Skip to main content

Data Bingo! Oh no!




Oh boy - look what a data hunter has dragged in this time! Why is this problem so common? And who on earth is Bonferroni?

Our friend here found one "statistically significant" result when he looked at goodness knows how many differences between groups of people. He's fallen totally for a statistical illusion that's a hazard of 'multiple testing'. And a lot of headline writers and readers will fall for it, too.

Then he's made it worse by taking his unproven hypothesis (that a particular drink on a particular day in a particular group of people prevented stroke) and whacking on another unproven hypothesis (that if everyone else drinks lots of it, benefits will ensue). But it's the problem of multiple testing (also called multiplicity) where Olive Jean Dunn comes in.

It's pretty much inevitable that multiple testing will churn out some some totally random, unreliable answers.

A "statistically significant" difference isn't proof that the difference is a "real" one that could hold true for others. But it estimates that the probability of finding a difference roughly like this in this data if it's not real is less than a 5/100 or 5% (a "p" value of less than 0.05).

If you test for multiple possibilities, you need to expect even your statistically significant "findings" to be freak occurrences on average 5 times out of a 100 (or 1 in 20 findings). If you test only a few things, your chances of this kind of random error is very low.

But especially if you have a big dataset, the more things you look at, the higher the chance is that you'll drag total nonsense out. With high-powered computers crunching big data, this becomes a big problem - large numbers of spurious findings that can't be replicated.

Carlo Bonferroni (1892-1960) was an Italian mathematician. His name graces some statistical tests used to interpret results when doing multiple tests. But the multiple testing methods with his name that we use today were developed by Olive Jean Dunn, in papers she published in 1959 and 1961 [PDF].

There are other ways of approaching these problems. Some are concerned that techniques based on the Bonferroni correction are too conservative - too likely to throw the baby out with the water, if you like. So they use measures that have a different basis, such as the False Discovery Rate (FDR) [PDF].

Statistical tests can't totally eliminate the chance of random error, though. So you usually need more than just a single possibly random test result to be sure about something.

Getting more technical...

What about multiplicity issues in systematic reviews? As the Cochrane Handbook (section 16.7.2) points out, systematic reviews concentrate on estimating pre-specified effects - not searching for possible effects. Safeguards still matter, though. Even pre-specified analyses need to be kept to a minimum. And how many analyses were done needs to be kept in mind when interpreting results.

If you would like to read more technical information about multiple testing, here are some free slides from the University of Washington. And if you want to read more about the controversies and issues, here's a primer in Nature and an article in the Journal of Clinical Epidemiology (behind paywalls).



Update on 20 March 2016 added Olive Jean Dunn (including creating a biography page for her on Wikipedia), changed the cartoon to have the female scientist calling for her aid rather than Bonferroni's, and refined the description of statistical significance - thanks to the feedback from an anonymous commenter.

Comments

Popular posts from this blog

Women and children overboard

It's the  Catch-22  of clinical trials: to protect pregnant women and children from the risks of untested drugs....we don't test drugs adequately for them. In the last few decades , we've been more concerned about the harms of research than of inadequately tested treatments for everyone, in fact. But for "vulnerable populations,"  like pregnant women and children, the default was to exclude them. And just in case any women might be, or might become, pregnant, it was often easier just to exclude us all from trials. It got so bad, that by the late 1990s, the FDA realized regulations and more for pregnant women - and women generally - had to change. The NIH (National Institutes of Health) took action too. And so few drugs had enough safety and efficacy information for children that, even in official circles, children were being called "therapeutic orphans."  Action began on that, too. There is still a long way to go. But this month there was a sign that ...

Benefits Of Healthy eating Turmeric every day for the body

One teaspoon of turmeric a day to prevent inflammation, accumulation of toxins, pain, and the outbreak of cancer.  Yes, turmeric has been known since 2.5 centuries ago in India, as a plant anti-inflammatory / inflammatory, anti-bacterial, and also have a good detox properties, now proven to prevent Alzheimer's disease and cancer. Turmeric prevents inflammation:  For people who

Austerity-A Fancy Word for Destitute.

The reason for this post is not for the folks who have been caught in the first wave of personal economic hard reality, but the next wave. Regardless of the optimism espoused by grinning leaders and sycophant press, we are entering the final stage of global economic collapse. It began in 2008 and was forestalled for five years with fudge putty, but the weight of global indebtedness cannot be propped any longer and the final crunch is imminent. Austerity measures herald the final throes.  Indications of coming austerity.   Austerity measures are the final last ditch effort, futile or not! Back in the day many of us old-timers went through periods of "hard-times". In retrospect I realize there is no comparison to yesteryear hard times and today's version. Back then, expectations were never very high for the working class, there were no sophisticated systems or conveniences anyway. In fact the difference between being "set" or not was about having treats or not. Si...