Education: Power

Underpowered research has been described by Lemuel Moyé as looking for something in the basement… without bothering to switch the light on.

The concept of statistical power should be basically intuitive. In order to detect or observe an effect of interest a researcher may need to use a magnifying glass. This instrument however yields only limited precision, smaller effects would require a more powerful optical microscope while even smaller ones may demand yet a larger, more powerful (and more expensive) electron microscope.

Clearly there is an inverse relationship between the size of the thing you’re looking for and the size/precision of the instrument you’ll need to find it. By not matching the appropriate instrument with the anticipated size of the effect of interest the researcher runs the risk of not being able to see what they set out to look for.

The instrument the clinical trialist uses to detect, observe and measure treatment effects is the clinical trial. The size of the trial must be sufficiently large (in order to yield sufficient precision) to allow them to grasp (or catch a glimpse of) the truth. This is the basic principal behind statistical power in a clinical trial.

Underpowered clinical trials are unethical. They exploit limited resources and  patient altruism without ever having a credible chance of satisfactorily answering the question of interest. They merely sow doubt, confusion and suspicion.

Overpowered clinical trials are unethical. They are generally too large and as a result  utilize more resources and take longer to answer the research question than a more appropriately (smaller) sized trial would.

A credible sample size justification in a journal article will convince readers that:
the investigators are looking hard enough for what they hope to find.
the investigators are have planned a sharply focussed research question (and associated hypothesis).
the investigators understand the basic principles of experimental design.
the investigators have a deep understanding of the science underpinning their research.

Example: The Large Hadron Collider (LHC)
The mysterious Higgs boson particle was first predicted to exist in the 1960s by Peter Higgs of the University of Edinburgh. The existence of this elusive particle has fascinated the physics community for decades.

Previous research has narrowed down the existence of the Higgs boson to a mass range of between 115 and 141 giga-electronvolts with a best estimate centred at 125 giga-electronvolts. Scientists at CERN wanted to test for the existence of this hypothesised particle.

What the scientists at CERN did:
Based on all available evidence and information to hand regarding the hypothesized Higgs boson particle they first calculated the size of a particle accelerator necessary to test for its existence. Then in collaboration with over 10,000 scientists from over 100 countries they spent approximately €3.1 billion over 12 years to build the world’s largest high energy particle accelerator with a circumference of 27 kilometres, 100 metres underground in the French-Swiss Alps. They built the right tool for the task at hand.

What the scientists at CERN didn’t do:
Use their existing particle collider on the grounds that it was what they’d used in previous experiments.
Use their existing particle collider on the grounds it was cheap, they could get started immediately and they’d save €3.1 billion and 12 years.
Find the world’s largest particle collider and use that on the grounds it was the best available and would save €3.1 billion & 12 years.
Build a particle collider based on the amount of funding they had available. 

While the LHC is one of the most expensive scientific instruments ever made their approach to discovery is analogous to a humble clinical trial. Trialists must ensure their trial is “powerful” enough to detect the effects of interest before starting.

Shortcuts, timesavers and all arguments centred on convenience, cost and other practicalities that compromise a trials available power unfortunately, are not acceptable from a statistical point of view.

Example: Ready… Aim… FIRE!
Ready: Formulate the clinical question.
Aim: Design the right clinical trial and ensure you have sufficient power.
FIRE! Conduct the trial and report the results with confidence.

Example: Ready… FIRE!… Aim…
Ready: Formulate the clinical question.
FIRE! Conduct the trial as soon as possible, don’t bother with calculating the necessary sample size. Report indefinite, unclear, underpowered results.
Aim: Figure out the sample size you should have used. List the small sample size as a limitation. Plea for further research on the topic.

Muda (無駄) is a traditional Japanese term for an activity that is wasteful and doesn’t add value or is unproductive. Underpowered and overpowered research is muda. Scarcity of funding, ignorance of statistical power and its scientific and ethical relevance (along with an widespread fixation on p-values) and the ‘publish or perish’ mindset have combined to produce muda on an industrial scale in psychiatric research.

Muda does not serve psychiatry, its evidence base, its patients or physicians. Muda in psychiatric research is unethical.

Good science is usually expensive, inconvenient and difficult.

A sample size/power justification demonstrates that you are going to look hard enough to find what you said you were looking for. Underpowered clinical trials are usually doomed to fail before they even begin.

Suggested reading:
What Underlies Sample Size Calculations
Sample size calculations simplified