J Atten Disorders: Adult ADHD, duloxetive vs placebo

Ranked #1 during June 2012: Duloxetine in Adults With ADHD: A Randomized, Placebo-Controlled Pilot Study.

Study design & execution:
A 6 week parallel design RCT pilot study involving 30 adult patients with a diagnosis of ADHD to investigate the treatment potential of duloxetine. The Introduction section spells out the overarching aim of the trial:

The aim of the present pilot study was to investigate the effect of duloxetine in adults with ADHD …

This modest aim is in accordance with its pilot design. The design includes random treatment allocation & blinding and the trial appears to have been well executed. Nine out of 15 patients in the duloxetine arm completed the 6 week trial.

An immediate problem is revealed from the stated primary objective:

The primary objective was to test the hypothesis that 60 mg of duloxetine daily is superior to placebo in the treatment of adult ADHD, as measured by…

This is misguided; “test the hypothesis” and “superiority” is the language of the fully powered phase III & IV clinical trial. Pilot or feasibility studies by definition do not intend to demonstrate superiority; their more modest aims are to explore feasibility and to gather useful information and insights to aid the design of a subsequent larger, fully powered, confirmatory study. Possibly the most common error observed in pilot studies is a misplaced emphasis on statistical significance, instead of feasibility (the pop-gun fallacy). Perhaps since duloxetine is widely prescribed and well studied in depression none of the objectives seem to investigate feasibility issues.

Analysis & Reporting:
The study’s confusion as to its raison d’être is conveyed by an admission in the Discussion section:

This is most likely the result of lack of power to detect group differences.

The focus on “power” and “detection of group differences” in a pilot study is misguided. Unfortunately a number of futile comparisons are undertaken throughout the text, for example:

No statistically significant difference in baseline and demographic factors was found when comparing the 6 participants who dropped out with the 24 who completed the trial.

The secondary outcome measure (Clinical Global Impression Scale) recorded in the clinicaltrials.gov registry has been elevated to a co-primary outcome measure (as stated in the manuscript’s Method / Assessments section).

From clinicaltrials.gov/ct2/show/NCT00940693:

Primary Outcome Measures: Impact of duloxetine on the Conners‘ Adult attention-Deficit/Hyperactivity Disorder Rating Scale-Observer Report:Screening Version…
Secondary Outcome Measures: Impact of duloxetine on the Clinical Global Impression Scale in ADHD adults…

From the abstract:

Results: The Duloxetine group showed lower score on CGI-Severity at Week 6 (3.00 vs. 4.07 for placebo, p < .001), greater improvement on CGI-Improvement (2.89 vs. 4.00 at Week 6, p < .001), and greater decreases on five of eight subscales of the CAARS

Some of the secondary objectives specified in the clinical trials registry go unreported in this manuscript. From a statistical perspective this re-ordering (and non-disclosure) of the objectives is problematic.

The text states that a repeated measures ANOVA model was used; however specific details of this model are missing from the Methods section (some details can be elucidated from the Results section). Important details such as whether adjustment was made for outcome score at baseline are unclear. The latter is a serious omission as the model forms the backbone of this longitudinal analysis. Its precise form needs to be clearly specified to the reader.

The chosen analysis strategy of significance testing (in spite of a small sample size) has the effect of cornering the interpretation into strictly dichotomous conclusions (“significant” & “not significant”). This is not the information required from a pilot study. In an attempt to escape this self-imposed predicament the phrase “trend towards” is used in the Discussion and Results sections when the p-value is slightly above the defined significance level. This language is a common form of pleading that is specific to significance testing in underpowered studies. Reporting the confidence intervals and discussing the confidence limits would be a much more informative means of presenting the results.

Example: From the Discussion section:

There was no group difference on the CAARS-Inv:SV even though participants on duloxetine started at a higher total ADHD average score (33.44) than did the Placebo group (31.60) and finished at a lower score (25.67 vs. 31.33)

This highlights the poverty of significance testing when used to describe qualitatively the observed treatment effects in pilot studies. There was a group difference in change from baseline scores of 7.5 however no confidence interval representing the uncertainty of this result is reported – instead the text reports “There was no group difference…” This interpretation hides useful information from readers and future investigators.

Twelve F statistics are reported however with the exception of the (undocumented) error bars on the plots no confidence intervals are reported in the text.

The Tolerability Findings section does a good job in summarizing and discussing the observed adverse events; unfortunately however the RCT safety fallacy is committed in the opening sentence:

There were no significant differences in vital signs between placebo and treatment groups.

Were it not for the small sample size the study objectives and analysis look similar to a phase III-IV duloxetine trial. This represents a missed opportunity. The analysis, resting on significance testing falls short in providing a qualitative description the potential of duloxetine treatment for this indication.

The text concludes with a thoughtful discussion on how the results should inform the planning of a larger study including drop-out, tolerability, dose and dose titration.

Statistical suggestions: 
The aims of a pilot study should be to measure the observed treatment effect and its variation as well as investigate general feasibility issues, not to perform stringent significance testing. While “superiority”, “statistical significance” etc are inappropriate aims (given N=30) the investigators should still define a criterion for a successful pilot. The secondary objectives should be at least partly focused on the feasibility of conducting an antidepressant trial for adult ADHD.

Choose a single outcome variable for the primary objective, report this as the top line result. Relegate the others to secondary analyses (if there is a compelling reason for co-primary objectives this should be explained in the Methods section). Rank the secondary objectives in order of importance and report them all in this order. This forms a fixed structure to the investigation, it helps to protect against false positive (type I) error and shouldn’t be re-shuffled midstream.

Ensure the randomization mechanism and blinding methods are described succinctly to the reader.

Avoid analysis of percentage change stick to the raw data.

Focus the analysis (and plots) primarily on the treatment effect. Significance testing is a clumsy and uninformative way to handle small sample sizes. Confidence intervals convey more information to the reader than do p-values.

The following links cover relevant statistical issues in more detail:

Quite often the emphasis is wrongly placed on statistical significance, not on feasibility – which is the main focus of the pilot study. Our experience in reviewing submissions to a research ethics board also shows that most of the pilot projects are not well designed: i.e. there are no clear feasibility objectives; no clear analytic plans; and certainly no clear criteria for success of feasibility.
(Thabane 2010)

A tutorial on pilot studies: the what, why and how
Design and analysis of pilot studies: recommendations for good practice
confidence intervals instead of p-values
shuffling the deck
low powered comparisons are futile
the pop-gun fallacy
the treatment effect 

Questions & comments for authors:
We would be happy to report the observed treatment difference on the CAARS-O:SV scale and its 95% confidence interval if available – this information would be useful to investigators planning a larger confirmatory study.

The authors are guaranteed this space for replies/rejoinder

Questions & comments for the journal:
A simple rule/policy may be helpful: Do not publish articles which claim to be pilot studies yet simultaneously embark on significance testing.

No details of the randomization or blinding mechanisms were provided, this could easily  have been detected using the CONSORT checklist by a non-statistical reviewer.

The journal is guaranteed this space for replies/rejoinder

Questions & comments for readers:
Comments are welcome in the Discussion section below.

CONSORT checklist:
Click the link directly below for detailed CONSORT based appraisal.