Acta Psychiatr Scand: Bipolar depression, lamotrigine vs lamotrigine+divalproex

Ranked #1 during July 2012Lamotrigine vs. lamotrigine plus divalproex in randomized,  placebo-controlled maintenance treatment for bipolar depression.

Study design & execution:
A two arm, randomized, double-blind, parallel study to compare Lamotrigine against a combination of Lamotrigine + Divalproex as a maintenance treatment for patents with bipolar depression.

The text is not entirely clear what the study’s specific primary objective and its associated primary outcome variable was:

( “Primary Outcome Measures: Rates of response to treatment regimen.”

(Abstract) Objective: To compare the maintenance efficacy of lamotrigine (Lam) to combination therapy…

(Aims of Study section): We hypothesized that a combination of Lam and Div would provide superior outcomes in terms of depressive prophylaxis…

(Statistical Analysis section): …which would have power of approximately 0.75 if survival in the baseline group was…

(Statistical Analysis section): On the basis of the completed studies of Div and Lam, we anticipated median time to event for intervention for a depressive episode of …

 (Fig. 3): Time to development of a depressive relapse defined as a MADRS score >= 15.

(Abstract) Results: Time to depressive episode did not differ significantly…

The sample size justification hints at pragmatism:

Given funding resources, we aimed to recruit a sample that would result in approximately 40 subjects per randomized group, which would have power of approximately 0.75 if survival in the baseline group was 17% vs. 41%…

From a statistical point of view pragmatic sample sizes are invariably too small in which case they manage to be simultaneously honest and unethical (since the small size results in a lack of power which makes the research effort futile). Pragmatic sample sizes often hint at the ready, FIRE, aim approach to clinical research.

Completion was 16/45 and 13/41 at 8 months. The text reports

On the average, patients completed about two thirds of the scheduled visits (Lam alone 6.5 ± 3.3; Lam + Div 6.3 ± 3.3, t = 0.2…)

Ideally standard deviation should not be expressed as ±SD, we suggest (SD). The completed visits data are highly unlikey to be Normally distributed thus making the choice of t-test a potentially poor one.

Analysis & Reporting:
No information on the randomization or blinding is supplied in the text, given this is a randomized, controlled trial the text requires a succinct description of these key design fratures as per the CONSORT statement.

No table of baseline data is reported. This leaves the readers to take the rather vague statement “There were no significant differences…” in the Results section in good faith.

The Statistics Methods section should describe the hypotheses to be tested and the associated tests to be used or models to be fitted, rather than merely offer a list of statistical techniques that were employed. Various statistical tests are listed however they aren’t associated to a particular objective (since the objectives aren’t formally laid out in the text).

A mixed effects model is reported in the Results section however there are few specific details on this model in the Methods / Statistical Analysis section.

A Cochran-Mantel-Haenszel analysis and Breslow-Day homogeneity test is reported in the Results section however there are no associated details in the Methods / Statistical Analysis section.

A proportional hazard regression model’s results are reported, again no details of this model are supplied in the Methods / Statistical Analysis section (which mentions Kaplan-Meier analysis for survival analyses).

Some continuous outcome variables are dichotomised – while popular with clinicians this is a poor design decision from a statistician’s point of view, it lessens power and reduces available information in the data.

An old fashioned significance testing approach was undertaken for the analysis. This is a poor choice given the Limitations section states somewhat obliquely: “Limitations of the study include a sample size with inadequate power for a strong test of several hypotheses.” (ready, FIRE, aim).

The “Baseline symptom moderators of outcome” analysis is clearly spelt out as exploratory analysis, however the analysis used significance testing; this is a poor approach for exploratory analyses (see suggestions).

A subgroup analysis was reported with the admission:

The number of BD II patients (N = 20) was too small for meaningful separate analysis, but… did not suggest a significant difference…”

The analyses seem to continually use significance testing when it is not warranted by the small amount of data (see suggestions below).

The text reports the RCT safety fallacy:

Discontinuation rates for adverse effects were low with both medication regimens and did not differ significantly.

The ready, FIRE, aim approach is confirmed by the limitations:

Limitations of the study include a sample size with inadequate power for a strong test of several hypotheses.

It is not clear why an ambitious (but futile) program of significance testing was undertaken given this serious limitation. In general limitations should list factors which were beyond the scope of the investigators to control (site closure, flu epidemic, etc). If there wasn’t sufficient power to test hypotheses then they shouldn’t have been tested. Other more descriptive approaches could have been taken (see suggestions).

From the Discussion section:

The results of this study indicate that the combination of Lam and Div was generally more effective…”

The choice of words “indicate” and “generally” seems like special pleading for the non-significant primary result. The purpose of clinical trials is to reduce clinical uncertainty.

“… thus serves as much for proof of concept as conclusive for the overall hypothesis of advantage of combination regimens for depressive prophylaxis in BD”

This claim is confusing, “conclusive” findings are unwarranted from the inconclusive primary results.

we found evidence that lamotrigine plus divalproex ER more effectively controlled depressive relapse in recently bipolar depressed patients than did lamotrigine alone”

Again, this choice of language seems too strong given the inconclusive primary results. The Scientific method rests on strict and vigorous attempts to disprove theories not on nurturing, protecting and promoting them.

Statistical suggestions:
Clearly specify the study’s primary objective, the primary outcome variable and the statistical methods planned to test it.

As noted earlier the purpose of clinical trials is to reduce clinical uncertainty. Ensure your trial has adequate power to answer the question of interest – if not do not proceed. A battery of futile significance tests may only sow more uncertainty. Ready, aim, FIRE! Do not embark on an analysis of (underpowered) significance testing and then concede there was probably inadequate power to do so in the Limitations section.

Ensure all statistical models and methods used are fully accounted for in the Statistical Methods section. A laundry list of methods employed is not usually adequate. For statistical models, explain the choice of model and justify the covariates chosen and other key decisions and assumptions.

In cases where (pre-specified) exploratory or subgroup analyses are undertaken do not use significance testing. A yes/no, significant/not-significant answer to these more nuanced inquiries is not useful and can be misleading. Instead compute the 95% confidence interval and conjecture that the true result lies within the interval reported, if the interval is too wide conclude your trial was unable to shed much light on the matter.

the wrong turn
ready, FIRE, aim
low powered significance tests are futile
the statistical power tutorial

Questions & comments for authors:
We feel the choice of significance testing is not ideal for small sized trials. We would suggest a more restrained conclusion based on the outcome of the primary analysis.

The authors are guaranteed this space for replies/rejoinder

Questions & comments for the journal:
No information was supplied on the randomization or blinding in the text, this could easily have been picked up with the CONSORT checklist. Much of the statistical methodology used was not first described in the Statistics Methods section. Pragmatic sample sizes should be treated with suspicion and hint at low power. We would urge the journal not to publish work which relies on strict significance testing but then subsequently admits to not having adequate power to do so (as a limitation).

The journal is guaranteed this space for replies/rejoinder

Questions & comments for readers:
Comments are welcome in the Discussion section below.

CONSORT checklist:
Click the link directly below for detailed CONSORT based appraisal.