Non-adherence in non-inferiority trials: pitfalls and recommendations

July 12, 2020 by Yin Mo et al | BMJ

Non-adherence in non-inferiority trials can affect treatment effect estimates and often increases the chance of claiming non-inferiority under the standard intention-to-treat analysis. This article discusses the implications of different patterns of non-adherence in non-inferiority trials and offers practical recommendations for trial design, alternative analysis strategies, and outcome reporting to reduce bias in treatment estimates and improve transparency in reporting.

Randomised controlled trials that test for non-inferiority of the experimental arm are performed when a new treatment is compared with an established standard of care. Instead of being required to have superior clinical efficacy, the new treatment might be preferred for its improved safety, convenience, or reduced cost. These trials are increasingly prevalent because highly efficacious standard-of-care treatments have been established for many diseases, making demonstration of superiority against standard-of-care implausible and placebo controlled trials without any active comparators unethical to perform.1,2

A basic weakness of non-inferiority trials, compared with superiority trials, is that poor conduct of the trial or deviations from the protocol could result in false rejection of the null hypothesis that the experimental treatment is inferior. Most trials report that some participants do not adhere to their allocated treatment. Intention-to-treat analysis estimates the treatment effect accounting for this real world adherence pattern by comparing outcomes between groups of participants defined by their allocated treatment; it measures the effect of allocating a treatment on participant outcomes, instead of the actual effect of treatment (often called an effectiveness trial). If the primary research interest is the causal effect of assigning treatments, then this estimate is likely to be the most relevant. In other situations, the question of primary interest is the causal effect of the treatment itself. Because many patterns of non-adherence result in reduced observed differences between the comparison arms, there is a risk that relying on the intention-to-treat analysis to conclude non-inferiority will lead to the adoption of treatments which, when taken, lead to worse outcomes.

Many trials also report the per protocol analysis, which includes only participants who received the treatment according to randomisation assignment to estimate the effect of the treatment itself. However, because the adherent participants might systematically differ in underlying prognostic factors compared with non-adherent patients, and the per protocol participants in each allocation group might differ in terms of prognostic characteristics, the per protocol analysis can give biased treatment effect estimates. In non-inferiority trials, this difference could lead to false conclusions of non-inferiority when the treatment effect is actually inferior.

Most non-inferiority trials continue to rely on intention-to-treat and per protocol analyses even in the presence of high degrees of non-adherence. In this article, we discuss the appropriateness of the common analysis methods given the unique features of non-inferiority trials, explain the effects of various patterns of non-adherence on estimates obtained using these methods, and suggest measures to improve study design, statistical analysis and reporting to deal with this issue.

Summary points

Non-adherence to allocated treatment in non-inferiority trials typically dilutes observed treatment effects in respective allocation arms, and results in a higher probability of claiming non-inferiority
Different patterns of non-adherence can bias treatment efficacy estimates differently, depending on the influence of the confounding factors on the adherence to allocated treatment and on the study outcome
Potential confounder should be prespecified in order to collect relevant and complete data from both adherent and non-adherent participants during the trial
When estimating treatment efficacy, causal inference methods can help to minimise bias and risk of false non-inferiority claims

Challenges of non-inferiority trials

In non-inferiority trials, we ask whether a new treatment is no worse than the standard-of-care treatment, compared with asking whether a new treatment is better than the standard of care in a typical superiority trial (box 1). This shift in focus of comparison complicates non-inferiority trials for two main reasons.

What are non-inferiority trials and how are they analysed?

Testing the non-inferiority hypothesis

Non-inferiority trials are conducted to show that an experimental treatment is not worse than the control by a predefined non-inferiority margin in terms of the primary outcome (the alternative hypothesis, H₁). The corresponding null hypothesis (H₀) is that the intervention is indeed worse than the control arm by more than or equal to the non-inferiority margin. These definitions directly contrast superiority trials, which test the null hypothesis that neither treatment arm has superior clinical efficacy.

In an example trial that compares an experimental treatment with a control treatment using a primary outcome of mortality, the null hypothesis is tested by comparing the upper bound of the two sided confidence interval of the treatment effect estimate (experimental treatment effect minus control treatment effect on an absolute scale, or experimental treatment effect divided by control treatment effect on a relative scale) with the non-inferiority margin. Non-inferiority is concluded if the upper confidence interval bound is less than the non-inferiority margin.

A type I error in a non-inferiority trial is falsely concluding non-inferiority when the new treatment is inferior. Power is the probability of correctly concluding non-inferiority when the new treatment is non-inferior according to the predefined boundary.

Conventional analysis methods

Such trials are analysed in two ways: intention to treat and per protocol.

The intention-to-treat approach considers all randomised participants according to their assigned groups, regardless of whether participants received the allocated interventions. In this case, randomisation ensures no systematic selection or confounding bias. Unless adherence is 100%, the causal effect of the treatment allocated will not be identical to the effect of the treatment received, in general.
The per protocol approach analyses the subset of participants who adhered to their randomisation assignment. The per protocol population differs from the intention-to-treat population when there is non-adherence. Those patients who do not adhere could systematically differ in underlying characteristics (confounding factors) compared with adherent patients. These characteristics could be known or unknown. The per protocol population conditions on post-randomisation information. Removing patients or part of their follow-up in a per protocol analysis violates the integrity of the randomisation process. With time varying treatment, per protocol also involves censoring. Exclusion of time after non-adherence causes an immortal time bias. Therefore, the treatment effect estimate in a per protocol analysis is a combination of the true treatment effect and bias from selecting a subset of patients.

The first complication is in deciding what we mean by “no worse than.” If the new treatment does lead to worse outcomes, but worse only by a small amount, we might reasonably conclude that it is non-inferior. The largest such “small amount” that is compatible with a conclusion of non-inferiority is known as the non-inferiority margin. This margin is a practically acceptable compromise in treatment efficacy that we are willing to sacrifice in exchange for the secondary benefits offered by the new treatment. The subjective nature of this measure arises from the debate around what margin is “clinically acceptable” and how the advantages of the new treatment are weighed against the potential loss in treatment efficacy; researchers also need to decide whether the non-inferiority margin should be on a relative or absolute scale.3 The non-inferiority margin might often be chosen for practical reasons such as reduction of sample size while maintaining adequate power required to conclude non-inferiority. Despite the development of many objective methods to justify the non-inferiority margin, its determination remains a contentious issue and highly context specific.2,4

The second complication is that poorly designed and conducted non-inferiority trials will often have an increased chance of concluding non-inferiority.3,5 Some examples include:

Non-specific endpoint measures—eg, using 30 day mortality as the primary outcome in a trial comparing drugs for treating cardiac arrhythmia in the intensive care unit. Even if one treatment is more effective than the other, the measurable difference will be diluted by mortalities due to other reasons such as sepsis or hypovolemic shock.6
Inappropriate participant cohort—eg, a trial comparing conservative medical treatment (control treatment) versus percutaneous coronary intervention (experimental treatment) in patients with stable angina but relatively good exercise tolerance, which has a primary endpoint defined as exercise increment at six weeks. Most participants, regardless of allocation, would not be expected to achieve this endpoint even though a clinically significant difference could exist between the two arms in patients with lower baseline exercise tolerance.7 Conversely, choosing patient groups with a high chance of spontaneous cure might also give misleading results—eg, in malaria endemic areas, adults commonly self-cure and treatment responses with ineffective medicines could produce excellent outcomes, but the same treatments in children can lead to high failure rates.8
Markedly different pharmacokinetic properties between the treatments with insufficient follow-up—eg, when the outcome of interest is recurrence in the treatment of malaria, recurrences might be delayed by slowly eliminated drugs. Terminating follow-up before all recurrences have occurred will favour the drug being eliminated more slowly.8,9

To counteract the above problems, the emphasis of all major guidelines for non-inferiority trials is on choosing appropriate control treatments that have been previously shown to be superior to placebo, and to ensure consistency in study design with the historical placebo controlled studies that established the standard of care.13,10 While these recommendations are useful as a regulatory approach to license new drugs, in many situations where placebo controlled trials were never performed, the appropriate choice of participants and outcomes becomes a more contentious issue.

Another challenge in clinical trials is non-adherence to allocated treatment: when non-adherence leads to a lower average treatment effect measured in the control group, or similar treatment effects measured in both groups, the experimental group will be more likely to appear non-inferior. Both types of non-adherence are frequently observed in non-inferiority trials. Because the control is usually a clinically available standard-of-care treatment, non-adherence often leads to study participants taking up a treatment from the opposite arm or taking an alternative treatment with similar efficacy to the control (box 2).

Case study of non-adherence on intention-to-treat and per protocol analyses in a non-inferiority trial

A study compared dose reduction guided by disease activity (experimental treatment) with continuous prescription (control treatment) of disease-modifying anti-rheumatic drugs in patients with rheumatic arthritis.11 The primary outcome was the proportion of participants who experienced a major flare by day 180 of follow-up. In the continuous treatment arm, 15% (nine of 59) of patients had dose reduction because they either had low disease activity or developed side effects and could not tolerate continuous treatment. In the dose reduction arm, 37% (45 of 121) had continuous treatment due to poorly controlled disease. The study concluded non-inferiority based on a per protocol analysis (absolute risk difference 2%, 95% confidence interval −12% to 12%), given a non-inferiority margin of 20%. Supplementary intention-to-treat analysis concurred with the per protocol analysis.

However, crossing over of participants could have resulted in the per protocol patients in the dose reduction arm having more patients with mild disease, and the per protocol patients in the continuous dosing arm having more patients with severe disease. If such a difference existed in baseline disease severity in the two per protocol groups, the dose reduction group would likely have fewer patients with major flares than the continuous group. The per protocol estimate might therefore be biased in favour of the dose reduction group. In an intention-to-treat analysis, crossing over of participants resulted in a proportion of participants receiving treatment of the opposite arm, which diluted the treatment effect difference measured between the two arms. In this example, both intention-to-treat and per protocol estimates have a heightened risk of claiming non-inferiority than using the true treatment efficacy estimate.

Implications of non-adherence in non-inferiority trials

With intention-to-treat analysis, if only 10% of participants cross over to the opposite arm, the probability of claiming non-inferiority can increase up to 8-10% from the nominal value of 2.5%.12 This inflation could lead to ineffective treatments being adopted as the standard of care and could lower the bar for subsequent clinical trials, enabling consecutively worse treatments to be accepted into clinical practice.13 Such a procession of ever-worsening care has been termed as “biocreep” (fig 1).

Fig 1

Effect of non-adherence on biocreep. Panels show four scenarios if consecutive non-inferiority trials (comparing standard-of-care versus treatment A; treatment A versus treatment B; treatment B versus treatment C; treatment C versus treatment D) were to be carried out at 100%, 90%, 80% and 70% adherence. X axis represents consecutive non-inferiority trials; y axis represents decrease in true efficacies of treatments A, B, C, and D compared with the initial standard-of-care treatment. Treatments A, B, C, and D are 10%, 20%, 30%, and 40% less effective than the standard of care, respectively. Dot sizes are probabilities (represented by percentages next to dots) for the new and inferior experimental treatment to be accepted as non-inferior at the end of each trial. For example, if 100% adherence is maintained in the trials (first panel), the probability of treatment A being accepted as the new standard of care is 2%. By contrast, when the consecutive trials are conducted with 70% adherence (last panel), treatment D has a 7% chance that it will be accepted as the new standard of care, when its true efficacy is 40% less than the current standard of care. This pattern of non-adherence is crossover (that is, in the 70% adherence scenario, 30% of participants from each arm cross over to the opposite arm)12

“>

Fig 1

Download figure
Open in new tab
Download powerpoint

Non-adherence, unlike treatment assignment, will often not be an independent event but driven by confounding or non-confounding factors (fig 2). Non-confounding factors affect the probability of adhering to the intervention but do not affect the study outcome. An example might be intolerance to study drug treatments due to mild side effects such as nausea or rash, which cause enough discomfort to affect adherence but not the outcome of the disease.14,15 By contrast, non-adherence can be driven by factors that influence the study outcome, such as disease severity.16,17 For example, consider an open label study where more severely ill patients are more likely not to adhere to the experimental treatment and these patients cross over to the standard-of-care control arm. On the other hand, patients with less severe disease are more willing to adhere to the experimental treatment. Comparing the groups of patients according to the actual treatments received will be biased because they have different disease severities. In this case, disease severity is a confounder because it affects both adherence and disease outcome.

“>

Fig 2. Common patterns of non-adherence in clinical trials. The directed acyclic graphs show the causal pathways from treatment allocation to outcome, highlighting the mechanisms causing non-adherence to allocated treatments

Download figure
Open in new tab
Download powerpoint

The effects of non-adherence patterns on intention-to-treat and per protocol analyses have been explored in a simulation study describing each of the scenarios shown in table 1.12 These simulations of a non-inferiority trial with a time-fixed intervention and dichotomous outcomes showed that most non-adherence patterns result in intention-to-treat analysis having a higher probability of claiming non-inferiority when the true experimental treatment is actually inferior in efficacy. Consider, as an example, a novel drug treatment that is compared with nicotine patch for smoking cessation. Suppose that 20% of the patients in the drug treatment group developed a rash and ended up taking various forms of nicotine replacement therapy, including nicotine patches, to quit smoking. These nicotine replacement therapies shift the effect measure in the experimental group closer to that of the control group, hence increasing the probability of claiming non-inferiority. The exception to this shift in effect measure is in cases when non-adherent participants from the experimental arm receive no treatment or treatments inferior to both the control and the experimental treatments (which might happen if the experimental treatment resulted in intolerable side effects, rendering participants not being able to take up the control or any further treatments).

Per protocol analysis includes only adherent study participants and is therefore vulnerable to confounding bias. The direction of bias depends on the direction of influence the confounders have on adherence and outcome. Unless confounders driving non-adherence are measured and adjusted for, per protocol analysis will be biased and could contribute to an increased risk of falsely claiming non-inferiority when the experimental treatment is actually inferior. Box 3 lists several methods from causal inference that can be used to adjust for differences in confounding characteristics.