How to rate Risk of bias in Randomized controlled trials

← GRADE and methodology

See this topic in the GRADE handbook: Study limitations (Risk of Bias)

The content below is provided by Gordon Guyatt, co-chair of the GRADE working group

Supplimental reading: GRADE guidelines: 4. Rating the quality of evidence—study limitations (risk of bias)

Both randomized control trials (RCTs) and observational studies may incur risk of misleading results if they are flawed in their design or conduct – what other publications refer to as problems with “validity”, “internal validity”, “study limitations” and we will refer to as “risk of bias”.

What method-issues to consider when assessing Risk of Bias

Concealment of randomization
Those enrolling patients are aware of the group (or period in a cross-over trial) to which the next enrolled patient will be allocated (major problem in “pseudo” or “quasi” randomized trials with allocation by day of week, birth date, chart number etc.)
Blinding
Patient, caregivers, those recording outcomes, those adjudicating outcomes, or data analysts are aware of the arm to which patients are allocated (or the medication currently being received in a cross-over trial)
Loss to follow-up
Loss to follow-up and failure to adhere to the intention to treat principle in superiority trials; or, in non-inferiority trials, loss to follow-up and failure to conduct both analyses considering only those who adhered to treatment, and all patients for whom outcome data are available
Selective outcome reporting
Incomplete or absent reporting of some outcomes and not others on the basis of the results
Use of unvalidated outcome measures (e.g., patient-reported outcomes)
Stopping early for benefit

How to do the assessment, practical aspects

Summarizing risk of bias must be outcome specific
Summarizing risk of bias requires consideration of all relevant evidence
Existing systematic reviews are often limited in summarizing study limitations across studies
What to do when there is only one RCT
Moving from risk of bias in individual studies to rating confidence in estimates across studies
Application of principles

What method-issues to consider when assessing Risk of Bias

1. Concealment of randomization

Although randomization is a powerful technique, it does not always succeed in creating groups with similar prognosis. Investigators may make mistakes that compromise randomization.
When those enrolling patients are unaware and cannot control the arm to which the patient is allocated, we refer to randomization as concealed. In unconcealed trials, those responsible for recruitment may systematically enroll sicker—or less sick—patients to either treatment or control groups. This behavior will compromise the purpose of randomization and the study will yield a biased result. Careful investigators will ensure that randomization is concealed through strategies such as remote randomization, in which the individual recruiting the patient makes a call to a methods center to discover the arm of the study to which the patient is assigned.
Consider, for instance, a trial of β-blockers vs angiotensin-converting enzyme (ACE) inhibitors for hypertension treatment that used opaque numbered envelopes to conceal randomization(1). At the time the study was conducted, evidence suggested that β-blockers were better for patients with heart disease. Significantly more patients with heart disease were assigned to receive β-blockers (P = .037). Also, evidence suggested that ACE inhibitors were better for patients with diabetes mellitus. Significantly more patients with diabetes were assigned to receive ACE inhibitors (P = .048). It is very possible that clinicians were opening envelopes and violating the randomization to ensure patients received what the clinicians believed was the best treatment. Thus, the prognostic balance that randomization could have achieved was prevented.

2. Blinding

If randomization succeeds, treatment and control groups begin with a similar prognosis. Randomization, however, provides no guarantees that the 2 groups will remain prognostically balanced. Blinding is the optimal strategy for maintaining prognostic balance.

Table 2 describes 5 groups involved in clinical trials that, ideally, will remain unaware of whether patients are receiving the experimental therapy or control therapy. Patients who take a treatment that they believe is effective may feel and perform better than those who do not, even if the treatment has no biologic activity. Investigators interested in determining the biologic impact of a treatment will ensure patients are blind to treatment allocation. Similarly, rigorous research designs will ensure blinding of those caring for participants, as well as those collecting, evaluating, and analyzing data. Demonstrations of bias introduced by unblinding—such as the results of a trial in multiple sclerosis in which a treatment benefit judged by unblinded outcome assessors disappeared when adjudicators of outcome were blinded(2) —highlight the importance of blinding. The more subjectivity involved in judging whether a patient has had a target outcome, the more important blinding becomes. For example, blinding of an outcome assessor is unnecessary when the outcome is all-cause mortality.

Finally, differences in patient care other than the intervention under study—cointerventions—can, if they affect study outcomes, bias the results. Effective blinding eliminates the possibility of either conscious or unconscious differential administration of effective interventions to treatment and control groups. When effective blinding is not possible, documentation of potential cointerventions becomes important.

Five Groups That Should, if Possible, Be Blind to Treatment Assignment
Patients: To avoid placebo effects

Clinicians: To prevent differential administration of therapies that affect the outcome of interest (cointervention)

Data collectors: To prevent bias in data collection

Adjudicators of outcome: To prevent bias in decisions about whether or not a patient has had an outcome of interest

Data analysts: To avoid bias in decisions regarding data analysis

3. Loss to Follow-up

Ideally, at the conclusion of a trial, investigators will know the status of each patient with respect to the target outcome. The greater the number of patients whose outcome is unknown—patients lost to follow-up—the more a study is potentially compromised. The reason is that patients who are lost often have different prognoses from those who are retained—they may disappear because they have adverse outcomes or because they are doing well and so did not return for assessment. The magnitude of the bias may be substantial. A systematic review suggested that up to a third of positive trials reported in high-impact journals may lose significance given plausible assumptions regarding differential loss to follow-up in treatment and control groups.

When does loss to follow-up pose a serious risk of bias? Although you may run across thresholds such as 20% for a serious risk of bias, such rules of thumb are misleading. Consider 2 hypothetical randomized trials, each of which enters 1000 patients into both treatment and control groups, of whom 30 (3%) are lost to follow-up (Table 3). In trial A, treated patients die at half the rate of the control group (200 vs 400), a relative risk (RR) of 50%. To what extent does the loss to follow-up threaten our inference that treatment reduces the death rate by half? If we assume the worst (ie, that all treated patients lost to follow-up died), the number of deaths in the experimental group would be 230 (23%). If there were no deaths among the control patients who were lost to follow-up, our best estimate of the effect of treatment in reducing the relative risk of death drops from 200/400, or 50%, to 230/400, or 58%. Thus, even assuming the worst makes little difference to the best estimate of the magnitude of the treatment effect. Our inference is therefore secure.

Contrast this with trial B. Here, the RR of death is also 50%. In this case, however, the total number of deaths is much lower; of the treated patients, 30 die, and the number of deaths in control patients is 60. In trial B, if we make the same worst-case assumption about the fate of the patients lost to follow-up, the results would change markedly. If we assume that all patients initially allocated to treatment—but subsequently lost to follow-up—die, the number of deaths among treated patients rises from 30 to 60, which is equal to the number of control group deaths. If this assumption is accurate, we would have 60 deaths in both the treatment and control groups and the effect of treatment would drop to 0. Because of this dramatic change in the treatment effect (50% RR if we ignore those lost to follow-up; 100% RR if we assume all patients in the treatment group who were lost to follow-up died), the 3% loss to follow-up in trial B threatens our inference about the magnitude of the RR.

Of course, this worst-case scenario is unlikely. When a worst-case scenario, were it true, substantially alters the results, you must judge the plausibility of a markedly different outcome event rate in the treatment and control group patients lost to follow-up.

The issue is conceptually identical with continuous outcomes: was the loss to follow-up such that reasonable assumptions about differences in outcomes among those lost to follow-up in intervention and control groups could change the overall results in an important way?

Within the context of a systematic review, one can test, for each study and ultimately for the pooled estimate, a variety of assumptions about rates of events in those lost to follow-up when the outcome is a binary variable(3). One can also conduct such sensitivity analyses when the data are continuous(4). Such approaches represent the ideal way of determine whether to rate down for risk of bias as a results of loss to follow-up.

4. Stopping early for benefit

Theoretical consideration(5), simulations(6), and empirical evidence(7) all suggest that trials stopped early for benefit overestimate treatment effects. The most recent empirical work suggests that in the real world formal stopping rules do not reduce this bias, that it is evident in stopped early trials with less than 500 events, and that on average the ratio of relative risks in trials stopped early versus the best estimate of the truth (trials not stopped early) is 0.71(8).

Systematic review authors and guideline developers should consider this important source of bias. Systematic reviews should provide sensitivity analyses of results including and excluding studies that stopped early for benefit; if estimates differ appreciably, those restricted to the trials that did not stop early should be considered the more credible. When evidence comes primarily or exclusively from trials stopped early for benefit, authors should infer that substantial overestimates are likely in trials with fewer than 500 events and that large overestimates are likely in trials with fewer than 200 events(8).

5. Selective outcome reporting

When authors selectively report positive outcomes and analyses within a trial, critics have used the label “selective outcome reporting”. Recent evidence suggests that selective outcome reporting, which tends to produce overestimates of the intervention effects, may be widespread(9-13).

For example, a systematic review of the effects of testosterone on erection satisfaction in men with low testosterone identified four eligible trials(14). The largest trial’s results were reported only as “not significant”, and could not, therefore, contribute to the meta-analysis. Data from the three smaller trials suggested a large treatment effect (1.3 standard deviations, 95% confidence interval 0.2 - 2.3). The review authors ultimately obtained the complete data from the larger trial: after including the less impressive results of the large trial, the magnitude of the effect was smaller and no longer statistically significant (0.8 standard deviations, 95% confidence interval -0.05 – 1.63)(15).

The Cochrane handbook suggests that definitive evidence that selective reporting has not occurred requires access to a protocol developed before the study was undertaken(16). Selective reporting is present if authors acknowledge pre-specified outcomes that they fail to report, or report outcomes incompletely such that they cannot be included in a meta-analysis. One should suspect reporting bias if the study report fails to include results for a key outcome that one would expect to see in such a study, or if composite outcomes are presented without the individual component outcomes.

Note that within the GRADE framework, which rates the confidence in estimates from a body of evidence, suspicion of publication bias in a number of included studies may lead to rating down of quality of the body of evidence. For instance, in the testosterone example above, had the authors not obtained the missing data, they would have considered rating down the body of evidence for the selective reporting bias suspected in the largest study.

How to do the assessment, practical aspects

1. Summarizing risk of bias must be outcome specific

Sources of bias may vary in importance across outcomes. Thus, within a single study, one may have higher quality evidence for one outcome than for another. For instance, RCTs of steroids for acute spinal cord injury measured both all-cause mortality and, based on a detailed physical examination, motor function (24-26). Blinding of outcome assessors is irrelevant for mortality, but crucial for motor function. Thus, as in this example, if the outcome assessors in the primary studies were not blinded, evidence might be categorized for all-cause mortality as having no serious risk of bias, and rated down for motor function by one level on the basis of serious risk of bias.

2. Summarizing risk of bias requires consideration of all relevant evidence

Every study addressing a particular outcome will differ, to some degree, in risk of bias. Review authors and guideline developers must make an overall judgment, considering all the evidence, whether quality of evidence for an outcome warrants rating down on the basis of risk of bias.

Individual trials achieve a low risk of bias when most or all key criteria are met, and any violations are not crucial. Studies that suffer from one crucial violation – a violation of crucial importance with regard to a point estimate (in the context of a systematic review) or decision (in the context of a guideline) – provide limited quality evidence. When one or more crucial limitations substantially lower confidence in a point estimate, a body of evidence provides only weak support for inferences regarding the magnitude of a treatment effect.

High quality evidence is available when most studies from a body of evidence meet bias-minimizing criteria. For example, of the 22 trials addressing the impact of beta blockers on mortality in patients with heart failure most, probably or certainly, used concealed allocation, all blinded at least some key groups, and follow-up of randomized patients was almost complete(27).

GRADE considers a body of evidence of moderate quality when the best evidence comes from individual studies of moderate quality. For instance, we cannot be confident that, in patients with falciparum malaria, amodiaquine and sulfadoxine-pyrimethamine together reduce treatment failures compared to sulfadoxine-pyrimethamine alone because the apparent advantage of sulfadoxine-pyrimethamine was sensitive to assumptions regarding the event rate in those lost to follow-up in two of three studies(28).

Surgery versus conservative treatment in the management of patients with lumbar disc prolapse provides an example of rating down two levels due to risk of bias in RCTs(29). We are uncertain of the benefit of open disectomy in reducing symptoms after one year or longer because of very serious limitations in the one credible trial of open disectomy compared to conservative treatment. That trial suffered from inadequate concealment of allocation and unblinded assessment of outcome by potentially biased raters (surgeons) using unvalidated rating instruments (Table 6).

3. Existing systematic reviews are often limited in summarizing study limitations across studies

To rate overall confidence in estimates with respect to an outcome, review authors and guideline developers must consider and summarize study limitations considering all the evidence from multiple studies. For a guideline developer, using an existing systematic review would be the most efficient way to address this issue.

Unfortunately, systematic reviews usually do not address all important outcomes, typically focusing on benefit and neglecting harm. For instance, one is required to go to separate reviews to assess the impact of beta blockers on mortality(27) and on quality of life(30). No systematic review has addressed beta-blocker toxicity in heart failure patients.

Review authors’ usual practice of rating the quality of studies across outcomes, rather than separately for each outcome, further limits the usefulness of existing systematic reviews for guideline developers. This approach becomes even more problematic when review authors use summary measures that aggregate across quality criteria (e.g., allocation concealment, blinding, loss to follow-up) to provide a single score. These measures are often limited in that they focus on quality of reporting rather than on the design and conduct of the study(31). Furthermore, they tend to be unreliable and less closely correlated with outcome than individual quality components(32-34). These problems arise, at least in part, because calculating a summary score inevitably involves assigning arbitrary weights to different criteria.

Finally, systematic reviews that address individual components of study limitations are often not comprehensive and fail to make transparent the judgments needed to evaluate study limitations. These judgments are often challenging, at least in part because of inadequate reporting: just because a safeguard against bias isn’t reported doesn’t mean it was neglected(35, 36).

Thus, although systematic reviews are often extremely useful in identifying the relevant primary studies, members of guideline panels or their delegates must often review individual studies if they wish to ensure accurate ratings of study limitations for all relevant outcomes. As review authors increasingly adopt the GRADE approach (and in particular as Cochrane review authors do so in combination with using the Cochrane risk-of-bias tool) the situation will improve.

4. What to do when there is only one RCT

Many people are uncomfortable designating a single RCT as high quality evidence. Given the many instances in which the first positive report has not held up under subsequent investigation, this discomfort is warranted. On the other hand, automatically rating down quality when there is a single study is not appropriate. A single, very large, rigorously planned and conducted multi-centre RCT may provide evidence warranting high confidence. GRADE suggests especially careful scrutiny of all relevant issues (risk of bias, precision, directness, publication bias) when only a single RCT addresses a particular question.

5. Moving from risk of bias in individual studies to rating confidence in estimates across studies

Moving from 6 risk of bias criteria for each individual study to a judgment about rating down for quality of evidence for risk of bias across a group of studies addressing a particular outcome presents challenges.
We suggest the following 5 principles:

Judicious consideration
In deciding on the overall confidence in estimates, one does not average across studies (for instance if some studies have no serious limitations, some serious limitations, and some very serious limitations, one doesn’t automatically rate quality down by one level due to an average rating of serious limitations). Rather, judicious consideration of the contribution of each study, with a general guide to focus on the high quality studies (as we will illustrate), is warranted.
Evaluate how much each trial contributes
This judicious consideration requires evaluating the extent to which each trial contributes toward the estimate of magnitude of effect. This contribution will usually reflect study sample size and number of outcome events – larger trials with many events will contribute more, much larger trials with many more events will contribute much more.
Be conservative when rating down
One should be conservative in the judgment of rating down. That is, one should be confident that there is substantial risk of bias across most of the body of available evidence before one rates down for risk of bias.
Consider the context
The risk of bias should be considered in the context of other limitations. If, for instance, reviewers find themselves in a close call situation with respect to two quality issues (risk of bias and, say, precision) we suggest rating down for at least one of the two.
Be explicit
Notwithstanding the first five principles, reviewers will face close-call situations. They should both acknowledge they are in such a situation, make it explicit why they think this is the case, and make the reasons for their ultimate judgment apparent.

6. Application of principles

In a systematic review of flavonoids to treat pain and bleeding associated with hemorrhoids(37), with respect to the primary outcome of persisting symptoms, most trials did not provide sufficient information to determine whether randomization was concealed, the majority violated the intention-to-treat principle and did not provide the data allowing the appropriate analysis (Table 7), and none used a validated symptom measure. On the other hand, most authors described their trials as double-blind, and although concealment and blinding are different concepts, blinded trials of drugs are very likely to be concealed(35) (Table 7). Because the questionnaires appeared simple and transparent, and because of the blinding of the studies, we would be hesitant to consider lack of validation introducing a serious risk of bias.

Nevertheless, in light of these study limitations, one might consider focusing on the highest quality trials. Substantial precision would, however, be lost (requiring rating down for imprecision) and the quality of the trials did not explain variability in results (i.e. the magnitude of effect was similar in the higher and lower risk of bias studies). Both considerations argue for basing an estimate on the results of all RCTs.

In our view, this represents a borderline situation in which it would be reasonable either to rate down for risk of bias, or not to do so. This illustrates that the great merit of GRADE is not that it ensures consistency of conclusions, but that it requires explicit and transparent judgments. Considering these issues in isolation, and following the principles articulated above, however, we would be inclined not to rate down for quality for risk of bias.

Three RCTs addressing the impact of 24-hour administration of high dose corticosteroids on motor function in patients with acute spinal cord injury illustrate another principle of aggregation(24-26). Although the degree of limitations is in fact a continuum (as Figure 1 illustrates), GRADE simplifies the process by categorizing these studies – or any other study – as having “no serious limitations”, “serious limitations”, or “very serious limitations” (as in Table 5).

The first of the 3 trials (Bracken in Figure 1), which included 127 patients treated within 8 hours of injury, ensured allocation concealment through central randomization, almost certainly blinded patients, clinicians, and those measuring motor function, and lost 5% of patients to follow-up at 1 year(24). The flaws in this RCT are sufficiently minor to allow classification as no serious risk of bias.

The second trial (Pointillart in Figure 1) was unlikely to have concealed allocation, did blind those assessing outcome (but not patients or clinicians), and lost only one of 106 patients to follow-up(26). Here, quality falls in an intermediate range, and classification as either moderate risk of bias. The third trial (Odani in Figure 1), which included 158 patients, almost certainly failed to conceal allocation, used no blinding, and lost 26% of patients to follow-up, many more in the steroid group than the control group(25). This third trial is probably best classified as having very serious risk of bias.

Considering these three RCTs, should one rate down for risk of bias with respect to the motor function outcome? If we considered only the first two trials, the answer would be no. Therefore the review authors must decide either to exclude the third trial (thereby only including trials with few limitations) or include it based on a judgment that overall there is a low risk of bias (since most of the evidence comes from trials with few limitations) despite the contribution of the trial with very serious limitations to the overall estimate of effect. This example illustrates that averaging across studies will not be the right approach.

Tables:

TABLE 3. When Does Loss to Follow-up Seriously Increase Risk of Bias?
	Trial A		Trial B
	Treatment	Control	Treatment	Control
Number of patients randomized	1000	1000	1000	1000
Number (%) lost to follow-up	30 (3)	30 (3)	30 (3)	30 (3)
Number (%) of deaths	200 (20)	400 (40)	30 (3)	60 (6)
RR not counting patients lost to follow-up	0.2/0.4 = 0.50		0.03/0.06 = 0.50
RR for worst-case scenario^a	0.23/0.4 = 0.58		0.06/0.06 = 1

Table 5: Summarizing study limitations for randomized trials

Extent of risk of bias	Risk of bias within a study	Risk of bias across studies	Interpretation across studies*	Example of summary across studies
No serious limitations, do not downgrade	Low risk of bias for all key criteria (Table 1)	Most information is from studies at low risk of bias	High quality evidence: The true effect lies close to that of the estimate of the effect.	Beta blockers reduce mortality in patients with heart failure¹
Serious limitations, downgrade one level (i.e. from high to moderate quality)	Crucial limitation for one criterion, or some limitations for multiple criteria sufficient to lower ones confidence in the estimate of effect	Most information is from studies at moderate risk of bias	Quality of evidence reduced from high to moderate quality evidence: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.	Amodiaquine and sulfadoxine-pyrimethamine together (SP) likely reduce treatment failures compared to SP alone in patients with malaria²
Very serious limitations downgrade two levels (i.e. from high to low quality, or moderate to very low)	Crucial limitation for one or more criteria sufficient to substantially lower ones confidence in the estimate of effect	Most information is from studies at high risk of bias	Quality of evidence reduced from high to low quality evidence: The true effect may be substantially different from the estimate of the effect.	Open discectomy may reduce symptoms after one year compared to conservative treatment of lumbar disc prolapse³

*This interpretation assumes no problems that necessitate rating down due to imprecision, inconsistency, indirectness and publication bias

1. Brophy JM, Joseph L, Rouleau JL. Beta-blockers in congestive heart failure. A Bayesian meta-analysis. Ann Intern Med 2001;134(7):550-60.

2. McIntosh H, Jones K. Chloroquine or amodiaquine combined with sulfadoxine-pyrimethamine for treating uncomplicated malaria. The Cochrane Database of Systematic Reviews 2005(4).
3. Gibson JNA, Waddell G. Surgical interventions for lumbar disc prolapse. Cochrane Database of Systematic Reviews 2007, Issue 2. Art. No.: CD001350. DOI: 10.1002/14651858.CD001350.pub4.

Table 6: Quality assessment for open discectomy versus conservative treatment (27, Gibson and Waddell)

Quality assessment
No of patients (studies)	Design	Limitations	Inconsistency	Indirectness	Imprecision	Publication bias
Outcome: Poor/bad result at 1 year – surgeon rated
126 (1)	RCT	Very serious limitations¹	Not relevant	No serious indirectness	Serious imprecision²	Unlikely
Outcome: Poor/bad result at 4 years – surgeon rated
126 (1)	RCT	Very serious limitations¹	Not relevant	No serious indirectness	Serious imprecision²	Unlikely
Outcome: Poor/bad result at 10 years – surgeon rated
126 (1)	RCT	Very serious limitations¹	Not relevant	No serious indirectness	Serious imprecision²	Unlikely

¹Inadequate concealment of allocation and unblinded, unvalidated assessment by the surgeon.

²Wide confidence intervals and few events (16 or fewer).

Table 7. Risk of bias for measurement of symptoms in studies of flavonoids in patients with hemorrhoids

Author	Randomization	Allocation concealment	Blinding	Loss to follow up^ / ITT principle observed or per protocol analysis	Other
Dimitroulopoulos D, 2005	Adequate* Computer generated random numbers*	Sealed opaque envelopes*	Described as single blind Care givers, patients and data collectors blinded*	3% / Per protocol	Unvalidated symptom measure
Misra MC, 2000	Adequate Computer generated random numbers*	Adequate Sealed opaque envelopes*	Patients and physicians* Described as double blind Placebo identical appearance	2% / Per protocol	Unvalidated symptom measure
Godeberge P, 1994	Adequate*	Adequate Sealed opaque envelopes*	Patients, physician-investigator, data manager, statistician and authors blinded	6% / Per protocol
Cospite M, 1994	Unclear	Unclear	Unclear Described as double blind	12% / IT	Unvalidated symptom measure
Chauvenet-M, 1994	Unclear	Unclear	Unclear	11% / Per protocol	Unvalidated symptom measure
Ho Y-H, 2000	Adequate Drawing of sealed opaque envelopes*	Adequate Sealed opaque envelopes	All parties blinded*	0% / IT	Unvalidated symptom measure
Thanapongsathorn W, 1992	Unclear	Unclear	Unclear Described as double blind	I2% / Per protocol	Unvalidated symptom measure
Titapant V, 2001	Unclear	Unclear	Unclear Described as double blind Placebo identical appearance	12% / Per protocol	Unvalidated symptom measure
Wijayanegara H, 1992	Unclear	Unclear	Unclear Described as double blind	3% / Per protocol	Unvalidated symptom measure
Annoni F, 1986	Unclear	Unclear	Unclear Described as double blind Placebo identical appearance	Uncertain / Unclear	Unvalidated symptom measure
Thorp RH, 1970	Unclear	Unclear	Physicians and patients blinded Described as double blind Placebo identical appearance	20% / Per protocol	Unvalidated symptom measure
Clyne MB, 1967	Bottles numbered consecutively in accordance to random tables	Unclear	Physicians and patients blinded Described as double blind Placebo identical appearance	Uncertain / Per protocol	Unvalidated symptom measure
Sinnatamby CS, 1973	Unclear	Unclear	Physicians and patients blinded Described as double blind	53% / Per protocol	Unvalidated symptom measure
Trochet JP, 1992	Randomised by blocks of 3 (method unclear)	Unclear	Physicians blinded Placebo identical appearance	Uncertain / IT	Unvalidated symptom measure

IT: intention to treat principle observed, *: data provided by authors; ^ no important differences in rate of loss to follow-up between flavonoid and control groups in any study

_____________________________

The text below is taken from the GRADE workinggroup official JCE series, article number 4:
GRADE guidelines: 4. Rating the quality of evidence—study limitations (risk of bias)

"Readers can refer to many authoritative discussions of the study limitations that often afflict RCTs (Table 1). Two of these discussions are particularly consistent with GRADE’s conceptualization, which include a focus on outcome specificity (i.e., the focus of risk of bias is not the individual study but rather the individual outcome, and quality can differ across outcomes in individual trials, or a series of trials [1], [2]). We shall highlight three of the criteria in Table 1. The importance of the first of these, stopping early for benefit, has only recently been recognized. Recent evidence has also emerged regarding the second, selective outcome reporting [3], [4]. Furthermore, the positioning of selective outcome reporting in taxonomies of bias can be confusing. Some may intuitively think it should be categorized with publication bias, rather than as an issue of risk of bias within individual studies. Finally, we highlight loss to follow-up because it is often misunderstood.

Study limitations in randomized trials

1. Lack of allocation concealment

Those enrolling patients are aware of the group (or period in a crossover trial) to which the next enrolled patient will be allocated (major problem in “pseudo” or “quasi” randomized trials with allocation by day of week, birth date, chart number, etc)

2. Lack of blinding

Patient, care givers, those recording outcomes, those adjudicating outcomes, or data analysts are aware of the arm to which patients are allocated (or the medication currently being received in a crossover trial)

3. Incomplete accounting of patients and outcome events

Loss to follow-up and failure to adhere to the intention-to-treat principle in superiority trials; or in noninferiority trials, loss to follow-up, and failure to conduct both analyses considering only those who adhered to treatment, and all patients for whom outcome data are available

4. Selective outcome reporting bias

Incomplete or absent reporting of some outcomes and not others on the basis of the results

5. Other limitations

Stopping early for benefit

Use of unvalidated outcome measures (e.g., patient-reported outcomes)

Carryover effects in crossover trial

Recruitment bias in cluster-randomized trials"

Go to the orginal article for full text, or go to a specific chapter in the article:

Rating down quality for risk of bias

Study limitations in randomized trials

Stopping early for benefit

Selective outcome reporting

Loss to follow-up

Study limitations in observational studies

Case series: the problem of missing internal controls
Dealing with prognostic imbalance

Limitations of GRADE’s approach to assessing risk of bias in individual studies

Summarizing study limitations must be outcome specific

Summarizing risk of bias requires consideration of all relevant evidence

Existing systematic reviews are often limited in summarizing study limitations across studies

What to do when there is only one RCT

Moving from Cochrane risk of bias tables in individual studies to rating quality of evidence across studies

Application of principles

Recording judgments about study limitations

See the Assessing Risk of Bias Training video from McMaster CE&B GRADE site: http://cebgrade.mcmaster.ca/RoB/index.html%EF%BB%BF

References

1.Hansson L, Lindholm LH, Niskanen L, Lanke J, Hedner T, Niklason A, et al. Effect of angiotensin-converting-enzyme inhibition compared with conventional therapy on cardiovascular morbidity and mortality in hypertension: the Captopril Prevention Project (CAPPP) randomised trial. Lancet. 1999 Feb 20;353(9153):611-6.

2.Noseworthy JH, Ebers GC, Vandervoort MK, Farquhar RE, Yetisir E, Roberts R. The impact of blinding on the results of a randomized, placebo-controlled multiple sclerosis clinical trial. Neurology. 1994 Jan;44(1):16-20.

3.Akl EA, Johnston BC, Alonso-Coello P, Neumann I, Ebrahim S, Briel M, et al. Addressing dichotomous data for participants excluded from trial analysis: a guide for systematic reviewers. PLoS One. 2013;8(2):e57132.

4.Ebrahim S, Akl EA, Mustafa RA, Sun X, Walter SD, Heels-Ansdell D, et al. Addressing continuous data for participants excluded from trial analysis: a guide for systematic reviewers. J Clin Epidemiol. 2013 Sep;66(9):1014-21 e1.

5.Pocock SJ. When (not) to stop a clinical trial for benefit. Jama. 2005 Nov 2;294(17):2228-30.

6.Pocock SJ, Hughes MD. Practical problems in interim analyses, with particular regard to estimation. Control Clin Trials. 1989 Dec;10(4 Suppl):209S-21S.

7.Montori VM, Devereaux PJ, Adhikari NK, Burns KE, Eggert CH, Briel M, et al. Randomized trials stopped early for benefit: a systematic review. Jama. 2005 Nov 2;294(17):2203-9.

8.Bassler D, Briel M, Montori VM, Lane M, Glasziou P, Zhou Q, et al. Stopping randomized trials early for benefit and estimation of treatment effects: systematic review and meta-regression analysis. JAMA. Mar 24;303(12):1180-7.

9.Furukawa TA, Watanabe N, Omori IM, Montori VM, Guyatt GH. Association between unreported outcomes and effect size estimates in Cochrane meta-analyses. Jama. 2007 Feb 7;297(5):468-70.

10.Chan AW, Altman DG. Identifying outcome reporting bias in randomised trials on PubMed: review of publications and survey of authors. Bmj. 2005 Apr 2;330(7494):753.

11.Chan AW, Hrobjartsson A, Haahr MT, Gotzsche PC, Altman DG. Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. Jama. 2004 May 26;291(20):2457-65.

12.Chan AW, Krleza-Jeric K, Schmid I, Altman DG. Outcome reporting bias in randomized trials funded by the Canadian Institutes of Health Research. Cmaj. 2004 Sep 28;171(7):735-40.

13.Mathieu S, Boutron I, Moher D, Altman DG, Ravaud P. Comparison of registered and published primary outcomes in randomized controlled trials. JAMA. 2009 Sep 2;302(9):977-84.

14.Bolona ER, Uraga MV, Haddad RM, Tracz MJ, Sideras K, Kennedy CC, et al. Testosterone use in men with sexual dysfunction: a systematic review and meta-analysis of randomized placebo-controlled trials. Mayo Clin Proc. 2007 Jan;82(1):20-8.

15.Sinha M, Montori VM. Reporting bias and other biases affecting systematic reviews and meta-analyses: a methodological commentary. Expert Rev Pharmacoeconomics Outcomes Res. 2006;6(1):603-11.

16.Higgins JP, Altman D. Assessing the risk of bias in included studies. In: Higgins J, Green S, editors. Cochrane Handbook for Systematic Reviews of Interventions 501. Chichester, U.K.: John Wiley & Sons; 2008.

17.Deeks JJ, Dinnes J, D'Amico R, Sowden AJ, Sakarovitch C, Song F, et al. Evaluating non-randomised intervention studies. Health Technol Assess. 2003;7(27):iii-x, 1-173.

18.West S, King V, Carey T, Lohr K, McKoy N, Sutton S, et al. Systems to rate the strength of scientific evidence [Evidence report/technology assessment no 47]. AHRQ Publication No 02-E016: Agency for Healthcare Research and Quality2002.

19.Proposed Evaluation Tools for COMPUS.: Assessment, November 29, 2005. Ottawa: Canadian Coordinating Office for Health Technology2005.

20.Sanderson S, Tatt ID, Higgins JP. Tools for assessing quality and susceptibility to bias in observational studies in epidemiology: a systematic review and annotated bibliography. Int J Epidemiol. 2007 Jun;36(3):666-76.

21.Greer IA, Nelson-Piercy C. Low-molecular-weight heparins for thromboprophylaxis and treatment of venous thromboembolism in pregnancy: a systematic review of safety and efficacy. Blood. 2005 Jul 15;106(2):401-7.

22.Sanson BJ, Lensing AW, Prins MH, Ginsberg JS, Barkagan ZS, Lavenne-Pardonge E, et al. Safety of low-molecular-weight heparin in pregnancy: a systematic review. Thromb Haemost. 1999 May;81(5):668-72.

23.Bowker SL, Majumdar SR, Veugelers P, Johnson JA. Increased cancer-related mortality for patients with type 2 diabetes who use sulfonylureas or insulin. Diabetes Care. 2006 Feb;29(2):254-8.

24.Bracken MB, Shepard MJ, Collins WF, Jr., Holford TR, Baskin DS, Eisenberg HM, et al. Methylprednisolone or naloxone treatment after acute spinal cord injury: 1-year follow-up data. Results of the second National Acute Spinal Cord Injury Study. J Neurosurg. 1992 Jan;76(1):23-31.

25.Otani K, Abe H, Kadoya S. Beneficial effect of methylprednisolone

sodium succinate in the treatment of acute spinal cord injury. Sekitsui Sekizui. 1994;7:633–47.

26.Pointillart V, Petitjean ME, Wiart L, Vital JM, Lassie P, Thicoipe M, et al. Pharmacological therapy of spinal cord injury during the acute phase. Spinal Cord. 2000 Feb;38(2):71-6.

27.Brophy JM, Joseph L, Rouleau JL. Beta-blockers in congestive heart failure. A Bayesian meta-analysis. Ann Intern Med. 2001 Apr 3;134(7):550-60.

28.McIntosh H, Jones K. Chloroquine or amodiaquine combined with sulfadoxine-pyrimethamine for treating uncomplicated malaria. The Cochrane Database of Systematic Reviews. 2005(4).

29.Gibson J, Waddell G. Surgical interventions for lumbar disc prolapse. The Cochrane Database of Systematic Reviews. 2007(2).

30.Dobre D, van Jaarsveld CH, deJongste MJ, Haaijer Ruskamp FM, Ranchor AV. The effect of beta-blocker therapy on quality of life in heart failure patients: a systematic review and meta-analysis. Pharmacoepidemiol Drug Saf. 2007 Feb;16(2):152-9.

31.Moher D, Jadad AR, Nichol G, Penman M, Tugwell P, Walsh S. Assessing the quality of randomized controlled trials: an annotated bibliography of scales and checklists. Control Clin Trials. 1995 Feb;16(1):62-73.

32.Schulz KF, Chalmers I, Hayes RJ, Altman DG. Empirical evidence of bias. Dimensions of methodological quality associated with estimates of treatment effects in controlled trials. Jama. 1995 Feb 1;273(5):408-12.

33.Emerson JD, Burdick E, Hoaglin DC, Mosteller F, Chalmers TC. An empirical study of the possible relation of treatment differences to quality scores in controlled randomized clinical trials. Control Clin Trials. 1990 Oct;11(5):339-52.

34.Juni P, Witschi A, Bloch R, Egger M. The hazards of scoring the quality of clinical trials for meta-analysis. Jama. 1999 Sep 15;282(11):1054-60.

35.Devereaux PJ, Choi PT, El-Dika S, Bhandari M, Montori VM, Schunemann HJ, et al. An observational study found that authors of randomized controlled trials frequently use concealment of randomization and blinding, despite the failure to report these methods. J Clin Epidemiol. 2004 Dec;57(12):1232-6.

36.Soares HP, Daniels S, Kumar A, Clarke M, Scott C, Swann S, et al. Bad reporting does not mean bad methods for randomised trials: observational study of randomised controlled trials performed by the Radiation Therapy Oncology Group. Bmj. 2004 Jan 3;328(7430):22-4.

37.Alonso-Coello P, Zhou Q, Martinez-Zapata MJ, Mills E, Heels-Ansdell D, Johanson JF, et al. Meta-analysis of flavonoids for the treatment of haemorrhoids. Br J Surg. 2006 Aug;93(8):909-20.

How to rate Risk of bias in Randomized controlled trials

What method-issues to consider when assessing Risk of Bias

How to do the assessment, practical aspects

What method-issues to consider when assessing Risk of Bias

1. Concealment of randomization

2. Blinding

3. Loss to Follow-up

4. Stopping early for benefit

5. Selective outcome reporting

How to do the assessment, practical aspects

1. Summarizing risk of bias must be outcome specific

2. Summarizing risk of bias requires consideration of all relevant evidence

3. Existing systematic reviews are often limited in summarizing study limitations across studies

4. What to do when there is only one RCT

5. Moving from risk of bias in individual studies to rating confidence in estimates across studies

6. Application of principles

Table 5: Summarizing study limitations for randomized trials

Table 6: Quality assessment for open discectomy versus conservative treatment (27, Gibson and Waddell)

Table 7. Risk of bias for measurement of symptoms in studies of flavonoids in patients with hemorrhoids

Feedback and Knowledge Base

Searching…

Knowledge Base

MAGIC authoring and publication platform

What method-issues to consider when assessing Risk of Bias

How to do the assessment, practical aspects

What method-issues to consider when assessing Risk of Bias

1. Concealment of randomization

2. Blinding

3. Loss to Follow-up

4. Stopping early for benefit

5. Selective outcome reporting

How to do the assessment, practical aspects

1. Summarizing risk of bias must be outcome specific

2. Summarizing risk of bias requires consideration of all relevant evidence

3. Existing systematic reviews are often limited in summarizing study limitations across studies

4. What to do when there is only one RCT

5. Moving from risk of bias in individual studies to rating confidence in estimates across studies

6. Application of principles

Table 5: Summarizing study limitations for randomized trials

Table 6: Quality assessment for open discectomy versus conservative treatment (27, Gibson and Waddell)

Table 7. Risk of bias for measurement of symptoms in studies of flavonoids in patients with hemorrhoids

We're glad you're here

Searching…

Contact support

Knowledge Base

MAGIC authoring and publication platform