

result. The risk of bias (RoB) for the outcomes in each study
should be systematically assessed and sensitivity analyses
performed to examine the effect of RoB on the conclusions.
Observational and nonrandomized comparative studies in
SRs of interventions should
not
be included in MAs because
the MA may provide very precise but spurious results
because of confounding and patient selection bias.
Only a nonrandom proportion of research projects
ultimately reach publication in an indexed journal and
become readily identifiable for SRs. Statistically significant
‘‘positive’’ results favoring an intervention are more likely to
be published, published more quickly, and published in
higher impact journals, leading to publication bias
[26]. When these trials are pooled together in an MA, this
may lead to exaggeration of the treatment effect. Begg and
Egger tests, along with funnel graphs and plots, can be
applied for detection of publication bias, but these have
limited power in small MAs, such as those including fewer
than ten studies
[27]. To minimize publication bias, authors
should not only perform a comprehensive systematic
literature search looking for published trials in various
electronic databases, but should also search trial registries
for unpublished studies and conference abstracts or
proceedings
[18].
4.
When the results of an RCT are in conflict with
the results of an SR/MA
It is not uncommon for the results of a large RCT to appear to
be inconsistent with evidence from SRs/MAs. The most
extreme case is when an intervention thought to be
beneficial is demonstrated to be harmful in a large RCT
[9,10]. More commonly, an RCT may show that a treatment
is ineffective, or less effective than found in a previous MA,
or perhaps only effective in a subpopulation of patients.
Assuming the conflicting RCT was of high quality, a number
of issues should be explored to try to explain the
discrepancies.
4.1.
Quality of the SR
The starting point is the methodological quality of the SR.
Assessment of Multiple Systematic Reviews (AMSTAR
[6_TD$DIFF]
) and
Documentation and Appraisal Review Tool (DART) check-
lists
[28–30]allow readers to judge a review’s quality by
focusing on the essential components of a well-conducted
SR. Items include the comprehensiveness of the search
strategy
[7_TD$DIFF]
, a description of the characteristics of studies
included and an assessment of their scientific quality. A
poor quality SR/MAmay produce biased results that conflict
with a large RCT.
4.2.
Small study effects and publication bias
Small study effects and publication bias can individually
and jointly produce results in an SR/MA that conflict with a
large RCT. Studies have shown that small RCTs can
exaggerate intervention effects owing to shortcomings in
methodological rigor, which may then introduce bias
[3].
Small studies that find statistically significant (but unreal-
istically large) treatment effects are more likely to be
published than negative studies, and then included in an SR
and MA, leading to publication bias. Both of these
phenomena can be investigated using funnel plots
[31].
4.3.
Heterogeneity
Heterogeneity within an SR/MA can arise from many
sources, including the population recruited (age, sex,
disease severity, etc), the intervention(s) and control
treatments, and the definition and timing of outcome
measurements. If studies included in an SR/MA differ
substantially from a subsequent large RCT, then judgment is
required on whether similar findings should be expected.
Another source of heterogeneity is differences in the
methodological quality of the studies included. Deficiencies
in the generation and concealment of the allocation
sequence, adherence to treatment, handling of missing
data, and outcome assessment can all introduce bias in the
outcomes reported in the studies included
[18]. Bias may
then be propagated in MAs through the pooling of biased
study effects, thus contributing to different estimates of
effectiveness between an SR/MA and subsequent large
RCTs. Nevertheless, since an MA is generally seen to have a
higher LE than a single RCT, the results of a poor-quality MA
may have more impact than a well-conducted RCT.
Heterogeneity should be assessed using both clinical
knowledge and statistical methods. If substantial heteroge-
neity from any source is suspected, random effects models
are recommended; however, the pooling of data and
estimation of an overall treatment effect may be inappro-
priate with any statistical model in the presence of
heterogeneity. Meta-regression is a useful tool for exploring
the relationship between RCT effect sizes and character-
istics at a study level
[32] ;however, IPD MAs are required
for assessment at a patient level
[21,33]. Appropriate
statistical modeling may show that after correcting for
sources of bias and heterogeneity, discrepancies between
SRs/MAs and definitive RCTs are reduced. Whatever the
approach, interpretation of results is less straightforward
when heterogeneity is present.
To provide guidance to clinicians and guideline devel-
opers when there is a conflict in results between a large RCT
and an SR/MA, a practical checklist of points to consider is
provided in
Table 3.
5.
Examples of discrepancies between findings
from MAs and large RCTs
5.1.
Medical expulsive therapy
Five SRs and MAs on the management of uncomplicated
symptomatic ureteric stones using MET were published in
the past 10 yr
[34–38]. All five suggested that alpha blockers
and nifedipine were more effective in increasing spontane-
ous passage of ureteric stones compared to control (risk
ratios ranging from 1.45 to 1.59). The reviews identified
numerous sources of potential bias that limited the strength
E U R O P E A N U R O L O G Y 7 1 ( 2 0 1 7 ) 8 1 1 – 8 1 9
814