Critical appraisal of a randomised
controlled trial (RCT)
Another day, another RCT relating to the shoulder, or so it seems. The
shoulder seems to be a hot topic at the moment with hundreds of RCTs published
over recent years. On the face of it, this should be a good thing but in
reality it can be quite confusing particularly when clear and consistent
messages are not forthcoming.
So, with that in mind, this is the second blog in a series (the first
one here) that will critically appraise published RCTs
relating to the shoulder with the aim of understanding how these might be
relevant to practice.
Before we begin, for those of you not too familiar with the RCT, a previous blog discusses basic design and rationale, here. This might be a useful starting point if some of the terms used seem unfamiliar or confusing.
This month’s blog refers to: Turgut et al. (2017). Effects of scapular stabilization exercise training on scapular kinematics, disability, and pain in subacromial impingement: A RCT. Archives of Physical Medicine & Rehabilitation, 98, 1915-23.
Before we begin, for those of you not too familiar with the RCT, a previous blog discusses basic design and rationale, here. This might be a useful starting point if some of the terms used seem unfamiliar or confusing.
This month’s blog refers to: Turgut et al. (2017). Effects of scapular stabilization exercise training on scapular kinematics, disability, and pain in subacromial impingement: A RCT. Archives of Physical Medicine & Rehabilitation, 98, 1915-23.
Briefly, this RCT was designed to evaluate whether stretching and strengthening with additional scapular stabilisation exercises was better than stretching and strengthening alone in patients classified with:
· Painful arc during shoulder
flexion or abduction,
· Pain with resisted external
rotation and abduction,
· Scapular dyskinesis based on
observational assessment combined with a reduction in shoulder pain with
movement during the scapula assistance test.
The researchers hypothesised that the group who
received additional scapular stabilisation exercises would improve the position
and movement of their scapula and report less pain and disability than the
group who did not receive the additional scapular stabilisation exercises. In
summary, the authors report no significant difference in pain and disability
between the two groups after a 12-week training programme. But, they do report
that they observed statistically significant changes in scapular kinematics in
the group who received specific scapular stabilisation exercises compared to
those who didn’t. So, scapular kinematics appeared to improve but this was not
associated with greater reductions in pain or improvements in function.
Given that previous studies have not reported
changes in scapular kinematics as patients report reduced pain and improved
function [1] and given that the role of
scapular dyskinesis in shoulder pain remains uncertain [2], this seems an interesting
finding.
Critical appraisal
Rather than undertake a systematic and comprehensive critical appraisal, for blog purposes, we will focus on some key aspects that might help us judge whether we can trust the findings of a RCT or whether we should be cautious or even reject the findings. So, please keep this in mind and feel free to add to the debate as you see fit.
With regard to Turgut et al, there are four areas
we will focus on:
1. Differences
between the characteristics of the groups
2. Differences in the
dose of exercise received by the two groups
3. Measurement of scapular dyskinesis
4. Sample size and uncertainty
1. Differences between the characteristics of the groups.
A feature of a well conducted RCT is that the two or more groups that are created are similar at the start of the trial in terms of the factors we know, for example age, height, weight etc, but also the factors we don’t know or are difficult to characterise, for example genetic profile. This is important because if we want to conclude at the end of the RCT that one intervention, i.e. scapular focused exercise, is better than another intervention like general strengthening exercise, then we need to be confident that the only true differences between groups is the intervention they receive. If this is not so, we cannot be confident that any differences we observe are due to the intervention and not some other factor. If this seems confusing, then examples are given in the previous blog (here) to explain this further.
Now, this might not seem important in this RCT
because Turgut et al report no statistically significant differences between
the two groups after the 12-week training programme. But, on closer inspection it
is apparent that the baseline characteristics of the two groups are different;
the group receiving the additional scapular stabilisation exercises (the
intervention group) are on average six years younger (33.4 compared to 39.5
years) and on average have a Body Mass Index (BMI) two points lower (23.7
compared to 25.8 – equivalent to 7kg difference for a male who is 1.8m).
Why might this be important? It is recognised that
age might be associated with poorer prognosis and that assessment of scapular
dyskinesis is difficult, i.e. it is inherently unreliable. It is not clear
whether the age difference in this RCT is relevant but it is likely that the
higher average BMI of the control group (as reflected in the selection criteria
for the RCT) would make assessment of scapular dyskinesis more challenging and
therefore more difficult to determine if change has happened.
Why the imbalance in baseline characteristics might
have occurred is unclear. Turgut et al use a valid method to generate their
random sequence but they do not report how they conceal their allocation, which
gives rise to concern. Other factors, for example pain and disability, appear
to be well balanced. The reason again could relate to the small sample size. To
demonstrate this, take two coins and one friend. Both of you flip your coin ten
times – do you get the same number of heads and tails as each other? Do you get
five heads and five tails? Probably not, because this is a random process. Now
try flipping the coins 20 times; it is likely that you will still get a
different number of heads and tails from each other but it is likely that you
will get a more balanced number of heads and tails. Now try flipping the coin
30 times and the impact of increasing your sample size, i.e. number of flips,
will become clearer as the number of heads and tails becomes closer with more
flips.
2. Differences in the dose of exercise received by the two groups
An
exercise dose response relationship has been reported for patients who complain
of shoulder pain, but can still move their arm [3,4]. This is important in
the context of this RCT that is looking to evaluate whether a specific type of
exercise, i.e. additional scapula stabilisation exercises, confers better
clinical outcomes. In this context, we need to consider whether any result is
due to the specific type of exercise or simply due to doing more exercise.
Turgut
et al. standardized the stretching exercises between the groups but the
intervention group performed different strengthening exercises to the control
group (a step was added to each) and a different amount of resisted exercise
due to the addition of the scapula stabilisation exercises (a minimum of 240
reps and maximum of 480 reps 3 times per week compared to a minimum of 90 reps
and maximum of 180 reps 3 times per week in the control group). Hence, any observed differences between the
two groups could be due to the additional dose of exercise rather than due to
the specific type of exercise.
3. Measurement of scapular dyskinesis
As already mentioned, measurement of scapular dyskinesis is difficult and inherently unreliable. Turgut et al. used an electromagnetic tracking system and reported evidence supporting reliability and validity, but with standard errors of measurement ranging from 3.37⁰ to 7.44⁰ and a minimal detectable change ranging from 7.81⁰ to 17.27⁰. Given that 8⁰ was regarded as an important asymmetrical difference by the authors, the measurement challenge is clear to see. But, this limitation is appropriately recognised by Turgut et al. in their limitations section and it is not immediately apparent what they could have done differently with regards to the measurement tool used; still, this is an important limitation.
As already mentioned, measurement of scapular dyskinesis is difficult and inherently unreliable. Turgut et al. used an electromagnetic tracking system and reported evidence supporting reliability and validity, but with standard errors of measurement ranging from 3.37⁰ to 7.44⁰ and a minimal detectable change ranging from 7.81⁰ to 17.27⁰. Given that 8⁰ was regarded as an important asymmetrical difference by the authors, the measurement challenge is clear to see. But, this limitation is appropriately recognised by Turgut et al. in their limitations section and it is not immediately apparent what they could have done differently with regards to the measurement tool used; still, this is an important limitation.
However, given these measurement related issues, one design feature that
might be useful is blinding of the outcome assessor. Blinding is where
participants/ patients, clinicians, and those assessing outcomes for the
research are unaware of which treatment the patient has received and is
referred to as single, double, and triple blinding respectively. Blinding of
the outcome assessor guards against measurement bias. Measurement bias is a
risk where measurement is not objective, e.g. alive or dead, and where the
outcome assessor might be able to influence the measurement, consciously or
unconsciously, perhaps because they have a preference for one of the
interventions. For example, if the researchers themselves hypothesised that the
addition of scapular stabilisation exercises would result in better clinical
outcomes undertook the measurement it is feasible to suggest that there might
be a risk of measurement bias.
Blinding of outcome assessment is not reported by Turgot et al but
should have been feasible, although many studies are done with resource
limitations which might have prevented employment of additional personnel.
Despite this, lack of outcome assessor blinding in this RCT is potentially another
limitation.
4. Sample size and
uncertainty
Although it is possible to generate findings from a small RCT that are
regarded as internally valid, i.e. we can trust them, what then becomes
difficult is to generalise those findings to the wider population with any
degree of certainty. Remember, often in research we are trying to infer the
findings from our research sample to the wider population, i.e. with regard to
Turgot et al, from the 30 participants in the RCT to the wider population of
patients with this type of shoulder pain. The smaller the sample the more
uncertain we are that the findings are generalizable to the wider population
because quite simply we have less information. This uncertainty is often now
presented as a 95% confidence interval, i.e. the range of values within which
we are 95% certain that the true population value lies (recognising that we
won’t be 100% certain unless we do our research on the entire population which
is usually not possible). For example, a RCT might conclude that the difference
between the two groups in the trial was two points on a pain visual analogue
scale in favour of the intervention group with the 95% confidence interval
being -2 to +4. These statistics mean that in the RCT, the observed difference
between the groups was two points. But, if we were to repeat this study, the
actual difference might be two points in favour of the control group (-2) or as
much as four points in favour of the intervention group (+4). In this example
we see that the confidence interval crosses zero, i.e. the point where there is
no difference between the two groups and correspondingly this result would be
regarded as not statistically significant.
You might be familiar with statistical significance with regard to the
p-value with p > 0.05 regarded as not being statistically significant. This
means, based on the sample data, that we are unable to reject the null
hypothesis that states no difference between the groups. It is important to
read this statement carefully because it is not the same as saying the two
groups are the same.
Given that the 95% confidence interval gives a range of values which are
easier to interpret, this is now preferred as requested in reporting
guidelines. Unfortunately, Turgot et al only present us with p-values that
suggest there is no statistically significant difference between the two groups
in terms of shoulder pain and disability at baseline, after six and 12 weeks. But,
looking more closely we observe that the difference between the two groups in
terms of the SPADI (Shoulder Pain & Disability Index) Total Score is seven
points by six weeks and 13 points by 12 weeks in favour of the group who
received the additional scapular stabilisation exercises (10 points is regarded
as a clinically significant change on the SPADI). So, why do Turgot et al report
that there is no difference? One reason could be that the numbers of
participants in the trial is so small (15 in each group), the data variable and
hence there is insufficient evidence, due to the limited information provided
by the small numbers of participants, to reject the null hypothesis that there
is no true difference between the two groups. So the lack of a statistically
significant difference is not because the results of the RCT suggest the two
groups have the same effect but rather there is insufficient evidence from the
data – this is a vitally important difference, possibly indicative of a Type II
error (some post blog extra reading for you).
Conclusion
So, because of concern about difference in baseline characteristics, different dose of exercise, possible measurement error, and risk of bias we cannot be confident that the reported changes in scapular kinematics are valid and attributable to the addition of scapula stabilisation exercises. Also, because of the small sample size we cannot be confident that the report of no statistically significant difference between the two groups infers that the clinical effectiveness of the interventions in this RCT are similar. Therefore, based on this critical appraisal it is recommended that the findings of this RCT be treated with caution – currently it is not clear how these results develop our understanding about how, or even if, scapular kinematics change in response to intervention.
Hopefully there are a few learning points from this
blog, one clear implication though for future physiotherapy-related research is
that RCTs looking to determine the effectiveness of different interventions
need to have sample sizes sufficient to generate treatment recommendations with
confidence and be designed to increase confidence that any differences observed
can be attributed to variable of interest.
Elif Turgut’s response is below:
I have read your critical appraisal on our recent
RCT. I found very useful and beneficial to receive your perspectives. At some
points, I strongly agree with your comments, on the other hands in some parts
however I do not totally agree with your approach. As a researcher, you may
acknowledge me to be right that there is no perfect scientific study, each
design has its unique limitations.
As you suggested, in this study we used a
valid randomization and I believe we give enough details about allocation. An
independent researcher applied randomization by using computer- generated
numbers, which were stratified based on observed scapular dyskinesis type to
avoid clustering across study groups. A block size of 4 was used within the 2
strata. Therefore, you may see it is not a simple randomization with coin.
Additionally, the baseline characteristics such as age and BMI was not the
factors for the randomization process, so it is natural we have not got exactly
the same characteristics but statistically insignificant differences between
the groups were observed. On the other hand, a lot of factors may affect the
prognosis besides age but I am confused with the rationale behind you mention
about assessment of scapular dyskinesis? If there is an evidence for this I
would like to read it.
Furthermore, scapular dyskinesis type only the
factor to allocate participants in one or two study groups. As you mention
under “3. Measurement of scapular dyskinesis” section, it is not a scapular
dyskinesis measurement, in this study we assessed three-dimensional scapular
kinematics as an outcome measure which is pretty different things when compared
to scapular dyskinesis. Therefore, I see you may overlook and you may fail to
address the article broadly.
Also, the design of this RCT were based on our
hypothesis “A shoulder girdle stretching and strengthening program with
additional scapular stabilization exercises would improve scapular kinematics
and reduce disability and pain compared with a shoulder girdle stretching and
strengthening program without additional exercises in participants with SIS”
therefore we investigate the effect of additional scapular stabilization
training on the outcomes. I mean one group received “a” treatment and the other
group received “a+b”. Therefore, it is expected the training volumes were not
equal. However, as a result both groups were found similar with training.
Another importing thing I must say in this study we reported sample size
analysis, and we stopped the clinical trial when we received the predicted
sample size which is computed based on primary outcome. Therefore, the small sample size judgment
for this study in not valid.
I believe the findings add important insights
to describe how scapular kinematics change in response to specific exercise
training. It should also be noted that the study findings help us develop our
understanding of important components of an exercise programme. First the both
progressive exercise regimes were applicable and well tolerated. And no
additional benefit observed with adding scapular stabilization exercises.
Therefore, it is important to give an active treatment approach to the patients
with aforementioned symptoms independent from additional scapular focused
exercises in 12 week period you will probably have good outcomes.
---------------------------------------------------------------------------------------------------------------------------
Thanks for reading, hope it was useful; thoughts gratefully received.
Chris Littlewood, Tomas
Parraguez, Brian Cho, Sijmen
Hacquebord, Paul Regan
[1] Bury J, West M, Chamorro-Moriana G,
Littlewood C. Effectiveness of scapula-focused approaches in patients with
rotator cuff related shoulder pain: A systematic review and meta-analysis. Man
Ther 2016;25:35–42. doi:10.1016/j.math.2016.05.337.
[2] Littlewood C, Cools AMJ. Scapular
dyskinesis and shoulder pain: the devil is in the detail. Br J Sports Med
2017;0:bjsports-2017-098233. doi:10.1136/bjsports-2017-098233.
[3] Osteras H, Torstensen T, Haugerud L,
Osteras B. Dose-response effects of graded therapeutic exercises in patients
with long-standing subacromial pain. Adv Physiother 2009;11:199–209.
[4] Littlewood C, Malliaras P, Chance-Larsen
K. Therapeutic Exercise for rotator cuff tendinopathy: A systematic review of
contextual factors and prescription parameters. Int J Rehabil Res 2015;38.
No comments:
Post a Comment