Critical appraisal of a RCT (February 2018).
Exercise has been shown to be as effective as surgery for subacromial pain syndrome (SAPS) in a number of different studies [1,2,3]. While this is encouraging for proponents of exercise based physiotherapy, it also raises more questions than it answers. What type of exercise? How much of it? For how long? Does it matter if it's painful or not? How, why, or even does either treatment actually help? This month we examine a RCT designed to try to shed some light some of these questions by investigating whether one type of exercise (in this case non-painful eccentric training of the shoulder external rotators) produces better results than another (general exercise) in this patient group.
This is the third in a series of blogs (the first two can be found here and here). Again, this blog will not provide a systematic or comprehensive critical appraisal of the chosen paper, or employ one of the many critical appraisal tools available, but will highlight what we consider to be three important elements to consider when trying to interpret the results of the trial and apply them to our patients. This blog will consider various aspects of RCT design and implementation, an overview of which can be found here.
This month we will consider the potential sources of bias and the degree of confidence or doubt we have in results of the following RCT:
Shoulder external rotator eccentric training versus general shoulder exercise for subacromial pain syndrome: a randomized controlled trial. The International Journal of Sports Physical Therapy, 12(7), 1121-33.
This study was designed to investigate whether eccentric training of the shoulder external rotators (ETER) or general exercise (GE) produces better clinical outcomes in those with SAPS. This study defined SAPS based on the presence of at least three of the following; a positive Neer, Hawkins-Kennedy, or empty can test, painful resisted external rotation, palpable tenderness of the supraspinatus or infraspinatus insertion, or a painful arc of abduction. The primary outcome measure was the Western Ontario Rotator Cuff Index (WORC), a patient reported outcome measure which considers physical symptoms, sports and recreation, work, lifestyle, and emotions. Secondary outcome measures were a Numerical Pain Rating Scale (NPRS) for best, worst, and average pain, isometric strength, active range of movement, Y balance test, and Global Rating of Change (GROC). 48 participants were randomised into two groups (25 in the ETER group and 23 in the GE group) and underwent a six week exercise program including four visits to a physical therapist (the study was conducted in the USA). The ETER group performed non-painful eccentric external rotation (3 sets of 15 with a 3 second eccentric phase), resisted scapula retraction (2 sets of 10), and a cross body stretch (3 reps with 30-45 second holds). The GE group performed active flexion and abduction with no resistance (2 x 10 reps of each), and the same resisted scapula retraction and stretching excises as the ETER group. All outcomes were measured at baseline, 3 week, 6 weeks, and 6 months. The study found that ETER produced statically significant superior results compared to GE at 3 weeks, 6 weeks, and 6 months according to WORC score, NPRS score, and isometric muscle strength. There were no statistically significant differences in active range of movement, Y balance, or GROC. The authors conclude that eccentric training may be efficacious to improve self-reported pain, function and strength in those with SAPS.
It is often easier to find faults when critically appraising a study, so let's start with the strengths of this RCT. It asked a clinically relevant question and used a design appropriate to answer it, and the study protocol was published before the trial began which guards against bias. Additionally, the interventions and outcome measures were well described. This may sound simple but the quality of description of interventions and outcome measures in RCT’s, especially those including therapeutic exercise, is often poor limiting interpretation, application, and replication of results [4,5].
However there are a few aspects of the trial which we must consider before deciding how confident we can be in the reported superiority of ETER exercises over GE. The three areas we feel are most important to consider are:
1. Differential dropouts and statistical analysis.
2. The choice of comparator.
3. Methods of randomisation and baseline differences between groups.
1. Differential dropout rates and statistical analysis.
Dropouts and the resultant incomplete data cause investigators a number of problems when it comes to analysing and interpreting RCT results. One of the key characteristics of RCT’s that reduces their susceptibility to bias is randomisation. Random allocation of participants to different treatment groups aims to ensure that the groups are comparable at baseline in relation to both known and unknown factors that might influence the outcome of the trial. This increases our confidence that any differences in treatment effectiveness are related to the intervention of interest rather than any baseline differences. In order to maintain this control of bias, the random treatment assignment must be preserved through the whole trial including in the statistical analysis. This form of statistical analysis is known as an intention-to-treat (ITT) analysis. This analyses participants based on the group to which they were assigned at randomisation irrespective of what treatment they actually received or whether they completed the trial. This is considered the most appropriate method of analysis when comparing the effectiveness of treatments in RCT’s .
The authors in this study did not employ an ITT approach because they were concerned that the asymmetrical dropouts between groups may cause type I error (finding a significant difference when one does not exist). Instead they analysed only participants that completed the trial which is termed a completed cases analysis. Whether asymmetrical dropouts cause error in the results of an RCT depends on why the data are missing (rather awkwardly termed ‘missingness’) and how it is handled in the analysis . If those that dropout do so completely at random, then a completed cases analysis is reasonable because the two groups available for analysis are still based on chance alone. However, if the data is not missing completely at random, a completed cases approach analyses a non-random subset of those that entered the trial and compromises the initial randomisation process. In this trial the differential dropouts between groups (39% in the GE group dropped out compared to 12% in the ETER group) increases suspicion that that the reasons for this may not have been completely random . Using a completed cases analysis therefore increases doubt that the differences in treatment outcome can be attributed to the intervention of interest confidently.
Thank you, this is a great point and one that we deliberated over for quite some time. There are pros and cons with ITT and we had concerns that week 3 between group comparisons could be inflated with the 2 subjects in the control group reporting a slight worsening in subjective outcomes and subsequently dropping from the trial early. This will end up lending to your comparator group interventions discussion but generally speaking my biggest concern was that the active range of motion exercises used as a comparator may have slightly increased symptoms in the early phase of treatment for some subjects in the control group and carrying over week 1 data could falsely inflate between group differences in favor of the experimental group. If the 6 month dropouts were the only issue using ITT to carry over week 6 data to 6 month data would be an easier decision but the early dropouts from the control group were a big factor in the decision.
2. The choice of comparator.
To accurately test how effective a treatment is it needs to be compared to something. Studies where treatment outcomes are measured without a comparison group can show that a patient had a particular treatment and got better, but cannot show that they got better because of the particular treatment. Controlled studies (both RCT’s and other non-randomised controlled studies) use a control group to demonstrate what would have happened if the participants had not had the treatment of interest, either by doing nothing (no treatment control), making patients think that they have had the treatment of interest but without administering the active components (placebo control), or by comparing to another treatment (active control). In this case an active control was chosen. While this is reasonable because the alternative to using eccentric exercises would be to provide an alternative exercise-based treatment, the most appropriate comparator would be representative of current practice (so we know whether changing to this ‘new’ or ‘different’ treatment is better than what we already do). This study uses range of movement exercises (with one resisted exercise that was standardised across groups) to represent general exercise. The authors themselves identify that this may not be representative to a typical exercise program used in clinical practice. Unless this reflects our current practice, it makes it very difficult to know what these results mean.
The choice of control also introduces doubt as to whether we can be sure that it was the type of exercise that was the decisive factor in determining the results of this trial. Both groups performed exercises that involved both concentric and eccentric phases. This makes it less clear whether this was a true comparison between two distinct types of exercise. The control group also performed lower dose and lower resistance exercises than the ETER group. Previous studies have suggested that exercise protocols that include resisted exercise may be more effective that those that do not , and that higher dose exercise may be more effective than lower dose exercise . Even if we accept the reported differences in outcomes between the two groups, can we be confident that it was the type of exercise that caused them?
Author response: This is another great point. I am not confident that the comparator exercise in this trial was representative of what a PT would do in practice. Simply having a patient actively move the shoulder through an elevation movement without load may not be a typical general exercise program. Its possible that the various differences between exercise programs, ie load, specific isolated movement, arm position etc could be the reason for between group differences rather than the fact that the experimental group utilized an eccentric exercise.
3. Methods of randomisation and baseline differences between groups.
As described, the benefit of randomisation is that it theoretically balances both known and unknown factors that could potentially influence the outcome of the trial between the groups. This increases our confidence that any difference in outcome is due to the intervention of interest and not some other known or unknown difference between groups. In this trial the researchers randomised patients by asking them to blindly place a pen on a table of random numbers. Manual randomisation methods such as this (or the use of a coin toss, drawing lots, of shuffling cards) introduce more doubt than more robust methods like using computer generated or remotely generated random numbers because either the participant or investigator could theoretically influence the process. For example what happened if a participant landed equidistant between 2, 3, or even 4 numbers in the table? This raises an important point in the assessment of the risk of bias; we are not saying that the participants or investigators did unduly influence the randomisation process in this trial, just that the method used increases doubt because they theoretically could have done so. We as readers will never know for sure if the results were unduly influenced, and that is why we asses for the risk of bias rather than actual bias itself.
If there was bias in the randomisation process this would mean that there were systematic differences between the two treatment groups. However, the fact that there were systematic differences between the two groups does not necessarily mean that there was bias in the randomisation process. Randomisation can only maximise the probability that known and unknown factors are balanced between the two groups; it cannot not guarantee that this is the case. The bigger the sample size the more likely they are to be balanced (the reasons for this were discussed in a previous blog here). In this study there were statistically significant differences in favour of the ETER group in strength (ABD/ER ratio) and Y balance. There were also non-significant (but not necessarily non-important) differences in all other baseline strength measurements, most range of movement measurements, best pain, and younger age. We do not really know how, why, or even if exercise really does help patients with SAPS so we cannot know how, why, or if these baseline differences affected treatment outcomes. If it is feasible that younger, stronger, patients with better range of movement and balance are more likely to benefit more from exercise based treatment, then we have to consider that it could have been differences between groups rather than the differences in treatment effectiveness that caused the differences in outcomes.
Author response: I also agree with this, hindsight is 20/20, If we did a similar trial again the use of computer generated randomization would be much preferred. The topic of baseline variables that could be affiliated with improved outcomes is very important. I would love to have collected more baseline variables in a larger sample and run a regression off the responders to determine the patient characteristics that are consistent with a positive outcome. In this case we are examining the between group mean but some participants have dramatic improvements over others. It would be interesting to know which patients respond best to heavy load exercises and which do not respond as favorably.
This study reports that eccentric training of the external rotators of the shoulder produces significantly better results that general exercise in those with SAPS. However, as with any RCT, it is important to appraise the methods used before accepting the results. Differential dropouts between groups and the way the data were analysed might increase risk of bias and hence decreases our confidence in the reported results, and baseline differences between groups and the choice of comparator increases doubt as to whether any differences in outcome can be specifically linked to the type of exercise performed.
Author response: One last point about this topic is that the progression of exercise mode, load and volume dosing is critically important. The level of tissue irritability is also an important factor to help dictate exercise prescription and in clinical practice I wouldn’t arbitrarily prescribe eccentric exercises to any patient with chronic sub-acromial pain. Progressions in arm position, type of movement (ie. Isometric vs light isotonic vs eccentric) and load/dose increases respective of patient tolerance and baseline strength will be important to integrate into future trials. A pragmatic design that allows the clinician to manipulate these exercise prescription variables based on patient presentation will be important in future studies. Thank you all for your interest and review of this topic.
Thanks for reading, hope it was useful; thoughts gratefully received.
Paul Regan, Chris Littlewood, Tomas Parraguez, Brian Cho, Sijmen Hacquebord
, stergaard SAnnals of Rheumatic Diseases,
Higgins JPT, Green S (editors). Cochrane Handbook for Systematic Reviews of Interventions Version 5.1.0 [updated March 2011]. The Cochrane Collaboration, 2011. Available from http://handbook.cochrane.org.
Bell ML, Kenward, MG, Horton, NJ, (2013). Differential dropout and bias in randomised controlled trials: when it matters and when it may not. British Medical Journal, 346:e8668.
åPhysiotherapy Research International,