Friday, 12 January 2018

Critical Appraisal of a RCT (January 2018)

Critical appraisal of a randomised controlled trial (RCT)

Another day, another RCT relating to the shoulder, or so it seems. The shoulder seems to be a hot topic at the moment with hundreds of RCTs published over recent years. On the face of it, this should be a good thing but in reality it can be quite confusing particularly when clear and consistent messages are not forthcoming.

So, with that in mind, this is the second blog in a series (the first one here) that will critically appraise published RCTs relating to the shoulder with the aim of understanding how these might be relevant to practice.

Before we begin, for those of you not too familiar with the RCT, a previous blog discusses basic design and rationale, 
here. This might be a useful starting point if some of the terms used seem unfamiliar or confusing.

This month’s blog refers to: Turgut et al. (2017). Effects of scapular stabilization exercise training on scapular kinematics, disability, and pain in subacromial impingement: A RCT. Archives of Physical Medicine & Rehabilitation, 98, 1915-23.

Briefly, this RCT was designed to evaluate whether stretching and strengthening with additional scapular stabilisation exercises was better than stretching and strengthening alone in patients classified with:

·       Painful arc during shoulder flexion or abduction,
·       Pain with resisted external rotation and abduction,
·       Scapular dyskinesis based on observational assessment combined with a reduction in shoulder pain with movement during the scapula assistance test.

The researchers hypothesised that the group who received additional scapular stabilisation exercises would improve the position and movement of their scapula and report less pain and disability than the group who did not receive the additional scapular stabilisation exercises. In summary, the authors report no significant difference in pain and disability between the two groups after a 12-week training programme. But, they do report that they observed statistically significant changes in scapular kinematics in the group who received specific scapular stabilisation exercises compared to those who didn’t. So, scapular kinematics appeared to improve but this was not associated with greater reductions in pain or improvements in function.

Given that previous studies have not reported changes in scapular kinematics as patients report reduced pain and improved function [1] and given that the role of scapular dyskinesis in shoulder pain remains uncertain [2], this seems an interesting finding.

Critical appraisal
Rather than undertake a systematic and comprehensive critical appraisal, for blog purposes, we will focus on some key aspects that might help us judge whether we can trust the findings of a RCT or whether we should be cautious or even reject the findings. So, please keep this in mind and feel free to add to the debate as you see fit.

With regard to Turgut et al, there are four areas we will focus on:
1. Differences between the characteristics of the groups
2. Differences in the dose of exercise received by the two groups
3. Measurement of scapular dyskinesis
4. Sample size and uncertainty

1. Differences between the characteristics of the groups.

A feature of a well conducted RCT is that the two or more groups that are created are similar at the start of the trial in terms of the factors we know, for example age, height, weight etc, but also the factors we don’t know or are difficult to characterise, for example genetic profile. This is important because if we want to conclude at the end of the RCT that one intervention, i.e. scapular focused exercise, is better than another intervention like general strengthening exercise, then we need to be confident that the only true differences between groups is the intervention they receive. If this is not so, we cannot be confident that any differences we observe are due to the intervention and not some other factor. If this seems confusing, then examples are given in the previous blog (
here) to explain this further.

Now, this might not seem important in this RCT because Turgut et al report no statistically significant differences between the two groups after the 12-week training programme. But, on closer inspection it is apparent that the baseline characteristics of the two groups are different; the group receiving the additional scapular stabilisation exercises (the intervention group) are on average six years younger (33.4 compared to 39.5 years) and on average have a Body Mass Index (BMI) two points lower (23.7 compared to 25.8 – equivalent to 7kg difference for a male who is 1.8m).

Why might this be important? It is recognised that age might be associated with poorer prognosis and that assessment of scapular dyskinesis is difficult, i.e. it is inherently unreliable. It is not clear whether the age difference in this RCT is relevant but it is likely that the higher average BMI of the control group (as reflected in the selection criteria for the RCT) would make assessment of scapular dyskinesis more challenging and therefore more difficult to determine if change has happened.

Why the imbalance in baseline characteristics might have occurred is unclear. Turgut et al use a valid method to generate their random sequence but they do not report how they conceal their allocation, which gives rise to concern. Other factors, for example pain and disability, appear to be well balanced. The reason again could relate to the small sample size. To demonstrate this, take two coins and one friend. Both of you flip your coin ten times – do you get the same number of heads and tails as each other? Do you get five heads and five tails? Probably not, because this is a random process. Now try flipping the coins 20 times; it is likely that you will still get a different number of heads and tails from each other but it is likely that you will get a more balanced number of heads and tails. Now try flipping the coin 30 times and the impact of increasing your sample size, i.e. number of flips, will become clearer as the number of heads and tails becomes closer with more flips.

2. Differences in the dose of exercise received by the two groups
An exercise dose response relationship has been reported for patients who complain of shoulder pain, but can still move their arm [3,4]. This is important in the context of this RCT that is looking to evaluate whether a specific type of exercise, i.e. additional scapula stabilisation exercises, confers better clinical outcomes. In this context, we need to consider whether any result is due to the specific type of exercise or simply due to doing more exercise.

Turgut et al. standardized the stretching exercises between the groups but the intervention group performed different strengthening exercises to the control group (a step was added to each) and a different amount of resisted exercise due to the addition of the scapula stabilisation exercises (a minimum of 240 reps and maximum of 480 reps 3 times per week compared to a minimum of 90 reps and maximum of 180 reps 3 times per week in the control group).  Hence, any observed differences between the two groups could be due to the additional dose of exercise rather than due to the specific type of exercise.

3. Measurement of scapular dyskinesis
As already mentioned, measurement of scapular dyskinesis is difficult and inherently unreliable. Turgut et al. used an electromagnetic tracking system and reported evidence supporting reliability and validity, but with standard errors of measurement ranging from 3.37⁰ to 7.44⁰ and a minimal detectable change ranging from 7.81⁰ to 17.27⁰. Given that 8⁰ was regarded as an important asymmetrical difference by the authors, the measurement challenge is clear to see. But, this limitation is appropriately recognised by Turgut et al. in their limitations section and it is not immediately apparent what they could have done differently with regards to the measurement tool used; still, this is an important limitation.

However, given these measurement related issues, one design feature that might be useful is blinding of the outcome assessor. Blinding is where participants/ patients, clinicians, and those assessing outcomes for the research are unaware of which treatment the patient has received and is referred to as single, double, and triple blinding respectively. Blinding of the outcome assessor guards against measurement bias. Measurement bias is a risk where measurement is not objective, e.g. alive or dead, and where the outcome assessor might be able to influence the measurement, consciously or unconsciously, perhaps because they have a preference for one of the interventions. For example, if the researchers themselves hypothesised that the addition of scapular stabilisation exercises would result in better clinical outcomes undertook the measurement it is feasible to suggest that there might be a risk of measurement bias.

Blinding of outcome assessment is not reported by Turgot et al but should have been feasible, although many studies are done with resource limitations which might have prevented employment of additional personnel. Despite this, lack of outcome assessor blinding in this RCT is potentially another limitation.

4. Sample size and uncertainty
Although it is possible to generate findings from a small RCT that are regarded as internally valid, i.e. we can trust them, what then becomes difficult is to generalise those findings to the wider population with any degree of certainty. Remember, often in research we are trying to infer the findings from our research sample to the wider population, i.e. with regard to Turgot et al, from the 30 participants in the RCT to the wider population of patients with this type of shoulder pain. The smaller the sample the more uncertain we are that the findings are generalizable to the wider population because quite simply we have less information. This uncertainty is often now presented as a 95% confidence interval, i.e. the range of values within which we are 95% certain that the true population value lies (recognising that we won’t be 100% certain unless we do our research on the entire population which is usually not possible). For example, a RCT might conclude that the difference between the two groups in the trial was two points on a pain visual analogue scale in favour of the intervention group with the 95% confidence interval being -2 to +4. These statistics mean that in the RCT, the observed difference between the groups was two points. But, if we were to repeat this study, the actual difference might be two points in favour of the control group (-2) or as much as four points in favour of the intervention group (+4). In this example we see that the confidence interval crosses zero, i.e. the point where there is no difference between the two groups and correspondingly this result would be regarded as not statistically significant.

You might be familiar with statistical significance with regard to the p-value with p > 0.05 regarded as not being statistically significant. This means, based on the sample data, that we are unable to reject the null hypothesis that states no difference between the groups. It is important to read this statement carefully because it is not the same as saying the two groups are the same.

Given that the 95% confidence interval gives a range of values which are easier to interpret, this is now preferred as requested in reporting guidelines. Unfortunately, Turgot et al only present us with p-values that suggest there is no statistically significant difference between the two groups in terms of shoulder pain and disability at baseline, after six and 12 weeks. But, looking more closely we observe that the difference between the two groups in terms of the SPADI (Shoulder Pain & Disability Index) Total Score is seven points by six weeks and 13 points by 12 weeks in favour of the group who received the additional scapular stabilisation exercises (10 points is regarded as a clinically significant change on the SPADI). So, why do Turgot et al report that there is no difference? One reason could be that the numbers of participants in the trial is so small (15 in each group), the data variable and hence there is insufficient evidence, due to the limited information provided by the small numbers of participants, to reject the null hypothesis that there is no true difference between the two groups. So the lack of a statistically significant difference is not because the results of the RCT suggest the two groups have the same effect but rather there is insufficient evidence from the data – this is a vitally important difference, possibly indicative of a Type II error (some post blog extra reading for you).

So, because of concern about difference in baseline characteristics, different dose of exercise, possible measurement error, and risk of bias we cannot be confident that the reported changes in scapular kinematics are valid and attributable to the addition of scapula stabilisation exercises. Also, because of the small sample size we cannot be confident that the report of no statistically significant difference between the two groups infers that the clinical effectiveness of the interventions in this RCT are similar. Therefore, based on this critical appraisal it is recommended that the findings of this RCT be treated with caution – currently it is not clear how these results develop our understanding about how, or even if, scapular kinematics change in response to intervention.

Hopefully there are a few learning points from this blog, one clear implication though for future physiotherapy-related research is that RCTs looking to determine the effectiveness of different interventions need to have sample sizes sufficient to generate treatment recommendations with confidence and be designed to increase confidence that any differences observed can be attributed to variable of interest.

Elif Turgut’s response is below:

I have read your critical appraisal on our recent RCT. I found very useful and beneficial to receive your perspectives. At some points, I strongly agree with your comments, on the other hands in some parts however I do not totally agree with your approach. As a researcher, you may acknowledge me to be right that there is no perfect scientific study, each design has its unique limitations. 

As you suggested, in this study we used a valid randomization and I believe we give enough details about allocation. An independent researcher applied randomization by using computer- generated numbers, which were stratified based on observed scapular dyskinesis type to avoid clustering across study groups. A block size of 4 was used within the 2 strata. Therefore, you may see it is not a simple randomization with coin. Additionally, the baseline characteristics such as age and BMI was not the factors for the randomization process, so it is natural we have not got exactly the same characteristics but statistically insignificant differences between the groups were observed. On the other hand, a lot of factors may affect the prognosis besides age but I am confused with the rationale behind you mention about assessment of scapular dyskinesis? If there is an evidence for this I would like to read it. 

Furthermore, scapular dyskinesis type only the factor to allocate participants in one or two study groups. As you mention under “3. Measurement of scapular dyskinesis” section, it is not a scapular dyskinesis measurement, in this study we assessed three-dimensional scapular kinematics as an outcome measure which is pretty different things when compared to scapular dyskinesis. Therefore, I see you may overlook and you may fail to address the article broadly. 

Also, the design of this RCT were based on our hypothesis “A shoulder girdle stretching and strengthening program with additional scapular stabilization exercises would improve scapular kinematics and reduce disability and pain compared with a shoulder girdle stretching and strengthening program without additional exercises in participants with SIS” therefore we investigate the effect of additional scapular stabilization training on the outcomes. I mean one group received “a” treatment and the other group received “a+b”. Therefore, it is expected the training volumes were not equal. However, as a result both groups were found similar with training. Another importing thing I must say in this study we reported sample size analysis, and we stopped the clinical trial when we received the predicted sample size which is computed based on primary outcome. Therefore, the small sample size judgment for this study in not valid.

I believe the findings add important insights to describe how scapular kinematics change in response to specific exercise training. It should also be noted that the study findings help us develop our understanding of important components of an exercise programme. First the both progressive exercise regimes were applicable and well tolerated. And no additional benefit observed with adding scapular stabilization exercises. Therefore, it is important to give an active treatment approach to the patients with aforementioned symptoms independent from additional scapular focused exercises in 12 week period you will probably have good outcomes.

Thanks for reading, hope it was useful; thoughts gratefully received.

Chris Littlewood, Tomas Parraguez, Brian Cho, Sijmen Hacquebord, Paul Regan

[1]       Bury J, West M, Chamorro-Moriana G, Littlewood C. Effectiveness of scapula-focused approaches in patients with rotator cuff related shoulder pain: A systematic review and meta-analysis. Man Ther 2016;25:35–42. doi:10.1016/j.math.2016.05.337.
[2]       Littlewood C, Cools AMJ. Scapular dyskinesis and shoulder pain: the devil is in the detail. Br J Sports Med 2017;0:bjsports-2017-098233. doi:10.1136/bjsports-2017-098233.
[3]       Osteras H, Torstensen T, Haugerud L, Osteras B. Dose-response effects of graded therapeutic exercises in patients with long-standing subacromial pain. Adv Physiother 2009;11:199–209.
[4]       Littlewood C, Malliaras P, Chance-Larsen K. Therapeutic Exercise for rotator cuff tendinopathy: A systematic review of contextual factors and prescription parameters. Int J Rehabil Res 2015;38.

No comments:

Post a Comment