We now present the results indicating the effects of social networks on clinicians revisions to their diagnostic assessments and their treatment recommendations. In the following analyses, diagnostic accuracy is defined as the absolute number of percentage points between a clinicians diagnostic assessment and the most accurate diagnostic assessment. For clarity of presentation, we normalize diagnostic accuracy on a 01 scale by applying min-max normalization to the absolute error of clinicians diagnostic assessments. Under this procedure, the minimum possible accuracy (indicated by 0) corresponds to the diagnostic assessment with the greatest absolute error (i.e. an estimate that is as far as possible from the most accurate answer of 16%, which in this case is 84 percentage points), while the maximum possible accuracy (indicated by 1) corresponds to a diagnostic assessment that is 0 percentage points away from the most accurate answer, such that they are equivalent (SI, Statistical Analyses). As above, in the discussion of our results we refer to the patient-actors in the standardized patient videos as patients.
Clinicians initial assessments and treatment recommendations were made independently. Figure1 shows that for the initial responses of all clinicians in the study, there were no significant differences in the accuracy of the diagnostic assessments (Fig.1a, b) given to the Black female patient and the white male patient (p>0.5, n=28, Wilcoxon Rank Sum Test, Two-sided); nor were there any significant differences in the accuracy of initial diagnostic assessments when controlling for experimental condition using a regression approach (=1.06, CI=[3.79 to 5.92], p=0.67, Supplementary Table6). However, consistent with previous studies of bias in medical care2,3,4,5,6, despite clinicians providing both patients with similar diagnostic assessments, clinicians treatment recommendations varied significantly between patients. Across all clinicians, their initial treatment recommendations (Fig.1c, d) show a significant disparity in the rate at which the guideline-recommended treatment was recommended for the white male patient versus the Black female patient. Overall, clinicians recommended Option C, referral to the emergency department for immediate evaluation, for the white male patient in 22% of responses, while only making this recommendation for the Black female patient in 14% of responses (p=0.02, n=28 observations, Wilcoxon Rank Sum Test, Two-sided).
Panels a and b show the change (from the initial assessment to the final assessment) in the average diagnostic accuracy of clinicians. Panel a shows the control conditions. Panel b shows the network conditions. The insets in both panels show the total improvement (in percentage points) in the accuracy of clinicians diagnostic assessments. Error bars display 95% confidence intervals; data points display the mean change for each of the trials (N=7) in each condition. Panels c and d show the change (from the initial recommendation to the final recommendation) in the proportion of clinicians recommending the guideline-recommended treatment recommendationreferral to the emergency department for immediate cardiac evaluation (Option C)for the white male patient-actor and Black female patient-actor. Panel c shows the control conditions. Panel d shows the network conditions. The insets in both panels show the total improvement (in percentage points) in the percent of clinicians recommending the guideline-recommended treatment. Error bars display 95% confidence intervals; data points display the mean change for each of the trials (N=7) in each condition. Panels e and f show the change (from the initial response to the final response) in the odds of clinicians recommending option A (unsafe undertreatment) rather than option C (highest quality, guideline-recommended treatment) for each patient-actor. Panel e shows the control conditions. Panel f shows the network conditions. The insets in both panels show the total reduction in the likelihood that clinicians would recommend unsafe undertreatment rather than the guideline-recommended treatment for each patient-actor. Error bars display 95% confidence intervals; data points display the mean change for each of the trials (N=7) in each condition.
In the control conditions (Fig.1a), after two rounds of revision there was no significant change in the accuracy of clinicians assessments (i.e. diagnostic estimates) for either the white male patient (p>0.9, n=7, Fig.1a inset, Wilcoxon Signed Rank Test, Two-sided) or the Black female patient (p>0.9, n=7, Fig.1a inset, Wilcoxon Signed Rank Test, Two-sided). Correspondingly, Fig.1c shows that in the control conditions there was no significant change in the rate at which clinicians recommend the guideline-recommend treatment for either the Black female patient or the white male patient (Black female patient showed a 3 percentage point increase, p=0.81, n=7 observations, Wilcoxon Signed Rank Test, Two-sided; white male patient showed a 1 percentage point increase, p=0.93, n=7 observations, Wilcoxon Signed Rank Test, Two-sided; Fig.1c). Clinicians final treatment recommendations in the control conditions still showed a significant disparity between the white male patient and the Black female patient in their rates of referral to the emergency department (p=0.04, n=14 observations, Wilcoxon Signed Rank Test, Two-sided; Fig.1c).
Figure1b shows that in the network conditions there were significant improvements (from the initial response to the final response) in the accuracy of the assessments given to both the white male patient (p=0.04, n=7, Wilcoxon Signed Rank Test, Two-sided; Fig.1b inset) and the Black female patient (p=0.01, n=7 observations, Wilcoxon Signed Rank Test, Two-sided; Fig.1b inset). Figure1d shows that in the network conditions, after two rounds of revision there was no significant change in the rate at which clinicians recommended the guideline-recommended treatment for the white male patient (p=0.57, n=7 observations, Wilcoxon Signed Rank Test, Two-sided; Fig.1d inset). This lack of change is due to the fact that, regardless of the accuracy of their initial assessments for the white male patient, clinicians were initially significantly more likely to recommend the guideline-recommended treatment for white male patient (p<0.01, OR=1.78, CI=[1.22.6], Supplementary Table7). Consequently, improvements in assessment accuracy for the white male patient had a smaller positive impact on increasing clinicians likelihood of recommending the guideline-recommended treatment. By contrast, clinicians initially were significantly less likely to recommend the guideline-recommended treatment for the Black female patient (p<0.01, OR=0.56, CI=[0.380.83], Supplementary Table7), while they were significantly more likely to recommend unsafe undertreatment for this patient (p<0.05, OR=1.5, CI=[1.082.04], Supplementary Table8). Consequently, improvements in assessment accuracy had a substantially greater effect on the final treatment recommendations for the Black female patient (Fig.1d). In the network condition, the rate at which clinicians recommended guideline-recommended treatment for the Black female patient increased significantly, from 14% in initial response to 27% in final response (p<0.01, n=7 observations, Wilcoxon Signed Rank Test, Two-sided; Fig.1d). As a result, clinicians final treatment recommendations in the network conditions exhibited no significant disparity between the Black female patient and the white male patient in terms of referral rates to the emergency department (p=0.22, n=14 observations, Wilcoxon Rank Sum Test, Two-sided; See Supplementary Table11).
The primary pathway for bias reduction in the network condition was the effect of improvements in clinicians assessment accuracy on reducing the initially high rates at which unsafe undertreatment was recommended for the Black female patient. Figure1e, f shows the odds of clinicians recommending unsafe undertreatment rather than the guideline-recommended treatment for both patients in both conditions. Consistent with the above discussion, treatment recommendations for the white male patient did not exhibit any bias toward unsafe undertreatment (p=0.19, n=14, Wilcoxon Signed Rank Test, Two-sided). As expected, improvements in assessment accuracy in the network condition did not significantly impact clinicians odds of recommending the guideline-recommended treatment rather than unsafe undertreatment for the white male patient (p=0.21, n=7, Wilcoxon Signed Rank Test, Two-sided). By contrast, clinicians initially had significantly greater odds of recommending unsafe undertreatment rather than the guideline-recommended treatment for the Black female patient (Fig.1e, f; p<0.01, n=28 observations, Wilcoxon Signed Rank Test, Two-sided). Independent revision in the control conditions did not have any impact on the treatment recommendations for either the white male (p=1.0, n=7, Wilcoxon Signed Rank Test, Two-sided) or the Black female patient (p=0.81, n=7, Wilcoxon Signed Rank Test, Two-sided). However, assessment revisions in the network condition led to a significant change in the odds of clinicians recommending the guideline-recommended treatment rather than unsafe undertreatment for the Black female patient (Fig.1fp=0.01, n=7, Wilcoxon Signed Rank Test, Two-sided). By the final round in the network conditions, there was no significant difference between patients in their odds of having clinicians recommend the guideline-recommended treatment rather than unsafe undertreatment (Fig.1f, p=0.19, n=14, Wilcoxon Rank Sum Test, Two-sided).
The network mechanism responsible for improvements in the accuracy of clinicians assessments, and the corresponding reduction of race and gender disparity in their treatment recommendations, is the disproportionate impact of accurate individuals in the process of belief revision within egalitarian social networks13,15,16. As demonstrated in earlier studies of networked collective intelligence13,15,16, during the process of belief revision in peer networks there is an expected correlation between the accuracy of an individuals beliefs and the magnitude of their belief revisions, such that accurate individuals revise their responses less; this correlation between accuracy and revision magnitude is referred to as the revision coefficient13. Within egalitarian social networks, a positive revision coefficient has been found to give greater de facto social influence to more accurate individuals, which is predicted to produce network-wide improvements in the accuracy of individual beliefs within the social network. These improvements in collective accuracy have been found to result in a corresponding reduction in biased responses among initially biased participants12,13,15,16. Figure2a tests this prediction for clinicians in our study. The results show, as expected, that there is a significant positive revision coefficient among clinicians in the network conditions (p<0.001, r=0.66, SE=0.1, clustered by trial, Supplementary Table14), indicating that less accurate clinicians made greater revisions to their responses while more accurate clinicians made smaller revisions, giving greater de facto influence in the social network to more accurate clinicians. This correlation holds equally for clinicians assessments for both the white male and Black female patients (Supplementary Table14). Figure2b shows that for both patients, improvements in assessment accuracy led to significant improvements in the quality of their treatment recommendations (p<0.05, OR=1.04, CI=[1.00, 1.09], Supplementary Table9). Importantly, for clinicians who initially recommended unsafe undertreatment (Option A), we find that improvements in assessment accuracy significantly predict an increased likelihood of recommending the guideline-recommended treatment (Option C) by the final round (p<0.01, OR=1.17, CI=[1.03, 1.33], Supplementary Table10). These improvements translated into a significant reduction in the inequity of recommended care for the Black female patient, for whom clinicians were initially significantly more likely to recommend unsafe undertreatment (see Fig.3, below).
Panel a shows clinicians propensity to revise their diagnostic assessments in the network conditions according to the initial error in their diagnostic assessments. Clinicians accuracy is represented as the absolute number of percentage points of a given assessment from the most accurate assessment of 16% (represented by 0 along the x-axis, indicating a distance of 0 percentage points from the most accurate response). Magnitude of revision is measured as the absolute difference (percentage points) between a clinicians initial diagnostic assessment and their final diagnostic assessment. Clinicians accuracy in their initial assessment significantly predicts the magnitude of their revisions between the initial to final response. Grey error band displays 95% confidence intervals for the fit of an OLS model regressing initial error of diagnostic assessment on magnitude of revision. Panel b shows the significant positive relationship between the improvement in clinicians diagnostic accuracy (from the initial to final assessment), and their likelihood of improving in their treatment recommendation (i.e. the probability of switching from recommending Option A, B, or D to Option C) for clinicians in the network conditions. The trend line shows the estimated probability of clinicians improving their treatment recommendations according to a logistic regression, controlling for an interaction between experimental condition (control or network) and patient-actor demographic (Black female or white male) (Supplementary Table9). Error bars show standard errors clustered at the trial level.
Each panel shows the fraction of clinicians providing each treatment recommendation at the initial and final response, averaged first within each of the trials in each condition (N=7), and then averaged across trials. Option A. 1 week follow-up (unsafe undertreatment). Option B. Stress test in 23 days (undertreatment). Option C. Immediate emergency department evaluation (guideline-recommended treatment). Option D. Immediate cardiac catheterization (overtreatment Panel a shows the change in control condition recommendations for the Black female patient-actor (initial recommendations light pink, final recommendations dark pink). Panel b shows the change in network condition recommendations for the Black female patient-actor (initial recommendations light pink, final recommendations dark pink). Panel c shows the change in control condition recommendations for the white male patient-actor (initial recommendations light blue, final recommendations dark blue). Panel d shows the change in network condition recommendations for the white male patient-actor (initial recommendations light blue, final recommendations dark blue).
Figure3 shows the changing rates at which clinicians recommended each option (Option A. unsafe undertreatment, Option B. undertreatment, Option C. guideline-recommended treatment, and Option D. overtreatment) for each patient, from the initial response to the final response, for all conditions. As discussed above, we are particularly interested in the inequity of patient care, defined as the rate at which clinicians made a clearly unsafe recommendation (Option A) versus recommending the guideline-recommended treatment (Option C)23,24. Initial responses exhibited significant inequity between patients. Initially, across both conditions, 29.9% of clinicians recommended the unsafe undertreatment for the Black female patient, while only 14.1% recommended the guideline-recommended treatment, resulting in a 15.7 percentage point difference in the rate at which clinicians recommended unsafe undertreatment rather than the guideline-recommended treatment for the Black female patient. By contrast, for the white male patient, 23.4% of clinicians recommended the unsafe undertreatment, while 21.4% of clinicians recommended the guideline-recommended treatment, resulting in a 2 percentage point difference in the likelihood of clinicians recommending unsafe undertreatment rather than the guideline-recommended treatment for the white male patient. This resulted in a 13.7 percentage point difference between the Black female patient and the white male patient in their likelihood of having clinicians recommend unsafe undertreatment rather than the guideline-recommended treatment (p=0.02, n=28 observations, Wilcoxon Rank Sum Test, Two-sided). Individual reflection did not reduce this inequity. The control conditions produced no significant change in the inequity between patients from the initial response to the final response (p=0.57, n=14 observations, Wilcoxon Signed Rank Test, Two-sided). Accordingly, in the final response in the control conditions, there was a 15.3 percentage point difference between the Black female patient and the white male patient in their likelihood of having the clinician recommend unsafe undertreatment rather than the guideline-recommended treatment (p=0.04, n=14 observations, Wilcoxon Rank Sum Test, Two-sided; see SI Eq. 2). Strikingly, however, improvements in diagnostic accuracy in the network condition produced a 20 percentage point reduction in the rate at which clinicians recommended unsafe undertreatment rather than the guideline-recommended treatment the Black female patient (p=0.04, n=14 observations, Wilcoxon Rank Sum Test, Two-sided). By the final response in the network conditions, inequity was eliminatedthe Black female patient was no longer more likely than the white male patient to have clinicians recommend unsafe undertreatment rather than the guideline-recommended treatment (p=0.16, n=14 observations, Wilcoxon Rank Sum Test, Two-sided).
Figure3 (panels ad) also shows that the network conditions improved the quality of clinical care recommended for both patients (white male and Black female). In particular, for both the Black female and white male patient, the network conditions produced significantly greater reductions in the proportion of clinicians recommending unsafe undertreatment (Option A) than the control conditions (1.6 percentage point reduction in the control conditions, 11.8 percentage point reduction in the network conditions; p<0.01, n=28 observations, Wilcoxon Signed Rank Test, Two-sided). This reduction in the recommendation of unsafe undertreatment (Option A) was associated with significant increases in recommendations for safer care for both patients. While Option B was not the guideline-recommended treatment, it represents a safer treatment than Option A. Correspondingly, the network conditions significantly increased the proportion of clinicians recommending safer undertreatment (Option B) than the control conditions (3.5 percentage point reduction in control conditions, +6.5 percentage point increase in the network conditions; p=0.03, n=28 observations, Wilcoxon Signed Rank Test, Two-sided). Strikingly, the rate of overtreatment (i.e. Option D, unnecessary invasive procedure) for both patients was significantly decreased in the network conditions, while it increased in the control conditions (2.8 percentage point reduction in the network conditions, +3.1 percentage point increase in the control conditions; p<0.01, n=28 observations, Wilcoxon Signed Rank Test, Two-sided).
These results reveal a tendency for clinicians in the control conditions to increase the acuity (i.e. urgency) of care for all patients as a result of independent reflection, leading to an increase in overtreatment. By contrast, in the network conditions, clinicians adjusted their recommendations toward safer, more equitable care for both patients, significantly reducing both unsafe undertreatment (Option A) and overtreatment (Option D). Additional sensitivity analyses show these findings to be robust to variations in clinicians characteristics26 (see SI, Sensitivity Analyses).
The rest is here:
The reduction of race and gender bias in clinical treatment recommendations using clinician peer networks in an experimental setting - Nature.com