0
We're unable to sign you in at this time. Please try again in a few minutes.
Retry
We were able to sign you in, but your subscription(s) could not be found. Please try again in a few minutes.
Retry
There may be a problem with your account. Please contact the AMA Service Center to resolve this issue.
Contact the AMA Service Center:
Telephone: 1 (800) 262-2350 or 1 (312) 670-7827  *   Email: subscriptions@jamanetwork.com
Error Message ......
Original Investigation |

Reliability of Risk-Adjusted Outcomes for Profiling Hospital Surgical Quality FREE

Robert W. Krell, MD1; Ahmed Hozain, BS2; Lillian S. Kao, MD, MS3; Justin B. Dimick, MD, MPH1
[+] Author Affiliations
1Department of Surgery, University of Michigan Health System, Ann Arbor
2Department of Surgery, Michigan State University College of Human Medicine, East Lansing
3Department of Surgery, The University of Texas at Houston Medical School, Houston
JAMA Surg. 2014;149(5):467-474. doi:10.1001/jamasurg.2013.4249.
Text Size: A A A
Published online

Importance  Quality improvement platforms commonly use risk-adjusted morbidity and mortality to profile hospital performance. However, given small hospital caseloads and low event rates for some procedures, it is unclear whether these outcomes reliably reflect hospital performance.

Objective  To determine the reliability of risk-adjusted morbidity and mortality for hospital performance profiling using clinical registry data.

Design, Setting, and Participants  A retrospective cohort study was conducted using data from the American College of Surgeons National Surgical Quality Improvement Program, 2009. Participants included all patients (N = 55 466) who underwent colon resection, pancreatic resection, laparoscopic gastric bypass, ventral hernia repair, abdominal aortic aneurysm repair, and lower extremity bypass.

Main Outcomes and Measures  Outcomes included risk-adjusted overall morbidity, severe morbidity, and mortality. We assessed reliability (0-1 scale: 0, completely unreliable; and 1, perfectly reliable) for all 3 outcomes. We also quantified the number of hospitals meeting minimum acceptable reliability thresholds (>0.70, good reliability; and >0.50, fair reliability) for each outcome.

Results  For overall morbidity, the most common outcome studied, the mean reliability depended on sample size (ie, how high the hospital caseload was) and the event rate (ie, how frequently the outcome occurred). For example, mean reliability for overall morbidity was low for abdominal aortic aneurysm repair (reliability, 0.29; sample size, 25 cases per year; and event rate, 18.3%). In contrast, mean reliability for overall morbidity was higher for colon resection (reliability, 0.61; sample size, 114 cases per year; and event rate, 26.8%). Colon resection (37.7% of hospitals), pancreatic resection (7.1% of hospitals), and laparoscopic gastric bypass (11.5% of hospitals) were the only procedures for which any hospitals met a reliability threshold of 0.70 for overall morbidity. Because severe morbidity and mortality are less frequent outcomes, their mean reliability was lower, and even fewer hospitals met the thresholds for minimum reliability.

Conclusions and Relevance  Most commonly reported outcome measures have low reliability for differentiating hospital performance. This is especially important for clinical registries that sample rather than collect 100% of cases, which can limit hospital case accrual. Eliminating sampling to achieve the highest possible caseloads, adjusting for reliability, and using advanced modeling strategies (eg, hierarchical modeling) are necessary for clinical registries to increase their benchmarking reliability.

Figures in this Article

Clinical registries have had a prominent role in increasing transparency and accountability for the outcomes of surgical care. Many, if not all, of the preeminent surgical clinical registries use risk-adjusted outcomes feedback to benchmark performance and guide surgical quality improvement efforts.14 With the increased prevalence of linking postoperative outcomes to reimbursements and quality improvement efforts, it is important that outcome measures be highly reliable to avoid misclassifying hospitals.1,5

However, a systematic evaluation of the statistical reliability of commonly used outcome metrics in surgery is lacking.68 Because of financial or personnel limitations, not all surgical registries capture 100% of cases from their participating hospitals.9 As a consequence, the yearly maximum number of cases reported by many hospitals in those programs can be limited. The combination of low caseload and low outcome rates reduces the ability of many outcomes to distinguish true quality differences among providers, which results in low reliability—analogous to power limitations in clinical trials.7 Several studies10,11 have called into question the reliability of certain complications for measuring quality in specific clinical populations. A better understanding of the reliability of commonly reported risk-adjusted outcomes and measures to counteract low reliability will help to improve the accuracy of surgical outcome reporting.

In this context, we conducted an evaluation of the statistical reliability of 3 commonly used outcomes (mortality, severe morbidity, and overall morbidity) for profiling hospital performance across multiple procedures. We used logistic regression modeling techniques, a common risk-adjustment method, to calculate risk-adjusted mortality and morbidity rates following 6 different procedures. We then examined the reliability of those measures by investigating the effect of hospital caseload (ie, reported cases) on outcome reliability and then by assessing the number of hospitals that met 2 commonly accepted minimum reliability standards. We hypothesized that limited caseloads and rare event rates would result in low reliability for most commonly reported outcomes, even in clinically rich surgical registries.

Data Source and Study Population

We analyzed data from the 2009 American College of Surgeons National Surgical Quality Improvement Program (ACS-NSQIP) clinical registry. Details of data collection and validation in ACS-NSQIP have been provided elsewhere.12 In brief, the registry includes more than 135 variables encompassing patient and operative characteristics, 21 postoperative complications, reoperation, and 30-day mortality. Using relevant Current Procedural Terminology codes, we identified patients undergoing colon resection, pancreatic resection, laparoscopic gastric bypass, open ventral hernia repair, abdominal aortic aneurysm (AAA) repair, or lower extremity bypass procedures.

Outcomes

Our primary outcomes of interest were risk-adjusted overall morbidity, severe morbidity, and mortality. Postoperative complications recorded by ACS-NSQIP include surgical (wound dehiscence, bleeding, graft failure, or superficial, deep, or organ-space surgical site infection), medical (cardiac arrest, myocardial infarction, deep venous thrombosis, pulmonary embolism, urinary tract infection, renal insufficiency, or acute renal failure), pulmonary (pneumonia, prolonged intubation, or unplanned intubation), nervous (coma, stroke, or peripheral nerve injury), and systemic (sepsis or septic shock) complications. In addition, ACS-NSQIP records reoperation and 30-day postoperative mortality rates. For the present study, we defined 30-day morbidity as any of the 21 possible complications. To define severe morbidity, we excluded superficial surgical site infection, deep venous thrombosis, urinary tract infection, peripheral nerve injury, or progressive renal insufficiency.

Statistical Analysis
Creation of Risk-Adjusted Outcome Rates

We entered patient demographics, comorbid conditions, and operative characteristics when applicable into a forward stepwise logistic regression model with each outcome (mortality, severe morbidity, and morbidity) as a dependent variable. Those variables with coefficient P < .05 from the stepwise regression model were then used in a logistic regression model for each outcome to generate a patient’s probability of experiencing that particular outcome. We repeated the process across procedure types. To generate hospital risk-adjusted outcome rates, patient probabilities were then summed for each hospital and compared with each hospital’s observed outcome rate to generate hospital-level observed to expected ratios. Multiplying each hospital’s observed to expected ratio by the mean outcome rate yielded its risk-adjusted rate.

Calculating Reliability

Reliability is a quantification of the proportion of provider performance variation explained by true quality differences (ie, statistical signal) and is measured on a scale of 0 (all differences attributable to measurement error) to 1 (all differences attributable to quality differences). A requisite for calculating reliability is the calculation of statistical “noise” for a particular outcome. Reliability is then defined as the ratio of signal to (signal + noise).13 To determine the reliability of each outcome measure, we used hierarchical logistic regression modeling. We defined signal as the variance of hospital random effect intercepts in the logistic model after full adjustment for patient risk factors.10 We quantified a hospital’s statistical noise by estimating that hospital’s measurement error variance in the logistic regression model.6,14 The reliability of each hospital’s risk-adjusted outcome rate was then calculated as signal/(signal + noise).

To assess the influence of caseload on reliability, we created hospital caseload (ie, cases reported by each hospital) terciles for each procedure. We then calculated the mean reliability of each outcome measure across caseload terciles. In further analysis, we quantified the number of hospitals with greater than 0.70 or 0.50 reliability for each outcome by procedure. A reliability of 0.70 is considered adequate for differentiating provider performance.6,15 Finally, we used the hospital-level random intercept variance in the hierarchical model as well as the total measurement error for each procedure group to calculate the number of cases needed to achieve 0.70 and 0.50 reliability for each outcome across procedures.

We performed all statistical analyses using Stata, release 12 (StataCorp). The study protocol was reviewed and determined as “not regulated” by the University of Michigan Institutional Review Board.

There were 55 466 patients in 199 hospitals who underwent colon resection, pancreatic resection, laparoscopic gastric bypass, open ventral hernia repair, AAA repair, or lower extremity bypass procedures. Descriptive characteristics of the patients, unadjusted and adjusted outcome rates, and hospital caseload (ie, their collected cases) are presented in Table 1. Overall morbidity was the most frequent outcome across all procedures and ranged from 5.5% (laparoscopic gastric bypass) to 31.0% (pancreatic resection). Severe morbidity varied widely by procedure, ranging from 2.8% (laparoscopic gastric bypass) to 24.6% (pancreatic resection). Mortality was the least frequent outcome, ranging from 0.2% (laparoscopic gastric bypass) to 5.4% (AAA). Hospital caseload varied widely across procedures as well (Table 1). Colon resection was the most commonly captured procedure performed, with hospitals averaging 114 cases per year, and pancreatic resection was the least commonly captured procedure, with hospitals performing a mean of 17 cases per year.

Table Graphic Jump LocationTable 1.  Characteristics of Included Operations in American College of Surgeons National Surgical Quality Improvement Program, 2009

Mean reliability for each outcome across procedure types and hospital volume is presented in Table 2 and graphically in the Figure. Mean reliability for overall morbidity, the most frequent outcome, ranged from 0.17 (lower extremity bypass) to 0.61 (colon resection). Mean reliability for severe morbidity ranged from 0.13 (laparoscopic gastric bypass) to 0.49 (colon resection). Mean reliability for mortality ranged from 0 (laparoscopic gastric bypass) to 0.39 (colon resection).

Table Graphic Jump LocationTable 2.  Hospital Caseload Tercile Cutoffs and Mean Reliability for Outcomes by Procedure Type and Caseload Tercile
Place holder to copy figure label and caption
Figure.
Mean Reliability of Risk-Adjusted 30-Day Outcomes by Hospital Caseload Tercile and Procedure Type

A, Mortality; the mortality rate for laparoscopic gastric bypass was zero for all hospital caseloads. B, Severe morbidity. C, Any morbidity.

Graphic Jump Location

Reliability for each outcome depended on how frequently the event occurred, with more common outcomes having higher reliability (Figure). Mean reliability for infrequent events such as mortality was lower than that for more frequent events such as overall morbidity. For example, reliability for mortality following pancreatic resection (mean risk-adjusted mortality rate, 4.9%) was 0.06 and ranged from 0.01 in low-accrual hospitals to 0.13 in high-accrual hospitals. In contrast, reliability for overall morbidity following pancreatic resection (mean risk-adjusted overall morbidity rate, 31.0%) was 0.33 and ranged from 0.11 in low-accrual hospitals to 0.60 in high-accrual hospitals (Table 2). An exception to the trends we observed was with lower extremity bypass, in which reliability for severe morbidity was higher than reliability for overall morbidity across hospital caseloads (Table 2).

Reliability was generally higher for more commonly captured procedures (Table 2). For example, mean reliability for overall morbidity was higher for common procedures such as colon resection (mean caseload, 114/y; mean reliability, 0.61) than for less commonly captured procedures such as AAA repair (mean caseload, 25/y; mean reliability, 0.29). This relationship persisted when comparing only the highest-volume hospitals. Mean reliability for morbidity in high-volume hospitals for colon resections was 0.75, and mean reliability for high-volume hospitals for AAA repair was 0.47 (Table 2).

Moreover, reliability for all outcomes increased in a stepwise fashion as hospital caseload increased for all procedures (Figure). For example, reliability for overall morbidity following AAA repair (mean reliability, 0.29) ranged from 0.12 in low-caseload hospitals to 0.47 in high-caseload hospitals (Table 2). Pancreatic resection and laparoscopic gastric bypass showed the largest variation in outcome reliability across hospital caseloads. For example, mean reliability for severe morbidity following pancreatic resection ranged from 0.08 in low-caseload hospitals to 0.52 in high-caseload hospitals, and mean reliability for overall morbidity following laparoscopic gastric bypass ranged from 0.19 in low-caseload hospitals to 0.68 in high-caseload hospitals (Figure). An exception to this general trend was reliability for mortality following laparoscopic gastric bypass. All hospitals had reliability of zero for mortality regardless of caseload (Figure).

Table 3 reports the proportion of hospitals that met 2 common reliability benchmarks for each outcome. For overall morbidity, the most frequent outcome, colon resection (37.7% of hospitals), pancreatic resection (7.1%), and laparoscopic gastric bypass (11.5%) were the only procedures for which any hospitals met a reliability threshold of 0.70, which is considered good.6 When assessing a reliability threshold of 0.50, which is considered fair, few hospitals met the reliability benchmark for most procedures (Table 3). An exception was colon resection, in which 80.4% of hospitals met a 0.50 reliability threshold for overall morbidity. For lower event rate outcomes (ie, severe morbidity and mortality), fewer hospitals met reliability thresholds. Colon resection (2.5% of hospitals) and pancreatic resection (3.0% of hospitals) were the only procedures for which hospitals met a 0.70 reliability threshold for severe morbidity. Colon resection (1.5% of hospitals) was the only procedure for which hospitals met a 0.70 reliability threshold for mortality (Table 3). No hospitals met a reliability threshold of 0.70 for any outcome following ventral hernia repair, AAA repair, or lower extremity bypass.

Table Graphic Jump LocationTable 3.  Hospitals Meeting 0.70 and 0.50 Reliability for Postoperative Mortality, Severe Morbidity, and Morbiditya

Table 4 lists the calculated number of cases required to achieve reliability benchmarks for each outcome across procedures. In general, as outcomes became less frequent, hospitals would have to provide larger caseloads to achieve 0.50 or 0.70 reliability. For example, to meet 0.50 reliability for mortality, a hospital would have to perform 147 colon resections, 237 pancreatic resections, 520 ventral hernia repairs, 1342 AAA repairs, or 151 lower extremity bypass procedures (Table 4). With more frequent outcomes (overall morbidity), hospitals would require smaller caseloads to meet reliability thresholds.

Table Graphic Jump LocationTable 4.  Hospital Caseload Requirements for Meeting 0.70 and 0.50 Reliability Thresholds for Overall Morbidity, Severe Morbidity, and Mortality

As quality measurement platforms are increasingly used for public reporting and value-based purchasing, it has never been more important to have reliable performance measures.5,16,17 Reliability is the most widely used indicator to assess an outcome’s capability to detect differences in quality if they exist.13 This is analogous to a power calculation used to avoid type II errors (failure to detect a real difference between groups) in clinical trials. Similar to the need for sufficient sample size and large enough treatment effect to have adequate power in a clinical trial, hospital outcomes measurements require both large enough caseloads and frequent enough adverse event rates to reliably capture quality differences.6 We have demonstrated that commonly used outcome measures have low reliability for hospital profiling for a diverse range of procedures. Hospital caseload was a strong driver for outcome reliability, with higher-caseload hospitals showing the most reliable outcomes. However, with infrequent outcomes, the number of submitted cases needed for adequate outcome reliability was much larger than most hospitals were able to provide. Our findings underscore the importance of carefully considering reliability when designing outcomes feedback programs for providers.

There have been few studies6,7,10,15 assessing outcome measure reliability using claims or clinical registry data. Most have shown that many hospitals lack the caseloads to reliably detect differences in performance for certain outcomes in specific clinical populations. Dimick et al7 demonstrated that few hospitals met caseload requirements to detect meaningful differences from performance benchmarks following cardiovascular, pancreatic, esophageal, or neurosurgical procedures. In a study similar to ours, Kao et al10 used ACS-NSQIP data to evaluate the reliability of surgical site infection as a quality indicator following colon resection and found that only half of the hospitals examined had adequate caseloads to meet reliability benchmarks. The present study goes further and provides a comprehensive evaluation of the reliability of 3 commonly used outcomes across a collection of general and vascular procedures and highlights the reliability problems that can occur with low caseloads and infrequent outcomes.

Outcomes with low reliability can mask both poor and outstanding performance relative to benchmarks. Hospitals with poor outcomes might assume they have no quality problems when they do (analogous to a type II error). Likewise, outcomes with low reliability may cause average (or well-performing) hospitals to be spuriously labeled as poor performers (analogous to a type I error: detecting a difference between groups when none exists). Without a formal assessment of outcome reliability, it is unclear whether a hospital’s performance is the result of quality or if it simply lacks an adequate caseload. When reporting outcomes, most quality reporting programs use P values and/or CIs to assign significance to a hospital’s performance relative to benchmarks. However, these significance measures are often relegated to a footnote or dismissed. When hospitals act to investigate and amend a spuriously high outcome rate, they may direct resources to where they do not have a problem—this is known as tampering in the quality improvement lexicon.18,19 Given the cost of maintaining and implementing quality improvement programs, hospitals have a vested interest in using highly reliable outcome measures to minimize misclassification and unnecessary spending.

There are 3 main strategies to improve the reliability of outcome measures. One approach is to increase the caseload by sampling 100% of certain procedures.20,21 An alternative approach gaining momentum is the use of reliability adjustment. This technique has been discussed extensively elsewhere22 and is gaining traction in several statewide and national outcomes reporting programs. In brief, reliability adjustment uses empirical Bayes techniques to shrink a provider’s risk-adjusted outcome rate toward the overall mean rate, according to the provider’s caseload.23 Reliability adjustment has been demonstrated11,24 to more accurately predict future hospital performance for both general surgical and vascular procedures. A third option to increase reliability is by using composite quality indicators that combine quality signal from other measures and procedures within a hospital, such as outcomes from multiple related procedures, length of stay, and reoperation rate.23,25,26 Composite measures have been shown25 to more accurately predict future hospital performance compared with a single risk-adjusted outcome measure. Although these strategies are far from universal, they are gaining traction in some registries. For example, ACS-NSQIP has been among the leaders in implementing best practices to increase the reliability of outcome measures. Specifically, ACS-NSQIP now offers 100% sampling for certain procedures, uses hierarchical modeling and reliability adjustment for reporting outcomes, and has investigated using composite measures for certain procedures for use in quality profiling.26

There are several important limitations to the present study. Our results may not be generalizable to clinical registries that already capture nearly 100% of their patients.22 However, even with 100% case capture, some hospitals that participate in clinical registries may not have the caseload for reliable benchmarking, especially if considering rare outcomes (eg, mortality) or uncommon procedures (eg, pancreatectomy). This underscores the importance of using other methods for increasing reliability (eg, composite measures and reliability adjustment) as well. Another limitation of this study is that ACS-NSQIP may not be generalizable to all US hospitals because it oversamples larger teaching hospitals.

Currently, outcomes reported by many clinical registries may have low reliability for profiling hospital performance for most commonly performed general and vascular surgery procedures. Implementing procedure-targeted data collection and accounting for statistical reliability when reporting outcomes will better inform hospitals of where they stand relative to their peers. More broadly, providers and payers should consider strategies to improve reliability when using clinical registry data for performance profiling, such as 100% sampling of high-risk conditions, reliability adjustment for outcomes reporting, and use of composite measures. Such measures should give more insight into quality differences between providers and better target high leverage areas for quality improvement.

Accepted for Publication: July 15, 2013.

Corresponding Author: Robert W. Krell, MD, Department of Surgery, University of Michigan, 2800 Plymouth Rd, Bldg 16, Office 016-100N-13, Ann Arbor, MI 48109 (rkrell@med.umich.edu).

Published Online: March 12, 2014. doi:10.1001/jamasurg.2013.4249.

Author Contributions: Dr Dimick had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Krell, Dimick.

Acquisition, analysis, or interpretation of data: Krell, Hozain, Dimick.

Analysis and interpretation of data: All authors.

Drafting of the manuscript: Krell, Hozain, Dimick.

Critical revision of the manuscript for important intellectual content: All authors.

Statistical analysis: Krell, Hozain, Dimick.

Obtained funding: Dimick.

Administrative, technical, or material support: Dimick.

Study supervision: Dimick.

Conflict of Interest Disclosures: Dr Dimick has a financial interest in ArborMetrix, Inc, which had no role in the analysis herein. No other disclosures were reported.

Funding/Support: Dr Krell is supported by grant 5T32CA009672-22 from the National Institutes of Health.

Role of the Sponsor: The National Institutes of Health had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Disclaimer: The ACS-NSQIP and the hospitals participating in the ACS-NSQIP are the source of the original data and cannot verify or be held responsible for the statistical validity of the data analysis or the conclusions derived by the authors.

Steinbrook  R.  Public report cards—cardiac surgery and beyond. N Engl J Med. 2006;355(18):1847-1849.
PubMed   |  Link to Article
Cohen  ME, Bilimoria  KY, Ko  CY, Hall  BL.  Development of an American College of Surgeons National Surgery Quality Improvement Program: morbidity and mortality risk calculator for colorectal surgery. J Am Coll Surg. 2009;208(6):1009-1016.
PubMed   |  Link to Article
Daley  J, Forbes  MG, Young  GJ,  et al.  Validating risk-adjusted surgical outcomes: site visit assessment of process and structure: National VA Surgical Risk Study. J Am Coll Surg. 1997;185(4):341-351.
PubMed
Campbell  DA  Jr, Henderson  WG, Englesbe  MJ,  et al.  Surgical site infection prevention: the importance of operative duration and blood transfusion—results of the first American College of Surgeons–National Surgical Quality Improvement Program Best Practices Initiative. J Am Coll Surg. 2008;207(6):810-820.
PubMed   |  Link to Article
Lindenauer  PK, Remus  D, Roman  S,  et al.  Public reporting and pay for performance in hospital quality improvement. N Engl J Med. 2007;356(5):486-496.
PubMed   |  Link to Article
Adams  JL, Mehrotra  A, Thomas  JW, McGlynn  EA.  Physician cost profiling—reliability and risk of misclassification. N Engl J Med. 2010;362(11):1014-1021.
PubMed   |  Link to Article
Dimick  JB, Welch  HG, Birkmeyer  JD.  Surgical mortality as an indicator of hospital quality: the problem with small sample size. JAMA. 2004;292(7):847-851.
PubMed   |  Link to Article
Russell  EM, Bruce  J, Krukowski  ZH.  Systematic review of the quality of surgical mortality monitoring. Br J Surg. 2003;90(5):527-532.
PubMed   |  Link to Article
Campbell  DA  Jr, Englesbe  MJ, Kubus  JJ,  et al.  Accelerating the pace of surgical quality improvement: the power of hospital collaboration. Arch Surg. 2010;145(10):985-991.
PubMed   |  Link to Article
Kao  LS, Ghaferi  AA, Ko  CY, Dimick  JB.  Reliability of superficial surgical site infections as a hospital quality measure. J Am Coll Surg. 2011;213(2):231-235.
PubMed   |  Link to Article
Osborne  NH, Ko  CY, Upchurch  GR  Jr, Dimick  JB.  The impact of adjusting for reliability on hospital quality rankings in vascular surgery. J Vasc Surg. 2011;53(1):1-5.
PubMed   |  Link to Article
Shiloach  M, Frencher  SK  Jr, Steeger  JE,  et al.  Toward robust information: data quality and inter-rater reliability in the American College of Surgeons National Surgical Quality Improvement Program. J Am Coll Surg. 2010;210(1):6-16.
PubMed   |  Link to Article
Adams  JL. The reliability of provider profiling: a tutorial. Santa Monica, CA: RAND Corp; 2009. http://www.rand.org/pubs/technical_reports/TR653. Accessed October 15, 2012.
Hosmer  DW, Lemeshow  S.  Confidence interval estimates of an index of quality performance based on logistic regression models. Stat Med. 1995;14(19):2161-2172.
PubMed   |  Link to Article
Scholle  SH, Roski  J, Adams  JL,  et al.  Benchmarking physician performance: reliability of individual and composite measures. Am J Manag Care. 2008;14(12):833-838.
PubMed
Calikoglu  S, Murray  R, Feeney  D.  Hospital pay-for-performance programs in Maryland produced strong results, including reduced hospital-acquired conditions. Health Aff (Millwood). 2012;31(12):2649-2658.
PubMed   |  Link to Article
Faber  M, Bosch  M, Wollersheim  H, Leatherman  S, Grol  R.  Public reporting in health care: how do consumers use quality-of-care information? a systematic review. Med Care. 2009;47(1):1-8.
PubMed   |  Link to Article
Wan  TTH, Connell  AM. Total quality management and continuous quality improvement. In: Wan TTH, Connell AM. Monitoring the Quality of Health Care: Issues and Scientific Approaches. New York, NY: Springer; 2003:143-158.
Cheung  YY, Jung  B, Sohn  JH, Ogrinc  G.  Quality initiatives: statistical control charts: simplifying the analysis of data for quality improvement. Radiographics. 2012;32(7):2113-2126.
PubMed   |  Link to Article
Birkmeyer  JD, Shahian  DM, Dimick  JB,  et al.  Blueprint for a new American College of Surgeons: National Surgical Quality Improvement Program. J Am Coll Surg. 2008;207(5):777-782.
PubMed   |  Link to Article
Hendren  S, Fritze  D, Banerjee  M,  et al.  Antibiotic choice is independently associated with risk of surgical site infection after colectomy: a population-based cohort study. Ann Surg. 2013;257(3):469-475.
PubMed   |  Link to Article
Birkmeyer  NJ, Dimick  JB, Share  D,  et al; Michigan Bariatric Surgery Collaborative.  Hospital complication rates with bariatric surgery in Michigan. JAMA. 2010;304(4):435-442.
PubMed   |  Link to Article
Dimick  JB, Staiger  DO, Hall  BL, Ko  CY, Birkmeyer  JD.  Composite measures for profiling hospitals on surgical morbidity. Ann Surg. 2013;257(1):67-72.
PubMed   |  Link to Article
Dimick  JB, Ghaferi  AA, Osborne  NH, Ko  CY, Hall  BL.  Reliability adjustment for reporting hospital outcomes with surgery. Ann Surg. 2012;255(4):703-707.
PubMed   |  Link to Article
Dimick  JB, Staiger  DO, Osborne  NH, Nicholas  LH, Birkmeyer  JD.  Composite measures for rating hospital quality with major surgery. Health Serv Res. 2012;47(5):1861-1879.
PubMed   |  Link to Article
Merkow  RP, Hall  BL, Cohen  ME,  et al.  Validity and feasibility of the American College of Surgeons colectomy composite outcome quality measure. Ann Surg. 2013;257(3):483-489.
PubMed   |  Link to Article

Figures

Place holder to copy figure label and caption
Figure.
Mean Reliability of Risk-Adjusted 30-Day Outcomes by Hospital Caseload Tercile and Procedure Type

A, Mortality; the mortality rate for laparoscopic gastric bypass was zero for all hospital caseloads. B, Severe morbidity. C, Any morbidity.

Graphic Jump Location

Tables

Table Graphic Jump LocationTable 1.  Characteristics of Included Operations in American College of Surgeons National Surgical Quality Improvement Program, 2009
Table Graphic Jump LocationTable 2.  Hospital Caseload Tercile Cutoffs and Mean Reliability for Outcomes by Procedure Type and Caseload Tercile
Table Graphic Jump LocationTable 3.  Hospitals Meeting 0.70 and 0.50 Reliability for Postoperative Mortality, Severe Morbidity, and Morbiditya
Table Graphic Jump LocationTable 4.  Hospital Caseload Requirements for Meeting 0.70 and 0.50 Reliability Thresholds for Overall Morbidity, Severe Morbidity, and Mortality

References

Steinbrook  R.  Public report cards—cardiac surgery and beyond. N Engl J Med. 2006;355(18):1847-1849.
PubMed   |  Link to Article
Cohen  ME, Bilimoria  KY, Ko  CY, Hall  BL.  Development of an American College of Surgeons National Surgery Quality Improvement Program: morbidity and mortality risk calculator for colorectal surgery. J Am Coll Surg. 2009;208(6):1009-1016.
PubMed   |  Link to Article
Daley  J, Forbes  MG, Young  GJ,  et al.  Validating risk-adjusted surgical outcomes: site visit assessment of process and structure: National VA Surgical Risk Study. J Am Coll Surg. 1997;185(4):341-351.
PubMed
Campbell  DA  Jr, Henderson  WG, Englesbe  MJ,  et al.  Surgical site infection prevention: the importance of operative duration and blood transfusion—results of the first American College of Surgeons–National Surgical Quality Improvement Program Best Practices Initiative. J Am Coll Surg. 2008;207(6):810-820.
PubMed   |  Link to Article
Lindenauer  PK, Remus  D, Roman  S,  et al.  Public reporting and pay for performance in hospital quality improvement. N Engl J Med. 2007;356(5):486-496.
PubMed   |  Link to Article
Adams  JL, Mehrotra  A, Thomas  JW, McGlynn  EA.  Physician cost profiling—reliability and risk of misclassification. N Engl J Med. 2010;362(11):1014-1021.
PubMed   |  Link to Article
Dimick  JB, Welch  HG, Birkmeyer  JD.  Surgical mortality as an indicator of hospital quality: the problem with small sample size. JAMA. 2004;292(7):847-851.
PubMed   |  Link to Article
Russell  EM, Bruce  J, Krukowski  ZH.  Systematic review of the quality of surgical mortality monitoring. Br J Surg. 2003;90(5):527-532.
PubMed   |  Link to Article
Campbell  DA  Jr, Englesbe  MJ, Kubus  JJ,  et al.  Accelerating the pace of surgical quality improvement: the power of hospital collaboration. Arch Surg. 2010;145(10):985-991.
PubMed   |  Link to Article
Kao  LS, Ghaferi  AA, Ko  CY, Dimick  JB.  Reliability of superficial surgical site infections as a hospital quality measure. J Am Coll Surg. 2011;213(2):231-235.
PubMed   |  Link to Article
Osborne  NH, Ko  CY, Upchurch  GR  Jr, Dimick  JB.  The impact of adjusting for reliability on hospital quality rankings in vascular surgery. J Vasc Surg. 2011;53(1):1-5.
PubMed   |  Link to Article
Shiloach  M, Frencher  SK  Jr, Steeger  JE,  et al.  Toward robust information: data quality and inter-rater reliability in the American College of Surgeons National Surgical Quality Improvement Program. J Am Coll Surg. 2010;210(1):6-16.
PubMed   |  Link to Article
Adams  JL. The reliability of provider profiling: a tutorial. Santa Monica, CA: RAND Corp; 2009. http://www.rand.org/pubs/technical_reports/TR653. Accessed October 15, 2012.
Hosmer  DW, Lemeshow  S.  Confidence interval estimates of an index of quality performance based on logistic regression models. Stat Med. 1995;14(19):2161-2172.
PubMed   |  Link to Article
Scholle  SH, Roski  J, Adams  JL,  et al.  Benchmarking physician performance: reliability of individual and composite measures. Am J Manag Care. 2008;14(12):833-838.
PubMed
Calikoglu  S, Murray  R, Feeney  D.  Hospital pay-for-performance programs in Maryland produced strong results, including reduced hospital-acquired conditions. Health Aff (Millwood). 2012;31(12):2649-2658.
PubMed   |  Link to Article
Faber  M, Bosch  M, Wollersheim  H, Leatherman  S, Grol  R.  Public reporting in health care: how do consumers use quality-of-care information? a systematic review. Med Care. 2009;47(1):1-8.
PubMed   |  Link to Article
Wan  TTH, Connell  AM. Total quality management and continuous quality improvement. In: Wan TTH, Connell AM. Monitoring the Quality of Health Care: Issues and Scientific Approaches. New York, NY: Springer; 2003:143-158.
Cheung  YY, Jung  B, Sohn  JH, Ogrinc  G.  Quality initiatives: statistical control charts: simplifying the analysis of data for quality improvement. Radiographics. 2012;32(7):2113-2126.
PubMed   |  Link to Article
Birkmeyer  JD, Shahian  DM, Dimick  JB,  et al.  Blueprint for a new American College of Surgeons: National Surgical Quality Improvement Program. J Am Coll Surg. 2008;207(5):777-782.
PubMed   |  Link to Article
Hendren  S, Fritze  D, Banerjee  M,  et al.  Antibiotic choice is independently associated with risk of surgical site infection after colectomy: a population-based cohort study. Ann Surg. 2013;257(3):469-475.
PubMed   |  Link to Article
Birkmeyer  NJ, Dimick  JB, Share  D,  et al; Michigan Bariatric Surgery Collaborative.  Hospital complication rates with bariatric surgery in Michigan. JAMA. 2010;304(4):435-442.
PubMed   |  Link to Article
Dimick  JB, Staiger  DO, Hall  BL, Ko  CY, Birkmeyer  JD.  Composite measures for profiling hospitals on surgical morbidity. Ann Surg. 2013;257(1):67-72.
PubMed   |  Link to Article
Dimick  JB, Ghaferi  AA, Osborne  NH, Ko  CY, Hall  BL.  Reliability adjustment for reporting hospital outcomes with surgery. Ann Surg. 2012;255(4):703-707.
PubMed   |  Link to Article
Dimick  JB, Staiger  DO, Osborne  NH, Nicholas  LH, Birkmeyer  JD.  Composite measures for rating hospital quality with major surgery. Health Serv Res. 2012;47(5):1861-1879.
PubMed   |  Link to Article
Merkow  RP, Hall  BL, Cohen  ME,  et al.  Validity and feasibility of the American College of Surgeons colectomy composite outcome quality measure. Ann Surg. 2013;257(3):483-489.
PubMed   |  Link to Article

Correspondence

CME


You need to register in order to view this quiz.
Submit a Comment

Multimedia

Some tools below are only available to our subscribers or users with an online account.

2,189 Views
10 Citations

Related Content

Customize your page view by dragging & repositioning the boxes below.

See Also...
Articles Related By Topic
Related Collections
Jobs
JAMAevidence.com

The Rational Clinical Examination: Evidence-Based Clinical Diagnosis
Evidence to Support the Update

The Rational Clinical Examination: Evidence-Based Clinical Diagnosis
Original Article: Does This Patient Have an Instability of the Shoulder or a Labrum Lesion?

×