0
We're unable to sign you in at this time. Please try again in a few minutes.
Retry
We were able to sign you in, but your subscription(s) could not be found. Please try again in a few minutes.
Retry
There may be a problem with your account. Please contact the AMA Service Center to resolve this issue.
Contact the AMA Service Center:
Telephone: 1 (800) 262-2350 or 1 (312) 670-7827  *   Email: subscriptions@jamanetwork.com
Error Message ......
Original Article |

Clinical Benefit of a Diagnostic Score for Appendicitis:  Results of a Prospective Interventional Study FREE

Christian Ohmann, PhD; Claus Franke, MD; Qin Yang, PhD ; and the German Study Group of Acute Abdominal Pain
[+] Author Affiliations

From the Theoretical Surgery Unit (Drs Ohmann and Yang) and the Department of General and Trauma Surgery (Dr Franke), Heinrich-Heine-University, D[[uuml]]sseldorf, Germany.


Arch Surg. 1999;134(9):993-996. doi:10.1001/archsurg.134.9.993.
Text Size: A A A
Published online

Hypothesis  Clinical use of a diagnostic score improves decision making in acute appendicitis.

Design  A before-and-after trial comparing a group of patients undergoing standard diagnostic workup with no additional diagnostic support (phase 1) with a group of patients undergoing additional diagnostic support with a score (phase 2).

Setting  Eight departments of surgery in Germany and Austria.

Patients  Eight hundred seventy patients with acute abdominal pain in phase 1 (October 1, 1994, to April 30, 1995) and 614 patients in phase 2 (February 1, 1995, to August 15, 1995).

Interventions  Structured and standardized history and clinical investigation in all patients with computer-based documentation; introduction of the diagnostic score after phase 1 and computer-supported use of the score in phase 2.

Results  The 2 groups were comparable with respect to signs, symptoms, and investigations related to acute appendicitis. Diagnostic performance of the final examiner decreased with the score (specificity, 86% vs 78%; positive predictive value, 67% vs 50%; and accuracy, 88% vs 81%). There were no differences in the rates of perforated appendix, appendectomy with normal findings, and complications; however, the delayed appendectomy rate (2% vs 8%) and the delayed discharge rate (11% vs 22%) were significantly lower with diagnostic support by the score (P=.02).

Conclusions  Integration of a score into the diagnostic process may have unforeseen clinical effects. The tested score cannot be recommended as a standard tool for diagnostic decision making in acute appendicitis.

THE EARLY and accurate diagnosis of acute appendicitis is still a difficult problem.1 Despite introduction of ultrasound and special laboratory investigations (eg, C-reactive protein), high diagnostic error rates are observed.2 As a consequence, perforation rates and rates of appendectomy with normal findings of 15% and more occur.3

In the last few years, several scoring systems have been developed for supporting the diagnosis of acute appendicitis.412 Initial evaluation studies have reported excellent results, indicating that scoring systems would be ideal as diagnostic aids because they have good performance and require no special equipment, being user-friendly and comprehensible to the clinician.1,7,1012 However, the clinical benefit of a diagnostic score integrated into the diagnostic process has not been investigated so far in a prospective study with adequate methods. We therefore performed such a study with the use of a diagnostic score developed and evaluated in Germany.

The investigation was performed as a multicenter prospective interventional study with 8 German or Austrian surgical hospitals, including 3 university hospitals. Included were all patients with acute abdominal pain within 1 week before hospital admission. Excluded were patients with postoperative acute abdominal pain, trauma, or hernia; children less than 6 years old; patients who gave no informed consent; and patients with no definite final diagnosis. Acute appendicitis was diagnosed only on histopathological grounds according to the following criteria: macroscopic signs: intravascular injection of the serosa; fibrinous, purulent film; edematous, hemorrhagic, necrotic changes of the wall; and blood (not sufficient) or pus on opening of the appendix; microscopic signs: focal or expanded erosion, ulceration, abscess, fistula, necrosis, or perforation.

Not sufficient were fibrosis taken as evidence of subsided inflammation, intravascular injection of the serosa as the only finding, and description of few granulocytes.

Perforation had to be proved on histopathological grounds. There was no option for diagnosing "chronic appendicitis" or "subacute appendicitis." In the case of outpatients, a follow-up was performed after 30 days (telephone interview).

In all patients, a structured and standardized history and clinical investigation were performed according to international standards. Data were documented with a user-friendly computer program and form-based data entry.13 In case of computer breakdown, forms were available for data collection.

The study was performed in 2 consecutive phases: phase 1, no additional diagnostic support (4 months); and phase 2, diagnostic support with a score based on history, clinical examination, and basic laboratory data (4 months) (Table 1).

Table Graphic Jump LocationTable 1. Diagnostic Score for Acute Appendicitis*

The diagnostic score was introduced after phase 1 into the hospitals in several ways: distribution of a publication, presentation in training sessions and clinical conferences, and by posting in the outpatient ward. The score was integrated into the computer program and automatically presented after data input of the history, clinical examination results, and basic laboratory data. After special laboratory investigations, ultrasound, and x-ray, the diagnosis of the final examiner after all investigations (in the majority of cases, a senior surgeon), the final diagnosis at discharge, and the outcome of disease were documented prospectively with the computer program. Comparability of the study groups was investigated for signs and symptoms related to acute appendicitis, the distribution of the final diagnoses, and the diagnostic investigations performed.

The outcome criteria were the diagnostic accuracy of the final examiner with respect to appendicitis (sensitivity, specificity, positive and negative predictive value, and accuracy), the perforated appendix rate, the rate of appendectomy with normal findings, the rate of laparotomy with normal findings, the delayed appendectomy rate, the complication rate, and the delayed discharge rate. For the outcome criteria, the following definitions were used: perforated appendix rate, proportion of patients with acute appendicitis who had a histologically proved perforation; negative appendectomy rate, proportion of patients with appendectomy in whom no appendicitis was found; negative laparotomy rate, proportion of laparotomies that were unnecessary (no intraoperative or histological diagnosis); delayed appendectomy rate, proportion of patients with appendicitis in whom the appendectomy was performed the second day or later after admission; and delayed discharge rate, proportion of patients with appendicitis who were discharged 10 days or later after admission.

Statistical comparisons between the 2 phases were performed with the χ2 test excluding missing data.

There are no general guidelines and rules in Germany for the performance of studies with formal decision aids based on routinely assessed clinical variables. We decided to give an information brochure to the patients explaining the study and to give them the option not to take part in the study.

Overall, 1484 patients could be enrolled in the study: 870 patients in phase 1, with no additional diagnostic support, and 614 patients in phase 2, with diagnostic support by the score (Table 2). The starting date of the study varied between centers; phase 1 began between October 1, 1994, and April 30, 1995, and phase 2 between February 1, 1995, and August 15, 1995. The frequency of appendicitis in phase 1 was 23.1% (n=201) compared with 18.6% (n=114) in phase 2. Major diagnoses were no specific abdominal pain (phase 1, 25%; phase 2, 27%), acute dyspepsia (8%, 10%), acute biliary disease (8%, 9%), ileus (4%, 5%), urolithiasis (3%, 5%), urinary tract infection (3%, 4%), and acute diverticulitis (3%, 4%). There were no significant differences between the 2 phases with respect to signs and symptoms related to appendicitis (Table 3). Study groups were comparable with respect to ultrasound of the abdomen (phase 1, 65%; phase 2, 64%) and ultrasound of the appendix (11%, 9%). Leukocyte counts were determined significantly more often in phase 2 as a component of the score (88%, 95%; P<.001).

Table Graphic Jump LocationTable 2. Number of Patients in the Study Groups
Table Graphic Jump LocationTable 3. Comparability of Study Groups for Signs and Symptoms Related to Acute Appendicitis

Clinicians' diagnosis of appendicitis changed after introduction of the score (Table 4). Specificity, positive predictive value, and accuracy were significantly lower with diagnostic support by the score. Before introduction of the score, appendicitis was diagnosed less often by the final examiner (31%) than after introduction of the score (36%) (P=.10), contrary to the frequency of appendicitis (23% vs 19%). There were no significant differences with respect to the perforation, appendectomy with normal findings, and complication rates. The delayed appendectomy and delayed discharge rates were significantly lower with diagnostic support. However, timing of appendectomy was not associated with the complication rate (24% in delayed appendectomy vs 10% in nondelayed appendectomy; P<.09; not differentiated between the study phases because of the small sample size). As expected, a higher complication rate was found in patients with delayed discharge than in those without delayed discharge (36% vs 5% in the total study population; P<.001).

Table Graphic Jump LocationTable 4. Clinical Outcome in the Study Groups

There was a linear relationship between the score values and frequency of appendicitis: less than 4.0 points, 3% (phase 1), 0% (phase 2); 4.0 to 5.5 points, 5%, 3%; 6.0 to 7.5 points, 11%, 10%; 8.0 to 9.5 points, 24%, 15%; 10.0 to 11.5 points, 32%, 24%; 12.0 to 13.5 points, 55%, 38%; and 14.0 points or more, 68%, 74%.

Despite all improvements (ultrasound, special laboratory values), routine diagnosis in acute appendicitis still poses a challenging problem. Major areas of concern are perforations (rate of up to 20%), negative appendectomies (rate of up to 30%), delayed operations, complications after operation, and late discharge.3,14 Therefore, several diagnostic scoring systems have been developed, characterized as noninvasive, understandable, user-friendly, and cost-effective.2,48,1012 Evaluation studies have demonstrated a good performance for some of these scores, indicating their potential for diagnostic decision making.68,10,12 Testing of these scores on a prospective database of German cases revealed disappointing results.15 None of the scores fulfilled any of the given quality criteria. The lack of separate testing in a prospective study, small sample size, differences in the target population, and geographic variation of the incidence and presentation of the diseases were discussed as major factors.16 For that reason, a new score was developed on the basis of German data, which gave promising results in a first evaluation study.9

Unfortunately, the clinical benefit of none of the scores has been tested in an adequate controlled study, comparing diagnostic performance of the clinician with and without the score. Some reports indicate improvement concerning the negative appendectomy rate or the perforation rate, if compared with historical data. In one study, 2 different surgical units were compared. In the unit that used the score, a negative appendectomy rate of 7% was found, and in the unit not using the score, a negative appendectomy rate of 17%.12 These studies cannot be taken as evidence of the clinical benefit of diagnostic scores in acute appendicitis.17 The optimal approach in clinical research is the randomized controlled clinical trial. In evaluating scores, this design has several pitfalls. Randomization of patients may result in carryover effects, since the physician may be influenced when deciding to treat control patients. A possible solution is to randomize physicians, but previous studies have shown that randomization to the intervention group may motivate physicians more than randomization to the control group.18 An alternative design is to perform a prospective intervention study with a before-and-after design, an approach used in our study. This design may be undermined by secular trends or sudden changes, either in the outcomes to be measured or in characteristics of the study population that influence these outcomes. This type of bias can never be excluded with this design, but it is probably low in our study for the following reasons: uniform data collection according to standard definitions in both phases, no differences between the study populations in the 2 phases (Table 3), and the short duration of each phase (4 months).

Systematic reviews have shown that the effectiveness of clinical guidelines and decision support is critically dependent on 3 factors: development, dissemination, and implementation strategy.19 The probability of being effective is highest if guidelines are developed internally, disseminated by specific educational initiatives, and implemented as patient-specific reminders at the time of consultation. In our study, the majority of participating centers were involved in the development of the score.9 The score was disseminated by specific training sessions or during clinical conferences, and it was applied during the consultation. The score did change clinical practice, although the accuracy of the score as a diagnostic aid was not convincing. Which factors may have biased the results in our study? In a previous multicenter study we showed that standardized and structured data collection did not change clinical performance in 6 German hospitals, so a checklist effect can be discounted. Because of the study design, with 2 consecutive phases and introduction of the score in phase 2, no carryover effects could occur. Systematic feedback was not provided in the study.

From the results of the study, it can be hypothesized that the diagnostic behavior of the clinician was changed in a systematic way. Although occurring less often, possible acute appendicitis was suspected more often in the test phase, but the diagnostic decision was false positive in every second patient (positive predictive value, 50%). Although this did not influence the decision to operate (no difference in the negative appendectomy rate), it helped to avoid delayed but necessary operations. In Germany, the average hospital stay for acute appendicitis is rather long, as was demonstrated in our study. Financing in nonperforated appendicitis in Germany is performed per case (Fallpauschale). The calculation of reimbursement is based on an average hospital stay of 7.16 days for an open operation and 6.04 days for a laparoscopic operation. Only if hospital stay exceeds 14 days (open operation) or 13 days (laparoscopic operation) is additional reimbursement of costs possible (Grenzverweildauer). In our study, we defined a hospital stay of 10 days or longer as delayed discharge and could demonstrate that scoring improved with respect to this outcome criterion. In summary, scoring did not result in an improvement of the classic outcome criteria (negative appendectomy, perforated appendix, and complication rate). Even worse, scoring degraded diagnostic decision making of the final examiner, especially with respect to overprediction of acute appendicitis. However, decreased diagnostic performance did not result in poorer management and outcome; instead, positive effects on the timing of operation and duration of hospital stay were measured.

Two general conclusions and 1 specific conclusion can be drawn from this study. Testing of a score in new clinical environments is necessary before widespread application can be recommended. Integration of a score into the diagnostic process may have unforeseen clinical effects. The existing score cannot be recommended as a standard tool for diagnostic decision making in acute appendicitis.

This work was supported by a grant (project number 01 EI 9606/0) from the German Ministry of Education, Science, Research, and Technology, Bonn, Germany, within the Medizinische Wissensbasen (MEDWIS) program.

Joachim Walenzyk, MD, Georg Federmann, MD, Clinic of General Surgery, Kreiskrankenhaus Goslar, Goslar, Germany; Jörg Krenzien MD, Gabiele Hansdorfer, MD, Surgical Clinic, Klinikum Ernst von Bergmann, Potsdam, Germany; Cornelia Berner, MD, Joachim Eibner, MD, Department of General and Trauma Surgery, Robert-Bosch-Krankenhaus Stuttgart, Stuttgart, Germany; Matthias Kraemer, MD, Klaus Kremer, MD, Surgical Clinic and Policlinic, University of Würzburg, Würzburg, Germany; Heinrich Böhner, MD, Surgical Clinic, Elisabeth-Krankenhaus Essen, Essen, Germany; Martin Labus, MD, Surgical Clinic, Bürgerhospital Frankfurt, Frankfurt, Germany; and Anton Klingler, PhD, Theoretical Surgery Unit, Surgical Clinic, University of Innsbruck, Innsbruck, Austria.

Reprints: Christian Ohmann, PhD, Funktionsbereich Theoretische Chirurgie, Klinik für Allgemein und Unfallchirurgie, Heinrich-Heine-Universität, Moorenstr 5, 40225 Düsseldorf, Germany (e-mail: ohmannch@uni-duesseldorf.de).

Hoffmann  JRasmussen  OO Aids in the diagnosis of acute appendicitis. Br J Surg. 1989;76774- 779
Izbicki  JRWilker  DKMandelkow  HK  et al.  Retro- and prospective studies on the value of clinical and laboratory chemical data in acute appendicitis [in German]. Chirurg. 1990;61887- 894
Andersson  REHugander  AThulin  JG Diagnostic accuracy and perforation rate in appendicitis: association with age and sex of the patient and with appendicectomy rate. Eur J Surg. 1992;15837- 41
Eskelinen  MIkonen  JLipponen  P A computer-based diagnostic score to aid in diagnosis of acute appendicitis: a prospective study of 1333 patients with acute abdominal pain. Theor Surg. 1992;786- 90
Van Way  CWMurphy  JRDunn  ELElerding  SC A feasibility study of computer aided diagnosis in appendicitis. Surg Gynecol Obstet. 1982;155685- 688
Alvarado  A A practical score for the early diagnosis of acute appendicitis. Ann Emerg Med. 1986;15557- 564
Arnbjörnsson  E Scoring system for computer-aided diagnosis of acute appendicitis: the value of prospective versus retrospective studies. Ann Chir Gynaecol. 1985;74159- 166
Fenyö  G Routine use of a scoring system for decision-making in suspected acute appendicitis in adults. Acta Chir Scand. 1987;153545- 551
Ohmann  CFranke  CYang  Q  et al.  Diagnostic score for acute appendicitis [in German]. Chirurg. 1995;66135- 141
Lindberg  GFenyö  G Algorithmic diagnosis of appendicitis using Bayes' theorem and logistic regression. Bayesian Stat. 1988;3665- 668
Teicher  ILanda  BCohen  MKabnick  LSWise  L Scoring system to aid in diagnoses of appendicitis. Ann Surg. 1983;198753- 759
Christian  FChristian  GP A simple scoring system to reduce the negative appendicectomy rate. Ann R Coll Surg Engl. 1992;74281- 285
Ohmann  CBelenky  GPlaten  C Integration of a data dictionary and a clinical database in an expert system for acute abdominal pain. Medinfo. 1995;2943- 946
Blind  PJDahlgren  ST The continuing challenge of the negative appendix. Acta Chir Scand. 1986;152623- 627
Ohmann  CYang  QFranke  C Diagnostic scores for acute appendicitis. Eur J Surg. 1995;161273- 281
deDombal  FTStaniland  JRClamp  SE Geographical variation in disease presentation: does it constitute a problem and can information science help? Med Decis Making. 1981;159- 69
Johnston  MELangton  KBHaynes  BMathieu  A Effects of computer-based clinical decision support systems on clinician performance and patient outcome: a critical appraisal of research. Ann Intern Med. 1994;120135- 142
North of England Study of Standards and Performance in General Practice, Medical audit in general practice, I: effects on doctors' clinical behaviour for common childhood conditions. BMJ. 1992;3041480- 1484
Grimshaw  JMRussell  IT Effect of clinical guidelines on medical practice: a systematic review of rigorous evaluations. Lancet. 1993;3421317- 1321

Figures

Tables

Table Graphic Jump LocationTable 1. Diagnostic Score for Acute Appendicitis*
Table Graphic Jump LocationTable 2. Number of Patients in the Study Groups
Table Graphic Jump LocationTable 3. Comparability of Study Groups for Signs and Symptoms Related to Acute Appendicitis
Table Graphic Jump LocationTable 4. Clinical Outcome in the Study Groups

References

Hoffmann  JRasmussen  OO Aids in the diagnosis of acute appendicitis. Br J Surg. 1989;76774- 779
Izbicki  JRWilker  DKMandelkow  HK  et al.  Retro- and prospective studies on the value of clinical and laboratory chemical data in acute appendicitis [in German]. Chirurg. 1990;61887- 894
Andersson  REHugander  AThulin  JG Diagnostic accuracy and perforation rate in appendicitis: association with age and sex of the patient and with appendicectomy rate. Eur J Surg. 1992;15837- 41
Eskelinen  MIkonen  JLipponen  P A computer-based diagnostic score to aid in diagnosis of acute appendicitis: a prospective study of 1333 patients with acute abdominal pain. Theor Surg. 1992;786- 90
Van Way  CWMurphy  JRDunn  ELElerding  SC A feasibility study of computer aided diagnosis in appendicitis. Surg Gynecol Obstet. 1982;155685- 688
Alvarado  A A practical score for the early diagnosis of acute appendicitis. Ann Emerg Med. 1986;15557- 564
Arnbjörnsson  E Scoring system for computer-aided diagnosis of acute appendicitis: the value of prospective versus retrospective studies. Ann Chir Gynaecol. 1985;74159- 166
Fenyö  G Routine use of a scoring system for decision-making in suspected acute appendicitis in adults. Acta Chir Scand. 1987;153545- 551
Ohmann  CFranke  CYang  Q  et al.  Diagnostic score for acute appendicitis [in German]. Chirurg. 1995;66135- 141
Lindberg  GFenyö  G Algorithmic diagnosis of appendicitis using Bayes' theorem and logistic regression. Bayesian Stat. 1988;3665- 668
Teicher  ILanda  BCohen  MKabnick  LSWise  L Scoring system to aid in diagnoses of appendicitis. Ann Surg. 1983;198753- 759
Christian  FChristian  GP A simple scoring system to reduce the negative appendicectomy rate. Ann R Coll Surg Engl. 1992;74281- 285
Ohmann  CBelenky  GPlaten  C Integration of a data dictionary and a clinical database in an expert system for acute abdominal pain. Medinfo. 1995;2943- 946
Blind  PJDahlgren  ST The continuing challenge of the negative appendix. Acta Chir Scand. 1986;152623- 627
Ohmann  CYang  QFranke  C Diagnostic scores for acute appendicitis. Eur J Surg. 1995;161273- 281
deDombal  FTStaniland  JRClamp  SE Geographical variation in disease presentation: does it constitute a problem and can information science help? Med Decis Making. 1981;159- 69
Johnston  MELangton  KBHaynes  BMathieu  A Effects of computer-based clinical decision support systems on clinician performance and patient outcome: a critical appraisal of research. Ann Intern Med. 1994;120135- 142
North of England Study of Standards and Performance in General Practice, Medical audit in general practice, I: effects on doctors' clinical behaviour for common childhood conditions. BMJ. 1992;3041480- 1484
Grimshaw  JMRussell  IT Effect of clinical guidelines on medical practice: a systematic review of rigorous evaluations. Lancet. 1993;3421317- 1321

Correspondence

CME
Meets CME requirements for:
Browse CME for all U.S. States
Accreditation Information
The American Medical Association is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians. The AMA designates this journal-based CME activity for a maximum of 1 AMA PRA Category 1 CreditTM per course. Physicians should claim only the credit commensurate with the extent of their participation in the activity. Physicians who complete the CME course and score at least 80% correct on the quiz are eligible for AMA PRA Category 1 CreditTM.
Note: You must get at least of the answers correct to pass this quiz.
You have not filled in all the answers to complete this quiz
The following questions were not answered:
Sorry, you have unsuccessfully completed this CME quiz with a score of
The following questions were not answered correctly:
Commitment to Change (optional):
Indicate what change(s) you will implement in your practice, if any, based on this CME course.
Your quiz results:
The filled radio buttons indicate your responses. The preferred responses are highlighted
For CME Course: A Proposed Model for Initial Assessment and Management of Acute Heart Failure Syndromes
Indicate what changes(s) you will implement in your practice, if any, based on this CME course.
Submit a Comment

Multimedia

Some tools below are only available to our subscribers or users with an online account.

Web of Science® Times Cited: 39

Related Content

Customize your page view by dragging & repositioning the boxes below.

Articles Related By Topic
Related Collections
PubMed Articles
JAMAevidence.com

Users' Guides to the Medical Literature
Acute Appendicitis

The Rational Clinical Examination
Make the Diagnosis: Appendicitis, Adult