UK accident and emergency medicine: EMTA - CTR guidance

Back to our Home Page

Print this page

Contact Us

CTR Guidance

The following guidance is by no means comprehensive or prescriptive, but reflects some of the experiences and research of trainees. Its content is not guaranteed by the College of Emergency Medicine, BAEM or EMTA. The information on critical appraisal is derived from courses and the resources referenced here - they come highly recommended. In particular the Centre for Evidence Based Medicine is worth bookmarking.

The College of Emergency Medicine have made available the slides and handout from the Critical Appraisal Course organized by Professor Steve Goodacre. The handout - CRITICAL APPRAISAL FOR EMERGENCY MEDICINE TRAINEES - is a huge and valuable resource of exceptional quality and readily available. It is essential reading during CTR  and exam preparation. 

Most training regions now require each trainee to write one CTR/year. The first is always the hardest, but by the time your examination comes around, you should have the choice of 3-4 to present. If you want to read another trainee's CTR, or have finished your CTR & exam and would like to share it with others, please visit the Shared Resources section of this website.

KEEP IT SIMPLE! - Choose a well defined topic, preferably one that does not bring >100 references on your literature search. Aim for 20-30 references. Don't forget that you'll need to critically appraise each of these, and may be quizzed on your sources during your exam. At least one high profile Emergency Physician got a serious grilling about referenced papers during a 2005 sitting! Aim to be an expert in a small defined area in which you have an interest.

Critical appraisal

The process of evaluating the quality of published evidence.

It allows health professionals to assess the accuracy of diagnostic tests and efficacy of therapies and preventions, and thereby improve and optimize their clinical practice.

The steps involved in doing a CTR:

  1. Decide on a topic
  2. Write (and later refine) inclusion and exclusion criteria for papers
  3. Performing the search
  4. Either:
    • Discover the topic is too broad/too many available articles -> Return to step 1
    • Narrow search down to identify 10-40 papers -> Go to step 5
  5. Summarize papers
  6. Assess their methodological quality
  7. Discuss findings
  8. Consider further original research to explore topic further
  9. Once complete:
  • Show your CTR to as many people as possible. Everyone will have a different angle on things
  • Get into the habit of talking about your CTR. Arrange plenty of mock vivas
  • Form your opinions on the implications in the real world of your CTR
  • Regularly recheck references and appraise new developments, and again immediately before your FCEM examination


Recommended texts for guidance include:

  • Egger, M., Smith, G. D. - Systematic reviews in health care: meta-analysis in context (London: BMJ)
    -excellent book which gives a step by step guide to doing a systematic review.
  • Guyatt, G., Rennie, D. - User�s guide to the medical literature: a manual for evidence based clinical practice (Chicago: American medical association

 Searching for evidence

1 Framing the question

 If the question you wish to ask is well structured, finding the relevant papers will become much easier. The BESTBETS website is a great source of three part questions that demonstrate this.

 A four part approach can also be employed:

1 The Patient / Problem

Eg in patients with suspected glass foreign body in foot�

2 The Intervention (treatment / investigation / process)

�does emergency department USS�

3 The Comparison / Alternative (optional, if relevant)

�compared to plain XR�

4 The Outcome

�reliably exclude presence of FB

This should help you with you search for evidence as you can be more logical in your approach.

A literature search may include:

  • Electronic databases: Medline, Embase, Cinahl, Cochrane database etc.
  • Hand search of journals
  • Grey literature: reports (government or academic), conference proceedings, internet, libraries, professional societies, Kings Fund, Nuffield etc.
  • Research registers: National Research Register, HTA database, Cochrane
  • Retrieved articles: bibliographies, search authors names, citation threads
  • Contact with researchers or �experts�
  • Pharmaceutical industry

2 Searching the Medline database

Usually using OVID software in your hospital library, via the BMA library website (members only), or via Athens (available to all NHS employees). Alternatively PUBMED is available free on the internet. See the EBM Resources page in this website via the button above.

All use Boolean logic (and / or / not) combined with medical subject heading (MeSH) terms to narrow down a search and so find the most relevant papers.

See for information about MeSH terms

MeSH terms are organized into subjects and subdivisions of each, branching out like tree roots from very broad terms/catagories until you get to very specific ones. Thus if you wish to search for an article about something specific (ie that is the subject of the article, rather than mentioned in passing in an article about something else) you can FOCUS the search to these articles.

However, if you wish to search for articles that may be included in subdivisions of a MeSH heading (eg considering femoral fractures, and wish to include femoral neck fractures) then you may wish to EXPLODE the search terms � this will include any articles found under the search term plus all its subdivisions.

The $ symbol is used as a truncation � ie if you search for ultraso$ you will get results for ultrasound, ultrasonography, ultrasonographer, etc.

If you know the word you�re searching is a MeSH term, you can add a �forward slash�: /
This can shortcut the mapping page in OVID and save time.

Searching for your CTR evidence:

When performing a literature search for your CTR or similar project:
  • Use as broad a net as possible. 
  • The problem about using medline abbreviations (eg "/us") is that you are relying on someone else to have already spotted that the paper is about your chosen term (ultrasound).  
  • Forget complex terminology and abbreviations and use only the prefix exp and the suffix .mp
  • All terms should be searched for using both forms.
  • For example:  "exp ultrasonography or ultraso&.mp" 
  • The aim of the search is not to pull up a small number of papers (this would suggest a potential error in searching) but to find any possible relevant papers, then search though them by eye, ideally two people independently. 
  • If the search brings up 20 000 papers, then make a more specific question and start from the beginning again. 
  • It would be wrong to change your inclusion/exclusion criteria after searching as these must be clearly defined before starting.
Get someone experienced (and ideally published) in literature reviews to check the search strategy BEFORE performing the search.  This will save my hours, days and weeks of re-searching when someone else points out the glaring omissions.

Quick Searches

If you want to do a quick search to answer a specific question, you can use shortcuts to narrow things down quickly. This would not be wise when searching for CTR evidence - many relevant articles may be missed - use the stategy above instead.

There are a lot of ways of using using notation to specify a search � this can speed things up once you get used to it:







Will find all the subheadings for a MeSH term



= author


= jounal


= a single word, including anywhere it appears as a mesh term


= word in abstract


= in title


= in title or abstract (textword)


= year of publication


= publication type (eg editorial, review article)


= title, original title, abstract, name of substance word, subject heading word

                               MeSH Subheadings:


= adverse effects


= congenital


= complications


= contraindications


= drug effects


= diagnosis


= drug therapy


= education


= epidemiology


= etiology


= nursing


= prevention & control


= radiography


= therapy (non-pharmalogical)


= theraputic use


= ultrasonography


Then the search can be LIMITED to eliminate unwanted irrelevant/unusable articles. Examples of limit set options are:

  • Human
  • Male
  • Review article
  • Abstracts
  • English language
  • Publication year
  • AIM journals (Abridged Index Medicus � the �mainstream�  medical journals)

The above four part question example could therefore be searched as:

  • foreign � 21340 results
  • *foreign bodies -� 13623 results
  • foreign bodies/us � 258 results
  • foreign bodies/ra � 3679 results
  • foreign bodies/ra and foreign bodies/us � 54 results
    -this appears to have sufficiently narrowed the search
    -however the results look at either ultrasonography or uss of foreign bodies
    -not a comparison of the two modalities.

A better method would be:

  • *foreign bodies/ and exp radiography/ - 1670 results
  • *foreign bodies/ and exp ultrasonography/ - 205 results
  • exp foreign bodies/ and exp radiography/ and exp ultrasonography/ - 136 results
  • *foreign bodies/ and exp radiography/ and exp ultrasonography/ - 50 results

Of which five appear to address our question:

1.Mizel MS. Steinmetz ND. Trepman E. Detection of wooden foreign bodies in muscle tissue: experimental comparison of computed tomography, magnetic resonance imaging, and ultrasonography. [Journal Article] Foot & Ankle International. 15(8):437-43, 1994 Aug.          

2.Donaldson JS. Radiographic imaging of foreign bodies in the hand. [Review] [28 refs] [Journal Article. Review] Hand Clinics. 7(1):125-34, 1991 Feb.

3. Ginsburg MJ. Ellis GL. Flom LL. Detection of soft-tissue foreign bodies by plain radiography, xerography, computed tomography, and ultrasonography. [Journal Article] Annals of Emergency Medicine. 19(6):701-3, 1990 Jun.

4. Torfing KF. Teisen HG. Skjodt T. Computed tomography, ultrasonography and plain radiography in the detection of foreign bodies in pork muscle tissue. [Journal Article] Rofo: Fortschritte auf dem Gebiete der Rontgenstrahlen und der Nuklearmedizin. 149(1):60-2, 1988 Jul.

5. De Flaviis L. Scaglione P. Del Bo P. Nessi R. Detection of foreign bodies in soft tissues: experimental comparison of ultrasonography and xeroradiography. [Journal Article] Journal of Trauma-Injury Infection & Critical Care. 28(3):400-4, 1988 Mar.

This will be difficult to grasp at first, but with practice it should get easier to quickly find the answers to any question you ask.

Most hospital libraries offer training to improve your evidence searching skills, which is usually free and very helpful.

 Types of studies

Pragmatic research

  • Whether a treatment works or how useful a test is in routine practice.
  • Unselected populations, randomised
  • Simple protocols, allowing leeway for physician judgement
  • Measures outcomes relevant to patients � eg mortality, quality of life, length of stay

Explanatory research

  • How or why a treatment works, whether it works under specific (usually ideal) circumstances.
  • Specific  staff / settings
  • Selected populations
  • May measure clinical / physiological outcomes, such as BP or arterial blood gas measurements
  • Methods may interfere with clinical care
  • May produce care that is highly structured and protocol-driven.

Evaluating A Therapy

Generally involves comparing a group of patients receiving the therapy with a comparable group receiving a placebo or a different therapy.

There are key design elements that influence the validity and generalizability of these studies:

  • Patient selection � if inclusion and exclusion criteria are highly selective, the results will only be applicable to the population included, ie will not be generalized
  • Allocation of patients to a group (treatment or control) - if patients, carers or researchers can influence allocation, bias will result (allocation or selection bias)
  • Randomization of the allocation process � this is used to ensure allocation is not influenced by patients, carers or researchers. Many methods may be used, and in fact a randomization schedule does not have to be completely random, so long as it is not predictable (eg in block randomization). To be effective participants must be blinded to allocation:
  • Allocation Concealment � ensures participants are unable to predict the allocation of the next patient into any given group until the patient is enrolled and consented. The ideal (but expensive) method is the telephone randomization hotline. Sealed envelopes are effective but all must be accounted for at the end of the trial and regularly checked for tampering. Allocation concealment is the key to avoiding bias.

  • Blinding � ensures the measurement if outcome is free from bias (hence avoids a different form of bias from allocation concealment). It should be clear exactly who was blinded � patients, carers, those measuring the results (to prevent measurement bias), and those interpreting the results.

Diagnostic papers

The reference �Gold Standard� is the criterion by which it is decided that the patient has, or does not have, the disease. Typical reference standards might be:

  • A single diagnostic test that is known to be very accurate, e.g. XR in bony fracture
  • A combination of diagnostic tests that used appropriately will reliably rule-in and rule-out disease, e.g. VQ scanning for pulmonary embolus combined with pulmonary angiographies in equivocal cases
  • Diagnostic testing with follow-up for negative cases to identify cases of disease that may have initially been misclassified as disease negative

An ideal reference standard should correctly classify patients with and without disease. However, it should also be safe and simple to apply, because it would be unethical to ask patients to undergo dangerous or complex testing purely for research purposes.

The same reference standard should be applied to all patients, regardless of the results of the diagnostic test under evaluation.

Work-up bias occurs when different reference standards are used depending on the perceived risk of a positive test

Incorporation bias occurs when the diagnostic test under evaluation forms part of the reference standard, eg cardiac markers in myocardial infarction.

Those measuring the reference standard must be blinded to the test under evaluation, and vice-versa.

Study populations should be representative of the population who would receive the test in routine practice. If the population is highly selected then this will bias estimates of sensitivity and specificity.

Evaluations of diagnostic tests should include some assessment of reliability. The most common method for estimating reliability is to measure the Kappa score. This calculates the agreement between observers beyond that expected due to chance. Values range from 0 (chance agreement only) to 1 (perfect agreement).

Cohort Studies

Used to find out what has happened to patients. A group of individuals is identified and watched to see what happens to them. May have a control/comparison group, but not necessarily.

Essential features - the defining characteristic is the element of time. A set of individuals is identified at one point in time and followed up at a later time to see what has happened to them. The direction of time is always forwards � i.e. if in a study individuals are selected at one point and traced backwards to see how they were at some point in the past, it is not a cohort study.

Complications - some studies identify a set of patients at some point in the past and follow them up to the present � this is a cohort study because time flows forwards from the point at which the patients are identified.

Systematic Reviews

A systematic review is a scientific study. It follows the IMRD approach (introduction, methods, results, and discussion). The conclusion should represent an unbiased synthesis of available data relating to a specific question.

It consists of three stages:

  1. Literature searching and retrieval
  2. Selection of appropriate papers
  3. Quality assessment of selected papers

Ideally done by two independent assessors, blinded to each others� decisions.

Look for:

  • Focused question
  • Methodology described
  • Systematic and comprehensive literature search
  • Primary studies selected according to defined criteria
  • Quality of primary data assessed objectively according to predefined criteria 
  • Synthesis of primary data may be attempted using statistical techniques
  • Potential bias in selection of primary data may be assessed
  • Conclusions result from a scientific study of the available data



  • Was there a conflict of interest?
  • Are the guidelines concerned with an appropriate topic?
  • Do they state clearly the goal of ideal treatment in terms of clinical or economic outcome?
  • Who published? "Experts" in the field? Was there a meta-analyst involved?
  • Systematic review based or "consensus"?
  • Are the guidelines valid and reliable?
  • Are the guidelines clinically relevant, comprehensive and flexible?
  • Do the guidelines take patient acceptability into account?
  • Do the guidelines include suggestions for dissemination, implementation and review?



Approach to Appraisal

First and foremost, HAVE A SYSTEM, then practice it. A common one is known by the acronym:


  • Objectives
  • Methods
  • Results
  • Analysis
  • Discussion/Conclusion

  • Objectives
      • Do the authors have a clearly defined objective for their study?
  • Methods
      • Design
      • Setting
      • Participants (Inclusion / exclusion criteria, age range, underlying disease)
      • Interventions
      • Outcome measures (Primary / Secondary)

Plus depending on type of paper:



Systematic review / Meta-analysis


Sample size




Follow up

Similar groups

Equal Rx


Gold Standard

Comparison with standard (blind & independent)

Both tests in all


Follow up

Data source

Study selection + inter rater obs

Clinical quest & objectives

Methodology of included studies

Weighing +/- rejection of poor quality studies

Handling of heterogeneity

  • Results
      • Main outcome � estimate / precision
      • Secondary outcomes - estimate / precision
  • Discussion & Conclusion
      • Authors�
        • Bottom line
        • Bias justification / limitation
        • External validation (compare to other studies / clinical practice)
      • Mine
        • Bottom line
        • Bias justification / limitation
        • Applicability (to my practice)
        • Limitation (considering my set-up)

Plus don�t forget to look at the References.

The FCEM Examination includes a Critical Appraisal section. See the FCEM guidance page for advice and infromation about the Critical appraisal section of the exam



Null Hypothesis

Most research papers of value will have an objective that states a clear hypothesis (that there is a difference between two or more groups). The opposite of the hypothesis is the null hypothesis (a prediction that there is no difference between two groups)

Testing a hypothesis: the P-value

We start out with the assumption that the null hypothesis is true. We then see what if any difference is demonstrated between two groups (eg treatment group & placebo group), and use statistical tests to calculate the probability that this difference could have arisen by chance. This probability is the P-value.

The smaller the P-value, the smaller the probability is that the difference arose by chance. If this probability is very small, then we can reject the null hypothesis (that there is no difference).

In other words we can be confident the difference is due to the intervention.

The P-values considered significant (eg P<0.05) should be defined at the beginning of a research project. This level is known as alpha.

Confidence intervals

Data collected in a research trial provides an estimate of a measurement that we use to answer the research question. Confidence intervals tell us how much uncertainty lies around this estimate.

Most confidence intervals are expressed as 95% - that the true value has a 95% probability of lying within the confidence interval. If the confidence interval is narrow, the estimate is more precise.

Confidence intervals provide information about clinical significance, whether the result is statistically significant or not. P-values do not. Confidence intervals can also be used to estimate the likelihood of a type II error.

Expressing Magnitude of Effect










  • Control Event Rate (CER) = c/c+d

  • Experimental Event Rate (EER) =a/a+b

Relative Risk (or Risk Ratio)

The ratio of the probability of developing an outcome over a specified time, with the intervention group compared to the control group. RR=EER/CER

Relative Risk Reduction 

The proportion that an intervention reduces a harmful outcome in comparison to patients not receiving the intervention. RRR = [CER-EER] / CER

Absolute Risk Reduction

The difference in rates of an adverse event between study and control populations. ARR=CER-EER

At a very simplistic level, reporting the RRR can make the treatment sound more

impressive than reporting the ARR. Both measures have their uses, but the ARR may be more useful for decision-making in the individual patient, particularly as it is used to calculate the number needed to treat.

Number Needed to Treat (NNT)

Number of patients who need to be treated over a specified period of time to achieve one additional good outcome. The inverse of absolute risk reduction [1/ARR]

Type I error

When you say there is a difference between the two groups when actually there isn�t (ie reject the NH when it is true).

This may occur when the p-value is set too low.

Multiple hypothesis testing, where researchers collect data without any clear objective and then analyse the data to look for statistically significant results, is a common cause of type I errors in poorly planned studies. This can be hard to spot if only the positive results are reported � critics should look for a logical flow from the objectives and methods to identify a clear rationale for doing the test in question.

Type II error

There is a difference between the two groups but you fail to spot it (ie fail to reject the NH when it is false). Usually because study is underpowered (not enough numbers).

The probability of a false negative result (defined as beta) is determined by the sample size. The larger the sample size, the smaller beta will be.

If confidence intervals are wide, estimates are imprecise and false negative result more likely. If the minimum clinically significant difference considered could in fact have been made smaller (and still worthwhile detecting), a type II error is possible.


Alternative hypothesis TRUE

Null Hypothesis TRUE

Research shows significant result

True Positive

False Positive


Research shows no significant result

False Negative


True Negative


The likelihood of detecting a true difference. It is also the probability of rejecting the null hypothesis.
The power of a study is defined as 1-beta. Conventionally, a study should aim to recruit a sufficient sample size for the power to be 80 or 90%.

Several factors will influence study power:

  • Level at which alpha is set � 0.05 by convention
  • Sample size
  • Variability of the outcome measure (defined by its standard deviation)
  • The minimum clinically significant difference we wish to detect

Intention to treat analysis

Patients should be analyzed in the group to which they were originally randomized, regardless of whether they actually received the treatment they were allocated to. This ensures that the protection from bias created by allocation concealment is maintained.

The Hawthorne Effect

When examining changes within an organization, studies that simply measure outcomes before and after an intervention, and then conclude that intervention caused the change in outcome may be subject to confounding by the Hawthorne Effect. Based on experiments undertaken at the Hawthorne works of the Western Electric Company in Chicago, this describes the observation that people change their behaviour when they think that you are watching them. Therefore any intervention, if subsequently monitored, will produce a recordable change in processes or outcomes, which is lost when monitoring ceases.

 Essential definitions



  • % of all true (+Ve) / true (�Ve) of all the results


  • A prediction

Null Hypothesis

  • A prediction that there is no difference between two groups


  • Is the finding true (ie can we trust the results, have they measured what they are supposed to)


  • Gets same results every time = reproducibility.


  • Is the finding applicable elsewhere?


  • Results are affected by systematic error
    Bias leads to inaccurate estimates. Accuracy can only be determined by examining the methods of a study and deciding if they have led to bias.


  • Results affected by random error
    P values tell us how likely this is
    Chance leads to imprecise estimates
    Confidence intervals give us an indication of the precision of an estimate


  • Results have been misinterpreted (ie part of the observed relationship between two variables is due to action of a third.) A false conclusion is drawn. Known confounders can be accounted for in the analysis, unknown confounders cannot.


  • Whether a treatment can work under ideal conditions


  • Whether a treatment does work under normal conditions

Presenting results:

Case positive

  • An individual with the disease in question, i.e. the gold standard is positive.

Case negative

  • An individual without the disease in question, i.e. the gold standard is negative.

Test positive

  • An individual with a positive result for the diagnostic test under investigation.

Test negative

  • An individual with a negative result for the diagnostic test under investigation.

True positives

  • Diseased individuals who test positive

False positive

  • Disease free but test positive

True negative

  • Disease free and test negative

False negative

  • Diseased individuals who test negative


Case Positive

Case Negative

Test Positive



Test Negative



Sensitivity = A/(A+C)

  • The proportion of people who have the disease who test positive for the disease
    If a test/sign has a high sensitivity, a negative result can help rule out the diagnosis (SNout).

Specificity = D/(B+D)

  • The proportion of disease free people who test negative for the disease
    If a test/sign has a high specificity, a positive result can help rule in the diagnosis (SPin)
    Sensitivity and specificity are constant when the prevalence varies

Positive predictive value = A/(A+B)

  • The probability that a patient has the condition if test is positive
    PPV increases with increasing prevalence

Negative predictive value = D/(C+D)

  • The probability that a patient doesn�t have the condition if test is negative
    NPV decreases with increasing prevalence

Likelihood ratio for a positive test

  • How much more likely is a positive result to be found in a person with as opposed to without the condition.
                      Sensitivity / (1 - specificity)

Likelihood ratio for a negative test

  • (1-sensitivity) / specificity

Likelihood ratio

Value of additional information


None at all

0.5 � 2

Little clinical significance

2 � 5

Moderately increases likelihood of disease. Useful additional information, but not diagnostic.

0.2 � 0.5

Moderately decreases likelihood of disease. Useful additional information, but not rule-out.

5 � 10

Markedly increases likelihood of disease. May be diagnostic if other information is supportive.

0.1 � 0.2

Markedly decreases likelihood of disease. May rule-out if other information is supportive.


> 10

Diagnostic. If this does not convince you that the patient has the disease then you probably shouldn�t have done the test.

< 0.1

Rules out disease.

Relative Risk

  • The risk of an event (eg death) after the experimental treatment/procedure as a percentage of the original (standard) risk

Power of a Study

The likelihood of detecting a true difference. Usually 80-90%.
It is also the probability of rejecting the null hypothesis.


  • The proportion of the population with the condition of interest.
    Prevalence = (a+c) / (a+b+c+d)

Type I error

  • When you say there is a difference between the two groups when actually there isn�t ie reject the NH when it is in fact true

Type II error

  • There is a difference between the two groups but you fail to spot it ie wrongly fail to reject the NH.

Resources available


  • Emergency Medicine Manual - good for definitions, thumbnail versions of terms, stats etc
  • How to read a paper (Greenhalgh) �� Comprehensive & very good, recommended for dipping into certain chapters but don�t need to read or know or understand it all.
  • Pocket guide to critical appraisal (Crombie) � worth buying, easy reading with small chapters and big headings.



JAMA 1994 271 had a series of articles called a users guide to medical literature that are OK but a bit wordy and detailed. Probably worth looking at in the library to see if they suit your style of revision/learning

Basic statistics for clinicians Can Med Assoc J 1995 152

  • A series of four small well written papers:
      • Hypothesis testing I
      • Interpreting study results: confidence intervals
      • Assessing the effects of treatment; measures of association
      • Correlation and regression
  • Sounds heavy reading but actually isn�t and is very understandable and logical.





The Owner(s) of this section : Mr Daniel P STRONG
Hit Count is 2707
2024 Copywrite
click here if you have any technical problems with the page