The following guidance is by no means comprehensive or prescriptive,
but reflects some of the experiences and research of trainees. Its content
is not guaranteed by the College of Emergency Medicine, BAEM or EMTA. The
information on critical appraisal is derived from courses and the resources
referenced here -
they come highly recommended. In particular the Centre for Evidence Based Medicine is worth bookmarking.

The College of Emergency Medicine have made available the slides and handout from the Critical Appraisal Course organized by Professor Steve Goodacre. The handout - CRITICAL APPRAISAL FOR EMERGENCY MEDICINE TRAINEES - is a huge and valuable resource of exceptional quality and readily available. It is essential reading during CTR and exam preparation.

Most training
regions now require each trainee to write one CTR/year. The first is always
the hardest, but by the time your examination comes around, you should have
the choice of 3-4 to present. If you want to read another trainee's CTR, or
have finished your CTR & exam and would like to share it with others, please
visit the Shared
Resources section of this website.

KEEP IT SIMPLE!
- Choose a well defined topic, preferably one that does not bring >100
references on your literature search. Aim for 20-30 references. Don't forget
that you'll need to critically appraise each of these, and may be quizzed
on your sources during your exam. At least one high profile Emergency Physician
got a serious grilling about referenced papers during a 2005 sitting! Aim
to be an expert in a small defined area in which you have an interest.

Critical appraisal

The process of
evaluating the quality of published evidence.

It allows health professionals to assess the accuracy of diagnostic tests
and efficacy of therapies and preventions, and thereby improve and optimize
their clinical practice.

Write (and
later refine) inclusion and exclusion criteria for papers

Performing
the search

Either:

Discover
the topic is too broad/too many available articles -> Return to
step 1

Narrow
search down to identify 10-40 papers -> Go to step 5

Summarize papers

Assess their
methodological quality

Discuss findings

Consider further original research to explore topic further

Once complete:

Show your CTR to as many people as possible. Everyone will
have a different angle on things

Get into the habit of talking about your CTR. Arrange plenty
of mock vivas

Form your opinions on the implications in the real world
of your CTR

Regularly recheck references and appraise new developments,
and again immediately before your FCEM examination

Recommended texts
for guidance include:

Egger, M.,
Smith, G. D. - Systematic reviews in health care: meta-analysis in
context (London: BMJ)
-excellent book which gives a step by step guide to doing a systematic
review.

Guyatt,
G., Rennie, D. - User�s guide to the medical literature: a
manual for evidence based clinical practice (Chicago: American medical
association

If the question you wish to ask is well structured, finding
the relevant papers will become much easier. The BESTBETS website
is a great source of three part questions that demonstrate this.

A four part approach can also be employed:

1 The Patient / Problem

Eg in patients with suspected glass
foreign body in foot�

2 The Intervention (treatment
/ investigation / process)

�does emergency department
USS�

3 The Comparison / Alternative (optional,
if relevant)

�compared to plain XR�

4 The Outcome

�reliably exclude presence
of FB

This should help you with you search for evidence as you can be more
logical in your approach.

A literature search may include:

Electronic databases: Medline, Embase, Cinahl, Cochrane database
etc.

Hand search of journals

Grey literature: reports (government or academic), conference
proceedings, internet, libraries, professional societies, Kings
Fund, Nuffield etc.

Research registers: National Research Register, HTA database,
Cochrane

Usually using OVID software in your hospital library, via the BMA
library website (members only), or via Athens (available
to all NHS employees). Alternatively PUBMED is
available free on the internet. See the EBM Resources page in this
website via the button above.

All use Boolean logic (and / or / not) combined with medical subject
heading (MeSH) terms to narrow down a search and so find the most relevant
papers.

MeSH terms are organized into subjects and subdivisions of each, branching
out like tree roots from very broad terms/catagories until you get to
very specific ones. Thus if you wish to search for an article about something
specific (ie that is the subject of the article, rather than mentioned
in passing in an article about something else) you can FOCUS the search
to these articles.

However, if you wish to search for articles that may be included in
subdivisions of a MeSH heading (eg considering femoral fractures, and
wish to include femoral neck fractures) then you may wish to EXPLODE
the search terms � this will include any articles found under
the search term plus all its subdivisions.

The $ symbol is used as a truncation � ie if you search for ultraso$
you will get results for ultrasound, ultrasonography, ultrasonographer,
etc.

If you know the word you�re searching is a MeSH term, you can
add a �forward slash�: /
This can shortcut the mapping page in OVID and save time.

Searching for your CTR evidence:

When performing a literature search for your CTR or similar project:

Use as broad a net as possible.

The problem about using medline abbreviations (eg "/us") is
that you are relying on someone else to have already spotted that the
paper is about your chosen term (ultrasound).

Forget complex terminology and abbreviations and use only the prefix exp and the suffix .mp

All terms should be searched for using both forms.

For example: "exp ultrasonography or ultraso&.mp"

The aim of the search is not to pull up a small number of
papers (this would suggest a potential error in searching) but to find
any possible relevant papers, then search though them by eye, ideally
two people independently.

If the search brings up 20 000 papers, then make a more specific question and start from the beginning again.

It would be wrong to change your inclusion/exclusion criteria after searching as these must be clearly defined before starting.

Get someone experienced (and ideally published) in literature reviews
to check the search strategy BEFORE performing the search. This will
save my hours, days and weeks of re-searching when someone else points
out the glaring omissions.

Quick Searches

If you want to do a quick search to answer a specific question, you can use shortcuts to narrow things down quickly. This would not be wise when searching for CTR evidence - many relevant articles may be missed - use the stategy above instead.

There are a lot of ways of using using notation to specify a search � this
can speed things up once you get used to it:

Prefixes:

*

= FOCUS

exp

= EXPLODE

sh

Will find all the subheadings for
a MeSH term

Suffixes:

.au

= author

.jn

= jounal

.me

= a single word, including anywhere
it appears as a mesh term

.ab

= word in abstract

.ti

= in title

.tw

= in title or abstract (textword)

.yr

= year of publication

.pt

= publication type (eg editorial,
review article)

.mp

= title, original title, abstract,
name of substance word, subject heading word

MeSH
Subheadings:

/ae

= adverse effects

/cn

= congenital

/co

= complications

/ct

= contraindications

/de

= drug effects

/di

= diagnosis

/dt

= drug therapy

/ed

= education

/ep

= epidemiology

/et

= etiology

/nu

= nursing

/pc

= prevention & control

/ra

= radiography

/th

= therapy (non-pharmalogical)

/tu

= theraputic use

/us

= ultrasonography

Then the search can be LIMITED to eliminate unwanted irrelevant/unusable
articles. Examples of limit set options are:

Human

Male

Review article

Abstracts

English language

Publication year

AIM journals (Abridged Index Medicus � the �mainstream� medical
journals)

The above four part question example could therefore be searched as:

foreign bodies.mp � 21340 results

*foreign bodies -� 13623 results

foreign bodies/us � 258 results

foreign bodies/ra � 3679 results

foreign bodies/ra and foreign bodies/us � 54 results
-this appears to have sufficiently narrowed the search
-however the results look at either ultrasonography or uss of foreign
bodies
-not a comparison of the two modalities.

A better method would be:

*foreign bodies/ and exp radiography/ - 1670 results

*foreign bodies/ and exp ultrasonography/ - 205 results

exp foreign bodies/ and exp radiography/ and exp ultrasonography/
- 136 results

*foreign bodies/ and exp radiography/ and exp ultrasonography/ -
50 results

Of which five appear to address our question:

1.Mizel MS. Steinmetz ND. Trepman E. Detection of
wooden foreign bodies in muscle tissue: experimental comparison of
computed tomography, magnetic resonance imaging, and ultrasonography. [Journal
Article] Foot & Ankle International. 15(8):437-43, 1994 Aug.

2.Donaldson JS. Radiographic imaging of foreign bodies
in the hand. [Review] [28 refs] [Journal Article. Review] Hand
Clinics. 7(1):125-34, 1991 Feb.

3. Ginsburg MJ. Ellis GL. Flom LL. Detection of soft-tissue
foreign bodies by plain radiography, xerography, computed tomography,
and ultrasonography. [Journal Article] Annals of Emergency
Medicine. 19(6):701-3, 1990 Jun.

4. Torfing KF. Teisen HG. Skjodt T. Computed tomography,
ultrasonography and plain radiography in the detection of foreign
bodies in pork muscle tissue. [Journal Article] Rofo: Fortschritte
auf dem Gebiete der Rontgenstrahlen und der Nuklearmedizin. 149(1):60-2,
1988 Jul.

5. De Flaviis L. Scaglione P. Del Bo P. Nessi R. Detection
of foreign bodies in soft tissues: experimental comparison of ultrasonography
and xeroradiography. [Journal Article] Journal of Trauma-Injury
Infection & Critical Care. 28(3):400-4, 1988 Mar.

This will be difficult to grasp at first, but with practice it should
get easier to quickly find the answers to any question you ask.

Most hospital libraries offer training to improve your evidence searching
skills, which is usually free and very helpful.

Whether a treatment works or how useful a test is in routine practice.

Unselected populations, randomised

Simple protocols, allowing leeway for physician judgement

Measures outcomes relevant to patients � eg mortality, quality
of life, length of stay

Explanatory research

How or why a treatment works, whether it works under specific (usually
ideal) circumstances.

Specific staff / settings

Selected populations

May measure clinical / physiological outcomes, such as BP or arterial
blood gas measurements

Methods may interfere with clinical care

May produce care that is highly structured and protocol-driven.

Evaluating A Therapy

Generally involves comparing a group of patients receiving the therapy
with a comparable group receiving a placebo or a different therapy.

There are key design elements that influence the validity and generalizability
of these studies:

Patient selection � if inclusion and exclusion criteria are
highly selective, the results will only be applicable to the population
included, ie will not be generalized

Allocation of patients to a group (treatment or control) - if patients,
carers or researchers can influence allocation, bias will result (allocation
or selection bias)

Randomization of the allocation process � this is used to ensure
allocation is not influenced by patients, carers or researchers. Many
methods may be used, and in fact a randomization schedule does not
have to be completely random, so long as it is not predictable (eg
in block randomization). To be effective participants must be blinded
to allocation:

Allocation Concealment � ensures participants are unable
to predict the allocation of the next patient into any given group
until the patient is enrolled and consented. The ideal (but expensive)
method is the telephone randomization hotline. Sealed envelopes
are effective but all must be accounted for at the end of the trial
and regularly checked for tampering. Allocation concealment is
the key to avoiding bias.

Blinding � ensures the measurement if outcome is free from
bias (hence avoids a different form of bias from allocation concealment).
It should be clear exactly who was blinded � patients, carers,
those measuring the results (to prevent measurement bias), and
those interpreting the results.

Diagnostic papers

The reference �Gold Standard� is the criterion by which
it is decided that the patient has, or does not have, the disease.
Typical reference standards might be:

A single diagnostic test that is known to be very accurate, e.g.
XR in bony fracture

A combination of diagnostic tests that used appropriately will
reliably rule-in and rule-out disease, e.g. VQ scanning for pulmonary
embolus combined with pulmonary angiographies in equivocal cases

Diagnostic testing with follow-up for negative cases to identify
cases of disease that may have initially been misclassified as disease
negative

An ideal reference standard should correctly classify patients with
and without disease. However, it should also be safe and simple to apply,
because it would be unethical to ask patients to undergo dangerous or
complex testing purely for research purposes.

The same reference standard should be applied to all patients, regardless
of the results of the diagnostic test under evaluation.

Work-up bias occurs when different reference standards are used
depending on the perceived risk of a positive test

Incorporation bias occurs when the diagnostic test under evaluation
forms part of the reference standard, eg cardiac markers in myocardial
infarction.

Those measuring the reference standard must be blinded to the test under
evaluation, and vice-versa.

Study populations should be representative of the population who would
receive the test in routine practice. If the population is highly selected
then this will bias estimates of sensitivity and specificity.

Evaluations of diagnostic tests should include some assessment of reliability.
The most common method for estimating reliability is to measure the Kappa
score. This calculates the agreement between observers beyond that expected
due to chance. Values range from 0 (chance agreement only) to 1 (perfect
agreement).

Cohort Studies

Used to find out what has happened to patients. A group of individuals
is identified and watched to see what happens to them. May have a control/comparison
group, but not necessarily.

Essential features - the defining characteristic is the element of time. A
set of individuals is identified at one point in time and followed
up at a later time to see what has happened to them. The direction
of time is always forwards � i.e. if in a study individuals are
selected at one point and traced backwards to see how they were at
some point in the past, it is not a cohort study.

Complications - some studies identify a set of patients at some point
in the past and follow them up to the present � this is a
cohort study because time flows forwards from the point at which the
patients are identified.

Systematic Reviews

A systematic review is a scientific study. It follows the IMRD approach
(introduction, methods, results, and discussion). The conclusion should
represent an unbiased synthesis of available data relating to a specific
question.

It consists of three stages:

Literature searching and retrieval

Selection of appropriate papers

Quality assessment of selected papers

Ideally done by two independent assessors, blinded to each others� decisions.

Look for:

Focused question

Methodology described

Systematic and comprehensive literature search

Primary studies selected according to defined criteria

Quality of primary data assessed objectively according to predefined
criteria

Synthesis of primary data may be attempted using statistical techniques

Potential bias in selection of primary data may be assessed

Conclusions result from a scientific study of the available data

Guidelines

Was there a conflict of interest?

Are the guidelines concerned with an appropriate topic?

Do they state clearly the goal of ideal treatment in terms of clinical
or economic outcome?

Who published? "Experts" in the field? Was there a meta-analyst
involved?

Systematic review based or "consensus"?

Are the guidelines valid and reliable?

Are the guidelines clinically relevant, comprehensive and flexible?

Do the guidelines take patient acceptability into account?

Do the guidelines include suggestions for dissemination, implementation
and review?

First and foremost, HAVE A SYSTEM, then practice it. A common one
is known by the acronym:

OMRAD

Objectives

Methods

Results

Analysis

Discussion/Conclusion

Objectives

Do the authors have a clearly defined objective for their
study?

Methods

Design

Setting

Participants (Inclusion / exclusion criteria, age range,
underlying disease)

Interventions

Outcome measures (Primary / Secondary)

Plus depending on type of paper:

Theraputic

Diagnostic

Systematic review / Meta-analysis

Ethics

Sample size

Randomization

Allocation

Blinding

Follow up

Similar groups

Equal Rx

Ethics

Gold Standard

Comparison with standard (blind & independent)

Both tests in all

Reproducible

Follow up

Data source

Study selection + inter rater obs

Clinical quest & objectives

Methodology of included studies

Weighing +/- rejection of poor quality studies

Handling of heterogeneity

Results

Main outcome � estimate / precision

Secondary outcomes - estimate / precision

Discussion & Conclusion

Authors�

Bottom line

Bias justification / limitation

External validation (compare to other studies / clinical
practice)

Mine

Bottom line

Bias justification / limitation

Applicability (to my practice)

Limitation (considering my set-up)

Plus don�t forget to look at the References.

The FCEM Examination includes a Critical Appraisal section. See the
FCEM
guidance page for advice and infromation about the Critical appraisal
section of the exam

Most research papers of value will have an objective that states a clear
hypothesis (that there is a difference between two or more groups). The
opposite of the hypothesis is the null hypothesis (a prediction that
there is no difference between two groups)

Testing a hypothesis: the P-value

We start out with the assumption that the null hypothesis is true.
We then see what if any difference is demonstrated between two groups
(eg treatment group & placebo group), and use statistical tests
to calculate the probability that this difference could have arisen
by chance. This probability is the P-value.

The smaller the P-value, the smaller the probability is that the difference
arose by chance. If this probability is very small, then we can reject
the null hypothesis (that there is no difference).

In other words we can be confident the difference is due to the intervention.

The P-values considered significant (eg P<0.05) should be defined
at the beginning of a research project. This level is known as alpha.

Confidence intervals

Data collected in a research trial provides an estimate of a measurement
that we use to answer the research question. Confidence intervals tell
us how much uncertainty lies around this estimate.

Most confidence intervals are expressed as 95% - that the true value
has a 95% probability of lying within the confidence interval. If the
confidence interval is narrow, the estimate is more precise.

Confidence intervals provide information about clinical significance,
whether the result is statistically significant or not. P-values do not.
Confidence intervals can also be used to estimate the likelihood of a
type II error.

Expressing Magnitude of Effect

Yes

No

Intervention

a

b

Control

c

d

Control Event Rate (CER) = c/c+d

Experimental Event Rate (EER) =a/a+b

Relative Risk (or Risk Ratio)

The ratio of the probability of developing an outcome over a specified
time, with the intervention group compared to the control group. RR=EER/CER

Relative Risk Reduction

The proportion that an intervention reduces a harmful outcome in comparison
to patients not receiving the intervention. RRR = [CER-EER] / CER

Absolute Risk Reduction

The difference in rates of an adverse event between study and control
populations. ARR=CER-EER

At a very simplistic level, reporting the RRR can make the treatment
sound more

impressive than reporting the ARR. Both measures have their uses, but
the ARR may be more useful for decision-making in the individual patient,
particularly as it is used to calculate the number needed to treat.

Number Needed to Treat (NNT)

Number of patients who need to be treated over a specified period of
time to achieve one additional good outcome. The inverse of absolute
risk reduction [1/ARR]

Type I error

When you say there is a difference between the two groups when actually
there isn�t (ie reject the NH when it is true).

This may occur when the p-value is set too low.

Multiple hypothesis testing, where researchers collect data without
any clear objective and then analyse the data to look for statistically
significant results, is a common cause of type I errors in poorly planned
studies. This can be hard to spot if only the positive results are
reported � critics
should look for a logical flow from the objectives and methods to identify
a clear rationale for doing the test in question.

Type II error

There is a difference between the two groups but you fail to spot it
(ie fail to reject the NH when it is false). Usually because study is
underpowered (not enough numbers).

The probability of a false negative result (defined as beta) is determined
by the sample size. The larger the sample size, the smaller beta will
be.

If confidence intervals are wide, estimates are imprecise and false
negative result more likely. If the minimum clinically significant difference
considered could in fact have been made smaller (and still worthwhile
detecting), a type II error is possible.

Alternative hypothesis
TRUE

Null Hypothesis
TRUE

Research shows significant result

True Positive

False Positive

TYPE I ERROR

Research shows no significant result

False Negative

TYPE II ERROR

True Negative

Power

The likelihood of detecting a true difference. It is also the probability
of rejecting the null hypothesis.
The power of a study is defined as 1-beta. Conventionally, a study should
aim to recruit a sufficient sample size for the power to be 80 or 90%.

Several factors will influence study power:

Level at which alpha is set � 0.05 by convention

Sample size

Variability of the outcome measure (defined by its standard deviation)

The minimum clinically significant difference we wish to detect

Intention to treat analysis

Patients should be analyzed in the group to which they were originally
randomized, regardless of whether they actually received the treatment
they were allocated to. This ensures that the protection from bias created
by allocation concealment is maintained.

The Hawthorne Effect

When examining changes within an organization, studies that simply measure
outcomes before and after an intervention, and then conclude that intervention
caused the change in outcome may be subject to confounding by the Hawthorne
Effect. Based on experiments undertaken at the Hawthorne works of the
Western Electric Company in Chicago, this describes the observation that
people change their behaviour when they think that you are watching them.
Therefore any intervention, if subsequently monitored, will produce a
recordable change in processes or outcomes, which is lost when monitoring
ceases.

% of all true (+Ve) / true (�Ve) of all the results

Hypothesis

A prediction

Null Hypothesis

A prediction that there is no difference between two groups

Validity

Is the finding true (ie can we trust the results, have they measured
what they are supposed to)

Reliability

Gets same results every time = reproducibility.

Generalisability

Is the finding applicable elsewhere?

Bias

Results are affected by systematic error
Bias leads to inaccurate estimates. Accuracy can only be determined
by examining the methods of a study and deciding if they have led
to bias.

Chance

Results affected by random error
P values tell us how likely this is
Chance leads to imprecise estimates
Confidence intervals give us an indication of the precision of
an estimate

Confounding

Results have been misinterpreted (ie part of the observed relationship
between two variables is due to action of a third.) A false conclusion
is drawn. Known confounders can be accounted for in the analysis,
unknown confounders cannot.

Efficacy

Whether a treatment can work under ideal conditions

Effectiveness

Whether a treatment does work under normal conditions

Presenting results:

Case positive

An individual with the disease in question, i.e. the gold standard
is positive.

Case negative

An individual without the disease in question, i.e. the gold standard
is negative.

Test positive

An individual with a positive result for the diagnostic test under
investigation.

Test negative

An individual with a negative result for the diagnostic test under
investigation.

True positives

Diseased individuals who test positive

False positive

Disease free but test positive

True negative

Disease free and test negative

False negative

Diseased individuals who test negative

Case Positive

Case Negative

Test Positive

A

B

Test Negative

C

D

Sensitivity = A/(A+C)

The proportion of people who have the disease who test positive
for the disease
If a test/sign has a high sensitivity, a negative result can help
rule out the diagnosis (SNout).

Specificity = D/(B+D)

The proportion of disease free people who test negative for the
disease
If a test/sign has a high specificity, a positive result can help
rule in the diagnosis (SPin)
Sensitivity and specificity are constant when the prevalence varies

Positive predictive value = A/(A+B)

The probability that a patient has the condition if test is positive
PPV increases with increasing prevalence

Negative predictive value = D/(C+D)

The probability that a patient doesn�t have the condition
if test is negative
NPV decreases with increasing prevalence

Likelihood ratio for a positive test

How much more likely is a positive result to be found in a person
with as opposed to without the condition.
Sensitivity
/ (1 - specificity)

Likelihood ratio for a negative test

(1-sensitivity) / specificity

Likelihood ratio

Value of additional information

1

None at all

0.5 � 2

Little clinical significance

2 � 5

Moderately increases likelihood
of disease. Useful additional information, but not diagnostic.

0.2 � 0.5

Moderately decreases likelihood
of disease. Useful additional information, but not rule-out.

5 � 10

Markedly increases likelihood of
disease. May be diagnostic if other information is supportive.

0.1 � 0.2

Markedly decreases likelihood of
disease. May rule-out if other information is supportive.

> 10

Diagnostic. If this does not convince
you that the patient has the disease then you probably shouldn�t
have done the test.

< 0.1

Rules out disease.

Relative Risk

The risk of an event (eg death) after the experimental treatment/procedure
as a percentage of the original (standard) risk

Power of a Study

The likelihood of detecting a true difference. Usually 80-90%.
It is also the probability of rejecting the null hypothesis.

Prevalence

The proportion of the population with the condition of interest.
Prevalence = (a+c) / (a+b+c+d)

Type I error

When you say there is a difference between the two groups when
actually there isn�t ie reject the NH when it is in fact true

Type II error

There is a difference between the two groups but you fail to spot
it ie wrongly fail to reject the NH.

Emergency Medicine Manual - good for definitions, thumbnail
versions of terms, stats etc

How to read a paper (Greenhalgh) �� Comprehensive
& very good, recommended for dipping into certain chapters but don�t
need to read or know or understand it all.

Pocket guide to critical appraisal (Crombie) � worth
buying, easy reading with small chapters and big headings.

JAMA 1994 271 had a series of articles called a users guide to
medical literature that are OK but a bit wordy and detailed. Probably
worth looking at in the library to see if they suit your style of revision/learning

Basic statistics for clinicians Can Med Assoc J 1995 152

A series of four small well written papers:

Hypothesis testing I

Interpreting study results: confidence intervals

Assessing the effects of treatment; measures of association

Correlation and regression

Sounds heavy reading but actually isn�t and is very understandable
and logical.