Transcript Slide 1

A quick survey of
Epidemiology and its
methods
Seminar by:
Diego Villarreal
Outline





What is epidemiology and what is it
used for
A quick glance at statistical
hypothesis
The normal distribution and p-values
Example of an epidemiological
survey conducted in India
Concluding remarks
What is Epidemiology?
Epidemiology is often defined as the study of
factors that determine the occurrence and
distribution of diseases in a population
Epidemiology is used to determine the course of
medical treatments, public health and scientific
research.
Areas in which epidemiology plays an important
role: Nutrition, Environmental Health, HIV etc…
Jekel J.F, Katz D., Elmore J. Epidemiology, Biostatistics, and Preventive Medicine. W.B Saunders Company.
Philadelphia USA 2001.
Epidemiology is often divided into two
different groups:
Classical Epidemiology and Clinical
Epidemiology
Classical Epidemiology- population oriented,
studies the risk factors associated with a
population. (HIV, Pb level in blood etc…)
Clinical Epidemiology- Studies patients in
health care settings in order to improve
the diagnosis and treatment of various
diseases. ( Drugs, nutrition etc.)
How do we know that the observations we
make of a specific population are correct?
Are they always the same?
Can we quantify our “Correctness”
Epidemiologists use tools such as
probability and statistics to assess and
validate their findings.
A little Probability and Statistics
Before testing a hypothesis, we must set up the hypothesis
in a quantitative manner.
The measurements done in epidemiological studies must be
a number of some sort. (i.e. number of patients that did
not receive a drug and died, mean blood pressure in HIV
patients, Pb levels of children in India etc…)
In statistics, there are usually two types of variables;
Continuous and Discrete.
Continuous Variables can assume an infinite number of
values (Pb levels, blood pressure, age, height)
Discrete variables can only assume a fixed number of
numerical values ( Sex, Pregnancy etc…)
Montgomery C.D, Runger C.G. Applied Statistics and Probability for Engineers. Jon Wiley & Sons. New York
NY: 2003.
The Null Hypothesis
A hypothesis that is tested statistically is called a
Null hypothesis (Ho).
The null hypothesis usually takes the form of a
“no difference hypothesis”, and we try to reject it
with the gathered data.
Example:
We want to test the efficiency of a drug that
reduces the death rates in two different groups.
Ho: Death rate group A = Death rate group B
or
Ho : Death rate group A- Death rate group B = 0
The null hypothesis is tested against an
Alternative hypothesis HA
HA: Death rate A –Death rate B ≠0
The HO is rejected if there is an observed
difference in the death rates of both
groups.
By rejecting Ho we automatically accept HA
Notice that HA does not specify whether
Death rate A > Death rate B
Wassertheil-Smoller S. Biostatistics and Epidemiology. Springer. New York NY: 2004.
Errors Associated with accepting or
rejecting a hypothesis:
Rejecting the Null hypothesis incorrectly- Type I
Error
Failing to reject the Null hypothesis- Type II Error
Decision on Basis of Sample
TRUE STATE OF NATURE
Drug Has
Drug has Effect; Ho
no effect False, Ha
(Ho True) Ture
Do Not
Type II
Reject Ho
No Error Error
Reject Ho
Type I
(Accept Ha) Error
No Error
It is important to understand that we can never
eliminate the risk of making one of these errors.
However we may lower the probability of making
these errors.
The probability of making a Type I error is known
as the Significance Level of a Statistical Test
Due to the intrinsic nature of the Null hypothesis,
by lowering the probability of a Type I error you
increase the probability of a Type II error
To lower both probabilities, one must increase
sample size.
The most widely used model for the
distribution of a random variable is the
normal distribution, which is described by
the following mathematical expression:
Where:
µ = Mean
σ = Standard Deviation

 f ( x)dx  1

It is important to notice the following facts:
68% of the observations fall within 1 standard deviation of the
mean, that is, between µ- σ and µ + σ .
95% of the observations fall within 2 standard deviations of the
mean, that is, between µ- 2σ and µ+2σ .
99.7% of the observations fall within 3 standard deviations of
the mean, that is, between µ- 3σ and µ+ 3σ .
P-values
The p-value is an index of the strength of
the evidence with regard to rejecting the
null hypothesis.
The p-value gives us an idea of whether or
not our data arises from mere chance, or
is indeed reliable and “true”
By convention, if a p-value > 0.05, we say
that the result is NOT statistically
significant, therefore accepting the Null
hypothesis.
Blood lead levels in Bombay:
A Case Study
As stated at the beginning, Epidemiology studies
the occurrence and distribution of diseases in a
population.
In 2002 a study was done to compare the blood
lead levels (BLL) of children in Bombay after the
use of lead in gasoline was prohibited.
The data collected was compared against existing
data from 1997 (before Pb was prohibited).
In 1997 the Georges Foundation conducted a study of 291
children (ages 6-10) in Bombay to test the BLL of children
in metropolitan areas in India
This study showed the following:
61.8% (n=180) had BLL > 10 μg/dL
14.7% (n=43) had BLL > 20 μg/dL
2.7% (n=8) had BLL > 30 μg/dL
0.6% (n=2) had BLL > 40 μg/dL
The study also pointed out that the mean BLL was in the
range of 8.6-14.4 μg/dL
Concentrations of lead in air in various locations ranged
from 0.10 – 1.18 μg/m3
At the time of this study, seasonal variations (monsoon vs.
non-monsoon season) were not studied.
Nichani, V; Li-W.I; Smith M.A; Noonan G; Kulkarni M.; Kodavor M; Naeher L.P; Science of the Total
Environment 2005 (In Press).
In the 2002 study, measurements were done in two
different campaigns (Monsoon and non-monsoon season)
A total of 754 children under 12 yrs were sampled.
276 (36.6%) during non-monsoon season and 478 (63.4%)
during monsoon season.
BLL were measured using an ESA Lead Care Portable
Analyzer.
The study locations were Panchseel Hospital (Mulund) and
low socioeconomic areas in Mulund and Thane. This was
done to include children with different socioeconomic status
(SES).
SES was determined by parental occupation and geographic
location.
The dependent variable used in the
analysis was BLL.
Independent variables were age, sex,
SES, and season.
Sex, SES and season were treated as
discrete variables, while age and BLL were
treated as continuous variables.
Since the distribution of BLL was not
normal, the data was normalized using a
Box-Cox transformation.
Results:
Sample t-tests showed that BLL’s
differences across SES were significant
(t=-5.9; p<0.0001), with lower SES
having higher BLL
BLL’s across seasons were also statistically
significant (t=5.4, p <0.0001). Higher
BLLs in the monsoon season.
BLL between boys and girls were not
statistically significant (t=1.1, p = 0.28)
Age is associated with increasing BLLs to a
small degree (p = 0.014).
Nichani, V; Li-W.I; Smith M.A; Noonan G; Kulkarni M.; Kodavor M; Naeher L.P; Science of the Total
Environment 2005 (In Press).
From the study done in India many things can be concluded.
1. Eliminating the Pb from gasoline is extremely important in
lowering the BLL of individuals.
2. Children with lower SES are more
susceptible to blood
poisoning.
3. During monsoon season, the BLL’s of children tend to be
higher.
4. Developing countries around the world MUST prohibit the use
of Pb in gasoline in order to secure the health of their citizens.
Conclusion



Epidemiology is a great tool in order to
shape public health policy
Statistics help epidemiologists determine
whether or not observations within a
population are relevant and significant
Epidemiological results MUST be used not
only by scientists and doctors, but by
politicians in order to have a healthy and
productive society.
QUESTIONS?!?!?!