Transcript Lecture1
Introduction: Why statistics?
Petter Mostad
2005.08.29
Statistics is…
• …a way to summarize and describe
information: not very interesting in itself
• …an important tool for research in my
field, and something I look forward to
learning more about
• …an important tool for research in my
field, but I only learn what I must learn
about this
• … boring
What best describes your attitude towards statistics?
How much do you already know?
• Definition of mean value, median,
standard deviation?
• Bayes formula?
• t-tests?
• p-values?
• Computing the probability of getting dealt
a flush in a game of poker?
Why a course in statistics?
What is research?
• A distinguishing feature of scientific
research is that its conclusions are
reproducible by other scientists
• Thus, research must
– contain information about exactly what has
been done
– somehow convince the reader that if she
repeates what has been done, she will reach
the same conclusions
A goal of science: To study
causality
• Ultimately, much of science is concerned
with establishing statements like ”If A
happens, then B will follow”
• In other words, one wants to show that B
is reproduced every time A happens.
Example: Studying causality
through intervension
• Retrospective studies can show
covariation between variables, but not
causality.
• Intervension can be used to argue that
changing a certain variable causes
another variable to change.
• To study effect of intervension, a control
group is needed
Example: Reproducibility through
randomization
Assume an experiment is done, with two
groups, receiving different ”treatment”:
• Differences in the result could be caused
by differences in the treatments, or by
differences between the groups from the
start.
• Randomising the division into groups
makes it unlikely that the groups are
systematically different from the start
Example: blind, or double-blind
studies
• Differences between the two groups could
be caused by people’s knowledge they are
in one group or the other.
• Differences could also be caused by the
experimentalists (doctors) knowledge who
is in which group.
• Removing the first knowledge gives a blind
study, removing the second gives a
double-blind study.
Quantitative and qualitative
research
• Quantitative: Focus on things that can be
measured or counted
• Qualitative: Focus on descriptions and
examples.
• Two different scientific tratidions. Health
economics and administration has
elements from both.
• Both have advantages and disadvantages
(which)?
Quantitative research
• For quantitative research, we have many
good tools to ensure reproducibility of
conclusions
• Statistics is a very important such tool
• Statistics used in this way can be called
inferential statistics
Example: Reproducibility through
statistics
• If you repeat a quantitative investigation (a
questionnaire, an observation of a social
phenomenon, a measurement) you are
unlikely to get exactly the same numbers.
• Statistics can help you to estimate how
different results are likely to be.
• This can tell you which conclusions are
likely to be reproducible in a potential
repetition of the investigation.
Descriptive vs. inferential statistics
• Descriptive statistics: To sum up, present,
and visualize data.
• Inferential statistics: A tool to handle, and
to draw (”infer”) reproducible conclusions
on the basis of, uncertain information.
Descriptive statistics
• Goal: To reduce amount of data, while
extracting the ”most important information”
• Can be done with single numbers
(”summary statistics”), tables, or graphical
figures.
• My next lecture will look at descriptive
statistics
Can descriptive statistics be
”objective”?
• A person makes choices about:
– What to measure
– How to measure (for example what questions
to ask or what scale to use)
– How to present the result
• Thus: A presentation or publication should
always contain information about exactly
how results have been obtained
Inferential statistics:
Hypothesis test example
• You throw a dice ten times, and get 1
seven out of these ten times. You
conclude that this is not a fair dice. Is the
conclusion reproducible?
• You need to compute what observations
are to be expected if the dice is a fair one.
Example: probability calculations
• The disease X has a 1% prevalence in the
population. There is a test for X, and
– If you are sick, the test is positive in 90% of
cases.
– If you are not sick, the test is positive in 10%
of cases.
• You have a positive test: What is the
probability that you are sick?
Example: desicions based on
uncertain information
• An oil company wants to produce the
maximum amount from an oil field.
• Available information:
– Measurements (seismics) describing
approximately the geometry of the rock layers
– Information from a couple of test drills
– Information from geologists
• Where should they place the wells, and
how should they produce?
The concept of a MODEL
• What separates inferential statistics from
descriptive statistics is the use of a model.
• A model is a (mathematical) description of
the connections between the variables you
are interested in.
• It is a simplification of reality, and so never
”correct” or ”wrong”, but it can be more or
less useful.
Statistical (or stochastic) models
• In statistical models, the variables are predicted
with some variation or uncertainty:
– The model for force moving a mass: F=ma, is exact.
– The model for what the eyes of a fair dice will show
contains probabilities
• We can use the observed data to choose
between possible models.
• The word ”stochastic” is often used when we are
focusing more on the model than on the data.
Example
• Assume a certain portion of the population
carry a specific gene, you want to know
how many
• The model is simply the unknown
proportion p
• You select and measure a number of
individuals, and use the information to
select the right model, i.e., the right p
Example
• You want to know the height distribution
among 30 year old Norwegian women.
• You assume, using experience, that a
good model is a normal distribution with
some expectation and some variance
• You use data from a number of women to
select a model (i.e. an expectation and
variance), or a range of likely such
expectations and variances
Sampling
• Often, the model can be a simplifying description
of the population we want to study.
• We investigate the model by sampling from the
population.
• When each individual is selected independently
and randomly from the population, we call it
(simple) random sampling
• Simple random sampling makes it easier to
compute what we can conclude about the model
from the data
Using the results
• Selecting some models over others means
that you increase your understanding of
each variable, and the relationships
between variables
• Once a model has been selected, it can be
used to forecast or predict the future
• Being able to predict the likely results of
different desicions can be used to improve
the desicion making
The goals of this course
• To enable you to understand, use, and
criticise research results produced by
others, and in particular to understand and
view critically the statistical arguments
• To enable you to produce your own valid
research results, using statistical tools.
Overview of statistics topics
we will look at
•
•
•
•
•
•
•
•
•
Descriptive statistics
Probability theory
Sampling and estimation
Regression
Non-parametrics
Analysis of variance
Desicion theory
Some more advanced topics
Much information is and will be available at
course web page