s14_lect1 - Biostatistics
Download
Report
Transcript s14_lect1 - Biostatistics
Spatial Distribution of Cancer Rates
Introduction to Biostatistics (BIOL 200)
Cancer Distributions
Roll/Cards
Context of Biology 200
Introduction to Statistics and Observational Studies
Course Overview
Syllabus
Expectations
Introductions
Guessing Ages
What might explain this pattern?
Cancer Distributions
Student Roll Cards
Fill out a card with the following:
1.
Your name
2.
Do vision/hearing limitations require you to sit up front?
3.
Number of college units taken to date
4.
Are you a crasher/on the wait list?
5.
Previous college biology courses taken
6.
Previous statistics courses taken
7.
Previous college math courses
8.
What degree are you are seeking and why?
9.
Do you know where/how to look for functions in Excel?
10. What do you look forward to learning in this course?
11. What are your biggest concerns about this course?
Context of Biology 200
Biostatistics
Objectives
Modern Science
Science is a method and
literature based on the
method
Science uses empirical tests
of hypotheses
Tests based on quantitative
observation
Data analysis is the pivot of
this scientific process
Data analysis is the
combination of the use and
interpretation of statistics
Modern Science
Science is a method and the
literature produced through
the method
Science uses empirical tests
of hypotheses
Tests are based on
quantitative or quantifiable
observation
Data analysis is the pivot of
this scientific process
Data analysis is both the use
and interpretation of statistics
Good data analysis requires
an understanding and use of
statistics
What biologists “do”?
Biologists study the living world through observation and
experimentation
Two types of scientific questions:
Description (“WHAT?” questions)
Explanation (“HOW?” questions)
Course Overview
At the end of the course students will
be able to:
explain why biologists benefit from a background in statistics
demonstrate knowledge of statistical analyses in the context of the
scientific research
write about, explain, and communicate statistical concepts and
procedures
use, apply, and learn statistical software to enter data, graph and
analyze data, test hypotheses, and interpret results from simple
samples and experiments
‘A’ students interpret and discuss biological research presented in
the scientific literature and popular publications
‘A’ students write about, explain, and communicate biological
findings using statistical concepts and results.
Over the course of the semester,
students will learn how to:
produce data through observational studies and experiments
visualize biological data
describe the distribution of single variables
examine and describe relationships between variables
draw inferences from a sample to a population
link biological hypotheses to statistical hypotheses
construct simple statistical models and estimate model parameters
test various types of simple statistical hypotheses
‘A’ students will know how to construct and interpret multipleregression models.
Grades
Routes to Awards
E
n
d
o
fS
e
m
e
s
te
r
M
o
s
tA
w
a
r
d
s
P
r
im
a
r
y
A
s
s
e
s
s
m
e
n
t
A
S
u
f
fi
c
ie
n
c
y
A
w
a
r
d
(
T
e
n
a
t
iv
e
)
S
e
c
o
n
d
C
h
a
n
c
e
s
U
A
u
to
m
a
tic
C
o
n
v
e
r
s
io
n
S
u
f
fi
c
ie
n
c
y
A
w
a
r
d
(
F
in
a
l)
A
S
e
c
o
n
d
a
r
y
A
s
s
e
s
s
m
e
n
ts
A
U
F
in
a
lE
x
a
m
A=Award given
U=No Award & Homework Completed
M=Misunderstanding demonstrated
U*=No Award w/o Homework Completed
Study Flow
Questions, Concerns, & Requests
Missed Classes & Integrity:
If you seek an excused absence:
READ the syllabus before planning an absence
Read the syllabus before an absence occurs
Review the syllabus when an absence occurs
Ask if you have questions about the policy or have recommendations
for changes to the policy.
Read the syllabus before contacting me about an absence.
Call and Email ASAP in an emergency absence. (save my number)
Cheating:
Zero tolerance for any form of cheating, big or small. You cannot
pass this course without an award for scientific ethics.
Review the rules about plagiarism when in doubt. Expulsions do
happen, even at UCSD, for cheating on a single assignment, on the
first (caught) offense.
Lecture 1. Samples and
observational studies
Chapter 7
The Practice of Statistics in the Life Sciences
Introduction to statistics
Statistics are "a quantitative technology for
empirical science; it is a logic and methodology
for the measurement of uncertainty and for an
examination of that uncertainty."
The key word here is "uncertainty." Statistics
becomes necessary in science when
observations are variable.
How do we deal with variability?
One approach is to measure every single entity in every possible
location of our population of interest
NOTE: this might not always be a “biological population”
Censusing a population is rarely possible in practice.
Potential solution: we measure a subset of all the possible entities
(i.e., a sample) and use these values to make inferences about the
population
Goals of Statistics
Estimate important quantitative values
Test hypotheses about those values
Statistics is about good
scientific practice
Good design and interpretation
Feline High-Rise Syndrome (FHRS)
The injuries associated with a cat falling out
of a window.
“The diagnosis of high-rise syndrome is not
difficult. Typically, the cat is found
outdoors, several stories below, and a
nearby window or patio door is open.”
High falls show lower injury rates
Whitney and Mehloff, Journal of the American Veterinary Medicine Association, 1987
Why?
Knowledgeable speculation:
1.
2.
3.
4.
Cats have high surface-to-volume ratios
Cats have excellent vestibular systems
Cats reach terminal velocity quickly, relax,
and therefore absorb impact better
Cats land on their limbs and absorb shock
through soft tissue
Jared Diamond, Nature 1988
A sample of convenience is a
collection of individuals that
happen to be available.
A newer study reports more injuries with
longer falls
Vnuk et al. 2004. Feline high-rise syndrome: 119 cases (1998-2001). J. Fel. Med. Surg. 6:305-312.
Objectives (W&S Chpt 1, supplmental)
Samples and observational studies
Know, understand, evaluate, integrate, and utilize information about:
Observation versus experiment
Population versus sample
The role of randomness in sampling
The simple random sample (SRS)
Other probability samples
Sample surveys
Comparative observational studies
Observational versus experimental studies
Observational study: Record data on individuals without attempting
to influence the responses.
Do female crickets choose their mates on the basis of
their health? Observe health of male crickets and mating status.
Experimental study: Deliberately impose a treatment on individuals,
and record their responses. Influential factors can be controlled.
Individuals do not chose which treatment they get.
Infect some males crickets with intestinal parasites and
keep other males healthy. Set traps to see whether
females tend to choose healthy rather than sick males.
Confounding
Two variables are confounded when their effects on a response
variable cannot be distinguished.
Observational studies often fail to yield clear causal conclusions,
because the explanatory variable is confounded with lurking variables.
Hours studying
CAUSE?
Good grade on test
“Intelligence”
Confounding?
Experiment or observational study?
In 1992, several major medical organizations said that women should take
hormones such as estrogen after menopause, when natural production of
these hormones ends. Indeed, women who had taken hormones seemed to
reduce their risk of a heart attack by 35% to 50%.
By 2002, several studies with women of different ages concluded that
hormone replacement does not reduce the risk of heart attacks. These
studies had assigned women to either hormone replacement or to dummy
pills that look and taste the same as the hormone pills. The assignment was
done by a coin toss, so that all kinds of women were equally likely to get
either treatment.
Population versus sample
Population: The entire group
of individuals in which we are
interested but can’t usually
assess directly
Sample: The part of the
population we actually examine
and for which we do have data
Population
Sample
A parameter is a number
summarizing a characteristic
of the population.
A statistic is a number
summarizing a characteristic of
a sample.
The role of randomness in sampling
How do you select the individuals/units in a sample?
Voluntary response sampling: individuals choose to be involved
Convenience sampling: ask whoever is around (mall, street) or ask
the first group through the door.
Probability sampling: individuals or units are randomly selected;
the sampling process is unbiased if done properly.
Ann Landers summarizing responses of readers: 70% of
(~10,000) parents wrote in to say that having kids was not
worth it—if they had to do it over again, they wouldn’t.
But a random sample showed that 91% of parents WOULD have kids again.
What do you think explains such drastically different responses?
Would you expect very different responses on the
potential legalizing of marijuana if you asked the first
people you saw on the parking lot of a university or the
first people you saw on the parking lot of a church?
The simple random sample
A Simple Random Sample (SRS) is made of randomly selected
individuals. Each individual in the population has the same probability of
being in the sample. All possible samples of size n have the same
chance of being drawn.
How to choose an SRS?
Draw from a hat (lottery style)
Flip a coin
Use a table of published random numbers (Table A)
Use software that generates random numbers
Other probability samples
A stratified random sample is the combination of two or more SRSs
taken from subgroups of a given population. The subgroups are chosen
to contain all the individuals with a certain characteristic.
The SRSs taken within each group in a stratified random sample need
not be of the same size.
Stratified random sample of 100 male and 100 female registered
voters in your county
Stratified random sample of 30 seniors and 20 juniors from your
college major
Multistage samples are drawn in multiple stages, by taking a random
sample within a random sample .
Start by taking a random sample of 5 counties in the state. Within each
selected county, take a random sample of 2 school districts. Within each
school district, take a random sample of 2 schools. (20 schools total)
Multistage samples are often
used by the government to
survey the whole population,
because they are more
economical. However,
statistical analysis for is more
complex than for an SRS.
National Electronic Injury Surveillance System
Sampling frame:
Hospitals with 6+ beds having an
emergency department [psychiatric
and penal institutions excluded]
Four strata:
Small hospitals
Large hospitals
Very large hospitals
Children’s hospitals
Map of NEISS hospitals
Data collected:
Interviews of people who came to emergency rooms with specific injuries.
Sample surveys
A sample survey is an observational study that relies on a random
sample drawn from the entire population.
Opinion polls are sample surveys that typically use voter registries or
telephone numbers to select their samples.
In epidemiology, sample surveys are used to establish the incidence
(rate of new cases per year) and the prevalence (rate of all cases at
one point in time) of various medical conditions, diseases, and lifestyles.
These are typically stratified or multistage samples.
Some survey challenges
Nonresponse: Individuals selected to be interviewed, but who either
cannot be reached or refuse to participate.
Response bias: Fancy term for lying or simply forgetting. Questions
about the past or sensitive issues often get biased responses (“How
much do you drink?” or “How much do you weigh?”).
Wording effects: Questions worded like “Do you agree that it is awful
that…” are prompting you to give a particular response. Questions
may also be too complicated and confusing.
Undercoverage: Sometimes parts of the population are left out in the
process of choosing the sample. For instance, surveys of households
omit homeless individuals. Surveys conducted online tend to include a
younger population.
How bad is nonresponse?
The Census Bureau’s American Community Survey (ACS): ~ 2.5%
Via mail with reminders. Response is mandatory.
University of Chicago’s General Social Survey (GSS): ~ 30%
In person.
Private polling firms such as SurveyUSA: ~90%
Over the phone (with interviewer or automated call) or online.
1995-2002
What do you think might explain the different results of these two surveys?
Results are based on
telephone interviews with
1018 national adults,
aged 18 and older,
conducted Feb. 6-7,
2009, as part of Gallup
Poll Daily tracking.
BBC World News America/The Harris Poll of 2158
adults surveyed online between February 3-5, 2009.
“Do you believe Charles Darwin’s theory which states
that plants, animals and human beings have evolved
over time?”
The Nurses’ Health Study is one of the largest prospective
observational studies designed to examine factors that may
affect major chronic diseases in women.
Since 1976, the study has followed a cohort of over 100,000 registered nurses.
Every two years, they receive a follow-up questionnaire about diseases and
health-related topics. Response rate: ~ 90% each time.
2007 report on age-related memory loss:
About 20,000 women ages 70+ had completed telephone interviews every two
years to assess their memory with a set of cognitive tests. One of the findings:
the more women walked during their late 50s and 60s, the better their memory
score was at age 70 and older.
However, we cannot unambiguously conclude that walking has a protective
effect against memory loss. Why?
Variable
A variable is a characteristic measured on individuals drawn from a
population under study.
Data are measurements of one or more variables made on a
collection of individuals.
Explanatory and response variables
We try to predict or explain a response variable from an
explanatory variable.
Older terminology:
dependent variable and independent variable
hold
Recently my 5 year old son, Blake, came to me and
said tell them it is my ‘dee dee’ (his dee dee is the
blanket he wraps up in when he does not feel good.)
Although conventional physicians tend to dismiss the
results of alternative medicine as ‘simply a placebo
effect,’ my son’s observation captures a simple
truth too many of us have forgotten. Like my son’s
security blanket there is a magical and intangible
element in the healing process which cannot be
measured – the role of the mind and the emotions.
Forward to an Encyclopedia of Alternative Medicine by Deepak Chopra MD
E
n
d
o
fS
e
m
e
s
te
r
M
o
s
tA
w
a
r
d
s
S
e
c
o
n
d
C
h
a
n
c
e
s
P
r
im
a
r
y
A
s
s
e
s
s
m
e
n
t
A
S
u
f
fi
c
ie
n
c
y
A
w
a
r
d
(
T
e
n
a
t
iv
e
)
A
u
to
m
a
tic
C
o
n
v
e
r
s
io
n
U
M
A
S
e
c
o
n
d
a
r
y
A
s
s
e
s
s
m
e
n
ts
S
u
f
fi
c
ie
n
c
y
A
w
a
r
d
(
F
in
a
l)
A
U
U
*
F
in
a
lE
x
a
m
A
=
Achievement, U = Unclear demonstration of skill, M = Evidence of misunderstanding
*Indicates ineligiblity for secondary assessments
Guessing Ages
The class will divide into 6 groups, of various sizes.
Each group will get a photograph of a person.
The group will record their single best guess of the age of person in
the photograph.
Groups will then swap photographs until they have assessed each of
the 12 photographs.
Finally, the true ages will be revealed. Each group will determine the
precision and accuracy of their age estimation.
Comparative observational studies
Case-control studies start with 2 random samples of individuals with
different outcomes, and look for exposure factors in the subjects’ past
(“retrospective”).
Individuals with the condition are cases, and those without are controls.
Good for studying rare conditions. Selecting controls is challenging.
Cohort studies enlist individuals of common demographics, and keep
track of them over a long period of time (“prospective”). Individuals
who later develop a condition are compared with those who don’t.
Cohort studies examine the compounded effect of factors over time.
Good for studying common conditions. Very expansive.
Aflatoxicosis epidemics
Aflatoxins are secreted by a fungus found in damaged
crops and can cause severe poisoning and death.
The Kenya Ministry of Health investigated a 2004 outbreak of aflatoxicosis resulting
in over 300 cases of liver failure. A sample of 40 case-patients and 80 healthy
controls were asked how they had stored and prepared their maize.
The case-patients were randomly selected from a list of individuals admitted to a
hospital during the 2004 outbreak for unexplained acute jaundice.
Control individuals were selected to be as similar to the case-patients as possible,
yet randomly selected.
Preliminary data suggested that soil, microclimate, and farming practices
might have played a role, but not age or gender.
For each case-patient, two individuals from the patient’s village with no
history of jaundice symptoms were randomly selected.
Creative Problem Solving and
Creative Discovery Requires Practice
Science is More than Decoding Textbooks
http://www.ted.com/talks/dan_meyer_math_curriculum_makeover.html
Our Focus is on Understanding
The CPR analogy does not apply;
memorizing recipes is not the same as understanding.
Higher learning should prepare you for science and citizenship
Clean “textbook” examples are rare
Colleges have a responsibility to develop higher thinking
- Creatively apply knowledge, integrate ideas, understand concepts, evaluate new
information, and utilize knowledge in new and ambiguous situations.
Presentation of Material in this Course
Direct instruction builds content knowledge for simple topics
Hands-on, active learning, student-centered classrooms have demonstrated
to be superior in promoting understanding in science and statistics courses.
Direct instruction reduces risk to the classroom flow.
Student-center classrooms are dynamic; be flexible and expect crashes.
Exams
Multiple-choice is inexpensive, but poor at recording student’s reasoning.
Self-evaluation is a weak spot for most Mesa students.
"Statistical thinking will one day be as necessary for efficient citizenship
as the ability to read and write." H.G.Wells
Context of Instructional
Choices
Pedagogical Considerations