03 - DePaul University

Download Report

Transcript 03 - DePaul University

Producing Data
Design of Experiments
IPS Chapters 3.1
© 2009 W.H. Freeman and Company
Objectives (IPS Chapters 3.1)
Design of experiments

Anecdotal and available data

Comparative experiments

Randomization

Randomized comparative experiments

Cautions about experimentation

Matched pairs designs

Block designs
Obtaining data
Available data are data that were produced in the past for some other
purpose but that may help answer a present question inexpensively.
The library and the Internet are sources of available data.
Look for primary data…
For example, government statistical offices are the primary source for
demographic, economic, and social data (visit the Fed-Stats site at
www.fedstats.gov).
A new 4-letter (or 9-letter) Word: Anecdotal
Anecdotal evidence refers to isolated stories / incidents /situations,
etc. These are the kinds of things we hear from friends, casestudies in the press, “crazy coincidences”, etc.
Anecdotes are based on selected individual cases, which we tend
to remember because they are often unusual in some way.
There is a very good chance that anecdotes are in no way
representative of any larger group of cases.
Beware of drawing conclusions from personal experiences or hearsay!!
Anecdote:

Smoking doesn't cause lung cancer, "My grandmother lived to 95,
smoked constantly, and didn't die of lung cancer."


The data proves than not all smokers die of lung cancer, but fails to see
the data that 90 percent of the lung cancer cases are smokers.
People wearing top hats seemed to live longer. This fact, was
supported by a great deal of anecdotal evidence. “My grandpa wore
a top hat and lived until 97 years old!”

People that wear top hats are usually richer, therefore can afford better
food, shelter, sanity, and medical resources. A wider study including
people's income identified the false claim.
Population versus sample

Population: The entire group
of individuals in which we are
interested but can’t usually
assess directly.

Sample: The part of the
population we actually examine
and for which we do have data.
How well the sample represents
the population depends on the
sample design.
Example: All humans, all
working-age people in
California, all crickets
Population
Sample

A parameter is a number
describing a characteristic of
the population.

A statistic is a number
describing a characteristic of a
sample.
Observational study: Record data on individuals without attempting
to influence the responses. You simply observe.
Example: Based on observations you make in nature,
you suspect that female crickets choose their
mates on the basis of their health.  Observe
health of male crickets that mated.
Experimental study: Deliberately impose a treatment on individuals
and record their responses. Influential factors can be controlled.
Example: Deliberately infect some males with
intestinal parasites and see whether females
tend to choose healthy rather than ill males.
Observational studies vs. Experiments
 Observational studies are essential sources of data on a variety of
topics. However, if our goal is to understand cause and effect, then
experiments are the only source of fully convincing data.
 Two variables are confounded when their effects on a response
variable cannot be distinguished from each other. Similar idea to a
lurking variable.
 Example: If we simply observe cell phone use and brain cancer, any
effect of radiation on the occurrence of brain cancer is confounded
with lurking variables such as age, occupation, and place of
residence
 Well designed experiments take steps to defeat confounding.
(Terminology)

The individuals in an experiment are the experimental units. If they
are human, we call them subjects.

In an experiment, we do something to the subject and measure the
response. The “something” we do is a called a treatment, or factor.

The factor may be the administration of a drug.

One group of people may be placed on a diet/exercise program for six
months (treatment), and their blood pressure (response variable) would
be compared with that of people who did not diet or exercise.

If the experiment involves giving two different doses of a drug, we
say that we are testing two levels of the factor.

A response to a treatment is statistically significant if it is larger
than you would expect by chance (due to random variation among
the subjects). We will learn how to determine this later.
In a study of sickle cell anemia, 150 patients were given the drug
hydroxyurea, and 150 were given a placebo (dummy pill). The researchers
counted the episodes of pain in each subject. Identify:
• The subjects
• (patients, all 300)
• The factors / treatments
• (hydroxyurea and placebo)
• And the response variable • (episodes of pain)
** Comparative experiments
Experiments are comparative in nature: We compare the response to a
treatment to:





Another treatment,
No treatment (a control)
Older / original treatment (another form of control)
A placebo (another form of control)
Any combination of the above
*** A control is a group to which the new treatment is NOT administered. It
serves as a reference mark for comparison (e.g., a group of subjects that do not
receive any drug or pill of any kind).
A placebo is a fake treatment, such as a sugar pill. This is to test the hypothesis
that the response is due to the actual treatment and not to the subject’s belief
that they were treated.
*** Without a control group, you should be very suspicious about any conclusions
drawn by the experiment.
About the placebo effect
The “placebo effect” is an improvement in health not due to any
treatment, but only to the patient’s belief that he or she will improve.



The “placebo effect” is not understood, but it is believed to have
therapeutic results on up to a whopping 35% of patients.
It can sometimes ease the symptoms of a variety of ills, from asthma to
pain to high blood pressure, and even to heart attacks.
An opposite, or “negative placebo effect,” has been observed when
patients believe their health will get worse.
The most famous, and maybe most powerful, placebo
is a kiss / blow / hug delivered to a crying child.
BIAS
Caution about
experimentation
The design of a study is
biased if it systematically
favors certain
outcomes.
The best way to exclude biases from an experiment is to randomize
the design. Both the individuals and treatments are assigned
randomly.
Let’s play: Find the Bias
What toothpaste do people prefer?

Experiment: In order to determine which brand of toothpaste
Americans perfer, researchers wait outside of Walgreens and ask
everyone who bought toothpaste, which brand they preferred. What
are some biases present in this experiment?

Potential Biases?
-
Colgate is on sale
Crest just had an advertising blitz during the Superbowl
Oprah mentioned in an interview that she likes Aquafresh
* How can we reduce bias?
A major objective in research.
• A double-blind experiment is one in which neither the subjects nor the
experimenter know which individuals got which treatment until the experiment
is completed. The goal is to avoid forms of placebo effects and biases based
on interpretation.
• Randomize (discussed later)
• Have a control group - often, though not always, by using a placebo
• Replicate : The best way to make sure your conclusions are robust is to
replicate your experiment—do it over. Replication ensures that particular
results are not due to uncontrolled factors or errors of manipulation.
The ideal experiment: A “randomized, double-blind,
controlled” trial.
Assume that all data is biased – it’s just a
matter of degree…
A reputable journal will only publish studies that demonstrate a significant
effort to minimize all forms of bias.

Thoughts?

Survey: Obtained 36,000 physician office fax numbers, delivered ~16,000 faxes
and received ~700 replies. Their respondents were mostly private practice
physicians, and mostly mid-career. .” (Source:
http://www.dpmafoundation.org/physician-attitudes-on-medicine.html).


The Doctor Patient Medical Association (DPMA) and the Patient Power Alliance
(PPA) work to repeal health care reform and call themselves a "a nonpartisan
association of doctors and patients dedicated to preserving free choice in
medicine." The organization is a member of the National Tea Party Federation
and the "American Grassroots Coalition
Note which magazine published this article - hardly a fly-by-night magazine!
Lack of realism aka ‘Validity’
Lack of realism is a serious weakness of experimentation. The
subjects or treatments or setting of an experiment may not realistically
duplicate the conditions we really want to study. In that case, we
cannot generalize about the conclusions of the experiment.
Is the treatment appropriate for the response you want to study?
Is studying the effects of eating red meat on cholesterol values in a group of
middle aged men a realistic way to study factors affecting heart disease
problems in humans?

What about studying the effects of hair spray
on rats to determine what will happen
to women with big hair?

Designing “controlled” experiments
Sir Ronald Fisher—The “father of statistics”—was
sent to Rothamsted Agricultural Station in the
United Kingdom to evaluate the success of
various fertilizer treatments.
Fisher found that the data from experiments that had been going on for decades
was basically worthless because of poor experimental design.

Fertilizer had been applied to a field one year and not the following year, in order to
compare the yield of grain produced with v.s. without the fertilizer.

What are the flaws in this research methodology?
 It may have rained more or been sunnier during different years.
 The seeds used may have differed between years as well.

In one case, fertilizer was applied to one field and not applied to a nearby field in the
same year.
BUT:
 The two fields might have had different soil, water, drainage, and history (that
is, the two fields may have been farmed differently in previous years).

Too

many factors affecting the results were “uncontrolled.”
** Any suggestions for a valid control?
Fisher’s solution:
“Randomized comparative experiments”

In the same field and same year, apply
F
F
fertilizer to randomly spaced plots
F
F F F F
F F
F
F F F
F
F F
F
within the field. Analyze plants from
similarly treated plots together.

This minimizes the effect of variation
F
F
F F
F
F
F
F
F F F F
F F F
within the field, in drainage and soil
composition on yield, as well as
controls for weather.
F F
F
F
The control and experimental treatments must
be given to similar groups of individuals

An experiment to compare the growth rates of two species of corn:


Be sure that both varieties of corn are planted in equally fertile soil.
Testing one cancer treatment vs another:


Both treatments must be given to patients with similar severity of disease.
Eg: Don’t give one of the drugs to a more seriously ill group of patients
* Randomization
Any decent study will randomize which subjects are in the control group
vs which are in the experimental group.
For example, if you are comparing a new cancer treatment vs the ‘older’
treatment, which patients get the new treatment and which get the older
treatment must be decided at random.
Principles of Experimental Design
The KEY ideas of experimental design:




Control the effects of lurking variables on the response, by
comparing the treatment you are interested in with a second group
who either receives a placebo, or a different treatment.
Randomize – use some kind of randomization technique to assign
subjects to treatments – in other words, the researcher does not pick
who goes in the treatment group and who goes in the control group.
Replicate each treatment on enough subjects to reduce chance
variation in the results.
Blind: This is another major factor – particularly in medical trials.
Neither the experimenter nor the subjects should be aware which
subjects are receiving the experimental treatment and which
subjects are receiving the control treatment.
Statistical Significance


Statistical Significance: An observed effect so large that it would
rarely occur by chance is called statistically significant.
More on this later…
Completely randomized designs
Completely randomized experimental designs:
Individuals are randomly assigned to groups, then
the groups are randomly assigned to treatments.
Which of the two groups is the control group?
Group 1 is the “experimental group”
Group 2 is the “control group”
** Block aka “stratified” designs
In a block, or stratified, design, subjects are divided into groups, or
blocks, prior to experiments, to test hypotheses (i.e. theories) about
differences between the groups.
For example, suppose you are evaluating three different acne treatments
on a group of teenagers between 14 and 16 years old. You would want to
randomize into a minimum of three groups. In fact, you should stratify into
four groups in order to include a control group.
We’ll stick to three for now—but failure to include a control is a major factor to consider when
you are drawing conclusions or making decisions.
What is another potential flaw/bias?
Gender! At this age, there are all kinds of hormonal changes
affecting teenagers, and they affect males vs females differently.
A stratified experiment:
So, we divide the subjects are into groups, or blocks, prior to the
experiments. This allows us to test hypotheses about differences
between the groups.
Identify the blocks:
A researcher wants to see if there is a significant difference in
resting pulse rates between men and women. Twenty-eight
men and 24 women had their pulse rate measured at rest in
the lab. How would you stratify?

Stratified random sample (by gender)
A researcher wants to determine if BST, a hormone intended to spur greater
milk production works as advertised. A farming research facility makes available
60 dairy cattle. How would you stratify and run this experiment?


Randomize the cattle into 2 groups

One group gets a BST injection

The other group – the “control” gets a fake injection
Compare milk production before and after BST
Matched pairs designs
Matched pairs: Try to find subjects that are closely matched in some
ways—e.g., same sex, same race, similar height, similar weight, similar
age. Within each group, randomly assign which individuals receive
which treatment.
Eg: In a study comparing two cancer treatments, if there are 40 African
Americans in the study, randomize 20 into the new treatment group, and
20 into the control group.
The most closely
matched pair
studies use
identical twins.
Matched pairs in medical experimentation

Medical studies will frequently stratify on matched groups such as










age
ethnicity
socioeconomic status
geographic location
employment
etc
etc
etc
This helps minimize/mitigate the effect of lurkng variables
See Jama Claudication study from course page.
Blinding



Eg: In hospital studies, a patient is sometimes given a bar-code
which they wear on a wristband. The medications also are not
labeled, and instead have a bar-code. The researcher/nurse giving
the medication will scan the wristband and match it with an
appropriate medication bar-code. So neither the patient nor the
researcher knows if they are getting the treatment or the
placebo/control.
If the patient doesn’t know if they are receiving the treatment vs the
control, the study is said to be ‘blinded’.
If the researcher/physician also doesn’t know, the study is said to be
double-blinded. This is much more ideal.
Weaknesses in experimental design

There is no such thing as the perfect experiment. Your goal is to
decide whether any of the limitations in the design are significant
enough to limit the validity of the conclusions.

Outside of reputable journals, this is extremely common!
Example of a randomized, double-blind controlled trial
A major cancer center is excited to hear about a promising new treatment for
pancreatic cancer. So:
 They contact all of the patients in their files with this condition.
 They find 408 patients who agree to be in their trial.
 They exclude from their trial 11 patients who say they moving out of state since
that group cannot be monitored by the center.
 They exclude 43 others from the trial because they have other significant
medical ilnesses which would be confounding
 Of the remaining group, they stratify based on gender and age
 Stratification: Now they have 354 patients remaining.





Gender: 190 are female and 164 are male.
Age: They further stratify into the following age groups: 20-40 / 40-60 / 60-80
Randomization and Control: Within each of these 6 groups, the patients are
randomly assigned to receive their usual treatment (the control group) vs the
new treatment (the experimental group)
Double Blind: Neither the patients nor the physicans know which patient is
receiving which treatment. They will not find out until the study has been
completed.
Very good, but there are still some flaws in the design of this study…
Limitations/Flaws in the pancreatic cancer study?




-
Stage of cancer
Choice of age groups
Lack of placebo control (ethics)
Ethics:
Why couldn’t we use a placebo as the control?
If the new treatment showed staggeringly effective results, the
experiment would be halted and all patients would probably be
changed to the new drug. This is very rare, however.
Producing Data
Sampling designs – Intro to Inference
IPS Chapters 3.2 and 3.3
© 2009 W.H. Freeman and Company
Objectives (IPS Chapters 3.2 and 3.3)
Sampling designs; Toward statistical inference

Sampling methods

Simple random samples

Stratified samples

Caution about sampling surveys

Population versus sample

Toward statistical inference

Sampling variability

Capture–recapture sampling
* Population vs Sample




A political scientist wants to know what % of college students consider
themselves conservatives
An automaker highers a market research firm to learn what % of adults
18-35 recall seeing TV ads for a new SUV
Government economists want to know about average household
income
It would be impossible to ask these questions of every single college
student / adult / household. Instead, we ask a sample of college
students / adults / households.




The population refers to the entire group that we want information about
The sample is the small section of the population that we actually examine
The GOAL of a study is for the information we derive from the sample
to generalize accurately to the population as accurately as possible.
Identify the population of interest for the three examples above.
If you’re biased and you know it…


Biases are everywhere
It is very important to be aware of the different types of bias and
where they tend to show up
Two examples of bias seen in sampling methods
1. Convenience sampling: Just ask whoever is around.


Which men, and on which street?



Example: “Man on the street” survey (cheap, convenient, often quite
opinionated, or emotional => now very popular with TV “journalism”)
Ask about gun control or legalizing marijuana “on the street” in
Berkeley or in some small town in Idaho and you would probably get
totally different answers.
Even within an area, answers would probably differ if you did the
survey outside a high school or a country western bar.
Bias: Opinions limited to individuals who are present.
2. Voluntary Response Sampling:

Individuals choose to be involved. These samples are very
susceptible to being biased because different people are motivated
to respond or not. Often called “public opinion polls,” these are not
considered valid or scientific.

Bias: Sample design systematically favors a particular outcome.
Bias present? Ann Landers summarizing responses of her
readers:
“70% of (10,000) parents wrote in to say that having kids was
not worth it—if they had to do it over again, they wouldn’t. “
Bias: Most letters to newspapers are written by disgruntled people. A
later sample found the exact opposite result! Incidentally, it turned out
that this sample was also very flawed.
CNN Online surveys – Is there a bias?
Bias: People have to care enough about an issue to bother replying. This sample
is probably a combination of people who hate “wasting the taxpayers money” and
“animal lovers.”
Ideally: RANDOM Sampling
Random sampling:

Individuals are randomly selected. No one group should be overrepresented.
Random sampling avoids several potential sources bias.
* Simple random samples
A Simple Random Sample (SRS) is made of randomly selected
individuals. Each individual in the population has the same probability of
being in the sample. All possible samples of size n have the same
chance of being drawn. - You’ll see ‘SRS’ frequently throughout the
course.
A sample that is not random is
essentially useless.
‘Nuff said.
Stratified samples
A common, and important form of random sampling:
A stratified random sample is essentially a series of SRSs performed
on subgroups of a given population. The subgroups are chosen to
contain all the individuals with a certain characteristic. For example:



Divide the population of DePaul students into males and females.
Divide the population of California by major ethnic group.
Divide the counties in America as either urban or rural based on population
density.
* The SRS taken within each group in a stratified random sample need
not be of the same size. For example:


A stratified random sample of students may end up with 100 male and 150
female students
A stratified random sample of a total of 100 Californians, will likely have a
higher percentage of Hispanics, than, say, a SRS taken in Arkansas.
Multistage samples use multiple stages/levels of stratification. They are often
used by the government to obtain information about the U.S. population.
Example: Sampling both urban and rural areas, people in different ethnic and
income groups within the urban and rural areas, and then within those strata
individuals of different ethnicities eg: Abortion rates, Obesity rates
Data are obtained by taking an SRS for each substrata.
Statistical analysis for
multistage samples is more
complex than for an SRS.
* Common biases in surveys:

Nonresponse Bias: People who feel they have something to hide
or who don’t like their privacy being invaded probably won’t answer.
Yet they are absolutely part of the population under study!

Remember that the most important objective of a good sample is for that
sample to accurately represent the population.

Response Bias: Fancy term for lying. This is particularly important
when the questions are very personal (e.g., “How much do you
drink?”)

Wording effects Bias: Questions worded like “Do you agree that it
is awful that…” are prompting you to give a particular response.

Etc, Etc  Bias can show up in all kinds of unexpected ways.

Another biggie: Undercoverage
Occurs when parts of the population are left out in the
process of choosing the sample.
Because the U.S. Census goes “house to house,” homeless people
are not represented. Illegal immigrants also avoid being counted.
Geographical districts with a lack of coverage tend to be poor.
Representatives from wealthy areas typically oppose statistical
adjustment of the census.
Historically, clinical trials have avoided including women in
their studies because of their periods and the chance of
pregnancy. This means that medical treatments were not
appropriately tested for women. This problem is slowly
being recognized and addressed.
1. To assess the opinion of students at the Ohio State University about campus
safety, a reporter interviews 15 students he meets walking on the campus late
at night who are willing to give their opinion.
 What is the sample here? What is the population? Why?

All those students walking on campus late at night

All students at universities with safety issues

The 15 students interviewed

All students approached by the reporter
2. An SRS of 1200 adult Americans is selected and asked: “In light of the huge
national deficit, should the government at this time spend additional money to
establish a national system of health insurance?“ Thirty-nine percent of those
responding answered yes.
 What can you say about this survey?
If it is truly an SRS, then we can say that the sampling process is sound.
However, the wording is biased. The results probably understate the
percentage of people who do favor a system of national health
insurance.
Should you trust the results of the first survey? Of the second? Why?

Flaws?

To assess the opinion of students at the Ohio State University about
campus safety, a reporter interviews 15 students he meets walking
on the campus late at night who are willing to give their opinion.



People who feel safe are more likely to walk out at night. People who
don’t feel safe probably won’t do so as often. They would be underrepresented in the sample.
Non-Response: Entirely possible that some people would hurry away or
refuse to answer if someone approaches them with a question at night.
Others?
Population versus sample

Population: The entire group
of individuals in which we are
interested but can’t usually
assess directly.

Sample: The part of the
population we actually examine
and for which we do have data.
How well the sample represents
the population depends on the
sample design.
Example: All humans, all
working-age people in
California, all crickets
Population
Sample

A parameter is a number
describing a characteristic of
the population.

A statistic is a number
describing a characteristic of a
sample.
** Statistical Inference
Inferential statistics refers to the process of drawing conclusions about a
population based on the information determined from a sample.
Inference is the bread-and-butter of statistics.
However, ALL of the following concepts are very important to bear in mind:

Your estimate of the population is only as good as your sampling design.  Work
hard to eliminate biases.

Trying to determine relationship between # beers and BAC by sampling a group of NFL
athletes will not properly generalize to the population of all Americans.

** Your sample is only an estimate—and if you randomly sampled again you
would probably get a somewhat different result.

Calculate the average height of 20 people. Then do it again – you will almost certainly get
a different result.

The bigger the sample the better. However, big sample sizes have their own
problems.

Eg. Studies with large sample sizes can become very expensive
* Sampling variability
Recall that your sample is only an estimate. Each time we take a
random sample from a population, we are likely to get a different set of
individuals and, therefore, calculate a different statistic. This is called
sampling variability.
The good news is that, if we take lots of random samples of the same size from
a given population, the variation from sample to sample—the sampling
distribution—will follow a predictable pattern. All of statistical inference is
based on this knowledge.
Be sure to understand what is meant by
a ‘sampling distribution’…

Refers to the results taken from many, many samples.
Example:



A sample of the heights of 20 college-aged women would give one
result. Of course, this is only one sample. If we repeat this sample
again, we’d almost certainly obtain a different result.
If we repeat this sample again, we will have 2 results. If we repeat
this sample 100 times, we will have 100 results.
If we plot these 100 results on a histogram, we have a sampling
distribution.
Sampling variability contd
The variability of a statistic is described by the spread of its sampling
distribution. Recall that each sample will give a different result. So what
if you took many, many samples of people’s heights and calculated the
mean from each sample?
Answer: You would get a distribution of sample values. (Specifically,
you’d get a distribution of sample means).
In a later lecture, we will show that if you graphed all of these means on
a histogram, they will turn out to follow a normal distribution pattern.
Recall that a normal distribution has a spread (aka variability) which we quantify
as the standard deviation. How wide the spread (sd) is depends on the sampling
design and the sample size. Larger sample sizes lead to lower spread.
Statistical Variability contd

Statistics from large samples are almost always close estimates of the
true population parameter.

Of course, this only applies to random samples. If there is bias, the results
will NOT be accurate!
I repeat: The higher the bias the less trust you can place in the results.
Remember: There will always be bias. A good study, however, will do
everything they can to minimize it.
Eg: “QuickVote” online surveys. They are worthless
no matter how many people participate because they
use a voluntary sampling design and not random
sampling.
A very effective way to reduce sampling variability is
to use a larger sample size
However, large samples are not always attainable.

Sometimes the cost, difficulty, or preciousness of what is studied limits
drastically any possible sample size

Blood samples/biopsies: No more than a handful of repetitions
acceptable. We often even make do with just one.

Opinion polls have a limited sample size due to time and cost of
operation. During election times though, sample sizes are increased
for better accuracy.

There are several techniques used to get around the problem of a
limited sample size. One is discussed on the following slide.
Producing Data
Ethics
IPS Chapters 3.4
© 2009 W.H. Freeman and Company
Objectives (IPS Chapters 3.4)
Ethics

Institutional review boards

Informed consent

Confidentiality

Clinical trials

Behavioral and social science experiments
Institutional review boards (IRBs)

An organization that carries out the study often has an institutional
review board that reviews all planned studies in advance in order to
protect the subjects from possible harm.


Every medical school / teaching hospital in the United States has an IRB.
“The purpose of an institutional review board is to protect the rights and
welfare of human subjects (including patients) recruited to participate in
research activities”

The institutional review board:




reviews the plan of study
can require changes
reviews the consent form
monitors progress at regular intervals

Sometimes they will close a study right in the middle!
Informed consent

All subjects must give their informed consent before data are
collected.

Subjects must be informed in advance about the nature of a study
and any risk of harm it might bring.

Subjects must then consent in writing.

Who can’t give informed consent?



prison inmates
very young children
people with mental disorders
Confidentiality

All individual data must be kept confidential. Only statistical
summaries may be made public.

Confidentiality is not the same as anonymity. Anonymity prevents
follow-ups to improve non-response or inform subjects of results.

Separate the identity of the subjects from the rest of the data
immediately!
Clinical trials

Clinical trials study the effectiveness of medical treatments on actual
patients – these treatments can harm as well as heal.

Points for a discussion:


Randomized comparative experiments are the only way to see the
true effects of new treatments.

Most benefits of clinical trials go to future patients. We must
balance future benefits against present risks.

The interests of the subject must always prevail over the interests
of science and society.
In the 1930s, the Public Health Service Tuskegee study recruited
399 poor blacks with syphilis and 201 without the disease in order to
observe how syphilis progressed without treatment. The Public
Health Service prevented any treatment until word leaked out and
forced an end to the study in the 1970s.
Behavioral and social science experiments

Many behavioral experiments rely on hiding the true purpose of the
study.

Subjects would change their behavior if told in advance what
investigators were looking for. (This is why there is nothing “real”
about Reality TV).

The “Ethical Principals” of the American Psychological Association
require consent unless a study merely observes behavior in a public
space.
Example – Claudication Study (on web page)










Methods: first thing they mention is IRB approval; Randomized; Design: 3 groups; Location
(Northwestern)
Inclusion & Exclusion Criteria: defining the population
Measurement: How they measured the results – sometimes straight-forward, sometimes can be a
huge and contentious issue. How do you measure pain symptoms? How do you measure
improvement?
Blinding: Obviously could not be double-blinded since patients knew their ‘treatment’. However,
researchers were blinded. They just saw the data results. They did not know which patients were in
which group as the experiment was going on.
Details: Many other issues and techniques employed by the study are explained in careful detail.
Stratifications (Blocks): Claudication vs No Claudication.
Control group: Nutritional consulting, regular meetings with data-gathering team, etc, but NO
exercise.
Outcomes: In particular note the very frequent mention of p-values, and confidence intervals. Very
important and we will be learning about them.
Charts and graphs:

p159: Breakdown of stratifications. Also note the ‘exclusion’ disclaimer at the bottom of the
graph. If you’re gonna leave people out of your analysis, you’d better explain why. In this case, 4
were left out in the end because they did not respond to following up.

Table 1, p.170: A careful breakdown and description of the people in each strata (block)
Conclusion: A study should at some point summarize the researchers’ recommendations on what
the study can tell us. In this study it is in the very last paragraph: “Physicians should recommend
supervised treadmill exercise programs for PAD patients regardless of whether they have classic
symptoms of intermittent claudication”.