Relationships Between Categorical Variables

Download Report

Transcript Relationships Between Categorical Variables

Relationships Between Categorical Variables
Thought Questions
1. Suppose a news article claimed that drinking coffee doubled your risk of developing a
certain disease. Assume the statistic was based on legitimate, well-conducted research.
What additional information would you want about the risk before deciding whether to quit
drinking coffee?
(Hint: Does this statistic provide any information on your actual risk?)
2. A recent study estimated that the “relative risk” of a woman developing lung cancer if she
smoked was 27.9.
What do you think is meant by the term relative risk?
Relationships Between Categorical Variables
Thought Questions
3. A study classified pregnant women according to whether they smoked and whether they were
able to get pregnant during the first cycle in which they tried to do so.
What do you think is the question of interest? Attempt to answer it. Here are the results:
Pregnancy Occurred After
First Cycle Two or More Cycles Total
Smoker
29
71
100
Nonsmoker
198
288
486
Total
227
359
586
Relationships Between Categorical Variables
Displaying Relationships Between Categorical Variables: Contingency Tables
•
•
Count the number of individuals who fall into each combination of categories.
Present counts in table = contingency table.
•
•
Each row and column combination = cell.
Row = explanatory variable.
Column = response variable.
Example 1: Aspirin and Heart Attacks
Variable A = explanatory variable = aspirin or placebo
Variable B = response variable = heart attack or no heart attack
Contingency Table with explanatory as row variable, response as column variable, four cells.
Heart Attack No Heart Attack Total
Aspirin
104
10,933
11,037
Placebo
189
10,845
11,034
Total
293
21,778
22,071
Relationships Between Categorical Variables
Conditional Percentages and Rates
Example: Find the Conditional (Row) Percentages
Aspirin Group:
Percentage who had heart attacks = 104/11,037 = 0.0094 or 0.94%
Placebo Group:
Percentage who had heart attacks = 189/11,034 = 0.0171 or 1.71%
Rate: the number of individuals per 1000 or per 10,000 or per 100,000.
Percentage: rate per 100
Example : Percentage and Rate Added
Aspirin
Placebo
Total
Heart
Attack
104
189
293
No Heart
Attack
10,933
10,845
21,778
Total
11,037
11,034
22,071
Heart
Attacks (%)
0.94
1.71
Rate per
1000
9.4
17.1
Relationships Between Categorical Variables
Example: Ease of Pregnancy for Smokers and Nonsmokers- Retrospective Observational Study
Variable A = explanatory variable = smoker or nonsmoker
Variable B = response variable = pregnant in first cycle or not
Time to Pregnancy for Smokers and Nonsmokers
Much higher percentage of nonsmokers than smokers were able to get pregnant during first
cycle, but we cannot conclude that smoking caused a delay in getting pregnant.
Relationships Between Categorical Variables
Risk, Probability, and Odds
Percentage with trait = (number with trait/total)×100%
Proportion with trait = number with trait/total
Probability of having trait = number with trait/total
Risk of having trait = number with trait/total
Odds of having trait = (number with trait/number without trait) to 1
Relative Risk, Increased Risk, and Odds
A population contains 1000 individuals, of which 400 carry the gene for a disease.
Equivalent ways to express this proportion:
•
•
•
•
•
Forty percent (40%) of all individuals carry the gene.
The proportion who carry the gene is 0.40.
The probability that someone carries the gene is .40.
The risk of carrying the gene is 0.40.
The odds of carrying the gene are 4 to 6 (or 2 to 3, or 2/3 to 1).
Relationships Between Categorical Variables
Baseline Risk and Relative Risk
Baseline Risk: risk without treatment or behavior
• Can be difficult to find. Example: Risk of getting lung cancer if you don’t smoke.
• If placebo included, baseline risk = risk for placebo group.
Relative Risk: of outcome for two categories of explanatory variable is ratio of risks for each
category.
• Relative risk of 3: risk of developing disease for one group is 3 times what it is for
another group.
• Relative risk of 1: risk is same for both categories of the explanatory variable (or
both groups).
Relationships Between Categorical Variables
Swedish Study: Effectiveness of Population-Based Service Screening With Mammography for
Women Ages 40 to 49 Years
RESULTS: During the study period, there were 803 breast cancer deaths in the study group (7.3
million person-years) and 1238 breast cancer deaths in the control group(8.8 million person-years).
The estimated RR (crude) for women aged 40-49 was 0.79 (95% CI, 0.72-0.86)
Study Group: 803/ 7,261,415 = 0.00011
Control Group: 1238/8,843,852 = 0.00014
Relative Risk = 0.00011/0.00014 = 0.79
Relative Risk = 0.00014/0.00011 = 1.27
Breast Cancer Death Rates : Study Group vs Control Group
11 per 100,000 person-years versus 14 per 100,000 person-years
Person-Years Definition: The product of the number of years times the number of members of a
population who have been affected by a certain condition (years of treatment with a given drug).
Relationships Between Categorical Variables
Example : Relative Risk of Developing Breast Cancer
• Risk of breast cancer for women having first child at 25 or older = 31/1628 = 0.0190
• Risk of breast cancer for women having first child before 25 = 65/4540 = 0.0143
• Relative risk = 0.0190/0.0143 = 1.33
What doe this RR mean?
Increased Risk = (change in risk/baseline risk)×100%
• Baseline risk for those who had child before age 25 = 0.0143
• Risk for women having first child at 25 or older = 0.0190
• Change in risk = (0.0190 – 0.0143) = 0.0047
• Increased risk = (0.0047/0.0143) = 0.329 or 32.9% What doe this increased risk mean?
Relationships Between Categorical Variables
Odds Ratio
Odds Ratio: ratio of the odds of getting the disease to the odds of not getting the disease. If the
risk is small, about the same as the Relative Risk.
Example: Odds Ratio for Breast Cancer
• Odds for women having first child at age 25 or older = with breast cancer/without breast
cancer = 31/1597 = 0.0194
• Odds for women having first child before age 25 = 65/4475 = 0.0145
• Odds ratio = 0.0194/0.0145 = 1.34
Relationships Between Categorical Variables
STUDY: ASSOCIATION BETWEEN CELLULAR-TELEPHONE CALLS AND MOTOR VEHICLE COLLISIONS
Background
Because of a belief that the use of cellular telephones while driving may cause collisions, several
countries have restricted their use in motor vehicles, and others are considering such regulations.
Methods
The study was conducted in Toronto, an urban region with no regulations against using a cellular
telephone while driving.
•Persons who came to the North York Collision Reporting Centre between July 1, 1994, and August
31, 1995, during peak hours (10 a.m. to 6 p.m.) on Monday through Friday were included in the study
if they had been in a collision with substantial property damage (as judged by the police).
•We studied 699 drivers who had cellular telephones and who were involved in motor vehicle
collisions resulting in substantial property damage but no personal injury.
•Each person’s cellular-telephone calls on the day of the collision and during the previous week were
analyzed through the use of detailed billing records.
Relationships Between Categorical Variables
STUDY: ASSOCIATION BETWEEN CELLULAR-TELEPHONE CALLS AND MOTOR VEHICLE COLLISIONS
Time of the Motor Vehicle Collision
The time of each collision was estimated from the subject’s statement, police records, and telephone
listings of calls to emergency services.
Analytic Method
•We used case–crossover analysis, a technique for assessing the brief change in risk associated with a
transient exposure.
•According to this method, each person serves as his or her own control; confounding due to age,
sex, visual acuity, training, personality, driving record, and other fixed characteristics is thereby
eliminated.
•We used the pair-matched analytic approach to contrast a time period on the day of the collision
with a comparable period on a day preceding the collision.
•In this instance, case–crossover analysis would identify an increase in risk if there were more
telephone calls immediately before the collision than would be expected solely as a result of chance.
Relationships Between Categorical Variables
STUDY: ASSOCIATION BETWEEN CELLULAR-TELEPHONE CALLS AND MOTOR VEHICLE COLLISIONS
Results
•A total of 26,798 cellular-telephone calls were made during the 14-month study period.
•The risk of a collision when using a cellular telephone was four times higher than the risk when a
cellular telephone was not being used (relative risk, 4.3; 95 percent confidence interval, 3.0 to 6.5).
•The relative risk was similar for drivers who differed in personal characteristics such as age and
driving experience;
•calls close to the time of the collision were particularly hazardous (relative risk, 4.8 for calls placed
within 5 minutes of the collision, as compared with 1.3 for calls placed more than 15 minutes before
the collision; P = 0.001);
•Units that allowed the hands to be free (relative risk, 5.9) offered no safety advantage over handheld units (relative risk, 3.9).
Relationships Between Categorical Variables
STUDY: ASSOCIATION BETWEEN CELLULAR-TELEPHONE CALLS AND MOTOR VEHICLE COLLISIONS
Weaknesses of the Study
•studied only drivers who consented to participate.
•people vary in their driving behavior from day to day — a fact that makes the selection of a
control period problematic.
•case–crossover analysis does not eliminate all forms of confounding. Imbalances
in some temporary conditions related to the driver, the vehicle, or the environment are possible.
Relationships Between Categorical Variables
Misleading Statistics about Risk
Common ways the media misrepresent statistics about risk:
1. The baseline risk is missing.
2. The time period of the risk is not identified.
3. The reported risk is not necessarily your risk.
Missing Baseline Risk
“Evidence of new cancer-beer connection” Sacramento Bee, March 8, 1984
Reported men who drank 500 ounces or more of beer a month (about 16 ounces a day) were
three times more likely to develop cancer of the rectum than nondrinkers.
Relationships Between Categorical Variables
Risk over What Time Period?
“Italian scientists report that a diet rich in animal protein and fat—cheeseburgers, french
fries, and ice cream, for example—increases a woman’s risk of breast cancer threefold,”
Prevention Magazine’s Giant Book of Health Facts (1991, p. 122)
If 1 in 9 women get breast cancer, does it mean if a women eats above diet, chances of breast
cancer are 1 in 3?
What other information do we need to know?
Reported Risk versus Your Risk
“Older cars stolen more often than new ones” Davis (CA) Enterprise, 15 April 1994
Reported among the 20 most popular auto models stolen [in California] last year, 17 were at
least 10 years old.”
What factors determine which cars are stolen?
Relationships Between Categorical Variables
Simpson’s Paradox: The Missing Third Variable
• Can be dangerous to summarize information over groups.
Example: Simpson’s Paradox for Hospital Patients
Survival Rates for Standard and New Treatments
Risk Compared for Standard and New Treatments
Looks like new treatment is a
success at both hospitals,
especially at Hospital B.
Estimating the Overall Reduction in Risk
•More serious cases were treated at
Hospital A (famous research hospital);
•More serious cases were also more
likely to die, no matter what.
•higher proportion of patients at
Hospital A received the new treatment.