Transcript Document
Sociology 601(Martin)
Lecture for week 2: September 9 - 11
• Chapter 3.1:
– Making Charts
• Chapter 3.2 – 3.5 (if time permits)
– Measures of central tendency
– Measures of variation
• Walk-through of the STATA graphic user interface.
Definitions for charts
• frequency distribution: a graph listing intervals of possible
values for a variable (on the x-axis), and number of
observations in each interval (on the y-axis).
• relative frequency distribution: as above, but the y-axis has the
percent or proportion of observations in each interval.
• bar graph: the variable is ordinal or nominal scale.
– The bars should not touch
• histogram: the variable is interval scale.
– The bars should touch
General Rules for Relative Frequency
Distributions
• Whether you are making a bar graph or histogram:
– Make sure each observation is in one and only one
category.
– Use categories of equal width.
– Choose an appealing number of categories.
– Decide whether to provide labels
– Double-check your graph.
• If you use fewer bars to describe the distribution of a
variable, you lose information but gain clarity.
Example from Text, p. 36
• Murders per 100,000 population, by State for 1993
Alabama
11.6
Louisiana
20.3
Ohio
6.0
Alaska
9.0
Maine
1.6
Oklahoma
8.4
Arizona
8.6
Maryland
12.7
Oregon
4.6
Arkansas
10.2
Massachusetts
3.9
Pennsylvania
6.8
California
13.1
Michigan
9.8
Rhode Island
3.9
Colorado
5.8
Minnesota
3.4
South Carolina
10.3
Connecticut
6.3
Mississippi
13.5
South Dakota
3.4
Delaware
5.0
Missouri
11.3
Tennessee
10.2
Florida
8.9
Montana
3.0
Texas
11.9
Georgia
11.4
Nebraska
3.9
Utah
3.1
Hawaii
3.8
Nevada
10.4
Vermont
3.6
Idaho
3.5
New Hampshire
2.0
Virginia
8.3
Illinois
11.4
New Jersey
5.3
Washington
5.2
Indiana
7.5
New Mexico
8.0
West Virginia
6.9
Iowa
2.3
New York
13.3
Wisconsin
4.4
Kansas
6.4
North Carolina
11.3
Wyoming
3.4
Kentucky
6.6
North Dakota
1.7
Frequency Distribution
• Murders per 100,000 population for 1993, by State
number of states
3
2
1
0
0
2
4
6
8
10
12
14
16
murder rate
• What have we lost? What have we gained?
18
20
Relative Frequency Distribution
• Murders per 100,000 population, by State
relative frequency
0.06
0.04
0.02
0
0
2
4
6
8
10
12
murder rate
14
16
18
20
Collapsed Relative Frequency Distribution
• Murders per 100,000 population, by State
relative frequency
0.3
0.2
0.1
0
0-1.9 2-3.9 4-5.9 6-7.9 8-9.9
1011.9
121413.9 15.9
1617.9
murder rate
• What have we lost? What have we gained?
1819.9
2021.9
3.2: Measuring central tendency - mean
• Mean: sum of measurements divided by number of
measurements.
n
1 Yi
• Equation for the mean of a sample: Y
n
• or, if you don’t have an equation editor,
Ybar = SUM(Yi) / n
where…
Ybar
is the sample mean
(Yi)
is a measurement of Y for case i
n
is the number of cases in the sample
Weighted means
• Weighted sample mean: the sum of measurements
divided by the number of observations, adjusted for
the number of cases in each observation
Yweighted (n jYj ) n j
– Example: we could weight the state murder rates by the
number of persons in each state in 1993 to get the mean
murder rate for persons in the US
• If n = 2 the equation for the weighted mean is
Yweighted (n1Y1 n2Y2 ) (n1 n2 )
3.3 Other measures of central tendency
• Median: the measurement that falls in the middle of an
ordered sample
– the median is the value of the 50th percentile
• Percentile: the number such that p% of scores fall
below it and (100-p)% of scores fall above it
• Mode: the value that occurs most frequently
3.4: Measures of variation
• range: the difference between the largest and smallest
observations
• interquartile range: the difference between the 25th
and 75th percentile observation
• deviation: for any observation, the difference between
that observation and the sample mean
Di = Yi - Ybar
(one averaged measure of variation for a sample would be to
take the mean of the absolute values of all the deviations for
the sample)
Variance and standard deviation:
the most common measures of variation
Yi Y
s n 1
2
2
Yi Y
s
n 1
2
• variance: the mean of the squared deviations for a
sample, labeled s2.
• standard deviation: the square root of the variance, or
the root mean squared deviation, labeled s.
Practice: Calculate the mean, variance, and standard
deviation.
yi
1
2
3
3
4
4
7
8
Σyi
ybar:
ybar yi - ybar (yi – ybar)2
Σ(yi – ybar)2
s2:
s:
yi
1
2
3
3
4
4
7
48
Σyi
ybar:
ybar
yi - ybar
(yi – ybar)2
Σ(yi – ybar)2
s2:
s:
Interpreting the standard deviation.
• s is (formally) the root mean squared deviation.
• s is one version of the typical distance of an
observation from the sample mean.
• Because s accounts for squared deviations, it is
affected by extreme scores.
– Is this a desirable property?
– Compare these samples: (-3,-3,+3,+3) vs (-2,-2,-2,+6)
• Generally, for a continuous quantitative variable Y
about 68% of scores fall between Ybar - s and Ybar + s.
Interpreting sample statistics.
• Recall that…
– A statistic is a single number estimated from a sample
– A parameter is a single number that summarizes some
quality of a variable in a population.
• For means:
– the population mean is (mu)
– The sample mean Ybar is an estimator of .
• For standard deviations
– the population standard deviation is (sigma),
– The sample standard deviation s is an estimator of .
A conceptual map of STATA
source
---------interface----------
output
.do file
outside data
set
interactive
data entry
command
window
data editor
pull-down
menus
icons
log file
results
window
graphics
active data
set
The STATA windows environment - icons
– Open (use)
– Save
– Print Results
– Begin Log
– Start viewer
– Bring results window to front
– Bring graph window to front
– Do-file editor
– Data editor
– Data browser
– Clear
– Break
The .do file:
interface of choice for social research
• Icons within the .do file:
–
–
–
–
–
–
–
–
–
–
–
New
Open
Save
Print
Find
Cut
Copy
Paste
Undo
Do current file
Run current file
Sample commands in a .do file
use "I:\601Fall08\socy601data.dta", clear
summarize AGE
summarize AGE [weight=ADULTS]
tabulate AGE
tabulate AGE [weight=ADULTS]
clear
How to create a log file
• One approach is to use the log icon to start and
stop a log.
• Another approach is to type the log-starting
command into a .do file :
log using I:\601Fall08\week01hmwk.txt, replace
*. . . (your work here) . . .
log close