Stat 350 Lab Session GSI: Yizao Wang Section 016 Mon
Download
Report
Transcript Stat 350 Lab Session GSI: Yizao Wang Section 016 Mon
Stat 350 Lab Session
GSI: Yizao Wang
Section 016 Mon 2pm30-4pm MH 444-D
Section 043 Wed 2pm30-4pm MH 444-B
Outline
•
•
•
•
•
Introduction
Syllabus
A brief review
Module1: Activity1,2
Module2: Activity2
Something about me
• My name: Yizao Wang
• My brief CV:
Originally from Beijing
Having been studying in Paris during the last three years
Now a first year graduate student in Department of Statistics
• I play Go when I have time…
(do you know where is the Umich Go club?)
Introduce yourself
•
•
•
•
What is your name?
Where are you from?
What is your major?
Which year are you in?
Syllabus
Any questions?
What is statistics…
Data
|
Analysis
|
Inference/conclusion
Let’s start with data
When we are collecting (sampling) data…
How many types of variables are there?
What are they?
Let’s start with data
When we are collecting (sampling) data…
How many types of variables are there?
2
What are they?
Categorical variables
Quantitative/numerical variables
Categorical
Consisting of groups
of names that do not
Raw data
necessarily have a
logical order
example
Graphical
summary
Numerical
summary
Gender, eye color
Quantitative
Consisting of numerical
values taken on each
individual.
Height, test score
Categorical
Consisting of groups
of names that do not
Raw data
necessarily have a
logical order
Quantitative
Consisting of numerical
values taken on each
individual.
example
Gender, eye color
Height, test score
Graphical
summary
Bar graph
Pie chart
Histogram
Boxplot
Numerical
summary
Categorical
Consisting of groups
of names that do not
Raw data
necessarily have a
logical order
Quantitative
Consisting of numerical
values taken on each
individual.
example
Gender, eye color
Height, test score
Graphical
summary
Bar graph
Pie chart
Histogram
Boxplot
Frequency table
5 number summary
(median, quartiles and
extremes)
Numerical
summary
Some big ideas
Different types of data lead to different
statistical methods, numerical summaries
and plots.
Histograms: the (shape of ) distribution of
a quantitative response
Boxplots: picture of 5 number summary
most useful for comparing 2+ sets of data
Module 1: Activity 1
visualizing and exploring a data set
Start up SPSS and open the employee data set
What type of variable is gender?
What type of graphs would be good to make for
this variable?
What type is current salary?
What type of graphs for it?
Module 1: Activity 1
visualizing and exploring a data set
Start up SPSS and open the employee data set
What type of variable is gender? Categorical
What type of graphs would be good to make for
this variable? Bar graphs
What type is current salary? Quantitative
What type of graphs for it? Histogram
Module 1: Activity 1
visualizing and exploring a data set
Let’s make a histogram of current salary
Don’t forget the title!
What shape do we see for the distribution of
salary?
Change the color
Module 1: Activity 1
visualizing and exploring a data set
Let’s make a histogram of current salary
Don’t forget the title!
What shape do we see for the distribution of
salary?
Skew to the right
Change the color
Module 1: Activity 1
visualizing and exploring a data set
Basic summary measures for current salary
Get five number summary
Save output
Module1: Activity 2
The Mean and the Median
Open the applet
http://www.ruf.rice.edu/%7Elane/stat_si
m/descriptive/index.html
Produce a positive skew and a negative
skew, and compare the relationship between
the mean and the median
Try different shapes of distribution, compare
their standard deviations. Comment?
Toy question: with N=10, give the distribution
with largest/smallest standard deviation
Module1: Activity 2
The Mean and the Median
In a symmetric distribution, the mean and the
median are equal.
With positive skewed distributions, the mean is
generally larger than the median.
With negative skewed distributions, the mean is
generally smaller than the median.
In a skewed distribution, which is a good
measure the center of a distribution?
Module1: Activity 2
The Mean and the Median
In a symmetric distribution, the mean and the
median are equal.
With positive skewed distributions, the mean is
generally larger than the median.
With negative skewed distributions, the mean is
generally smaller than the median.
In a skewed distribution, which is a good
measure the center of a distribution? Median
Module1: Activity 2
The Mean and the Median
Standard deviation:
On average, salaries are expected to fall
approximately ___$ from the mean salary of
___$.
On average, salaries vary by about ___$
from the mean salary of ___$.
Module2: Activity 1
How do genders compare on SSHA scores
Background: Survey of Study Habits and
Attitudes of college freshmen. It is known that
scores on the SSHA may explain success in
college. Data of both females and males is
collected.
Use side-by-side boxplots to examine (compare)
the distribution of the scores by gender.
Module2: Activity 1
How do genders compare on SSHA scores
Produce a side-by-side boxplot Add a title
Which gender had the lowest score?
Which had the highest score?
Which gender had the lowest median score?
How to compare the variability?
Can you tell the shape from boxplot?
Module2: Activity 1
How do genders compare on SSHA scores
Produce a side-by-side boxplot Add a title
Which gender had the lowest score? Male
Which had the highest score? Female
Which gender had the lowest median score?
Male
How to compare the variability? IQRs
Can you tell the shape from boxplot? No!
Module2: Activity 1
How do genders compare on SSHA scores
Split file and make histograms (organize
output by groups)
(Get descriptive summaries using
frequencies option)
Review of lab 1
What does statistics do?
Categorical variables and numerical
variables
Using plots to visualize data
Histogram to see the distribution
Standard deviation and shape of distribution
Boxplot with 5 number summary
Are you able to do HW1 with SPSS?
Before we finish today…
Comments on today’s lab?
Qwizdom system
Survey to complete