Stat 350 Lab Session GSI: Yizao Wang Section 016 Mon

Download Report

Transcript Stat 350 Lab Session GSI: Yizao Wang Section 016 Mon

Stat 350 Lab Session
GSI: Yizao Wang
Section 016 Mon 2pm30-4pm MH 444-D
Section 043 Wed 2pm30-4pm MH 444-B
Outline
•
•
•
•
•
Introduction
Syllabus
A brief review
Module1: Activity1,2
Module2: Activity2
Something about me
• My name: Yizao Wang
• My brief CV:
Originally from Beijing
Having been studying in Paris during the last three years
Now a first year graduate student in Department of Statistics
• I play Go when I have time…
(do you know where is the Umich Go club?)
Introduce yourself
•
•
•
•
What is your name?
Where are you from?
What is your major?
Which year are you in?
Syllabus
Any questions?
What is statistics…
Data
|
Analysis
|
Inference/conclusion
Let’s start with data
When we are collecting (sampling) data…
 How many types of variables are there?
 What are they?
Let’s start with data
When we are collecting (sampling) data…
 How many types of variables are there?
2
 What are they?
Categorical variables
Quantitative/numerical variables
Categorical
Consisting of groups
of names that do not
Raw data
necessarily have a
logical order
example
Graphical
summary
Numerical
summary
Gender, eye color
Quantitative
Consisting of numerical
values taken on each
individual.
Height, test score
Categorical
Consisting of groups
of names that do not
Raw data
necessarily have a
logical order
Quantitative
Consisting of numerical
values taken on each
individual.
example
Gender, eye color
Height, test score
Graphical
summary
Bar graph
Pie chart
Histogram
Boxplot
Numerical
summary
Categorical
Consisting of groups
of names that do not
Raw data
necessarily have a
logical order
Quantitative
Consisting of numerical
values taken on each
individual.
example
Gender, eye color
Height, test score
Graphical
summary
Bar graph
Pie chart
Histogram
Boxplot
Frequency table
5 number summary
(median, quartiles and
extremes)
Numerical
summary
Some big ideas
 Different types of data lead to different
statistical methods, numerical summaries
and plots.
 Histograms: the (shape of ) distribution of
a quantitative response
 Boxplots: picture of 5 number summary
most useful for comparing 2+ sets of data
Module 1: Activity 1
visualizing and exploring a data set
 Start up SPSS and open the employee data set
 What type of variable is gender?
 What type of graphs would be good to make for
this variable?
 What type is current salary?
 What type of graphs for it?
Module 1: Activity 1
visualizing and exploring a data set
 Start up SPSS and open the employee data set
 What type of variable is gender? Categorical
 What type of graphs would be good to make for
this variable? Bar graphs
 What type is current salary? Quantitative
 What type of graphs for it? Histogram
Module 1: Activity 1
visualizing and exploring a data set
 Let’s make a histogram of current salary
 Don’t forget the title!
 What shape do we see for the distribution of
salary?
 Change the color
Module 1: Activity 1
visualizing and exploring a data set
 Let’s make a histogram of current salary
 Don’t forget the title!
 What shape do we see for the distribution of
salary?
Skew to the right
 Change the color
Module 1: Activity 1
visualizing and exploring a data set
 Basic summary measures for current salary
 Get five number summary
 Save output
Module1: Activity 2
The Mean and the Median
 Open the applet
http://www.ruf.rice.edu/%7Elane/stat_si
m/descriptive/index.html
 Produce a positive skew and a negative
skew, and compare the relationship between
the mean and the median
 Try different shapes of distribution, compare
their standard deviations. Comment?
Toy question: with N=10, give the distribution
with largest/smallest standard deviation
Module1: Activity 2
The Mean and the Median
 In a symmetric distribution, the mean and the
median are equal.
 With positive skewed distributions, the mean is
generally larger than the median.
 With negative skewed distributions, the mean is
generally smaller than the median.
 In a skewed distribution, which is a good
measure the center of a distribution?
Module1: Activity 2
The Mean and the Median
 In a symmetric distribution, the mean and the
median are equal.
 With positive skewed distributions, the mean is
generally larger than the median.
 With negative skewed distributions, the mean is
generally smaller than the median.
 In a skewed distribution, which is a good
measure the center of a distribution? Median
Module1: Activity 2
The Mean and the Median
 Standard deviation:
On average, salaries are expected to fall
approximately ___$ from the mean salary of
___$.
On average, salaries vary by about ___$
from the mean salary of ___$.
Module2: Activity 1
How do genders compare on SSHA scores
 Background: Survey of Study Habits and
Attitudes of college freshmen. It is known that
scores on the SSHA may explain success in
college. Data of both females and males is
collected.
 Use side-by-side boxplots to examine (compare)
the distribution of the scores by gender.
Module2: Activity 1
How do genders compare on SSHA scores




Produce a side-by-side boxplot Add a title
Which gender had the lowest score?
Which had the highest score?
Which gender had the lowest median score?
 How to compare the variability?
 Can you tell the shape from boxplot?
Module2: Activity 1
How do genders compare on SSHA scores




Produce a side-by-side boxplot Add a title
Which gender had the lowest score? Male
Which had the highest score? Female
Which gender had the lowest median score?
Male
 How to compare the variability? IQRs
 Can you tell the shape from boxplot? No!
Module2: Activity 1
How do genders compare on SSHA scores
 Split file and make histograms (organize
output by groups)
 (Get descriptive summaries using
frequencies option)
Review of lab 1
 What does statistics do?
 Categorical variables and numerical
variables
 Using plots to visualize data
 Histogram to see the distribution
 Standard deviation and shape of distribution
 Boxplot with 5 number summary
 Are you able to do HW1 with SPSS?
Before we finish today…
 Comments on today’s lab?
 Qwizdom system
 Survey to complete