Introduction to Statistics Chapter 2
Download
Report
Transcript Introduction to Statistics Chapter 2
Statistical Analysis - Chapter 2
“Organizing and Analyzing Data”
Fashion Institute of Technology
Dr. Roderick Graham
Showing Data Graphically
When we collect a sample, we initially want to get a
picture of how the data “looks”.
We can show our “stakeholders” easily what the patterns
in the data are
What do we mean by “stakeholders”?
Three of the ways to show data are Histograms,
Frequency Polygons, and Circle Graphs
Showing Data Graphically
Look at the listing of numbers on p.17
This is called “ungrouped” data
Sometimes it is better to “group” data into
categories…this makes it easier to represent data
graphically (p.18)
Histograms
Let’s look at the move from “ungrouped data” to the
construction of a histogram in your textbook…(pp. 17 –
18)
1. Start with a survey of numbers…or “ungrouped data”
2. Decide on the categories you want to use and “group”
the numbers into the categories that fit it
3. Now the data has been changed from a series of ages,
to GROUPS of ages
4. We can compute statistics for both grouped and
ungrouped data
Histograms
Let’s figure out this
Histogram (taken from actual
data I am using)…
1 = 18 – 24
2 = 25 – 34
3 = 35 – 44
4 = 45 – 54
5 = 55 – 64
6 – 65+
How many people are
between ages 45 and 54?
Frequency Polygon (Line Graph)
This is a line graph representing
the shape of a histogram
Usually when you have “too
many bars” (categories) you
may want to use line graph
This can be used to show
trends easier than a histogram.
Circle Graph
These graphs are used to show what percentage (proportion) of a sample is doing
what.
Your textbook goes into some detail about how to create circle graphs with a
protractor…lucky for us we have Excel!
Below is an example from the CDC showing the percentages of how people have
become infected with HIV…
Key Points
It is up to you (researcher) to decide what graph is most
important for presenting your data. For me…
1. If am showing a small amount of categories, I use a
histogram
2. If I am showing trends through time, or a large number
of categories, I use a line graph
3. If I want to show percentages, I use a circle graph (this
always the best way to show percentages)
Our first “statistics”
Remember that statistics are values that we compute
from our sample of data that we have collected. We will
learn two basic and important types of statistics:
Measures of Central Tendency – What are the middle
values for our data?
Measures of Dispersion or Spread – How much diverse is
our data…or how widely scattered is our data?
You can compute these statistics for both grouped and
ungrouped data
Measures of Central Tendency
(ungrouped)
What if we had collected data about one measure, and
we wanted to know what the middle value was for this
measure?
Ex. What is the middle value, in age, for those who listen to
Lady Gaga?
Ex. How many times do young Hispanic women report
shopping at H&M?
Knowing this middle, or central, value is important for
describing our data.
There are three measures of central tendency…
Measures of Central Tendency
(ungrouped)
Mean (p.24)
Median (p.26)
This is the middle value of a set of data that has been arranged
from lowest to highest
Mode (p. 27)
This is the mathematical average of a set of numbers
The value that occurs the most in a set of data
We can use income as a good way of discussing these
three measures. Imagine that we wanted to know the
average incomes for FIT students. Imagine that we took a
random sample of incomes for FIT students. …
Measures of Central Tendency
(ungrouped)
The sample gives these
values:
5000, 6000, 30000, 110000,
15000, 6000, 17000, 13000,
12000, 11000, 8000, 6000,
15000, 6000, 11500
The Mean
This is the average….
Sum of values = 271500
Total N = 15
Mean = 18100
Measures of Central Tendency
(ungrouped)
The sample gives these
values:
5000, 6000, 30000, 110000,
15000, 6000, 17000, 13000,
12000, 11000, 8000, 6000,
15000, 6000, 11500
The Median
This is the middle values:
5000, 6000, 6000, 6000,
6000, 8000, 11000, 11500,
12000, 13000, 15000, 15000,
17000, 30000, 110000
The median here is 11500
In cases where there are
two middle values, we
average the two.
Measures of Central Tendency
(ungrouped)
The sample gives these
values:
5000, 6000, 30000, 110000,
15000, 6000, 17000, 13000,
12000, 11000, 8000, 6000,
15000, 6000, 11500
The Mode
This is the most numerous
value:
5000, 6000, 6000, 6000,
6000, 8000, 11000, 11500,
12000, 13000, 15000, 15000,
17000, 30000, 110000
The Mode here is 6000.
Sometimes there is no
mode…or even two
modes!
Measures of Central Tendency
(ungrouped)
So given these values…
5000, 6000, 6000, 6000,
6000, 8000, 11000, 11500,
12000, 13000, 15000,
15000, 17000, 30000,
110000
…what is the best
measure of central
tendency for this random
sample of FIT students?
Mean?...18100
Median?...11500
Mode?...6000
Measures of Dispersion or Spread
(ungrouped)
Range (p.29)
The highest value minus the lowest value….
From our last example, the range would be: 115000 – 5000 =
110000
Standard Deviation (p.29 – 35)
This is the average distance your values have from the mean
score.
Best shown through example…
Measures of Dispersion or Spread
(ungrouped)
Standard Deviation
Let’s return to our FIT
random sample…
1.
5000, 6000, 6000, 6000, 6000,
8000, 11000, 11500, 12000,
13000, 15000, 15000, 17000,
30000, 110000
3.
Follow the steps on the right
while we(I) calculate the
standard deviation as a class
on the board
2.
4.
5.
Calculate the
mean…which is 18100
Find the distance that each
value has from the mean
Square the distance
Add up these distances
and divide by the sample
size – 1 (at this point, this
number is called the
variance).
Then we get the square
root of this number
Standard Deviation
X
Mean (x-bar)
X – x-bar
(X – x-bar)2
5000
18100
-13100
17161 + E4
6000
18100
-12100
14641 + E4
6000
18100
-12100
14641 + E4
6000
18100
-12100
14641 + E4
6000
18100
-12100
14641 + E4
8000
18100
-10100
10201 + E4
11000
18100
-7100
5041 + E4
11500
18100
-6600
4356 + E4
12000
18100
-6100
3721 + E4
13000
18100
-5100
2601 + E4
15000
18100
-3100
961 + E4
15000
18100
-3100
961 + E4
17000
18100
-1100
121 + E4
30000
18100
11900
14161 + E4
110000
18100
91900
844561 + E4
Standard Deviation
We sum (x – x-bar)2, and get the square root of this sum.
This is the standard deviation. What is the square root of
the sum?
Appx. 26,219
Right now, this number means very little…but in the
following chapters we will gain a better understanding of
the standard deviation
Measures of Central Tendency and Dispersion
(Grouped Data)
Remember that grouped data is a collection of data that
has been placed into categories…
Thus we need to calculate the mean and standard
deviation differently, but the idea is the same.
P. 36 – 39 show the formulas for these measures.
Calculating the Mean for Grouped Data
Let’s say we conducted a random sample of FIT students, and
asked them their GPA. We decided to group GPA into
categories. Here is the data below:
GPA Category
Number of
Students
3.5 – 4.0
15
3.0 – 3.49
25
2.0 – 2.9
50
Below 2.0
11
So…what is the mean? Look at pages 36 – 38 and I will wait
for someone to tell me how to go about answering this
question?
Calculating the Mean for Grouped Data
X = the average of the categories
f = number of students
So can someone answer this question on the board (with
help from classmates)?
GPA Category
Number of
Students
3.5 – 4.0
15
3.0 – 3.49
25
2.0 – 2.9
50
Below 2.0
11
GPA
Category
X
Number
of
Students
(f)
3.5 – 4.0
3.75
15
3.0 – 3.49
3.245
25
2.0 – 2.9
2.45
50
Below 2.0
(0 – 1.9)
.95
11
Calculating the Standard Deviation of
Grouped Data
Now let’s calculate the standard deviation for this same
set of data…
GPA Category
Number of
Students
3.5 – 4.0
15
3.0 – 3.49
25
2.0 – 2.9
50
Below 2.0
11
Who can do this one on the board?
Writing Research Reports (pp. 48 – 50)
Background Statement (5 pts)
I will give you data…use your imagination
Why was the study performed (why was the data collected)?
Design and Procedures of the Study (10 pts)
How did you conduct the study
How was the study internally valid/externally valid
These two sections are not the most important…simply use
your imagination to complete these two sections
Writing Research Reports (pp. 48 – 50)
Results (55 pts.)
Analysis and Discussion (10 pts.)
The most important section.
For this first report, this is where you present your data
graphically, show measures of dispersion, and central tendency
What is interesting to you about the results?
Conclusions and Recommendations (20 pts.)
(this section you will not do for your report…this is where
you present your results and analysis to the class. The class
can ask you questions, so be on point!)
END