What Exactly is Statistics?

Download Report

Transcript What Exactly is Statistics?

Brought to you by
Tutorial Support Services
The Math Center



Statistics is the study of how to collect,
organize, analyze, and interpret numerical
information.
Descriptive statistics generally characterizes or
describes a set of data elements by graphically
displaying the information or describing its
central tendencies and how it is distributed.
Inferential statistics tries to infer information
about a population by using information
gathered by sampling.





Population: The complete set of data elements
where N refers to the Population Size.
Sample: A portion of a population selected for
further analysis.
Midrange: The arithmetic mean of the highest
and lowest data elements.
Parameter: A characteristic of the whole
population.
Statistic: A characteristic of a sample,
presumably measurable.

The Arithmetic Mean is obtained by summing
all elements of the data set and dividing by the
number of elements:
x

Mean  x 
n


The Sample Size is the number of elements in
a sample. It is referred to by the symbol n,
whereas x refers to each element in the data set.
The Mode is the data element which occurs
most frequently.

The Median is the middle element when the data
set is arranged in order of magnitude.
1.
2.

When n is odd, simply take the middle value of the data
set.
When n is even, take the sum of the two middle values,
leaving the same amount of even numbers before these
two values and the same amount after them, and divide by
2.
The Midrange is the arithmetic mean of the
highest and lowest data element:
xmax   xmin 
Midrange 
2





Example: A sample of size 9 (n=9) is taken of
student quiz scores with the following results:
5, 6, 7, 7, 8, 8, 8, 9.5, 10 5  6  7  7  8  8  8  9.5  10
 7.61
Answer: The mean is :
9
The median is: 8 (since this is the middle element)
The Mode is 8, since it is the data value which
appears in the distribution the most frequently
The Midrange is:
10  5 15
  7.5
2
2


Range is the difference between the highest and lowest data element.
The Standard deviation is another way to calculate dispersion. This is
the most common and useful measure because it is the average distance
of each score from the mean. The formula for sample standard deviation
is as follows:
2
s 

 x  x 
n 1
The Population Standard Deviation is as follows:
 x  x 
2
 


N
Notice the difference between the sample and population standard
deviations. The sample standard deviation uses (n-1) in the denominator,
hence is slightly larger than the population standard deviation which
uses N (which is often written as n).
Variance is the third method of measuring dispersion:


x  x
x  x



; 
2
s
2
2
2
n 1
N
First, we want to calculate the mean and sample
standard deviation of the following
distribution: 1, 2, 3, 4, 5. We calculate our
mean, and it is: Mean  1  2  3  4  5  15  3
5
5
Now we construct a table in which to keep track
of our data:
x
x  x 
x  x 
1
-2
4
2
-1
1
3
0
0
4
1
1
5
2
4
2



We now want to find the sum of x  x 2:
4+1+0+1+4=10.
The total number of values is N=5. To find N1, subtract 1 from 5 to get 4.
Now we find the sample standard
deviation:
x  x 
2
s
10

 1.58113
n 1
4

Using the formula for the population standard
deviation gives us the following:
 x  x 
2


N
10

 1.414
5
The variance of our distribution 1, 2, 3, 4, 5 is:
Variance  s  1.58113
2
2

Squaring σ gives us:
  1.1414  2
2
2

Descriptive Statistics Handout