2-Excel statistics
Download
Report
Transcript 2-Excel statistics
Basic Statistics
with Microsoft Excel
Helen Dixon
Aim and Objectives
Aim of today’s course
To
illustrate how Excel can be used to carry
out some basic statistical analyses and tests
Objectives
To
show you how to use some of the
statistical worksheet functions available within
Excel
To show you how to use some of the tools
available in the Analysis ToolPak
To make you aware of the limitations of Excel
Why use Excel?
Software more accessible
Previous familiarity with software
Easy to format output
Better charting facilities than some
statistical applications
Access to other key Excel facilities
Easy to use results with other applications
Problems with Excel
Errors due to rounding, missing data or extreme
values
Not suitable for very large data sets
Output labelled or arranged inappropriately
Need to repeat processes for different variables
or options
No record of analyses
Some algorithms are numerically unstable - little
or no information about algorithms employed
Analysis ToolPak results are not dynamic and
may vary with results generated by functions
Statistical Functions
Frequency Distributions
Mean, Median and Mode
Percentiles and Quartiles
Deviation and Squared Deviation about
the Mean
Variance and Standard Deviation
Covariance and the Correlation Coefficient
Frequency
Use COUNTIF to count how many times
an item appears in a list
=COUNTIF(range,
criteria)
Use FREQUENCY to calculate how often
values occur within a range
=FREQUENCY(data_array,
bins_array)
Can also use Histogram tool in Analysis
Toolpak
Mean, Median, Mode
Use AVERAGE or AVERAGEA to calculate
the arithmetic mean
=AVERAGE(number1,
Use MEDIAN to return the middle number
=MEDIAN(number1,
number2, etc.)
number2, etc)
Use MODE to return the most common
value
=MODE(number1,
number2, etc)
Percentiles and Quartiles
Use PERCENTILE to return the kth percentile of
a data set
=PERCENTILE(array,
percentile)
Percentile argument is a value between 0 and 1
Use QUARTILE to return the given quartile of a
data set
=QUARTILE(array,
quart)
Quart
is 1, 2, 3 or 4
IQR = Q3-Q1
May return different values to statistical package
Variance and Standard Deviation
Use VAR, VARA, VARP or VARPA to
calculate the variance for a range
E.g.
=VAR(value1, value 2, etc.)
Squared deviations about the mean/N or /n-1
Use STDEV, STDEVA, STDEVP or
STDEVPA to calculate the standard
deviation for a range
=E.g.
=STDEV(value1, value2, etc.)
Positive square root of variance
Covariance and
the Correlation Coefficient
Use COVAR to calculate the covariance
=COVAR(array1,
array2)
Average of products of deviations for each
data point pair
Depends on units of measurement
Use CORREL to return the correlation
coefficient
=CORREL(array1,
array2)
Returns value between -1 and +1
Also available in Analysis ToolPak
Probability
Numerical measure of the likelihood that
an event will occur
Some probabilities that can be calculated
using Excel:
Binomial
Probabilities
Poisson Probabilities
Hypergeometric Probabilities
Normal Probabilities
Exponential Probabilities
Binomial Probabilities
Use BINOMDIST to compute binomial
distribution probabilities and cumulative
binomial probabilities
=BINOMDIST(number_s,
trials, probability_s,
cumulative)
Calculates the probability that a sequence of
independent trials with two possible outcomes
will have a given number of successes
Cumulative is either TRUE or FALSE
Poisson Probabilities
Use POISSON to compute Poisson
Probabilities
=POISSON(x,
mean, cumulative)
Shows the probability of x occurrences of an
event over a specified interval of time or
space
Hypergeometric Probabilities
Use HYPGEOMDIST to compute
hypergeometric probabilities
=HYPGEOMDIST(sample_s,
number_sample, population_s, number_pop)
Computes the probability of x successes
(sample_s) in n trials (number_sample) when
the trials are dependent
Similar to Binomial except trials are not
independent – probability of success changes
from trial to trial
Does not compute cumulative probabilities
Normal Probabilities
Use NORMSDIST or NORMDIST to
compute the cumulative probability
=NORMSDIST(z)
=NORMDIST(x,
mean, standard_dev,
cumulative)
Use NORMSINV or NORMINV to compute
the z or x value given a cumulative
probability
=NORMSINV(probability)
=NORMINV(probability,
mean, standard_dev)
Exponential Probabilities
Use EXPONDIST to compute exponential
probabilities
=EXPONDIST(x,
lambda, cumulative)
x is the random variable
Lambda is 1/mean
Useful in computing probabilities for the time it
takes to complete a task
Analysis ToolPak
Descriptive Statistics
Correlation
Linear Regression
t-Tests
z-Tests
ANOVA
Covariance
Descriptive Statistics
Mean, Median, Mode
Standard Error
Standard Deviation
Sample Variance
Kurtosis
Skewness
Confidence Level for
Mean
Range
Minimum
Maximum
Sum
Count
kth Largest
kth Smallest
Correlation and Regression
Correlation is a measure of the strength of linear
association between two variables
Values between -1 and +1
Values close to -1 indicate strong negative relationship
Values close to +1 indicate strong positive relationship
Values close to 0 indicate weak relationship
Linear Regression is the process of finding a line of best
fit through a series of data points
Can also use the SLOPE, INTERCEPT, CORREL and RSQ
functions
t-Tests and z-Tests
Used to test hypotheses by comparing
means
If sample means are equal suggests both
samples came from same population
t-Test – n <30
Equal
or unequal variances or paired test
Check result using TTEST function
z-Test – n>30
Used
for means with known variances
ANOVA: Analysis of Variances
Compares variances in two or more data sets
If difference is found it can be assumed that the
means of the data sets are different
Single Factor – use instead of t-Test for more
than 2 samples
Two Factor with Replication – useful when data
can be classified along 2 different dimensions
Two Factor without Replication – as above but
only one observation for each pair
PivotTables
Use for crosstabulations
Data must be in tabular format: columns
with headings, no blank columns
Easy to pivot data
Easy to create PivotCharts
Can summarise and analyse data without
affecting data source
Final Tips
Excel only suitable for basic analysis using small
data sets
Later versions of Excel more reliable than Excel
97
Check Analysis TookPak results with worksheet
functions
Check overall results by hand or with dedicated
statistical package