A Primer of Statisticsx

Download Report

Transcript A Primer of Statisticsx

Statistics
What are Statistics
• Method of interpreting raw data
relative to a hypothesis
– And presenting/organizing data for
interpretation
• Should be used (taken in to account)
to design experiments
– And collect data
• Standardized techniques
used by scientists
2
What are Statistics
• Vocabulary & symbols for
communicating about data
• A set of tools
– How do you know which tool
to use?
– What do you want to know?
– What type of data do you
have?
3
History of Statistics in Science
• Probability theory
• Used to determine odds
in better
– Game Theory
• Insurance premiums on
nautical vessels
– Based on intuitive risk
1st British ed. of Snake and Ladders
from 1800. Game originated in India
in 200 BCE
4
What are Statistics?
Two main branches:
• Descriptive statistics
–Tools for summarising,
organising, simplifying
data
• Inferential statistics
–Data from sample used to draw
inferences about population
Statistical terms
• Population
– Complete set of
individuals, objects or
measurements
• Sample
• A parameter is a
characteristic of a
population
– e.g., the average height of
all Britons.
– a sub-set of a population
• Variable
– a characteristic which may
take on different values
• Data
– numbers or measurements
collected
• A statistic is a
characteristic of a sample
– e.g., the average height of
a sample of Britons.
Sample vs. Population
Population
Sample
The Goal: To select a subset of the population that
representative of the variation in the population.
7
Variables
• Dependent (Response)
– Variable of primary interest
– Not controlled by the experimenter
– e.g. blood pressure in an antihypertensive drug
trial.
• Independent (Predictor)
– called a Factor
– Controlled by experimenter.
– e.g. administration of antihypertensive drug or
plecebo.
8
Types of Variables/Data
• Qualitative
– Descriptive
• Quantitative
1.
2.
3.
4.
Nominal
Ordinal
Interval
Ratio
9
1. Nominal
• Categorical scale
• Uses numbers, names or symbols to classify
objects
• Example:
Human Eye Color
10
1. Nominal
11
2. Ordinal
• Ranking scale
• Objects are placed in order
• Divisions or gaps between objects may no be
equal
• Example: Scottville Heat Index
12
2. Ordinal
13
3. Interval
• Equality of length between objects
• But no true zero
• Example: Temperature scales
– Celsius: 0 and 100 are arbitrarily placed at the
melting and boiling points of water.
14
3. Interval
15
3. Interval
• Able to quantify difference between
two interval scale values
– But there is no natural zero.
• 25oC is warmer than 20oC
– A 5oC difference
– Has a physical meaning.
• However, it does not make sense to
say that 20oC is twice as hot as 10oC.
– Because 0oC is arbitrary
16
4. Ratio
• An interval scale with a true zero
• Ratio of any two scale points are independent
of the units of measurement
• Example: Length
17
4. Ratio
• It is meaningful to say that 10 m is twice as
long as 5 m.
• This ratio hold true regardless of which scale
the object is being measured in (e.g. meters or
yards).
• This is because there is a natural zero.
18
Data
• Quantitative data can be further divided into two
groups: discrete and continuous.
• Discrete – If the set of
possible values,
– When pictured on the
number line, consists only
of isolated points.
• Continuous – If the set of
all values
– When pictured on the
number line, consists of
intervals.
19
Discrete and Continuous
Continuous
Discrete
20
Why do we use statistics?
1. To determine an effect of
modified/experimental variables
1. Using a sample of a population, allows us to
draw descriptive conclusions
– And make inferential statements
1. Provide probability of confidence that our
conclusions are based on actual differences
21
Why do we use statistics?
• Samples are a representation of the
population, they should give us an idea of the
variation that exists, within a population.
• Statistics demonstrate the confidence with
which similarities or differences exist, based
on variables; correlation
– Or the probability that differences we see are not
due to chance.
22
Correlation & Causation
23
Types of Questions
The most common questions we can ask using statistics:
1. How do our results compare to expectations?
2. What is the relationship between independent and
dependent variables?
1. Are groups different (dependent) between
Independent variables?
1. Are groups different (dependent) due to the
interaction of two or more independent variables?
24
Intermission
25
Previously in Animal Behavior….
Types of Questions
The most common questions we can ask using statistics:
1. How do our results compare to expectations?
–
Chi Squared (χ2)
2. What is the relationship between independent and
dependent variables?
–
Correlation (r)
3. Are groups different (dependent) between
Independent variables?
– T-test (t) or one-way ANOVA (f)
4. Are groups different (dependent) due to the
interaction of two or more independent variables?
–
Two-way ANOVA (f)
26
Types of Questions
How can we describe data, and the relationship
between variables?
• Descriptive statistics
– Means, Standard Deviation, Standard Error
• Median, Mode and Range
• Inferential Statistics
– Chi Sq., t-Test, Regression, ANOVA, etc.
• Graphical representation of the data
– Histograms, and line graphs
27
Descriptive Statistics: Mean
Arithmetic average:

x
Sample: X 
n
x
Population:  
N
X = [1,2, 3,4, 5, 6, 7, 8, 9,10]
 X / n  5.5
Can be affected by extreme values
28
Frequency Distribution
29
Variance
• The arithmetic mean of the squared
deviations from the sample mean:
n
s 
2
 xi  x 
i 1
2
n 1
30
Standard Deviation
• The sample standard deviation, s, is the
square-root of the variance
• s has the advantage of being in the same units
as the original variable x
n
s
 xi  x 
i 1
2
n 1
31
Standard Deviation
• Simply put, Standard Deviation tells us how
much variation we expect in the population
– Based on our sample.
32
Inferential Statistics - Summary
• Using a mathematical test of probability to
determine if there is a difference between
groups
– Using a sample representative of the population
– Measuring Dependent variable response to
independent variable.
• All a inferential statistical test tells us is if that
is a reliable difference between groups.
– Probability that difference is not due to change,
but rather an effect of the independent variable.
33
Alpha
• Inferential statistics are only relevant in context.
• Put them into context by identifying differences
– Comparing group means and standard deviations
– To see how similar/different results are, based on
dependent variables (treatments)
• We use α to determine if our hypothesis is
supported.
34
Alpha
• α is a probability values set to determine if the
relationship between variables is due to
treatment, or random chance
– Type II error
– P-value
• We typically set α to 0.05, or 5%, which means
are 95% confident that differences due to
independent variables are not due to random
chance.
35
Degrees of Freedom
• Number of values in the calculation that are
allowed to vary
• Define the number of test assumptions you
are able to make with a statistical test
– Before the chance of type II error exceeds your
predicted probability.
d.f. = n-1
36
Linear Correlations
• What is the relationship of independent and
dependent variables
– Correlation
• Positive
• Negative
Strong Positive
Correlation
R= +1
Moderate Positive
Correlation
R= +0.75
No Correlation
Strong Negative
Correlation
R= -1
Moderate Negative
Correlation
R= -0.75
Curvilinear
Relationship
R= 0.75
R= 0
• Curvilinear
• Reported as
β or r2
37
Linear Correlations
38
Chi Squared
• Does the measured values match
expectations?
– χ2
• Test of
Normality
– Compares
actual
results to
predicted
39
Chi Squared
40
T-Test
• Are two groups different (response/dependent
variable) due to a single independent variable?
– t-test
• Test of Normality
between two
groups
– Instead of
compared to
predicted
41
T-Test
42
ANOVA
• Are groups different (response/dependent
variable) due to several factors of a single
independent variables?
– ANalysis Of Variance (ANOVA)
• Test of variance
– Effect of
treatment on the
distribution of a
group, relative to
others.
43
Two-Way ANOVA(s)
• Are groups different (response/dependent
variable) due to multiple independent
variables?
• Effect of different
treatments on the
distribution of
groups
– Individual
variables
– Interaction
44
Two-way ANOVA Design
Variable B: Sugar
Maltose Glucose Sucrose
Variable
A: Yeast
Bakers
35 ml
342 ml
100 ml
Brewers
551 ml
120 ml
73 ml
45
Ammount of CO2 after
24 hours (ml)
Ammount of CO2 after
24 hours (ml)
300
250
200
150
100
50
0
Bakers
400
300
200
100
Brewers
Maltose Glucose Sucrose
Sugar
Yeast
Ammount of CO2 after
24 hours (ml)
0
600
500
400
300
200
100
0
Bakers
Brewers
Maltose
Glucose
Sugar
Sucrose
46
Reporting Results
• Must provide results of statistical calculations
– Test result (value; r, t, f, etc.)
– Degrees of Freedom (df)
– Probability value (p)
• This allows readers to see that any results you
report are supported by the statistics.
– Is there a reliable difference due to variables
– Statistically significance
47
Reporting Results
• Regression
– R=, p=
• T-test
– t(d.f.)= , p=
• ANOVA
– F(d.f.[Between], d.f [total])= , p=
48
Biologically Relevant
• Statistically Significant ≠ Relevant
• Reasons results might not be statistically
significant
– To small a sample
– To much variation in one or multiple groups
• You should report statistics, but not let them
dictate what you consider relevant results
– Use figures to make argument for relevant and
interesting results.
49
Thank You
Xkcd.com
50