playing chess

Download Report

Transcript playing chess

Last lecture summary
• Which measures of central tendency do you know?
• Which measures of variability do you know?
• Empirical rule
• Population, census, sample, statistic, parameter
• Statistical inference
Statistical jargon
population (census) vs. sample
parameter (population) vs. statistic (sample)
Population - parameter
Mean 𝜇
Standard deviation 𝜎
Sample - statistic
Mean 𝑥
Standard deviation s
Výběr - statistika
Výběrový průměr 𝑥
Výběrová směrodatná odchylka s
Sampling
• Representative sample, random sample
• Sampling with/without replacement
• Bias
New stuff
Bessel’s correction
𝑠=
𝑥𝑖 − 𝑥
𝑛−1
2
www.udacity.com – Statistics
Sample vs. population SD
• We use sample standard deviation to approximate
population paramater σ
𝑠=
𝑥𝑖 − 𝑥
𝑛−1
2
≈ 𝜎=
𝑥𝑖 − 𝜇
𝑛
2
• But don’t get confused with the actual standard deviation
of a small dataset.
• For example, let’s have this dataset: 5 2 1 0 7. Do you
divide by 𝑛 or by 𝑛 − 1?
Median absolute deviation (MAD)
• standard deviation is not robust
• IQR is robust
• mean absolute deviation MAD – a robust equivalent of the
standard deviation
• Take your data, find median, calculate absolute deviation
from the median, find the median of absolutes deviations
Median absolute deviation (MAD)
Data
Median deviation
5
10
30
20
30
5
15
10
15
Median:
MAD:
Absolute deviation
NORMAL
DISTRIBUTION
Playing chess
• Pretend I am a chess player.
• Which of the following tells you most about how good I
am:
1.
2.
3.
My rating is 1800.
8110th place among world competitive chess players.
Ranked higher than 88% of competitive chess players.
Distribution
We should use relative
frequencies and convert
all absolute frequencies
to proportions.
Distribution of scores in one particular year
Height data – absolute frequencies
http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights
Height data – relative frequencies
Height data – relative frequencies
What proportion
of values is
between 170 cm
and 173.75 cm?
30%
173.5
Height data – relative frequencies
What proportion of
values is between
170 cm and 175 cm?
We can’t tell for
certain.
• How should we modify data/histogram to allow us a more
detail?
1.
2.
3.
Adding more value to the dataset
Increasing the bin size
A smaller bin size
Height data – relative frequencies
What proportion of values is between 170 cm and 175 cm?
36%
Height data – relative frequencies
Height data – relative frequencies
recall the empirical rule
Normal distribution
68-95-99.7
N(𝜇,σ)
1
𝑥−𝜇
𝑒𝑥𝑝 −
2𝜎 2
2𝜎𝜋
2
STANDARD NORMAL
DISTRIBUTION
Who is more popular?
Who is more popular
s.d. = 36
Z = -3.53
s.d. = 60
Z = -2.57
Standardizing
Formula
𝑥−𝑥
𝑍=
𝑠
Quiz
• What does a negative Z-score mean?
1. The original value is negative.
2. The original value is less than mean.
3. The original value is less than 0.
4. The original value minus the mean is negative.
Quiz II
• If we standardize a distribution by converting every value
to a Z-score, what will be the new mean of this
standardized distribution?
• If we standardize a distribution by converting every value
to a Z-score, what will be the new standard deviation of
this standardized distribution?
Standard normal distribution
N(0,1)
Z
Z – number of standard
deviations away from the
mean
If the Z-value is +1, how
many percent are less than
that value?
cca 84 %
-3
-2
-1
0
+1
+2
+3
Proportion of human heights
𝑥 = 173 cm
𝑠 = 5 cm
𝑥 = 173 cm
𝑠 = 5 cm
-2
-1
0
+1
+2
Quiz
• Approximately what proportion of people is smaller than
168 cm?
𝑥 = 173 cm
𝑠 = 5 cm
16%
163
168
173
178
183
Quiz
• Approximately what proportion of people is higher than
183 cm?
𝑥 = 173 cm
𝑠 = 5 cm
2.5%
163
168
173
178
183
Quiz
• Approximately what proportion of people is between 163
cm and 178 cm high?
𝑥 = 173 cm
𝑠 = 5 cm
81.5%
163
168
173
178
183
Quiz
• Approximately what proportion of people is smaller than
180 cm?
𝑥 = 173 cm
𝑠 = 5 cm
ca 91.5%
163
168
173
178
183
Quiz
• What is the probability of randomly selecting a height in
the sample that is >5 standard deviations above the
mean?
1.
2.
3.
4.
0.01
0.3
0.8
0.99
Quiz
• What is the probability of randomly selecting a height in
the sample that is <5 standard deviations below the
mean?
1.
2.
3.
4.
0.01
0.3
0.8
0.99
Quiz
• What proportion of the data is either below 2 standard
deviations or above 2 standard deviations from the mean
for a normal distribution?
95%
2.5%
2.5%
Z-table
What is the proportion less than the point with the Z-score -2,75?