ENGR-25_Lec-18_Statistics-1x
Download
Report
Transcript ENGR-25_Lec-18_Statistics-1x
Engr/Math/Physics 25
Chp7
Statistics-1
Bruce Mayer, PE
Licensed Electrical & Mechanical Engineer
[email protected]
Engineering/Math/Physics 25: Computational Methods
1
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Learning Goals
Use MATLAB to solve Problems in
• Statistics
• Probability
Use Monte Carlo (random) Methods to
Simulate Random processes
Properly Apply InterPolation or
ExtraPolation to Estimate values
between or outside of know data points
Engineering/Math/Physics 25: Computational Methods
2
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Histogram
Histograms are
COLUMN Plots that
show the
Distribution of Data
• Height Represents
Data Frequency
Some General
Characteristics
• Used to represent
continuous grouped,
or BINNED, data
– BIN SubRange
within the Data
Engineering/Math/Physics 25: Computational Methods
3
• Usually Does not
have any gaps
between bars
• Areas represent
%-of-Total Data
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
HistoGram ≡ Frequency Chart
A HistoGram shows how OFTEN some
event Occurs
• Histograms are
often constructed
using Frequency
Tables
Engineering/Math/Physics 25: Computational Methods
4
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Histograms In MATLAB
MATLAB has 6
Forms of the
Histogram Cmd
The Simplest
hist(y)
• Generates a
Histogram with
10 bins
Example: HI Temps
at Oakland AirPort in
Jul-Aug08
Engineering/Math/Physics 25: Computational Methods
5
TmaxOAK
65, 66,
73, 79,
70, 74,
77, 86,
66, 72,
82, 76,
68, 65,
70, 68,
69, 67]
= [70, 75, 63, 64,
65, 65, 67, 78, 75,
71, 72, 67, 69, 69,
71, 72, 71, 74, 77,
90, 90, 70, 71, 66,
68, 73, 72, 82, 91,
75, 72, 72, 69, 70,
67, 65, 63, 64, 72,
71, 77, 65, 63, 69,
The Plot Statement
hist(TmaxOAK), ylabel('No.
Days'), xlabel('Max. Temp
(°F)'), title('Oakland
Airport - Jul-Aug08')
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
hist Result for Oakland
Oakland Airport - Jul-Aug08
15
It was
COLD in
Summer 08
10
No. Days
Bin Width =
(91-63)/10 =
2.8 °F
5
0
60
65
70
75
80
85
90
95
Max. Temp (°F)
Engineering/Math/Physics 25: Computational Methods
6
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Histograms In MATLAB
Next Example: Max
Temp at Stockton
AirPort in Jul-Aug08
hist(y)
• Generates a
Histogram with
10 bins
TmaxSTK = [94, 98, 93, 94,
91, 96, 93, 87, 89, 94,
100, 99, 103, 103, 103, 97,
91, 83, 84, 90, 89, 95, 94,
99, 97, 94, 102, 103, 107,
98, 86, 89, 95, 91, 84, 93,
98, 104, 105, 107, 103, 91,
90, 96, 93, 86, 92, 93, 95,
95, 86, 81, 93, 97, 96, 97,
101, 92, 89, 92, 93, 94]
The Plot Statement
hist(TmaxSTK), ylabel('No.
Days'), xlabel('Max. Temp
(°F)'), title(‘Stockton
Airport - Jul-Aug08')
Engineering/Math/Physics 25: Computational Methods
7
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
hist Result for Stockton
Stockton Airport - Jul-Aug08
16
It was HOT
in Summer
08
14
12
No. Days
10
Bin Width =
(107-81)/10
= 2.6 °F
8
6
4
2
0
80
85
90
95
100
105
110
Max. Temp (°F)
Engineering/Math/Physics 25: Computational Methods
8
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
hist Command Refinements
Adjust The number
Consider Summer
and width of the bins
08 HI-Temp Data
using
from Oakland and
hist(y,N)
Stockton
hist(y,x)
• Where
Make 2 Histograms
– N an integer
specifying the
NUMBER of Bins
– x A vector that
Specifies Bin
CENTERs
Engineering/Math/Physics 25: Computational Methods
9
• 17 bins
• 60F→110F by 2.5’s
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
hist Plots 17 Bins
>> hist(TmaxSTK,17),
ylabel('No. Days'),
xlabel('Max. Temp (°F)'),
title('Stockton, CA - JulAug08')>>
hist(TmaxOAK,17),
ylabel('No. Days'),
xlabel('Max. Temp (°F)'),
title('Oakland, CA - JulAug08')
Oakland, CA - Jul-Aug08
10
9
9
8
8
7
7
6
6
No. Days
No. Days
Stockton, CA - Jul-Aug08
10
5
5
4
4
3
3
2
2
1
1
0
80
85
90
95
Max. Temp (°F)
100
105
Engineering/Math/Physics 25: Computational Methods
10
110
0
60
65
70
75
80
Max. Temp (°F)
85
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
90
95
hist Plots Same Scale
>> x = [60:2.5:110];
>> hist(TmaxSTK,x),
ylabel('No. Days'),
xlabel('Max. Temp (°F)'),
title('Stockton, CA - JulAug08')
>> x = [60:2.5:110];
hist(TmaxOAK,x),
ylabel('No. Days'),
xlabel('Max. Temp (°F)'),
title('Oakland, CA - JulAug08')
Oakland, CA - Jul-Aug08
16
14
14
12
12
10
10
No. Days
No. Days
Stockton, CA - Jul-Aug08
16
8
8
6
6
4
4
2
2
0
60
65
70
75
80
85
Max. Temp (°F)
90
95
100
105
Engineering/Math/Physics 25: Computational Methods
11
110
0
60
65
70
75
80
85
Max. Temp (°F)
90
95
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
100
105
110
hist Numerical Output
Hist can also
provide numerical
Data about the
Histogram
n = hist(y)
• Gives the number of
values in each of the
(default) 10 Bins
For the Stockton
data
Engineering/Math/Physics 25: Computational Methods
12
k =
2
7
5
9
1
2
10
7
16
3
We can also spec
the number and/or
Width of Bins
>> k13 = hist(TmaxSTK,13)
k13 =
2
2
4
4
6
10
10
7
5
2
6
2
2
>> k2_5s = hist(TmaxOAK,x)
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
hist Numerical Output
Bin-Count and Bin-Locations
(Frequency Table) for the Oakland Data
>> [u, v] = hist(TmaxOAK,x)
u =
0
3
11
7
15
9
6
4
1
2
1
0
3
0
0
0
0
0
0
0
0
v =
60.0000
62.5000
65.0000
72.5000
75.0000
77.5000
85.0000
87.5000
90.0000
97.5000 100.0000 102.5000
110.0000
Engineering/Math/Physics 25: Computational Methods
13
67.5000
80.0000
92.5000
105.0000
70.0000
82.5000
95.0000
107.5000
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Histogram Commands - 1
Command
bar(x,y)
Description
Creates a bar chart of y versus x.
hist(y)
Aggregates the data in the vector y into
10 bins evenly spaced between the
minimum and maximum values in y.
hist(y,n)
Aggregates the data in the vector y into
n bins evenly spaced between the
minimum and maximum values in y.
hist(y,x)
Aggregates the data in the vector y into
bins whose center locations are
specified by the vector x. The bin widths
are the distances between the centers.
Engineering/Math/Physics 25: Computational Methods
14
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Histogram Commands - 2
Command
[z,x] = hist(y)
Description
Same as hist(y) but returns two vectors
z and x that contain the frequency
count and the 10 bin locations.
Same as hist(y,n) but returns two
[z,x] = hist(y,n) vectors z and x that contain the
frequency cnt and the n bin locations.
Same as hist(y,x) but returns two
vectors z and x that contain the
[z,x] = hist(y,x) frequency count and the bin locations.
The returned vector x is the same as
the user-supplied vector x.
Engineering/Math/Physics 25: Computational Methods
15
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Bar vs. Hist
Bar is Sequential while HIST is GROUPED
Tmax in Stockton, CA • Jul-Aug08
110
Stockton Airport - Jul-Aug08
16
105
14
12
100
No. Days
MaxTemp (°F)
10
95
8
6
90
4
2
85
0
80
85
90
95
100
Max. Temp (°F)
80
10
20
30
day
40
50
BAR
Engineering/Math/Physics 25: Computational Methods
16
60
HIST
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
105
110
BAR construction file
% Bruce Mayer, PE • 06Apr16
% ENGR25
clear, close, clc
% The data
TmaxSTK = [94, 98, 93, 94, 91, 96, 93, 87, 89,
94, 100, 99, 103, 103, 103, 97, 91, 83, 84, 90,
89, 95, 94, 99, 97, 94, 102, 103, 107, 98, 86,
89, 95, 91, 84, 93, 98, 104, 105, 107, 103, 91,
90, 96, 93, 86, 92, 93, 95, 95, 86, 81, 93, 97,
96, 97, 101, 92, 89, 92, 93, 94]
%
% the BAR graph
bar(TmaxSTK), axis([ 1 62 80 110]), grid
xlabel('day'); ylabel('MaxTemp (°F)')
title('Tmax in Stockton, CA • Jul-Aug08')
whitebg([0.8 1 1])
Engineering/Math/Physics 25: Computational Methods
17
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Check Default Bin Widths
• The previous
HandCalc of 2.8 °F
CONFIRMED by
MATLAB
Oakland
>> Tlo = min(TmaxOAK)
Tlo =
63
>> Thi = max(TmaxOAK)
Thi =
91
>> [n,BinCtr] = hist(TmaxOAK)
n =
11
10
15
11
7
– Note use of diff
command
2
2
BinCtr =
64.4000
67.2000
70.0000
72.8000
81.2000
84.0000
86.8000
89.6000
>> DelBC = diff(BinCtr)
DelBC =
2.8000
2.8000
2.8000
2.8000
2.8000
2.8000
Engineering/Math/Physics 25: Computational Methods
18
2.8000
0
1
3
75.6000
78.4000
2.8000
2.8000
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Check Default Bin Widths
• The previous
HandCalc of 2.6 °F
CONFIRMED by
MATLAB
Stockton
>> Tlo = min(TmaxSTK)
Tlo =
81
>> Thi = max(TmaxSTK)
Thi =
107
>> [n,BinCtr] = hist(TmaxSTK)
n =
2
5
1
10
16
– Note use of diff
command
7
9
BinCtr =
82.3000
84.9000
87.5000
90.1000
97.9000 100.5000 103.1000 105.7000
>> DelBC = diff(BinCtr)
DelBC =
2.6000
2.6000
2.6000
2.6000
2.6000
2.6000
Engineering/Math/Physics 25: Computational Methods
19
2.6000
2
7
3
92.7000
95.3000
2.6000
2.6000
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Data Statistics Tool - 1
Make LinePlot of Temp
Data for
Stockton, CA
Use the Tools
Menu to find
the Data
Statistics Tool
Engineering/Math/Physics 25: Computational Methods
20
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Data Statistics Tool - 2
Use the
Tool to Add
Plot Lines
for the
Temp Data
• The Mean
• ±StdDev
Engineering/Math/Physics 25: Computational Methods
21
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Data Statistics Tool - 3
Quite a Nice
Tool,
Actually
The Result
The Avg
Max Temp
Was
96.97 °F
Engineering/Math/Physics 25: Computational Methods
22
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Probability
Probability The LIKELYHOOD that a
Specified OutCome Will be Realized
• The “Odds” Run from 0% to 100%
Class Question: What are the
Odds of winning the California
MEGA-MILLIONS Lottery?
Engineering/Math/Physics 25: Computational Methods
23
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
258 890 085 ... EXACTLY???!!!
To Win the MegaMillions Lottery
• Pick five numbers from 1 to 75
• Pick a MEGA number from 1 to 15
The Odds for the 1st ping-pong Ball
= 5 out of 75
The Odds for the 2nd ping-pong Ball
= 4 out of 75, and so On
The Odds for the MEGA are 1 out of 15
Engineering/Math/Physics 25: Computational Methods
24
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
258 890 085... Calculated
Calc the OverAll Odds as the
PRODUCT of each of the Individual
OutComes
5 4 3 2 1 1 5!70! 1
Odds
75! 15
75 74 73 72 71 15
120
1
31,066,902,000 258,890,085
• This is Technically a COMBINATION
Engineering/Math/Physics 25: Computational Methods
25
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
258 890 085... is a DEAL!
The ORDER in Which the Ping-Pong
Balls are Drawn Does NOT affect the
Winning Odds
If we Had to Match the Pull-Order:
1 1 1 1 1 1
70!
Odds
75 74 73 72 71 15 15 71!
1
120X the Current
31,066,902,000
• This is a PERMUTATION
Engineering/Math/Physics 25: Computational Methods
26
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Normal Distribution - 1
Consider Data (Freq Tab) on the Height
of a sample group of 20 year old Men
Plot this Frequency Data using bar
Height of 20 Yr-Old Men
12
10
No.
8
6
4
2
0
62
64
66
68
70
Height (Inches)
72
74
Engineering/Math/Physics 25: Computational Methods
27
76
>>
y_abs=[1,0,0,0,2,4,5,
4,8,11,12,10,9,8,7,5,
4,4,3,1,1,0,1];
>> xbins =
[64:0.5:75];
>> bar(xbins, y_abs),
ylabel('No.'),
xlabel('Height
(Inches'),
title('Height of 20
Yr-Old Men')
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Ht (in)
64
64.5
65
65.5
66
66.5
67
67.5
68
68.5
69
69.5
70
70.5
71
71.5
72
72.5
73
73.5
74
74.5
75
No.
1
0
0
0
2
4
5
4
8
11
12
10
9
8
7
5
4
4
3
1
1
0
1
Normal Distribution - 2
We can also SCALE the Bar/Hist such that
the AREA UNDER the CURVE equals 1.00,
exactly
The Game Plan for Scaling
• Calc the Height of Each Bar To Get the
Total Area = Σ([Bin Width] x [individual counts])
𝑨 = ∆𝑨 = 𝑩𝑾𝒌 × 𝑰𝑪𝒌 =
• The individual Bar Area =
[Bin Width] x [individual count]
• %-Area any one bar → [Bar Area]/[Total Area]
Engineering/Math/Physics 25: Computational Methods
28
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Normal Distribution - 3
Use bar to construct the Scaled
Histogram with Area of 1.0000
• See File Scaled_Histogram_1206.m
Would
need to
enter all
100 raw
data pts to
use hist
0.1
0.08
Frequency
– Again, use
bar to
construct
histrogram
Height of 20 Yr-Old Men)
0.12
0.06
0.04
0.02
0
62
Engineering/Math/Physics 25: Computational Methods
29
64
66
68
70
Height (Inches)
72
74
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
76
Probability Distribution Fcn (PDF)
Because the Area
Under the Scaled
Plot is 1.00, exactly,
The FRACTIONAL
Area under any bar,
or set-of-bars gives
the probability that
any randomly
Selected 20 yr-old
man will be that
height
Engineering/Math/Physics 25: Computational Methods
30
e.g., from the Plot
we Find
• 67.5 in → 4%
• 68 in → 8%
• 68.5 in → 11%
Summing → 23 %
Thus by this dataset 23% of 20 yr-old
men are 67.2568.75 inches tall
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Random Variable
A random variable x takes on a defined set of
values with different probabilities; e.g..
• If you roll a die, the outcome is random (not fixed)
and there are 6 possible outcomes, each of which
occur with equal probability of one-sixth.
• If you poll people about their voting preferences,
the percentage of the sample that responds “Yes
on Proposition 101” is a also a random variable
– the %-age will be slightly differently every time you poll.
Roughly, probability is how frequently we
expect different outcomes to occur if we
repeat the experiment over and over
(“frequentist” view)
Engineering/Math/Physics 25: Computational Methods
31
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Random variables can be Discrete
or Continuous
Discrete random variables have a
countable number of outcomes
• Examples: Dead/Alive, Red/Black,
Heads/Tales, dice, deck of cards, etc.
Continuous random variables have an
infinite continuum of possible values.
• Examples: Battery Current, human weight,
Air Temperature, the speed of a car, the
real numbers from 7 to 11.
Engineering/Math/Physics 25: Computational Methods
32
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Probability Distribution Functions
A Probability Distribution Function
(PDF) maps the possible values of x
against their respective probabilities of
occurrence, p(x)
p(x) is a number from 0 to 1.0, or
alternatively, from 0% to 100%.
The area under a probability
distribution function curve is
always 1 (or 100%).
Engineering/Math/Physics 25: Computational Methods
33
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Discrete Example: Roll The Die
x
p(x)
1
p(x=1)=1/6
2
p(x=2)=1/6
3
p(x=3)=1/6
4
p(x=4)=1/6
5
p(x=5)=1/6
6
p(x=6)=1/6
px
1/6
1
2
3
4
5
1 1 1 1 1 1
px
6 6 6 6 6 6
all x
1
or p x 6 so
6
all x
px 1
Engineering/Math/Physics 25: Computational Methods
34
6
all x
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
x
Continuous Case
The probability function that accompanies a
continuous random variable is a continuous
mathematical function that integrates to 1.
The Probabilities associated with
continuous functions are just areas under a
Region of the curve (→ Definite Integrals)
Probabilities are given for a range of
values, rather than a particular value
• e.g., the probability of getting a math SAT
score between 700 and 800 is 2%).
Engineering/Math/Physics 25: Computational Methods
35
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Continuous Case PDF Example
Recall the negative exponential function
(in probability, this is called
x
f
(
x
)
e
an “exponential distribution”):
This Function Integrates to 1 (as
required for all PDF’s) for limits of 0 → ∞
e
0
x
Engineering/Math/Physics 25: Computational Methods
36
e
0 1 1
1 0
x
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Continuous Case PDF Example
The probability that
x is any exact value
(e.g.: 1.9476) is 0
• we can ONLY assign
Probabilities
to possible
RANGES of x
For example, the
probability of x
falling within 1 to 2:
p(x)=e-x
1
x
p(x)=e-x
1
1
NO Area
Under a
LINE
2
p (1 x 2) e x e x
x
Engineering/Math/Physics 25: Computational Methods
37
2
1
e 2 e 1
.135 .368 .23 23%
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
2
1
Gaussian Curve
The Man-Height
HistroGram had
some Limited, and
thus DISCRETE,
Data
If we were to
Measure 10,000 (or
more) young men
we would obtain a
HistoGram like this
Engineering/Math/Physics 25: Computational Methods
38
As We increase the
number and
fineness of the
measurements The
PDF approaches a
CONTINUOUS
Curve
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Gaussian Distribution
A Distribution that
Describes Many Physical
Processes is called the
GAUSSIAN or NORMAL
Distribution
Gaussian (Normal) distribution
• Gaussian → famous “bell-shaped curve”
– Describes IQ scores, how fast horses can run, the no. of
Bees in a hive, wear profile on old stone stairs...
• All these are cases where:
– deviation from mean is equally probable in either direction
– Variable is continuous (or large enough integers
to look continuous)
Engineering/Math/Physics 25: Computational Methods
39
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Normal Distribution
Real-valued PDF: f(x) → −∞ < x < +∞
2 independent fitting parameters:
µ , σ (central location and width)
Properties:
IP
• Symmetrical about Mode at µ
• Median = Mean = Mode
• Inflection points at ±σ
Area (probability of observing event) within:
• ± 1σ = 0.683
• ± 2σ = 0.955
For larger σ, bell shaped curve becomes
wider and lower (since area =1 for any σ)
Engineering/Math/Physics 25: Computational Methods
40
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
IP
Normal Distribution
Mathematically
f x
• Where
1
2
e
( x ) 2 2
– σ2 = Variance
– µ = Mean (& Median, Mode)
The Area Under the Curve
f x dx
1
2
e
( x ) 2 2
Engineering/Math/Physics 25: Computational Methods
41
2
dx 1
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
2
68-95-99.7 Rule for Normal Dist
68% of
the data
σ
σ
95% of the data
2σ
2σ
3σ
99.7% of the data
Engineering/Math/Physics 25: Computational Methods
42
3σ
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
68-95-99.7 Rule in Math terms…
Using Definite-Integral Numerical Calculus
1
e
2
1 x
2
dx .68 68%
2
1 x
2
3
1 x
2
1
e
2 2
1
e
3 2
Engineering/Math/Physics 25: Computational Methods
43
2
2
dx .95 95%
2
dx .997 99.7%
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Error Function (erf) & Probability
Guass’s Defining
Eqn for the erf
erf z
2
z
0
e
y2
IG
dy
This looks a lot Like
the normal dist
f x
1
2
e
( x ) 2
2
Consider the
Gaussian integral
Engineering/Math/Physics 25: Computational Methods
44
2
1
2
Or
IG
1
2
e
( x ) 2 2
e
x
2
1
x dy
y
dx 2
2
1
dy
dx Or
2
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
dx
2
Now Let
dx 2dy
2
dx
Error Function (erf) & Probability
Subbing for 𝑥 & 𝑑𝑥
IG
1
e
2
x
2
IG
2
dx
1
e
1 2
2
1
y2
1
IG
e
2
dy
erf
2
2
1
y2
As
IG
e
dy
2
erf z
ReArranging
Engineering/Math/Physics 25: Computational Methods
45
y2
e
dy
y2
z
e
dy
y2
0
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
dy
Error Function (erf) & Probability
Now the Limits
This Fcn is
Symmetrical about
y=0
Plotting
1
f y e
0.9
y2
Recall
0.8
erf z
2
f(y) = exp(-y )
0.7
0.6
0.5
2
z
e
y2
0
dy
And the erf properties
0.4
0.3
• erf(0) = 0
• erf(∞) = 1
0.2
0.1
0
-3
-2
-1
0
1
2
3
y
Engineering/Math/Physics 25: Computational Methods
46
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Error Function (erf) & Probability
By Symmetry about y = 0 for
2
0
e
y2
2
dy 1
0
e
2
−𝑦
𝑒
y2
the AUC’s
dy
Thus for Positive 𝐵
2
B
e
y2
dy
2
0
e
y2
dy
2
B
0
e
y2
dy
So Finally integrating: −∞ → 𝐵
2
B
e
y2
Engineering/Math/Physics 25: Computational Methods
47
dy 1 erf ( B)
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Error Function (erf) & Probability
Note That for a
Continuous PDF
• Probability that x is
Less or Equal to b
Px b
b
f x dx
• Probability that x is
between a & b
b
Pa x b f x dx
a
Engineering/Math/Physics 25: Computational Methods
48
The probability for
the Normal Dist
Px b
b
1
2
2
dx
b
Pa x b
But
( x )
1
e
2
e
( x ) 2 2
1
2
e
( x ) 2 2
2
a
2
2
2
dx I G so
2
x
1
I G 2 erf
2
2
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
dx
Error Function (erf) & Probability
If We Scale this
1
b
µ
Properly we can Px b 1 erf
2
2
Cast these Eqns
into the ½∙erf Form
1 bµ
a µ
Pa x b erf
erf
2 2
2
MATLAB has the erf built-in, so if we have
the POPULATION Mean & StdDev We can
Calc Probabilities for Normally Distributed
Quantities
Engineering/Math/Physics 25: Computational Methods
49
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
MATLAB and Guassian Prob
Thus MATLAB has the tools
needed to Calulate any
Gaussian Probability for
• −∞ < 𝑏 < +∞
•𝑎 < 𝑏
1 bµ
a µ
Pa x b erf
erf
2 2
2
Engineering/Math/Physics 25: Computational Methods
50
1
b µ
P x b 1 erf
2
2
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
erf(𝒛) can be NEGATIVE
For the previous erf calcs to work the erf
must be NEGATIVE when 𝑏 is negative;
e.g.:
1
0.73 µ
P x 0.73 1 erf
MUST be 0
2
2
A quick Check
>> erfM73 = erf(-0.73)
erfM73 =
-0.6981
>> erfP73 = erf(+0.73)
erfP73 =
0.6981
Bruce Mayer, PE
Engineering/Math/Physics 25: Computational Methods
51
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Estimating µ & σ (1)
The Location &
Width Parameters, µ
& σ, are Calculated
from the ENTIRE
POPULATION
• Mean, µ
n
xk n
k 1
• Variance, σ2
n
xk 2 n
2
• Standard Deviation, σ
2
For LARGE
Populations it is
usually impractical to
measure all the xk
In this case we take a
Finite SAMPLE to
ESTIMATE µ & σ
k 1
Engineering/Math/Physics 25: Computational Methods
52
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Estimating µ & σ (2)
Say we want to
characterize
Miles/Yr driven by
Every Licensed
Driver in the USA
We Take the Mean of
the SAMPLE
We assume that this
quantity is Normally
Distributed, so we
take a Sample of
N = 1013 Drivers
Use the SAMPLEMean to Estimate the
POPULATION-Mean
Engineering/Math/Physics 25: Computational Methods
53
N
x xk N
k 1
N
µ x xk N
k 1
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Estimating µ & σ (3)
S
Now Calc the
Estimate
SAMPLE Variance &
• standard deviation:
StdDev
positive square root of
N
2
S2
x
k 1
k
x
N 1
• Number decreased
from N to (N – 1) To
Account for case
where N = 1
– In this case 𝑥 = 𝑥1 ,
and the S2 result is
meaningless
Engineering/Math/Physics 25: Computational Methods
54
the variance
– small std dev:
observations are
clustered tightly around
a central value
– large std dev:
observations are
scattered widely about
the mean
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
All Done for Today
Gaussian?
Or
Normal?
Recall De Moivre’s Theorem
z R cos jR sin
Normal distribution was
introduced by French
mathematician
A. De Moivre in 1733.
• Used to approximate
probabilities of coin tossing
• Called it the exponential
bell-shaped curve
1809, K.F. Gauss, a German
mathematician, applied it to
predict astronomical entities… it
became known as the Gaussian
distribution.
Late 1800s, most believe majority
of physical data would follow the
distribution called normal
distribution
z k R k cosk j sin k
Engineering/Math/Physics 25: Computational Methods
55
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Engr/Math/Physics 25
Appendix
Some Normal Dist
Examples
Bruce Mayer, PE
Licensed Electrical & Mechanical Engineer
[email protected]
Engineering/Math/Physics 25: Computational Methods
56
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
How Good is the Rule for Real?
Check some example data:
The mean, µ, of the weight of a large
group of women
Cross Country
Runners = 127.8 lbs
The standard
deviation (σ)
for this Group
= 15.5 lbs
Engineering/Math/Physics 25: Computational Methods
57
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
68% of 120 = .68x120 = ~ 82 runners
In fact, 79 runners fall within 1σ (15.5 lbs) of the mean
112.3
127.8
143.3
25
20
P
e
r
c
e
n
t
15
10
5
0
80
90
100
110
120
130
140
150
160
POUNDS
Engineering/Math/Physics 25: Computational Methods
58
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
95% of 120 = .95 x 120 = ~ 114 runners
In fact, 115 runners fall within 2σ of the mean
96.8
127.8
158.8
25
20
P
e
r
c
e
n
t
15
10
5
0
80
90
100
110
120
130
140
150
160
POUNDS
Engineering/Math/Physics 25: Computational Methods
59
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
99.7% of 120 = .997 x 120 = 119.6 runners
In fact, all 120 runners fall within 3σ of the mean
81.3
127.8
174.3
25
20
P
e
r
c
e
n
t
15
10
5
0
80
90
100
110
120
130
140
150
160
POUNDS
Engineering/Math/Physics 25: Computational Methods
60
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Engr/Math/Physics 25
Appendix
f x 2 x 7 x 9 x 6
3
2
Bruce Mayer, PE
Licensed Electrical & Mechanical Engineer
[email protected]
Engineering/Math/Physics 25: Computational Methods
61
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Basic Fitting Demo File
% Bruce Mayer, PE
% ENGR25 * 29Jun12 * Rev 27Oct14
% file = Demo_Basic_Fitting_Stockton_Temps_1410.m
%
% Data for Stockton AirPort from
%
http://www7.ncdc.noaa.gov/IPS/cd/cd.html;jsessionid=1926FA20901D9A52D64
FC06A0A449C00
TmaxSTK1107 = [93 99 100 100 102 101 98 97 90 88 82 82 79 78 80 81 81
86 89 96 96 93 91 88 89 91 95 98 93 87 92]
N07 = length(TmaxSTK1107)
TmaxSTK1108 = [89 93 93 86 92 91 88 91 94 95 91 92 95 95 92 94 94 95 88
86 86 90 97 97 94 96 95 96 94 89 89]
N08 = length(TmaxSTK1108)
%
TmaxSTK11 = [TmaxSTK1107,TmaxSTK1108]
Ntot = length(TmaxSTK11)
nday = [1:Ntot];
plot(nday, TmaxSTK11, '-dk', 'LineWidth', 2), xlabel('No. Days after
31Jun11'),...
ylabel('Max. Temp (°F)'), title('Stockton, CA - Jul-Aug11'), grid
Engineering/Math/Physics 25: Computational Methods
62
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Normal or Gaussian?
Normal distribution was introduced by French
mathematician A. De Moivre in 1733.
• Used to approximate probabilities of coin tossing
• Called it exponential bell-shaped curve
1809, K.F. Gauss, a German mathematician,
applied it to predict astronomical entities… it
became known as Gaussian distribution.
Late 1800s, most believe majority data would
follow the distribution called normal
distribution
Engineering/Math/Physics 25: Computational Methods
63
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Carl Friedrich Gauss
Engineering/Math/Physics 25: Computational Methods
64
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Gaussian/Normal Distribution Eqn
f x
1
2
e
( x ) 2
2
2
Calculate in MATLAB using the
Error Function, 𝑒𝑟𝑓 𝑧
>> TestPerf = erf(0.41)
>> TestNerf = erf(-0.41)
TestPerf =
0.4380
TestNerf =
-0.4380
Engineering/Math/Physics 25: Computational Methods
65
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
Ht (in)
No.
Area (BW*No.)
No./TotArea
64
1
0.5
0.0200
1.00%
64.5
0
0
0.0000
0.00%
65
0
0
0.0000
0.00%
65.5
0
0
0.0000
0.00%
66
2
1
0.0400
2.00%
66.5
4
2
0.0800
4.00%
67
5
2.5
0.1000
5.00%
67.5
4
2
0.0800
4.00%
68
8
4
0.1600
8.00%
68.5
11
5.5
0.2200
11.00%
69
12
6
0.2400
12.00%
69.5
10
5
0.2000
10.00%
70
9
4.5
0.1800
9.00%
70.5
8
4
0.1600
8.00%
71
7
3.5
0.1400
7.00%
71.5
5
2.5
0.1000
5.00%
72
4
2
0.0800
4.00%
72.5
4
2
0.0800
4.00%
73
3
1.5
0.0600
3.00%
73.5
1
0.5
0.0200
1.00%
74
1
0.5
0.0200
1.00%
74.5
0
0
0.0000
0.00%
75
1
0.5
0.0200
1.00%
Engineering/Math/Physics
S
50.0 25: Computational Methods
66
BW*(No./TotArea)
S
100.00%
Normal
Dist
Data
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx
SPICE Circuit
Engineering/Math/Physics 25: Computational Methods
67
Bruce Mayer, PE
[email protected] • ENGR-25_Lec-18_Statistics-1.pptx