Transcript Lecture 1
Probability and Statistics
Course Requirements
1. Quizzes – 25%
2. First Long Exam – 25%
3. Second Long Exam – 25%
4. Third Long Exam – 25%
5. Total – 100%
Passing – 60%
Probability and Statistics
Statistics
A branch of mathematics that deals with the
collection, organization and analysis of numerical
data and with such problems as experiment design
and decision making.
3 Important features of Statistics:
1. Data gathering
2. Data analysis
3. Making decision
Probability and Statistics
Definition of terms
1. Raw data
Data collected in original form
2. Variable
Characteristic or attribute that can assume
different values
3. Population
All subjects possessing a common characteristic
that is being studied
Probability and Statistics
Definition of terms
4. Sample
A subgroup or subset of a population
5. Parameter
Characteristic or measure obtained from a
population
6. Qualitative variables
Variables which assume non-numerical values
Probability and Statistics
Definition of terms
7. Quantitative variables
variables which assume numerical values
8. Discrete variables
Variables which assume finite or countable
number of possible values, usually obtained by
counting
9. Continuous variables
Variables which assume infinite number of
possible values, usually obtained by
measurement
Probability and Statistics
Everyone involved in the experiment must have a
clear idea about what is to be studied, how the data
is to be collected and at least a qualitative
understanding as to how these data are to be
analyzed.
Guidelines for designing experiments:
1. Statement of the problem / recognition of the
problem
Develop all the ideas about the objectives of
the experiment
Probability and Statistics
Guidelines for designing experiments:
2. Choice of factors and levels
Choose the factors to be varied in the
experiment
Choose the ranges over which these factors will
be varied
Identify the specific levels at which runs will be
made
Probability and Statistics
Guidelines for designing experiments:
3. Selection of the response variable
The experimenter should be certain that this
variable really provides useful information
about the process under study
4. Choice of experimental design
Involves the consideration of sample size
(number of replicates/trials), the selection of a
suitable run order for the experimental trials,
and the determination of whether or not
blocking or other randomization restrictions
involved.
Probability and Statistics
Guidelines for designing experiments:
5. Performing the experiment
Monitor the process carefully to ensure that
everything is being done according to plan
6. Data analysis
Analyzing the data collected during the
experiment by statistical methods
6. Conclusions
Making decision based on the statistical results
Probability and Statistics
Methods of Sampling
1. Random sampling
sampling in which the data is collected using
chance methods or random numbers.
2. Systematic sampling
Sampling in which the data is collected by
selecting every kth object
3. Stratified sampling
Sampling in which the population is divided into
groups (strata) according to some characteristic.
Each strata is then sampled either random or
systematic
Probability and Statistics
Methods of Sampling
4. Cluster sampling
sampling in which the population is divided into
groups (usually geographically). Some of these
groups are randomly selected, and then all of
the elements in those groups are selected.
Probability and Statistics
Methods of Summarizing/Characterizing Data
1. Tabular Methods
a. Frequency Distribution
b. Cumulative Frequency
c. Stem and Leaf Table
2. Graphical Methods
a. Frequency Histogram
b. Frequency Polygon
c. Ogive
d. Pie chart
Probability and Statistics
Methods of Summarizing/Characterizing Data
3. Numerical Methods
a. Measures of Central Tendencies
Mean/Average, Median, Mode
b. Measures of Dispersion
Range, Variance, Standard Deviation
c. Measures of Shape
Skewness, Kurtosis
d. Measures of Data Locations
Percentiles, Deciles, Quartiles
Probability and Statistics
Tabular Methods
1. Frequency Distribution
The organization of raw data in tabular form
with classes and frequencies
Steps in Constructing a Frequency Distribution Table:
1. Determine the number of class intervals, k, needed
to summarize the data:
No. of class intervals
No. of samples
Probability and Statistics
Tabular Methods
Steps in Constructing a Frequency Distribution Table:
2. Find the range of observations
Range
Minimum value
Maximum value
Probability and Statistics
Tabular Methods
Steps in Constructing a Frequency Distribution Table:
3. Determine the width of the class intervals
Range
No. of class intervals
Class width
Probability and Statistics
Tabular Methods
Steps in Constructing a Frequency Distribution Table:
4. Form the frequency table
Class
Interval
Class
Boundaries
Class Mark,
xi
Frequency,
fi
Relative
Freq’y.
%
Class interval
Separates one class in a grouped frequency from
the other
The interval could actually appear in the raw data
and it begins with the lowest value
Probability and Statistics
Tabular Methods
Steps in Constructing a Frequency Distribution Table:
4. Form the frequency table
Class
Interval
Class
Boundaries
Class Mark,
xi
Frequency,
fi
Relative
Freq’y.
%
Class boundary
Separates one class in a grouped frequency from
the other
It has one more decimal place than the raw data
and therefore it does not appear in the data
Probability and Statistics
Tabular Methods
Steps in Constructing a Frequency Distribution Table:
4. Form the frequency table
Class
Interval
Class boundary
Class
Boundaries
Class Mark,
xi
Frequency,
fi
Relative
Freq’y.
%
Probability and Statistics
Tabular Methods
Steps in Constructing a Frequency Distribution Table:
4. Form the frequency table
Class
Interval
Class
Boundaries
Class Mark,
xi
Frequency,
fi
Relative
Freq’y.
%
Class Mark (Midpoint), xi
The number in the middle of the class
Probability and Statistics
Tabular Methods
Steps in Constructing a Frequency Distribution Table:
4. Form the frequency table
Class
Interval
Class
Boundaries
Class Mark,
xi
Frequency,
fi
Relative
Freq’y.
%
Frequency, fi
The number of times a certain value or class of
values occurs
Probability and Statistics
Tabular Methods
Steps in Constructing a Frequency Distribution Table:
4. Form the frequency table
Class
Interval
Class
Boundaries
Class Mark,
xi
Frequency,
fi
Relative
Freq’y.
%
Relative Frequency, %
Frequency divided by the total number of data
This gives the percent of values falling in that class
Probability and Statistics
Tabular Methods
Steps in Constructing a Frequency Distribution Table:
Illustration: the nicotine contents, in milligrams, for 40
cigarettes of a certain brand were recorded as follows:
1.09
1.74
1.58
2.11
1.64
1.79
1.37
1.75
1.92
1.47
2.03
1.86
0.72
2.46
1.93
1.63
2.31
1.97
1.70
1.90
1.69
1.88
1.40
2.37
1.79
0.85
2.17
1.68
1.85
2.08
1.64
1.75
2.28
1.24
2.55
1.51
1.82
1.67
2.09
1.69
Probability and Statistics
Tabular Methods
Steps in Constructing a Frequency Distribution Table:
Illustration: the nicotine contents, in milligrams, for 40
cigarettes of a certain brand were recorded as follows:
1.09
1.74
1.58
2.11
1.64
1.79
1.37
1.75
1.92
1.47
2.03
1.86
0.72
2.46
1.93
1.63
2.31
1.97
1.70
1.90
1.69
1.88
1.40
2.37
1.79
0.85
2.17
1.68
1.85
2.08
1.64
1.75
2.28
1.24
2.55
1.51
1.82
1.67
2.09
1.69
Class Interval
0.72 – 1.02
1.03 – 1.33
1.34 – 1.64
1.65 – 1.95
1.96 – 2.26
2.27 – 2.57
Probability and Statistics
Tabular Methods
Steps in Constructing a Frequency Distribution Table:
Illustration: the nicotine contents, in milligrams, for 40
cigarettes of a certain brand were recorded as follows:
Class Interval
Class Boundaries
Class
Mark,
xi
0.72 – 1.02
1.03 – 1.33
1.34 – 1.64
1.65 – 1.95
1.96 – 2.26
2.27 – 2.57
0.715-1.025
1.025-1.335
1.335-1.645
1.645-1.955
1.955-2.265
2.265-2.575
0.87
1.18
1.49
1.80
2.11
2.42
Probability and Statistics
Tabular Methods
Steps in Constructing a Frequency Distribution Table:
Illustration: the nicotine contents, in milligrams, for 40
cigarettes of a certain brand were recorded as follows:
1.09
1.74
1.58
2.11
1.64
1.79
1.37
1.75
1.92
1.47
2.03
1.86
0.72
2.46
1.93
1.63
2.31
1.97
1.70
1.90
1.69
1.88
1.40
2.37
1.79
0.85
2.17
1.68
1.85
2.08
1.64
1.75
2.28
1.24
2.55
1.51
1.82
1.67
2.09
1.69
Class
Boundaries
0.715-1.025
1.025-1.335
1.335-1.645
1.645-1.955
1.955-2.265
2.265-2.575
Frequency,
fi
2
2
8
17
6
5
Probability and Statistics
Tabular Methods
Steps in Constructing a Frequency Distribution Table:
Illustration: the nicotine contents, in milligrams, for 40
cigarettes of a certain brand were recorded as follows:
Class Interval
Class
Boundaries
0.72 – 1.02
1.03 – 1.33
1.34 – 1.64
1.65 – 1.95
1.96 – 2.26
2.27 – 2.57
0.715-1.025
1.025-1.335
1.335-1.645
1.645-1.955
1.955-2.265
2.265-2.575
Class
Mark,
xi
0.87
1.18
1.49
1.80
2.11
2.42
Frequency,
fi
2
2
8
17
6
5
Relative
Freq’y.
%
5.00
5.00
20.00
42.50
15.00
12.50
Probability and Statistics
Tabular Methods
Cumulative Frequency Distribution Table:
Cumulative Frequency, cfi
Gives the running total of the frequencies
The number of observations in the sample whose
values are less than or equal to the upper boundary
of the class interval
Relative Cumulative Frequency
(cfi / total number of samples) * 100
Percent of the values which are less than the upper
boundary
Probability and Statistics
Tabular Methods
Cumulative Frequency Distribution Table:
Class
Interval
Class
Boundaries
0.72 – 1.02
1.03 – 1.33
1.34 – 1.64
1.65 – 1.95
1.96 – 2.26
2.27 – 2.57
0.715-1.025
1.025-1.335
1.335-1.645
1.645-1.955
1.955-2.265
2.265-2.575
Class Freq’y, Cumulative
Mark,
fi
Frequency,
xi
cfi
0.87
1.18
1.49
1.80
2.11
2.42
2
2
8
17
6
5
2
4
12
29
35
40
Relative
Cum.
Freq’y.
%
5.00
10.00
30.00
72.50
87.50
100.00
Probability and Statistics
Graphical Methods
Frequency Histogram
A graph which displays the data by using vertical
bars of various heights to represent frequencies
The horizontal axis can either be class intervals,
class boundaries, or class marks
Probability and Statistics
Graphical Methods
Frequency Histogram
18
16
frequency
14
12
10
8
6
4
2
0
0.87
1.18
1.49
1.8
Class mark
2.11
2.42
Probability and Statistics
Graphical Methods
Frequency Polygon
A line graph between frequency and class mark
18
16
frequency
14
12
10
8
6
4
2
0
0.87
1.18
1.49
1.8
Class mark
2.11
2.42
Probability and Statistics
Graphical Methods
Relative cumulative frequency
Ogive
A frequency polygon of relative cumulative
frequency against upper class boundaries
120
100
80
60
40
20
0
1.025
1.335
1.645
1.955
2.265
Upper class boundary
2.575
Probability and Statistics
Graphical Methods
Pie chart
The degree of slice is based on the relative
frequency
5
5
20
42.5
15
12.5
Probability and Statistics
Numerical Methods
Measures of Central Tendencies
1. Mean / Average
The sum of the product of class mark and the
corresponding frequency divided by the total
number of samples
Probability and Statistics
Numerical Methods
Measures of Central Tendencies
2. Median
The value that will divide the samples into two
equal halves when the samples are arranged
from lowest to highest
Total frequencies of all
class intervals before the
median class
Frequency of the median class
Lower class boundary of the median
class
Probability and Statistics
Numerical Methods
Measures of Central Tendencies
3. Mode
The most frequent number
Lower class boundary of
the modal class
Frequency difference of the modal
class and the preceeding class
Frequency difference of the
modal class and the
succeeding class
Probability and Statistics
Numerical Methods
Measures of Variability / Dispersion
1. Range
Measures how the samples are clustered.
It is the difference between the highest and the
lowest values of the raw data
Range
Minimum value
Maximum value
Probability and Statistics
Numerical Methods
Measures of Variability / Dispersion
2. Variance
Measures how the samples are dispersed.
Probability and Statistics
Numerical Methods
Measures of Variability / Dispersion
3. Standard deviation, s
The positive square root of the variance
Coefficient of variation, Cv
If Cv < 10 – the data are considered clustered,
else the data are dispersed
Probability and Statistics
Numerical Methods
Measures of Shape
1. Skewness
A measure of the symmetry of the distribution
of the sample
If Sk < 0 – the distribution is skewed to the left
(i.e., left tail is longer than right tail)
Probability and Statistics
Numerical Methods
Measures of Shape
1. Skewness
A measure of the symmetry of the distribution
of the sample
If Sk = 0 – the distribution is symmetric with
respect to the mean, i.e., right and left tails are
of equal length (the distribution is called normal
or Gaussian)
Probability and Statistics
Numerical Methods
Measures of Shape
1. Skewness
A measure of the symmetry of the distribution
of the sample
If Sk > 0 – the distribution is skewed to the right
(i.e., right tail is longer than left tail)
Probability and Statistics
Numerical Methods
Measures of Shape
2. Kurtosis
A measure of the height of the distribution
If kurtosis < 0 – the distribution has short height
or is almost flat
Probability and Statistics
Numerical Methods
Measures of Shape
2. Kurtosis
A measure of the height of the distribution
If kurtosis = 0 – the distribution has the right
height
Probability and Statistics
Numerical Methods
Measures of Shape
2. Kurtosis
A measure of the height of the distribution
If kurtosis > 0 – the distribution has a high peak
Probability and Statistics
Numerical Methods
Measures of Data Location
1. Quartiles: Q1, Q2, Q3
It is the 25%, 50% and 75% respectively of the
data
2. Deciles: D1, D2, D3, … D9
It is the 10%, 20%, 30%,…90% respectively of the
data
3. Percentile: P1, P2, P3, … P99
It is the 1%, 2%, 3%,…99% respectively of the
data
Probability and Statistics
Quiz
The diameter of 36 rivet heads in 1/100 of an inch is given below:
6.72
6.66
6.66
6.72
6.77
6.64
6.62
6.74
6.82
6.76
6.72
6.81
6.70
6.73
6.76
6.79
6.78
6.80
6.70
6.78
6.70
6.72
6.78
6.66
6.62
6.76
6.76
6.76
6.75
6.76
6.67
6.76
6.66
6.68
6.70
6.72
1. Construct a Cumulative Frequency Table
2. Determine the Mean, Median and Mode
3. Determine the Variance, Standard deviation and the
coefficient of variation
4. Determine the skewness and kurtosis of the distribution and
make a conclusion about the shape of the distribution