Transcript niprl

Chapter 6.
Descriptive Statistics
6.1
6.2
6.3
6.4
NIPRL
Experimentation
Data Presentation
Sample Statistics
Examples
• Data: a mixture of nature and noise.
• Is the noise manageable?
 The noise is desired to be represented by a probability
distribution.
• Statistical inference:
– The science of deducing properties of an underlying
probability distribution from data
•
Can we have information on the underlying probability distribution?
 The information is given in the form of (functions of) data.
NIPRL
Figure 6.1 The relationship between probability theory and statistical inference
NIPRL
6.1 Experimentation
6.1.1 Samples
• Population: the set of all the possible observations available from a
particular probability distribution.
•
Sample: a subset of a population.
•
Random sample: a sample where the elements are chosen at random from
the population
•
A sample is desired to be representative of the population.
•
Types of observations: numerical and nominal
x
NIPRL
6.1.2 Examples
• Example 1: Machine breakdowns
 Suppose that an engineer in charge of the maintenance of a
machine keeps records on the breakdown causes over a
period of a year.
 Suppose that 46 breakdowns were observed by the engineer
(see Figure 6.2).
 What is the population from which this sample is drawn?
 Factors to consider to check the representative of data:
 Quality of operators
 Working load on the machine
 Particularity of data observation (e.g., more rainy days
than other years)
NIPRL
Figure 6.2
NIPRL
Data set of machine breakdowns
• Example 2: Defective computer chips
 The chip boxes are selected at random from …..
• Points to check on data:
 What is the data type?
 Are the data representative?
 How the randomness of data realized?
• Statistical problem:
 What is the population from which the data are sampled?
NIPRL
Figure 6.4
Data set of defective
computer chips
NIPRL
6.2 Data presentation
6.2.1 Bar and Pareto charts
6.2.2 Pie charts
6.2.3 Histograms
6.2.4 Outliers
 An outlier is an observation which is not from the distribution
from which the main body of the sample is collected.
NIPRL
Figure 6.7 Bar chart of machine breakdowns data set
NIPRL
Figure 6.9 Pareto chart of customer complaints for Internet company
NIPRL
Figure 6.12
Pie chart for
machine
breakdowns
data set
NIPRL
Figure 6.14
Histogram of computer
chips data set
NIPRL
Figure 6.16 Histograms of metal cylinder diameter data set with
NIPRL
different bandwidths
Figure 6.18 A histogram with positive skewness
NIPRL
Figure 6.19 A histogram with negative skewness
NIPRL
Figure 6.21 Histogram of a data set with a possible outlier
NIPRL
6.3 Sample statistics
6.3.1 Sample mean
6.3.2 Sample median
6.3.3 Sample trimmed mean
6.3.4 Sample mode
6.3.5 Sample variance
6.3.6 pth Sample quantiles
6.3.7 Boxplots
NIPRL
Cf. Chebyshev’s inequality:
2
Let E[ X ]   \ and \ Var ( X )   .
Then, P{| X   | c }  1  1/ c 2 \ or
P{| X   | c }  1/ c 2 .
In general, P{| X   |  }   2 /  2 .
Cf. Theorem: the weak law of large numbers
Let X i , i  1, , n be a sequence of i.i.d. random
variables, each having mean  and variance  2 .
Then, for any   0,
lim P{| X   |  }  0
n 
1 n
where \ X   X i
n i 1
NIPRL
(proof)
E[ X ]   \ and \ Var ( X )   2 / n.
It \ follows \ from \ Chebyshev ' s \ inequality \ that
2
P{| X   |  }  2 .
n
Therefore,
lim P{| X   |  }  0.
n 
NIPRL
Figure 6.22
NIPRL
Illustrative data set
Figure 6.23
Relationship between the
sample
mean, median, and trimmed
mean
for positively and negatively
skewed data sets
NIPRL
Figure 6.20 A histogram for a bimodal distribution
NIPRL
Figure 6.24
NIPRL
Boxplot of a data set
Figure 6.30
NIPRL
Rolling mill process
Figure 6.31
NIPRL
% scrap data set from rolling mill process
Figure 6.32 Histogram of rolling mill scrap data set
NIPRL
Figure 6.33
NIPRL
Boxplot and summary statistics for
rolling mill scrap data set