CHAPTER 3 Analysis of Data

Download Report

Transcript CHAPTER 3 Analysis of Data

CHAPTER 3
Analysis of Data
1
Data Analysis
The tasks in connection with the analysis of
data include the following:
1. Reduction of raw data
2. Summary of data
3. Study of relations between variables
2
1. Reduction of Raw Data



The units in which data recorded differ by
measurement methods, e.g., kN for loads
or mm for deformations.
Most data have meaning in comparison
with similar data, they should be reduced to
comparable values; e.g. loads are reduced
to stresses in MPa, deformations to strains.
In reducing data, corrections have to be
applied for systematic errors.
3
2. Summary of Data


It is important to assemble and evaluate
the accumulated masses of data in largescale experiments.
Statistical procedures are advantageous
for summarizing the data.
4
3. Study of Relations between Variables
The final step is to develop relations between the data
obtained from the test and previously obtained
data or some theory.
The skill with which this is done depends on the
capacity and background of the analyst.
Common devices employed in studying such relations
are tabulations, graphs, bar charts, and correlation
diagrams; the procedure is usually to hold
constant all variables except two, whose relation is
investigated.
5
Statistical Methods


Descriptive methods help us to present data
in a comprehensible form.
Inference methods help us generalize from
the properties of a limited sample to those of
the whole population, thus making testing
more efficient.
6
Random Variables



A random variable may either be discrete or
continuous.
If the set of all possible values of the random
variable is either finite or countably infinite,
then the random variable is discrete
If the set of all possible values of the random
variable is an interval, then the random
variable is continuous.
7
3.1 Variations in Data




All data derived from tests are subject to variation.
After the measurements have been corrected for the effects
of systematic errors, it is usually found that the variations in
corrected measurements follow a chance distribution.
For large numbers of data, variations in measurements and
measures of properties have been found to coincide closely
with variations computed from theoretical considerations.
When the data are few, the coincidence is often not so
good, but the concepts developed from the theory of
probability are applied and afford a fairly workable means
of summarizing and utilizing data.
8
Raw Data



Raw data: the data collected in original form
or the results listed in order of testing.
It is hard to analyse raw data
Chart 3.1 shows the net mass of the
galvanized iron sheets before and after the
galvanization process.
9
Ungrouped frequency distribution




Ungrouped frequency distribution: arranging the
items according to magnitude, usually in
ascending order (u.f.d.)
The minimum and maximum values may be
selected and mean, median and range may be
calculated on u.f.d
It is also possible to study the array by dividing it
into equal parts, such as quartiles (four parts),
deciles (10 parts), or percentiles (100 parts).
Chart 3.2 shows the previous data in this form;
each of the columns in the table represents one
quartile.
10
3.2 Data Grouping



Analyzing the data is important so that the results may be
presented in tabular or graphical form.
Most data in materials are grouped according to
magnitude. The arrangement of data according to
magnitude results to frequency distribution series.
When the time of occurrence (time of testing)is important,
a chronological sequence is sometimes used and the
data are presented as time series, e.g., the amount of
concrete placed on a project each day, determination of
creep, deterioration of materials after alternate freezethaw cycles etc. Some data, such as results of test
borings, may require geographical grouping.
11
Frequency Distribution




It is often useful to group the data according to
subdivisions called cells, class or step intervals.
After the length of the interval has been decided, the
number of items in each interval, called class
frequency (or frequency), is determined.
When there is large number of items, 13 to 20 class
intervals are recommended. Too many intervals may
give an irregular distribution, in this case 10 class
intervals are chosen. When the total number of items
is less than 25, such a presentation is of little value.
Chart 3.3 shows the frequency histogram of example
12
20
18
16
14
12
10
8
6
4
2
0
0,25
0,20
0,15
0,10
0,05
0,00
38
39
40
41
42
43
44
45
Mass of coating, g
13
46
47
Relative frequency
Graphical illustrations
usually help us to
visualize the nature of
data. The x axis shows
the variable studied.
The frequencies, actual
or relative, are plotted
as ordinates.
Number of sheets
Frequency Histogram
Cumulative Frequency Diagram
Sometimes it is of interest to know the number of
data that fall below (or above) a certain value.
For this reason the cumulative frequency or the
relative cumulative frequency may be shown.
Chart 3.4 shows the cumulative frequency diagram
of the example.
14
80
Cumulative frequency
The variable under
consideration is plotted
on the x axis, and when
both x axis and y axis
are arithmetic, the
cumulative distribution
takes a peculiar form is
called ogiv curve.
1,00
0,90
0,80
0,70
0,60
0,50
0,40
0,30
0,20
0,10
0,00
70
60
50
40
30
20
10
0
37,5 38,5 39,5 40,5 41,5 42,5 43,5 44,5 45,5 46,5 47,5
Mass of coating, g
15
Relative cumulative frequency
Cumulative Frequency Diagram
3.3 Sampling and Statistical Errors



Samples should be taken in a random manner,
so that each specimen has an equal chance of
being selected every time a choice has been
made.
Sampling may be done with or without
replacement: the chosen specimen may be
returned to the population before the next choice
is made, or discarded.
For destructive tests the latter method must be
used and it is usually more efficient in any case.
16
Sample Size-1
The size of the sample is important, as the
mean of one sample is likely to differ from
that of another.
If in the example problem we had made only 4
observations instead of 80, we feel that we
would have obtained a less accurate
representation of the population, but we don’t
know how much less accurate.
17
Sample Size-2
If we have a population size of N, the
number of possible samples of size n
is N!/[n!(N-n)!].
 The mean of all the individual sample
means equals the mean of the
population.

18
Sample Size-3

If N is very large compared to n, the standard
deviation of the sample means σs from the
population mean σp is:
s 


p
n
σs is called as the standard error of the mean.
If σp is unknown, as is usually the case, it may be
estimated, for example, by using the standard
deviation of the sample as an approximation.
19
Sample Size-4

In our example of 80 galvanized sheet
specimens, the mean is calculated to be
42.69 g and the standard deviation to be
2.089 g. If we assume the standard
deviation of the entire population to be
equal to this value, then the standard error
of the mean is 2.089/ 80 = 0.234 g. If we
had chosen only four specimens, the
corresponding value would be 1.045 g.
20
Errors vs Residuals
Error is the amount by which an observation differs
from its expected value (average of population)errors are unobservable
Residual, on the other hand, is an observable estimate
of the unobservable error. The sample average is
used as an estimate of the population average.
 The difference between the tensile strength of each
reinforcement in the sample and the unobservable
population average is an error, and
 The difference between the tensile strength of each
reinforcement in the sample and the observable
sample average is a residual.
21
3.4 Correlation
Correlation, indicates the strength and direction
of a linear relationship between two random
variables.
In order to study a relation of group of paired
measurements, the obvious procedure is to
construct a scatter diagram,
22
Correlation
The line representing the best fit is the regression
line, if the line were straight, its general form is
y=mx+b, where m and n are the regression
coefficients.
If all points were on the regression line, the
correlation would be perfect and the coefficient of
correlation would be 1, the sign depending on the
slope of the line.
For a straight regression line, a wide scatter would
decrease the coefficient of correlation (r).
23
Tensile Strength
Example : Tensile Strength vs Hardness
Scatter Diagram
24
The heavy dashed lines equally spaced on both sides of
the regression line can be placed so as to indicate
any desired probability limits.
The frequency polygon shows that the most likely or
probable strength (H), is the central value S.
For the example given, a hardness of H indicates that
the chances are even (1 to 1) that the tensile strength
will be between s1 and s2, because the limits are
placed  0.6745 on each side of the central value S
In the frequency distribution shown to the right, the open
area is equal to that shown cross-hatched, each
being one-half the total.
25
3.5 Quality control charts



It is practically impossible to attain a given value of quality in
each successive manufactured article because the quality is
a variable and the change it its magnitude is a frequency
distribution.
The variation in the magnitude of some statistic of a
measurable property such as tensile strength can be used
as a criterion of quality.
Values of a given function of quality, such as the arithmetic
mean of the tensile strength of samples, each containing an
equal number of items, say five, are plotted as ordinates
against a scale of abscissas that gives a numerical
sequence of samples increasing the customary way from left
to right.
26
Example-Quality Control Chart
27
The control chart presents the data so that their
consistency and regularity can be seen at a
glance.
The limits of variability, the lines parallel to the
abscissas, are commonly set at three standard
deviations on both sides of the central value.
With a normal distribution, 99.73% of the samples
will then satisfy the criterion.
28
When the control chart is used in
connection with a standard, the limits are
established with respect to the specified
value, but if no standards are given, the
limits are determined on the basis of the
data themselves as they are
accumulated.
29