Data Collection

Download Report

Transcript Data Collection

STATISTICS AND PROBABILITY
IN CIVIL ENGINEERING
TS4512
Doddy Prayogo, Ph.D.
1
Statistics
Science of data collection,
summarization, presentation and
analysis for better decision making.
– How to collect data?
– How to summarize it?
– How to present it?
– How do you analyze it and make
conclusions and correct decisions?
Role of Statistics
• Many aspects of Engineering deals with data –
product and process design
• Identify sources of variability
• Essential for decision making
Data Collection
• Observational study
– Observe the system
– Historical data
• The objective is to build a system model
usually called empirical models
• Design of experiment
• Plays key role in engineering design
Data Collection
• Sources of data collection:
 Observation of something you can’t control
(observe a system or historical data)
 Conduct an experiment
 Surveying opinions of people in social
problem
Data Collection
• Example:
– traveling time and frequency in site layout
– experimental data in concrete laboratory
Forms of Data Description
•
•
•
•
Point summary
Tabular format
Graphical format
Diagrams
Point Summary
1) Central tendency measures
– Sample Mean x =  xi/n
–Population Mean(µ)
–Median --- Middle value
– Mode --- Most frequent value
–Percentile
Point Summary
2) Variability measures
– Range = Max xi - Min xi
– Variance = V = S 2 =  (xi – x )2/ n-1
2) – {[( x ) 2]/n}

(x
i
i
also =
n -1
–Standard deviation = S
S = Square root (V)
–Coefficient of variation = S/ x
–Inter-quartile range (IQR)
Diagrams: Dot Diagram
• A diagram that has on the x-axis the points
plotted : Given the following grades of a
class:
50, 23, 40, 90, 95, 10, 80, 50, 75, 55, 60,
40.
.
.
.
.
0
50
100
Graphical Format
• Time Frequency Plot
The Time Frequency Plot tells the
following :
1) The Center of Data
2) The Variability
3) The Trends or Shifts in the data
• Control Chart
Time Frequency Plot
15
14
13
12
11
y 10
9
8
7
6
5
0
10
20
30
Observation number
40
50
Control Charts
• Central Line = Average ( X )
• Lower Control Limit (LCL)= X – 3S
• Upper Control Limit (UCL)= X + 3S
Control Charts
105
Concentration
Upper control limit = 100.5
95
x = 91.50
85
Lower control limit = 82.54
75
0
10
20
Observation number
30
Population and Sample
• Population is the totality of observations we
are concerned with.
• Example: All Engineers in the Kingdom,
All CE students etc.
• Sample : Subset of the population
50 Engineers selected at random, 10 CE
students selected at random.
Mean and Variance
• Sample mean X-bar
• Population mean µ
• Sample variance S2
• Population variance σ2
Percentiles
• Pth percentile of the data is a value where at
least P% of the data takes on this value or less
and at least (1-P)% of the data takes on this
value or more.
• Median is 50th percentile. ( Q2)
• First quartile Q1 is the 25th percentile.
• Third quartile Q3 is the 75th percentile.
Percentile Computation : Example
Data : 5, 7, 25, 10, 22, 13, 15, 27, 45, 18, 3, 30
Compute 90th percentile.
1. Sort the data from smallest to largest
3, 5, 7, 10, 13, 15, 18, 22, 25, 27, 30, 45
2. Multiply 90/100 x 12 = 10.8 round it to
to the next integer which is 11.
Therefore the 90th percentile is point # 11
which is 30.
Percentile Computation : Example
• If the product of the percent with the number
of the data came out to be a number. Then
the percentile is the average of the data point
corresponding to this number and the data
point corresponding to the next number.
• Quartiles computation is similar to the
percentiles.
• Pth percentile = (P/ 100)*n = r
double (round it up & take its rank)
(r)
integer (take Avg. of its rank & # after)
• Inter-quartile range = Q3 – Q1
• Frequency Distribution Table :
1) # class intervals (k) = 5 < k < 20
k ~ n
2) The width of the intervals (W) = Range/k
= (Max-Min) /n
Data Table 1.1 Compressive Strength of 80 Aluminum
Lithium Alloy (psi)
105
97
245
163
207
134
218
199
160
196
221
154
228
131
180
178
157
151
175
201
183 186
153 174
174 199
154 115
190 193
76 167
101 171
142 163
149 87
200 176
121
120
181
160
194
184
165
145
160
150
181
168
158
208
133
135
172
171
237
170
180
167
176
158
156
229
158
148
150
118
143
141
110
133
123
146
169
158
135
149
Relative
Frequency =
(Frequency/
n)
Cumulative
Relative
Frequency
Class Interval
(psi)
Frequency
70 ≤ x < 90
2
0.0250
0.0250
90 ≤ x < 110
3
0.0375
0.0625
110 ≤ x < 130
6
0.0750
0.1375
130 ≤ x < 150
14
0.1750
0.3125
150 ≤ x < 170
22
0.2750
0.5875
170 ≤ x <1 90
17
0.2125
0.8000
190 ≤ x < 210
10
0.1250
0.9250
210 ≤ x < 230
4
0.0500
0.9750
230 ≤ x < 250
2
0.0250
1.0000
25
Frequency
20
15
10
5
0
70 90 110 130 150 170 190 210 230 250
Compressive Strength (psi)
Cumulative Frequency
90
80
70
60
50
40
30
20
10
0
1
Strength
Histogram: is the graph of the frequency distribution table that
shows class intervals V.S. freq. or (Cumulative) Relative freq.
2nd Assignment
• Due date: August 30rd
• Group of 2 persons
• Please download Concrete Data on UCI
repository:
https://archive.ics.uci.edu/ml/machinelearning-databases/concrete/compressive/
26
2nd Assignment
1. Provide data description of the Concrete
Data (8 input variables and 1 output
variable). Provide at least (but not limited to)
upper bound, lower bound, mean, standard
deviation
2. Please make histogram of 2 selected input
variables and 1 output variable
27