ENGR 610 Applied Statistics Fall 2005
Download
Report
Transcript ENGR 610 Applied Statistics Fall 2005
ENGR 610
Applied Statistics
Fall 2007 - Week 1
Marshall University
CITE
Jack Smith
http://mupfc.marshall.edu/~smith1106
Overview for Today
Syllabus
Introductions
Chapters 1-3
Introduction to Statistics and Quality
Improvement
Tables and Charts
Describing and Summarizing Data
Homework assignment
Syllabus
Week 1 (Aug 23)
Introduction - Descriptive Statistics
1-3
Week 2 (Aug 30)
Discrete Probability Distributions
4
Week 3 (Sept 6)
Continuous Probability Distributions
5
Week 4 (Sept 13)
Estimation Procedures
8
Week 5 (Sept 20)
Review, Exam 1
Week 7 (Sept 27)
Hypothesis Testing
9
Week 7 (Oct 4)
Hypothesis Testing
9
Week 8 (Oct 11)
Design of Experiments
10
Week 9 (Oct 18)
Design of Experiments
11
Week 10 (Oct 25)
Review, Exam 2
1-5, 8
9-11
Syllabus, cont’d
Week 11 (Nov 1)
Simple Linear Regression
12
Week 12 (Nov 8)
Multiple Regression
13
Week 13 (Nov 15)
More Regression
13
Fall Break (Nov 22)
(no class)
Week 14 (Nov 29)
Review, Exam 3
Week 15 (Dec 6)
(Exam 3 due)
Text -- Levine, Ramsey, Smidt, “Applied Statistics for
Engineers and Scientists: Using Microsoft Excel and
MINITAB” (Prentice-Hall, 2001) - with CD-ROM
12-13
Grading
25% - Homework and attendance
25% - Exam 1
25% - Exam 2
25% - Exam 3
Introductions
Name
Home town
Undergraduate degree, major, where
Major focus of study at MU
Occupation, if working
Background in statistics
Hopes for this course
Introduction to Statistics (Ch 1)
What is Statistics?
Variables
Operational Definitions
Sampling
Software
What is Statistics?
Descriptive Statistics
Methods that lead to the collection, tabulation,
summarization and presentation of data
Inferential Statistics
Methods that lead to conclusions, or estimates of
parameters, about a population (of size N)
based on summary measures (statistics) on a
sample (of size n) - in lieu of a census
Why Statistics?
Describe numerical information
Draw conclusions on a large population from
sample information only
Derive and test models
Understand and control variation
Improve quality of processes
Design experiments to extract maximum
information
Predict or affect future behavior
Variables
Categorical
Nominal
Mutually exclusive
Collectively exhaustive
Numerical
Discrete or Continuous
Scale
Ordered
Interval - equally spaced
Ratio - with absolute zero
Operational Definitions
Objective, not subjective
Specific tests, measurements
Specific criteria
Agreed to by all
Consistent between individuals
Stable over time
Sampling
Advantages
Cost, time, accuracy, feasibility, scope
Minimize destructive tests
Probability samples
Simple random
Systematic random
With or without replacement
Random start, but constant increment or rate
Non-probability samples
Convenience, Judgment, Quota (representative)
Software
Historical (mainframe, batch)
Specialized (workstations, stand-alone)
SAS, SPSS,…
SAS, SPSS, MINITAB, S-PLUS (R*), BMDP,…
Integrated (standard desktops)
DataDesk, JMP, SYSTAT, MINITAB
Excel, add-ons (e.g., PHStat - from Prentice-Hall)
MATLAB (Octave*)
*Open Source
Introduction to
Quality Improvement
Quality = fitness of use
Meeting user/customer needs,
expectations, perceptions and experience
Quality of…
Design - intentional differences, grades
Conformance - meets/exceeds design
Performance - long-term consistency
History of
Quality Improvement
Middle Ages
> Industrial Revolution
> Information Age
Smith, Taylor, Ford, Shewhart, Deming
Read text!
Themes of
Quality Improvement
The primary focus is on process improvement
Shewhart-Deming cycle: Plan, Do, Study, Act
Most of the variation in a process is systemic and
not due to the individual
Teamwork is an integral part of a qualitymanagement organization
Customer satisfaction - primary organizational goal
Organizational transformation needs to occur to
implement quality management
Fear must be removed from organizations
Higher quality costs less, not more, but it requires
an investment in training
Tables and Charts (Ch 2)
Process Flow Diagrams
Cause-and-Effect Diagrams
Time-Order Plots
Numerical Data
Concentration Diagrams
Categorical Data
Bivariate Categorical Data
Graphical Excellence
Process Flow Diagrams
Cause-and-Effect Diagrams
Also known as an Ishikawa or a
“fishbone” Diagram
Procedures or
methods
People or
personnel
Effect
Environment
Materials or
supplies
Machinery or
equipment
Time-Order Plots
Tables and Charts for
Numerical Data
Stem-and-Leaf Displays
Frequency Distribution
Poor man’s histogram
“Binning” by range
Histogram
Polygon
Concentration Diagrams
Data points overlaid on schematic or
picture of object or process of interest
By location
Displayed as individual symbols or
tallies
Tables and Charts for
Categorical Data
Bar Chart
Pie Chart
Almost always in percentages
Pareto Diagram
Sorted (usually descending)
Overlaid with cumulative line (polygon) plot
Separate scales
Usually in percentages
Examples
Tables and Charts for
Bivariate Categorical Data
Contingency Table
Cross-classification
Joint responses
Percentages by row, column, total
A B C
1
2
3
5 3 2 10
2 3 4 9
0 2 3 5
7 8 9 24
Side-by-Side (Cluster) Bar Chart
May prefer stacked bars with percentage data
Graphical Excellence
Tufte, “The Visual Display of Quantitative Information”
Data-ink Ratio
(data-ink)/(total ink used in graphic)
Chartjunk
Graphical excellence… gives the viewer the largest number
of ideas, in the shortest time, with the least ink - clearly,
precisely, efficiently, and truthfully
Non-data or redundant “ink”
Lie Factor
(size of effect in graph)/(size of effect in data)
Describing and Summarizing Data Descriptive Statistics (Ch 3)
Measures of…
Central Tendency
Variation
Shape
Skewness
Kurtosis
Box-and-Whisker Plots
Measures of
Central Tendency
Mean (arithmetic)
Median
Most popular (peak) value(s) - can be multi-modal
Midrange
Middle value - 50th percentile (2nd quartile)
Mode
Average value:
1 N
Xi
N i
(Max+Min)/2
Midhinge
(Q3+Q1)/2 - average of 1st and 3rd quartiles
Measures of Variation
Range (max-min)
Inter-Quartile Range (Q3-Q1)
Variance
Sum of squares (SS) of the deviation from mean divided by
the degrees of freedom (df) - see pp 113-5
df = N, for the whole population
df = n-1, for a sample
2nd moment about the mean (dispersion)
(1st moment about the mean is zero!)
Standard Deviation
Square root of variance (same units as variable)
Sample (s2, s, n) vs Population (2, , N)
Quantiles
Equipartitions of ranked array of observations
Percentiles - 100
Deciles - 10
Quartiles - 4 (25%, 50%, 75%)
Median - 2
Pn = n(N+1)/100 -th ordered observation
Dn = n(N+1)/10
Qn = n(N+1)/4
Median = (N+1)/2 = Q2 = D5 = P50
Measures of Shape
Symmetry
Skewness - extended tail in one direction
3rd moment about the mean
Kurtosis
Flatness, peakedness
Leptokurtic - highly peaked, long tails
Mesokurtic - “normal”, triangular, short tails
Platykurtic - broad, even
4th moment about the mean
See p 118.
Box-and-Whisker Plots
Graphical representation of five-number summary
Min, Max (full range)
Q1, Q3 (middle 50%)
Median (50th %-ile)
See pp 123-5
Shows symmetry (skewness) of distribution
Homework
Ch 1
Appendix 1.2
Problems: 1.25
Ch 2
Excel, Analysis ToolPak, PHStat add-in
Appendix 2.1
Problems: 2.54, 2.55, 2.61
Ch 3
Appendix 3.1
Problems: 3.27, 3.31 (data on CD)
Next Week
Probability and
Discrete Probability Distributions
(Ch 4)