7 - 1 - Wiley
Download
Report
Transcript 7 - 1 - Wiley
Management Science: The Art of
Modeling with Spreadsheets, 2e
Chapter 7: Data Analysis for
Modeling
S.G. Powell
K.R. Baker
© John Wiley and Sons, Inc.
PowerPoint Slides Prepared By:
Alan Olinsky
Bryant University
7-1
7-1
Data Analysis in the
Context of Modeling
Supports the modeling process
Improves accuracy of model
Improves usefulness of conclusions
Modeling is the primary goal.
Data analysis is a means to that goal.
7-2
Topics for Chapter
Finding facts in databases
Editing, searching, sorting, filtering, and
tabulating
Sampling
Estimating parameters
Point estimates and interval estimates
7-3
Finding Facts from Databases
Tables of information
Each row is a record in the database.
Each column is a field for the records.
Excel calls such a table a list.
7-4
Excel Lists
First row contains names for each field
Each successive row contains one record.
Lists may be:
Searched and edited
Sorted
Filtered
Tabulated
7-5
Searching and Editing Lists
First assign a range name to entire list.
Include column titles.
With list selected choose Data – Form.
Examine records one at a time:
Find Prev.
Find Next.
Enter new record with New button.
Delete record with Delete button.
7-6
Database Form
7-7
Criteria Button
Found under Data – Form
Allows for searching of records
Enter data into a field.
Click Find Next.
7-8
Alternate Excel Search
Techniques
Highlight entire database.
Use Edit – Find to search.
Use Find and Replace to edit entries.
In Find and Replace
“?” stands for any single symbol
“*” stands for any sequence of symbols
7-9
Sorting: Data – Sort Command
7 - 10
Filtering
Select database then Data – Filter – AutoFilter.
Will filter lists based on values
Found under arrow at the title of each column
Arrow on title turns blue to remind list is filtered
Can remove filter by:
Select (All) using the list arrow; or
Selecting Show All under Data – Filter
7 - 11
More Filtering
Top 10 option returns records with smallest
or largest value of a numerical record
Custom option allows filtering with
compound criteria
More complicated compound criteria can be
achieved with Data – Filter – Advanced
Filter submenu.
7 - 12
Tabulating
Select Data – Pivot Table.
Creates summary tables
Layout button on
third step of wizard
creates the format
for the table
7 - 13
Analyzing Sample Data
Data is unlikely to cover whole population
Work with sample from population
Statistics are summary measures about sample
Want to construct statistics that represent population
Convenience sampling
Have easy access to information on subset of population
Subset may not be representative
Random sampling
All objects in population have equal chance of appearing
in sample
7 - 14
Descriptive Statistics
Summarizes information in sample
Gives numerical picture of observations
Excel Tools – Data Analysis
Descriptive Statistics table produced based on
data given as input
7 - 15
Inferential Statistics
Use information in sample to make inferences about
population
Systematic Error
If sample not representative of population
Avoid by careful sampling
Sampling Error
Sample is merely subset of population
Mitigated by taking large samples
7 - 16
Estimating Parameters: Point
Estimates
The sample average is calculated as: x x n
n
i 1
i
The sample variance is calculated as:
(xi x )2
s
n 1
i 1
2
n
and its square root is the sample standard deviation:
n
s
2
(x
x
)
i
i 1
n 1
7 - 17
(Optional) Estimating Parameters:
Interval Estimates
We can estimate parameters in two ways, with
point estimates and with interval estimates.
The interval estimate approach produces a range of
values in which we are fairly sure that the
parameter lies, in addition to a single-value point
estimate.
A range of values for a parameter allows us to
perform sensitivity analysis in a systematic fashion,
and it provides input for tornado charts or
sensitivity tables.
7 - 18
Interval Estimates for the Mean
P(L <= m <= U) = 1 – a.
L and U represent the lower and upper limits of the
interval.
1 – a represents the confidence level.
Usually a large percentage like 95 or 99%
m represents the (unknown) true value of the
parameter.
7 - 19
Sampling Theory
Working with a population described by a Normal
probability model
Mean m and standard deviation s.
Take repeated samples of n items from population
Calculate the sample average each time
The sample averages will follow a Normal
distribution with a mean of m and a variance of
s2/n.
7 - 20
Estimates
Standard error: the standard deviation of
some function being used to provide an
estimate.
Use the sample average to estimate the
population mean.
The standard deviation of the sample average
is called the standard error of the mean:
sx s / n
7 - 21
Z-scores
The z-score measures the number of standard
deviations away from the mean.
The z-score corresponding to any particular sample
xm xm
average is:
z
sx
s
n
Tells how many standard errors from the mean
90% of the sample averages will have z-scores
between –1.64 and +1.64.
The chances are 90% that the sample average will fall no
more than 1.64 standard errors from the true mean.
7 - 22
Confidence Intervals for Means
Upper and lower limits on estimate for mean:
x z(s / n )
n>30 recommended unless original population
resembles Normal
z can be computed using NORMSINV(1-a/2)
Replace s by the sample standard deviation s
Provided that sample is larger than n = 30
Excel Descriptive Statistics also will calculate halfwidth of confidence interval
7 - 23
Interval Estimates for a
Proportion
To estimate the sample proportion p, the
interval estimate is:
p(1 p)
pz
n
Sample size should be at least 50 for this
formula to be reliable
7 - 24
Sample Size Determination
Suppose want to estimate mean of sample to
within a range of ±R
n = (zs / R)2
Assumes:
Sampling from Normal distribution
Known variance – can begin with small sample
to estimate standard deviation
7 - 25
Sample Size Determination for
Proportions
Suppose want to estimate a proportion to
within a range of ±R
n = z2p(1 – p) / R2
Value maximized at p = 0.5
Conservative value:
n = (z/2)2 / R2
7 - 26
Summary
Data collection and analysis support the modeling task
where appropriate.
When early sensitivity testing indicates that certain
parameters must be estimated precisely, we turn to data
analysis for locating relevant information and for estimating
model parameters.
The process of finding facts in data is aided by a facility
with Excel and in particular with its database capabilities.
Excel provides an array of commands for searching, sorting,
filtering, and tabulating data.
Excel’s Data Analysis tool for calculating descriptive
statistics enables rapid construction of point estimates and
interval estimates from raw data.
7 - 27
Copyright 2008 John Wiley & Sons, Inc.
All rights reserved. Reproduction or translation
of this work beyond that permitted in section 117 of the
1976 United States Copyright Act without express
permission of the copyright owner is unlawful. Request
for further information should be addressed to the
Permissions Department, John Wiley & Sons, Inc. The
purchaser may make back-up copies for his/her own use
only and not for distribution or resale. The Publisher
assumes no responsibility for errors, omissions, or
damages caused by the use of these programs or from
the use of the information herein.
7 - 28