Chapter 2 - Portal UniMAP
Download
Report
Transcript Chapter 2 - Portal UniMAP
Chapter 2
STATISTICAL CONCEPTS AND LANGUAGE
2.1 THE DIFFERENCE BETWEEN THE POPULATION AND A SAMPLE
2.2 THE DIFFERENCE BETWEEN THE PARAMETER AND A STATISTICS
2.3 MEASUREMENT LEVELS
2.4 SAMPLING METHODS
(SIMPLE RANDOM SAMPLING, STRATIFIED RANDOM SAMPLING,
CLUSTER SAMPLING, SYSTEMATIC SAMPLING, AND
CONVENIENCE SAMPLING)
2.0 Statistical Concepts and
Language
Data Set:
Measurements of items
e.g., Yearly sales volume for your 23 salespeople
e.g., Cost and number produced, daily, for the past month
Elementary Units:
The items being measured
e.g., Salespeople, Days, Companies, Catalogs, …
A Variable:
The type of measurement being done
e.g., Sales volume, Cost, Productivity, Number of defects, …
How many variables?
Univariate data set: One variable measured for each elementary
unit
e.g., Sales for the top 30 computer companies.
Can do: Typical summary, diversity, special features
Bivariate data set: Two variables
e.g., Sales and # Employees for top 30 computer firms
Can also do: relationship, prediction
Multivariate data set: Three or more variables
e.g., Sales, # Employees, Inventories, Profits, …
Can also do: predict one from all other variables
2.1 The Difference Between the
Population and a Sample
Population — the whole
a collection of all persons, objects, or items under study
Census — gathering data from the entire population
Sample — gathering data on a subset of the population
Use information about the sample to infer about the population
Population
Population and Census Data
Identifier
Color
MPG
RD1
Red
12
RD2
Red
10
RD3
Red
13
RD4
Red
10
RD5
Red
13
BL1
Blue
27
BL2
Blue
24
GR1
Green
35
GR2
Green
35
GY1
Gray
15
GY2
Gray
18
GY3
Gray
17
Sample and Sample Data
Identifier
Color
MPG
RD2
Red
10
RD5
Red
13
GR1
Green
35
GY2
Gray
18
2.2 The Difference Between the
Parameter and a Statistics
Parameter — descriptive measure of the population
Usually represented by Greek letters
denotes population parameter
2 denotes population variance
denotes population standard deviation
Statistic — descriptive measure of a sample
Usually represented by Roman letters
x denotes sample mean
s 2 denotes sample variance
s denotes sample standard deviation
Process of Inferential Statistics
4. Use x
to estimate
1. Population
3. Sample
x
(statistic)
(parameter)
2. Select a
random sample
Statistics in Business
Probability is used in statistics
•
To estimate the level of confidence in a confidence
interval
•
To calculate the p-value in hypothesis testing
2.3 Measurement Levels
Nominal — In nominal measurement the values
just "name" the attribute uniquely.
No
ordering of the cases is implied.
For
example, a persons gender is nominal. It doesn’t
matter whether you call them boys vs. girls or males
vs. females or XY vs. XX chromosomes.
Another
example is religion – Catholic, Protestant,
Muslim, etc.
Ordinal - A variable is ordinal measurable if
ranking is possible for values of the variable.
For
example, a gold medal reflects superior
performance to a silver or bronze medal in the
Olympics. You can’t say a gold and a bronze medal
average out to a silver medal, though.
Preference
scales are typically ordinal – how much do
you like this cereal? Like it a lot, somewhat like it,
neutral, somewhat dislike it, dislike it a lot.
Interval - In interval measurement the distance
between attributes does have meaning.
Numerical
For
data typically fall into this category
example, when measuring temperature (in
Fahrenheit), the distance from 30-40 is same as the
distance from 70-80. The interval between values is
interpretable.
Ratio — in ratio measurement there is always a
reference point that is meaningful (either 0 for
rates or 1 for ratios)
This
means that you can construct a meaningful
fraction
(or ratio) with a ratio variable.
In
applied social research most "count" variables
are ratio, for example, the number of clients in
past six months.
Nominal Level Data
Numbers are used to classify or categorize
Example: Employment Classification
1
for Educator
2 for Construction Worker
3 for Manufacturing Worker
Ordinal Level Data
Numbers are used to indicate rank or order
Relative magnitude of numbers is meaningful
Differences between numbers are not comparable
Example: Ranking productivity of employees
Example: Position within an organization
1 for President
2 for Vice President
3 for Plant Manager
4 for Department Supervisor
5 for Employee
Ordinal Data
Faculty and staff should receive
preferential treatment for parking
space.
Strongly
Agree
1
Agree
2
Neutral
3
Disagree
4
Strongly
Disagree
5
Interval Level Data
Interval Level data - Distances between consecutive integers
are equal
Relative magnitude of numbers is meaningful
Differences between numbers are comparable
Location of origin, zero, is arbitrary
Vertical intercept of unit of measure transform function is not zero
Example: Fahrenheit Temperature
Example: Monetary Utility
Ratio Level Data
Highest level of measurement
Relative magnitude of numbers is meaningful
Differences between numbers are comparable
Location of origin, zero, is absolute (natural)
Vertical intercept of unit of measure transform function
is zero
Examples: Height, Weight, and Volume
Example: Monetary Variables, such as Profit and Loss, Revenues,
Expenses, Financial ratios - such as P/E Ratio, Inventory Turnover,
and Quick Ratio.
Ratio Level Data
Parametric statistics – requires that the data be interval or ration
Non Parametric – used if data are nominal or ordinal
Non parametric statistics can be used to analyze interval
or ratio data
Data Level, Operations, and
Statistical Methods
Data Level
Copyright 2011 John Wiley & Sons, Inc.
Meaningful Operations
Nominal
Classifying and Counting
Ordinal
All of the above plus Ranking
Interval
All of the above plus Addition,
Subtraction, Multiplication, and
Division (including means,
standard deviations, etc.)
Ratio
All of the above
21
2.4 Sampling Methods
Reasons for Sampling
Sampling – A means for gathering
information about a population without
conducting a census
Information
gathered from sample, and inference
is made about the population
Sampling has advantages over a census
Sampling
can save money.
Sampling
can save time.
Random Versus Nonrandom
Sampling
Nonrandom Sampling - Every unit of the population
does not have the same probability of being
included in the sample
Random sampling - Every unit of the population has
the same probability of being included in the
sample.
Random Sampling Techniques
Simple Random Sample – basis for other
random sampling techniques
Each
unit is numbered from 1 to N (the size of the
population)
A
random number generator can be used to
select
n items that form the sample
Random Sampling Techniques
Stratified Random Sample
The population is broken down into strata with like
characteristics (i.e. men and women OR old, young, and
middle-aged people)
Efficient when differences between strata exist
Proportionate (% of the sample from each stratum equals %
that each stratum is within the whole population)
Systematic Random Sample
Define k = N/n. Choose one random unit from first k units,
and then select every kth unit from there.
Cluster (or Area) Sampling
The population is in pre-determined clusters (students in
classes, apples on trees, etc.)
A random sample of clusters is chosen and all or some units
within the cluster is used as the sample
Simple Random Sample:
Population Members
01 Alaska Airlines
02 Alcoa
03 Ashland
04 Bank of America
05 BellSouth
06 Chevron
07 Citigroup
08 Clorox
09 Delta Air Lines
10 Disney
11 DuPont
12 Exxon Mobil
13 General Dynamics
14 General Electric
15 General Mills
16 Halliburton
17 IBM
18 Kellog
19 KMart
20 Lowe’s
Population size of N = 30
Desired sample size of n = 6
21 Lucent
22 Mattel
23 Mead
24 Microsoft
25 Occidental Petroleum
26 JCPenney
27 Procter & Gamble
28 Ryder
29 Sears
30 Time Warner
Simple Random Sampling:
Random Number Table
Select 6 values from 1 to 30
(ignore repeats) and get
Simple Random Sample:
Sample Members
01 Alaska Airlines
02 Alcoa
03 Ashland
04 Bank of America
05 BellSouth
06 Chevron
07 Citigroup
08 Clorox
09 Delta Air Lines
10 Disney
11 DuPont
12 Exxon Mobil
13 General Dynamics
14 General Electric
15 General Mills
16 Halliburton
17 IBM
18 Kellog
19 KMart
20 Lowe’s
21 Lucent
22 Mattel
23 Mead
24 Microsoft
25 Occidental Petroleum
26 JCPenney
27 Procter & Gamble
28 Ryder
29 Sears
30 Time Warner
Systematic Sampling: Example
Purchase orders for the previous fiscal year are
serialized 1 to 10,000 (N = 10,000).
A sample of fifty (n = 50) purchases orders is
needed for an audit.
k = 10,000/50 = 200
Systematic Sampling: Example
First sample element randomly selected from the
first 200 purchase orders. Assume the 45th
purchase order was selected.
Subsequent sample elements: 45, 245, 445, 645, . . .
Convenience (Non Random)
Sampling
Non-Random sampling – sampling techniques
used
to select elements from the population by any
mechanism that does not involve a random
selection process
These
techniques are not desirable for making
statistical inferences
Example – choosing members of this class as an
accurate representation of all students at our
university, selecting the first five people that walk
into a store and ask them about their shopping
preferences, etc.