Sorting Data
Download
Report
Transcript Sorting Data
Review: Stages in Research Process
Formulate Problem
Determine Research Design
Determine Data Collection Method
Design Data Collection Forms
Design Sample & Collect Data
Analyze and Interpret Data
Prepare Written/Oral Report
Data Analysis: Two Key
Considerations
(1) Is the variable to be analyzed by itself
(univariate analysis) or in relationship to
other variables (multivariate analysis)?
(2) What level of measurement was used?
If you can answer these two questions, data
analysis is easy...
Level of Measurement
CATEGORICAL MEASURES: A
commonly used expression for nominal
and ordinal measures.
CONTINUOUS MEASURES: A
commonly used expression for interval
and ratio measures.
Basic Univariate Statistics:
Categorical Measures
FREQUENCY ANALYSIS: A count of the
number of cases that fall into each of the
response categories.
Use of Percentages
Percentages are very useful for
interpreting the results of categorical
analyses and should be included
whenever possible.
Unless your sample size is VERY large,
however, report percentages as whole
numbers (i.e., no decimals)
Frequency Analysis
Researchers almost always work with
“valid” percentages which are simply
percentages after taking out cases with
missing data on the variable being
analyzed.
Note: In the example, there were no missing
cases. As a result, the “Percent” column entries
were identical to the “Valid Percent” column
entries.
Uses of Frequency Analysis
Identify blunders and cases with
excessive item nonresponse
Identify outliers
Univariate categorical analysis
Determine empirical distribution of a
variable
Confidence Interval
A projection of the range within which a
population parameter will lie at a given
level of confidence based on a statistic
obtained from a probabilistic sample.
This is why you need to draw a
probability sample!
Confidence Intervals for Proportions
where z = z score associated with the desired level of
confidence; p = the proportion obtained from the sample;
and n = the number of valid cases overall on which the
proportion was based.
CONFIDENCE INTERVAL:
Confidence Intervals for Proportions
Therefore, we can be 95% confident that the proportion
of people in the population who would respond that
they had financed their most recent car purchase is
between .21 and .39, inclusive.
CAUTION in Interpreting
Confidence Intervals
The confidence interval only takes
sampling error into account.
It DOES NOT account for other common
types of error (e.g., response error,
nonresponse error).
The goal is to reduce TOTAL error, not
just one type of error.
Basic Univariate Statistics:
Continuous Measures
DESCRIPTIVE STATISTICS: Statistics
that describe the distribution of
responses on a variable. The most
commonly used descriptive statistics are
the mean and standard deviation.
Converting Continuous Measures to
Categorical Measures
Sometimes it is useful to convert
continuous measures to categorical
measures.
This is legitimate, because measures at
higher levels of measurement (in this case,
continuous measures) have all the
properties of measures at lower levels of
measurement (categorical measures).
Why do this? Ease of interpretation
for managers
Converting Continuous Measures to
Categorical Measures
TWO-BOX TECHNIQUE: A technique for
converting an interval-level rating scale
into a categorical measure usually used
for presentation purposes. The
percentage of respondents choosing one
of the top two positions on a rating scale
is reported.
Converting Continuous Measures to
Categorical Measures
Please rate the quality of service provided by Better Smiles Dental
Office on the following scales:
very
poor
poor neutral good
very
good
Dental technicians
(2)
(6)
(36)
(32)
(24)
Receptionist
(10)
(16)
(18)
(36)
(20)
Dentist
(17)
(17)
(35)
(21)
(10)
Frequency count of respondents selecting each response
category shown in red
Converting Continuous Measures to
Categorical Measures
two-box
mean (s.d.)
Dental technicians
56%
3.70
(0.97)
Receptionist
56%
3.40
(1.25)
Dentist
31%
2.90
(1.21)
(n=100)
Confidence Intervals for Means
where z = z score associated with the desired level of
confidence; s = the sample standard deviation; and
n
= the total number of cases used to calculate the mean.
CONFIDENCE INTERVAL:
Confidence Intervals for Means
EXAMPLE: A sample of 100 car owners
revealed that the mean number of family
members was 4.0, with a sample standard
deviation of 1.9 family members. Assuming
that the 100 respondents had been secured
using a probability sampling plan, what is the
95% confidence interval for the mean number
of family members in the population?
Confidence Intervals for Means
Therefore, we can be 95% confident that the mean
number of family members in the population lies
somewhere between 3.6 and 4.4, inclusive.
Hypothesis Testing
THE ISSUE: How can we tell if a
particular result in the sample
represents the true situation in the
population or simply occurred by
chance?
Hypotheses
Unproven propositions about some
phenomenon of interest.
Hypothesis Testing
Null Hypothesis (Ho) The hypothesis that a
proposed result is not true for the population.
Researchers typically attempt to reject the null
hypothesis in favor of some alternative
hypothesis.
Alternative Hypothesis (HA) The hypothesis
that a proposed result is true for the population.
Typical Hypothesis Testing
Procedure
Specify Null and Alternative Hypotheses after
Analyzing the Research Problem
Choose an Appropriate Statistical Test Considering the
Research Design and after Determining the Sampling
Distribution That Applies Given the Chosen Test Statistic
Specify the Significance Level (Alpha) for the
Problem Being Investigated
Collect the Data and Compute the Value of the Test Statistic
Appropriate for the Sampling Distribution
Determine the Probability of the Test Statistic under the Null
Hypothesis Using the Sampling Distribution Specified in Step 2
Compare the Obtained Probability with the Specified Significance
Level and Then Reject or Do Not Reject the Null Hypothesis on
the Basis of the Comparison
Significance Level (α)
The acceptable level of Type I error
selected by the researcher, usually set at
0.05. Type I error is the probability of
rejecting the null hypothesis when it is
actually true for the population.
p-value
The probability of obtaining a given result
if in fact the null hypothesis were true in
the population. A result is regarded as
statistically significant if the p-value is
less than the chosen significance level of
the test.
Common Misinterpretations of What
“Statistically Significant” Means
Viewing the or p levels as if they are somehow related
to the probability that the research (alternative)
hypothesis is true (e.g., a p-value such as p<.001 is
“highly significant” and therefore more valid than p<.05).
Viewing p-values as if they represent the probability that
the results occurred because of sampling error (e.g.,
p=.05 implies that there is only a .05 probability that the
results were caused by chance).
Assuming that statistical significance is the same thing
as managerial significance.
Testing Hypotheses about
Individual Variables
Chi-square Goodness-of-Fit Test for Frequencies:
A statistical test to determine whether some observed
pattern of frequencies corresponds to an expected
pattern.
Testing Hypotheses about
Individual Variables
Kolmogorov-Smirnov Test: A statistical test used with
ordinal data to determine whether some observed
pattern of frequencies corresponds to some expected
pattern; also used to determine whether two
independent samples have been drawn from the same
population or from populations with the same
distribution.
Testing Hypotheses about
Individual Variables
Z-test for Comparing Sample Proportion against a
Standard
where p = proportion from the sample, π = the proportion
standard to be achieved, σp = the standard error of the
proportion, and n = number of respondents in the sample.
Testing Hypotheses about
Individual Variables
t-test for Comparing Sample Mean against a
Standard (Small Sample, n ≤ 30)
where x = sample mean, μ = the population standard, sx
= the standard error of the mean, s = sample standard
deviation, and n = sample size.
Testing Hypotheses about
Individual Variables
z-test for Comparing Sample Mean against a
Standard (Large Sample, n > 30)
where x = sample mean, μ = the population standard,
sx = the standard error of the mean, s = sample standard
deviation, and n = sample size.