Chapter 1 PowerPoint slides

Download Report

Transcript Chapter 1 PowerPoint slides

Chapter 1
Sampling and Data
What Is (Are?) Statistics?
Statistics (a discipline) is a science of dealing with
data. It consists of tools and methods to collect data,
organize data, and interpret the information or draw
conclusion from data.
Note: Statistics (plural) sometimes are referred to
particular calculations made from data. For
instance, mean, median, percentage etc. are
statistics, since these are numbers calculated
from a set of sample data collected.
Basic Terms
• Population: A collection, or set, of individuals or objects
or events whose properties are to be analyzed.
• Sample: A subset of the population.
• Parameter: A numerical value summarizing all the data
of an entire population, for instance, a population mean.
• Statistic: A numerical value summarizing the sample
data, for instance, a sample mean.
Two Areas of Statistics
Two areas of statistics:
• Descriptive Statistics: collection, presentation,
and description of sample data.
• Inferential Statistics: making decisions and
drawing conclusions about populations.
What is a Variable?
• Variables are characteristics recorded
about each individual or thing.
• The variables should have a name that
identify What has been measured.
What is an Observational Unit?
The person or thing to which the variable
is observed or measured, such as a
student in the class, is called the
observational/experimental unit or simply
a case .
What Are Data?
• Data can be numbers, record names, or
other labels recorded for the observational
unit.
• Not all data represented by numbers are
numerical data (e.g., 1=male, 2=female
where 1 and 2 are the indicators of
gender).
Data Tables
• The following data table clearly shows the
context of the data presented:
• Notice that this data table tells us the
variables (column) and observational units
(row) for these data.
What is Statistics Really About?
Statistics is about variation. Different
observational units may have different
data values for a variable. Statistics helps
us to deal with variation in order to make
sense of data.
Two kinds of Variables
• Qualitative, or Attribute, or Categorical, Variable:
A variable that identifies a categories for each case, for example,
gender.
Note: Arithmetic operations, such as addition and averaging, are not
meaningful for data resulting from a qualitative variable
• Quantitative, or Numerical, Variable: A variable that records
measurements or amounts of something and must have measuring
units, for example, height measured in inches.
Note: Arithmetic operations such as addition and averaging, are meaningful
for data resulting from a quantitative variable
Subdividing Variables Further
• Qualitative and quantitative variables may be
further subdivided:
Nominal
Qualitative
Ordinal
Variable
Discrete
Quantitative
Continuous
Key Definitions
•
Nominal Variable: A qualitative variable that categorizes (or describes, or names) an
element of a population, for example, color of a car purchased.
•
Ordinal Variable: A qualitative variable that incorporates an ordered position, or
ranking, for instance, The variable Age is recorded as young, middle, and old three
possible categories of values.
•
Discrete Variable: A quantitative variable that can assume a countable number of
values. That is, the values are the counts, for example, number of cars owned. So, a
discrete variable can assume values corresponding to integer values along a number
line.
•
Continuous Variable: A quantitative variable that are measurements such as height,
weight etc. The precision of the values recorded for the variable depends on the
measuring scales used. Therefore, a weight of 120 lbs recorded may actually be
120.1 lbs or 120.14 lb or 120.143 lb etc. if a more accurate scale is used for
measuring. Therefore, a continuous variable can assume any interval value along a
number line, including every possible value between any two values.
Important Reminders!
 In many cases, a discrete and continuous variable
may be distinguished by determining whether the
variables are related to a count or a measurement.
 Discrete variables are usually associated with
counting.
 Continuous variables are usually associated with
measurements.
Example
• Example: In a student evaluation of instruction at a large
university, one question asks students to evaluate the statement
“The instructor was generally interested in teaching” on the
following scale:
1 = Disagree Strongly;
2 = Disagree;
3 = Neutral;
4 = Agree;
5 = Agree Strongly.
• Question: Is interest in teaching categorical or quantitative?
Example (cont.)
• Question: Is interest in teaching categorical or
quantitative?
• Since there is an order to these ratings, but there are no
meaning by adding or subtracting two ratings.
• We conclude that variables like interest in teaching are
categorical and are ordinal variables.
Just because your variable’s values are numbers, don’t assume that it’s
quantitative.
Data Collection
• First problem a statistician faces: how to obtain
the data.
• Usually the data are sample data collected from a
portion of the population. It is important to obtain good
or representative sample data.
• Statistical Inferences to the population are made based
on statistics obtained from the sample data collected.
Biased Sampling
Biased Sampling Method: A sampling method that produces data
which systematically differs from the sampled population
An unbiased sampling method is one that is not biased
Sampling methods that often result in biased samples:
• Convenience sample: sample selected from elements of
a population that are easily accessible
• Volunteer sample: sample collected from those elements
of the population which chose to contribute the needed
information on their own initiative
Process of Data Collection
1. Define the objectives of the survey or experiment
– Example: Estimate the average length of time for anesthesia to
wear off
2. Define the variable and population of interest
– Example: Length of time for anesthesia to wear off after surgery
3. Defining the data-collection and data-measuring schemes. This
includes sampling procedures, sample size, and the data-measuring
device (questionnaire, scale, ruler, etc.)
4.
Determine the appropriate descriptive or inferential data-analysis
techniques
Methods Used to Collect Data
Data can be collected through performing an Experiment or
survey or census:
Experiment: The investigator controls or modifies the
environment and observes the effect on the variable under
study
Survey: Data are obtained by sampling some of the
population of interest. The investigator does not modify the
environment.
Census: A 100% survey. Every element of the population is
listed. Seldom used: difficult and time-consuming to compile,
and expensive.
Sampling Frame: A list of the elements belonging to the
population from which the sample will be drawn
Note: It is important that the sampling frame be representative
of the population
Sample Design: The process of selecting sample elements
from the sampling frame
Note: There are many different types of sample designs.
Usually they all fit into two categories: judgment
samples and probability samples.
Two types of sample designs
Judgment Samples: Samples that are selected on the
basis of being “typical”
– Items are selected that are representative of the
population. The validity of the results from a
judgment sample reflects the soundness of the
collector’s judgment.
Probability Samples: Samples in which the elements to
be selected are drawn on the basis of probability. Each
element in a population has a certain probability of being
selected as part of the sample.
Probability Sampling
Probability sampling includes random
sampling, systematic sampling, stratified
sampling, proportional sampling, and
cluster sampling.
Random Sampling
Random Samples: A sample selected in such a way that every
element in the population has a equal probability of being chosen.
Equivalently, all samples of size n have an equal chance of being
selected. Random samples are obtained either by sampling with
replacement from a finite population or by sampling without
replacement from an infinite population.
Notes:

Inherent in the concept of randomness: the next result
(or occurrence) is not predictable

Proper procedure for selecting a random sample: use a
random number generator or a table of random numbers
Example
 Example: An employer is interested in the time it takes
each employee to commute to work each morning. A
random sample of 35 employees will be selected and
their commuting time will be recorded.
1. There are 2712 employees
2. Each employee is numbered: 0001, 0002, 0003, etc., up to
2712
3. Using four-digit random numbers, a sample is identified:
1315, 0987, 1125, etc.
Systematic Sampling
Systematic Sample: A sample in which every kth item of the
sampling frame is selected, starting from the first element
which is randomly selected from the first k elements
Note: The systematic technique is easy to execute.
However, it has some inherent dangers when the
sampling frame is repetitive or cyclical in nature. In
these situations the results may not approximate a
simple random sample.
Example
Suppose you want to obtain a systematic sample
of 8 houses from a street of 120 houses., so
• First, since 120/8=15, choose a random starting
point between 1 and 15. Let’s say, 11.
• Then, choose every 15th house after the 11th
house.
The list of houses selected are
11, 26, 41, 56, 71, 86, 101, and 116.
Strartified Sampling
Stratified Random Sample: A sample obtained by
stratifying or grouping the sampling frame and then
selecting a fixed number of items from each of the
strata/groups by means of a simple random sampling
technique.
Proportional Sampling
Proportional Sample (or Quota Sample): A sample
obtained by stratifying the sampling frame and then
selecting a number of items in proportion to the size of
the strata (or by quota) from each strata by means of a
simple random sampling technique
Example
Suppose that in a company there are 180 staff include:
Male, full time
90
Male, part time
18
Female, full time
9
Female, part time
63
we are asked to take a proportional sample of 40 staff, stratified according to the
above categories.
• The first step is to calculate the percentage of staff in each group:
% male, full time = (90/180) x 100 = 0.5 x 100 = 50
% male, part time = (18/180) x100 = 0.1 x 100 = 10
% female, full time = (9/180) x 100 = 0.05 x 100 = 5
% female, part time = (63/180) x100 = 0.35 x 100 = 35
•
This tells us that of our sample of 40, 50% should be male, full time. 10% should
be male, part time. 5% should be female, full time. 35% should be female, part
time. Therefore,
50% of 40 is 20.
10% of 40 is 4.
5% of 40 is 2.
35% of 40 is 14.
We need to select 20 full time males, 4 part time males, 5 full time females,
and 35 part time females.
Cluster Sampling
Cluster Sample: A sample obtained by stratifying the
sampling frame into clusters first and then randomly
selecting some clusters. Finally, the sample will
include either all elements or a simple random sample of
some of the elements in each of the clusters selected.
Note: The difference between strata and cluster samplings:
All strata are represented in the sample; but only a subset of clusters
are in the sample.
Guideline for Planning a Statistical
Study
1.
2.
3.
4.
5.
6.
7.
Determine the variables and methods of measuring.
Decide to collect Identify the individuals or objects
involved.
data from an entire population or a sample. If using a
sample, decide on a sampling method.
Address issues of ethics, privacy, and confidentiality in
planning for data collection.
Collect data.
Apply descriptive statistics (Chapters 1, 2, 3) methods
and make conclusion using appropriate inferential
statistics methods (Chapters 9, 10, 11) from the data
collected.
Discussions and recommendations for future studies.
Probability & Statistics
• Probability is the science of making statement
about what will occur when samples are drawn
from a known population.
• Statistics is the science of organizing a sample
data and making inferences about the unknown
population from which the sample is drawn.
Probability (Chapters 4, 5, 6, 7, 8) is an vehicle of statistics so that the
accuracy of statistical inferences from a sample data to a population
can be justified with its chance of occurring. That is, we want to
know the chance a similar result will occur, if the study is repeated
many more times.
Comparison of Probability & Statistics
Probability: Properties of the population are
assumed known. Answer questions about
the sample based on these properties.
Statistics: Use information in the sample to
draw a conclusion about the population
Example
 Example: A jar of M&M’s contains 100 candy pieces, 15
are red. A handful of 10 is selected.
Probability question:
What is the probability that 3 of the 10 selected are red?
 Example: A handful of 10 M&M’s is selected from a jar
containing 1000 candy pieces. Three M&M’s in
the handful are red.
Statistics question:
What is the proportion of red M&M’s in the entire jar?