Statistics for Science (Psy) - United International College
Download
Report
Transcript Statistics for Science (Psy) - United International College
Statistics for Business
Instructor: Prof. Ken Tsang
Room E409-R11
Email: [email protected]
1
TA information
Mr. ZHOU, Min
周敏
Room E409 Tel:3620620
[email protected]
2
Web-page for this class
• Watch for announcements about this class
and
• download lecture notes from
• http://www.uic.edu.hk/~kentsang/stat2012/st
at2012.htm
• Or from this page:
http://www.uic.edu.hk/~kentsang/
Or from Ispace
3
Tutorials
• One hour each week
• Time & place to be announced later (we need
your input)
• More explanations
• More examples
• More exercises
4
How is my final grade determined?
•
•
•
•
Quizzes
Mid-term exam
Assignments
Final Examination
20%
20%
10%
50%
5
Some requirements on this Course
Assignments must be handed in before the
deadline.
We will tell you your scores for the mid-term test
and quizzes so that you know your progress.
However, for the final examination, we cannot
tell you the score before the AR release the
official results.
6
UIC Score System
7
Grade Distribution Guidelines
8
General Information
• Textbook
Business Statistics in Practice, 5th Edition,
Bowerman O’Connell Murphree,
McGraw Hill International
Edition(2009)
9
Statistics for the Behavioral Sciences
Frederick J Gravetter
and Larry B. Wallnau
Wadsworth Publishing; 8 edition (December
10, 2008)
10
Chapter 1
An Overview of Statistics
11
Chapter Sumary
1.1 Populations and Samples
1.2 Ratio, Interval, Ordinal, and Nominative
Scales of Measurement
1.3 An Introduction to Survey Sampling
12
What is statistics?
Statistics is the science of collecting,
organizing, presenting, analyzing, and
interpreting numerical data to
gain more knowledge,
make more effective decisions.
13
What is Statistics?
• Statistics is the branch of mathematical
science to make effective use of numerical
data relating to a population (groups of
individuals or experiments).
• It deals with all aspects of the collection,
analysis, interpretation (or explanation) and
presentation of such data, as well as the
planning of the collection of data (i.e. the
design of surveys and experiments).
14
Data collection and statistical analysis
• Once a sample that is representative of the
population is determined, data is collected for
the sample members in an observational or
experimental setting.
• This data can then be subjected to statistical
analysis, serving two related purposes:
– description
– inference
15
Why do I have to learn Statistics?
• Social policy, medical practice, and business
decision all rely on the proper use of statistics.
• Misuse of statistics can produce subtle, but serious
errors in description and interpretation of data,
which leads to wrong decision.
• Even when statistics are correctly applied, the
results can be difficult to interpret for those lacking
expertise.
• The set of basic statistical skills (and skepticism)
that people need to deal with information in their
everyday lives properly is referred to as statistical
literacy.
?
16
Who uses statistics?
Statistical techniques are
used in many areas:
Government
Marketing
quality control
Medical
Research (sports,
education, politic,
psychology…)
Who Uses Statistics?
17
Examples in business statistics
• Consumer price index (CPI) is a measure
estimating the average price of consumer
goods and services purchased by households
(a constant basket of goods and services
within the same area).
• Gross domestic product (GDP) is the market
value of all final goods and services made
within the borders of a country in a year.
18
Recent developments
• There are more and more data around us,
because
– It is cheap to obtain & store
• Computational tools are widely available. They
are cheap and effective.
19
The McKinsey Global Institute:
20
How Companies Learn Your Secrets
By CHARLES DUHIGG
Published: February 16, 2012
21
Basic Terminology
•
•
•
•
•
•
•
Measurement, data
Variables, Value
Quantitative, Qualitative
Population, Sample
Census
Descriptive Statistics
Inferential Statistics
22
Measurement
The process of determining the extent,
quantity, or amount of the variable of interest
for a particular item of the population.
• Produces data
• For example, collecting the starting salaries of
graduates from last year’s MBA program
23
Data
• Data can be viewed as the raw material from
which information is obtained, just as trees
are the raw material from which paper is
produced.
• In fact, a good definition of data is "facts or
figures from which conclusions can be
drawn".
24
Variables
• A variable is a characteristic that may assume
set of values to which a value can be assigned.
• Height, age, amount of income, province or
country of birth, grades obtained at school
and type of housing are all examples of
variables.
25
Value
The result of measurements from a variable.
• The specific measurement for a particular unit
in the population
• For example, the starting salaries of graduates
from last year’s MBA Program
26
Quantitative
Values that can be expressed as
quantities/numbers. (For example, “how
much” or “how many.”)
• Annual starting salary of college graduate
• Age and weight of a person
27
Qualitative
A descriptive category to which the value can
belong (a descriptive attribute of a population
unit)
• A person’s gender
• A person’s hair color
28
Population
A population is the set of all the individuals of
interest in a particular study.
• For example, if we want to know the starting
salaries of all UIC graduates then the
population of interest is the totality of all UIC
graduates.
29
Census
The procedure of systematically acquiring and
recording information (taking measurements)
about all the members of a given population.
• Census usually too expensive, too time
consuming, and too much effort for a large
population
30
Sample
A sample is a set of individuals selected from a
population, usually intended to represent the
population in a research study.
• For example: 1,000,000 Chinese college
students graduated in 2010
• This is too large for a census
• So, we select a sample of these graduates and
study their annual starting salaries
31
32
Population – the object of statistical study
• In applying statistics to a scientific, industrial,
or societal problem, it is necessary to begin
with a population to be studied.
• Populations can be diverse topics such as "all
persons living in a city/country" or “all past
and present students of UIC".
33
Parameter & Statistic
A parameter is a value, usually a numerical value, that
describes a population.
A parameter may be obtain from a single
measurement, or it may be derived from a set of
measurements from the population.
A statistic is a value, usually a numerical value, that
describes a sample.
A statistic may be obtain from a single measurement,
or it may be derived from a set of measurements from
the sample.
34
Sampling error
• Sampling error is the discrepancy, or amount
of error, that exists between a sample statistic
and the corresponding population parameter.
35
Example of Sampling error
36
Descriptive Statistics are procedures to
organize, summarize, and present data in an informative
way.
EXAMPLE 2:
According to Consumer
EXAMPLE 1:
The average test score for the
students in a class, to give a
descriptive sense of the typical
scores.
Reports, there were 2.5
problems per one copying
machines reported during
2009.
37
Descriptive statistics
• Descriptive statistics summarize/characterize the
population data by describing what was observed
in the sample numerically (tabular) or graphically.
• Numerical descriptors include mean and standard
deviation for continuous data types (like heights or
weights), while frequency and percentage are more
useful in terms of describing categorical data (like
race, gender…).
38
Descriptive Statistics
To describe the important aspects of a set of
measurements.
• For example, for a set of starting salaries, we
want to know:
– How much to expect (mean)
– What is a high versus low salary
• If the population is small, could take a census
and make statistical inferences
• But if the population is too large, then …
39
Inferential Statistics
The science that allow us to study samples
and then make generalizations about the
population from which they were selected (i.e.
to determine [in statistical sense] the population
parameters from sample statistics).
• For example, use a sample of starting salaries
to estimate the important aspects of the
population of starting salaries.
40
Inferential statistics
• Inferential statistics (or inductive statistics) uses
patterns in the sample data to draw inferences about
the population represented.
• These inferences may take the form of:
– answering yes/no questions about the data (hypothesis
testing),
– estimating numerical characteristics of the data
(estimation),
– describing associations within the data (correlation),
– modeling relationships within the data (regression).
41
Examples of inferential statistics
Example 1:
In each month, 1000
families were chosen
at random.
An popular index of
TV channel are
computed base from
the data obtained in
these family.
Example 2:
The accounting department of
a large firm will select a
sample of the invoices to
check for accuracy for all the
invoices of the company.
#1
42
Difference between descriptive & inferential
statistics
• Descriptive statistics are distinguished from
inferential statistics in that descriptive statistics aim
to quantitatively summarize a data set, rather than
being used to support inferential statements about
the population that the data are thought to
represent.
• Descriptive statistics- get a “feel” (characterization)
for the data
• Inferential statistics- draw conclusions from the data
43
Example: Descriptive & Inferential statistics
44
45
Data and Variables
• Variables are qualitative or quantitative
attributes that characterize a population/
sample.
• Data (plural of "datum", which is seldom used) are
typically the results of measurements of a set
of variables.
46
Types of Variables
For a Qualitative or Attribute Variable the
characteristic being studied is nonnumeric.
Gender
Eye
Color
Type of car
Types of Variables
In a Quantitative Variable information is
reported numerically.
Balance in your checking account
Final score for the students in a class
Number of children in a family
Types of Variables
Quantitative variables can be classified as either
Discrete or Continuous.
Discrete Variable consists of separate, indivisible values.
There are “gaps” between possible values of the variable.
Example: the number of
bedrooms in a house, or the
number of hammers sold at
the local hardware store
(1,2,3,…,etc).
Types of Variables
A Continuous Variable can assume any value
within a specified range. There are infinite number of
possible values between any 2 observed values.
The pressure in a tire
The weight of a pork chop
The height of students in a class.
Summary of Types of Variables
DATA
Qualitative or attribute
(type of car owned)
Quantitative or numerical
discrete
(number of children)
continuous
(time taken for an exam)
Scales of Measurement
There are four scales of data
Nominal
Ordinal
Interval
Ratio
Pinot noir
52
Nominal data
Nominal Scales Data are
Gender
classified into categories. But
the ordering of categories is
not meaningful. These are:
–Identifier or name
–Unranked categorization
•Example: gender, eye or skin
color
Eye
Color
53
Scales of Measurement
Category of Nominal scale variables must be
Mutually exclusive
ALL the individual (or object or measurement)
must appear in ONLY ONE category.
Exhaustive
ALL the individual (or object or
measurement) must appear in AT
LEAST ONE of the categories.
54
Scales of Measurement
Ordinal Scale: Orders are meaningful in ordinal
scale, but differences are not.
During a taste test of 4
soft drinks, Coca Cola
was ranked number 1,
Sprite number 2, Pepsi
number 3, and Root Beer
number 4. Can we say
Coca Cola is 2 better then
Pepsi?
4
2
1
3
55
Ordinal data
• Ordinal data
– All characteristics of nominal data plus…
– Rank-order categories
– Ranks are relative to each other
• Example: Low (1), moderate (2) or high (3) risk
56
Scales of Measurement
Interval Scales
Both the orders and differences are meaningful but
the ratio is not.
Temperature on the
Fahrenheit scale.
57
Interval data
• All of the characteristics of ordinal data plus…
• Measurements are on a numerical scale with
an arbitrary zero point
– The “zero” is assigned: it is nonphysical and not
meaningful
– Zero does not mean the absence of the quantity
that we are trying to measure
58
Interval data
Continued
• Can only meaningfully compare values by the
interval between them
– Cannot compare values by taking their ratios
– “Interval” is the arithmetic difference between the
values
• Example: temperature
– 0 F means “cold,” not “no heat”
– 80 F is not twice as warm as 40 F
59
Scales of Measurement
Ratio Scales: Orders, Differences and ratios are
meaningful for this level of measurement.
Miles traveled by sales
representative in a month
Monthly income
of surgeons
Ratio data
• All the characteristics of interval data plus…
• Measurements are on a numerical scale with a
meaningful zero point
– Zero means “none” or “nothing”
• Values can be compared in terms of their
interval and ratio
– $30 is $20 more than $10
– $0 means no money
61
Ratio data
Continued
• In business and finance, most quantitative
variables are ratio variables, such as anything
to do with money
– Examples: Earnings, profit, loss, age, distance,
height, weight
62
Qualitative Variables
• Descriptive categorization of population or
sample units
• Two types:
– Nominal
– Ordinal
63
Quantitative Variables
• Numerical values represent quantities
measured with a fixed or standard unit of
measure
• Two types:
– Interval
– Ratio
64
Summary of Types of Variables
DATA
Qualitative or attribute
(type of car owned)
Nominal
Ordinal
Quantitative or numerical
discrete
(number of children)
Interval
Ratio
continuous
(time taken for an exam)
How to choose a sample?
• For a sample to be used as a guide to an entire
population, it is important that it is truly a
representative of that overall population.
• Representative sampling assured: inferences and
conclusions can be safely extended from the sample
to the population as a whole.
• A major problem lies in determining the extent to
which the sample chosen is actually representative.
66
Sampling
• Sampling is that part of statistical practice
concerned with the selection of individual
observations intended to yield accurate
knowledge about a population of concern,
especially for the purposes of statistical
inference.
67
Representative Sample
• Representative sample is not easy to obtain
because of random and non- random
variations in the sample.
• Statistics offers methods for designing
experiments to choose a representative
sample of the overall population,
strengthening its capability to discern truths
about the population.
68
Random sampling
• Random sampling is a sampling technique to select a
sample for study from a population. Each individual is
chosen entirely by chance, hence unpredictable, and each
member of the population has a known, but possibly nonequal, chance of being included in the sample.
• By using random sampling, the likelihood of bias (being
non-representative) is reduced.
• Simple random sampling is the basic sampling technique
in which each individual is chosen entirely by chance with
an equal probability of being included in the sample, i.e.
each member of the population is equally likely to be
chosen at any stage in the sampling process.
69
Random process
• A random process is a repeating process
whose outcomes follow no describable
deterministic pattern, but follow a probability
distribution.
70
Probability and Mathematical statistics
• The fundamental mathematical concept
employed in understanding randomness is
probability.
• Mathematical statistics (statistical theory) is
the branch of applied mathematics that uses
probability theory and analysis to examine the
theoretical basis of statistics.
71
Chapter 1: GOALS
When you have completed this chapter, you will be able to:
ONE
TWO
Understand why we study statistics.
Explain what is meant by descriptive statistics and inferential statistics.
THREE
Distinguish between qualitative and quantitative variables.
FOUR
Distinguish between discrete and continuous variables.
FIVE
Distinguish among the nominal, ordinal, interval, and ratio levels
of measurement.
SIX
Define the terms mutually exclusive and exhaustive.
SEVEN Basic methods in sampling.
72