Transcript terminology

TERMINOLOGY:
Statistics
“is the development and application of
methods to the collection, analysis, and
interpretation of observed information
(data)/drawing inferences based on the
analysis from planned investigations”.
The term statistics is used to mean
either statistical data(for laymen) or
statistical methods
Statistical data:
When it means statistical data it refers to numerical
descriptions of things. These descriptions may take
the form of counts or measurements. Thus statistics
of malaria cases include fever cases, number of
positives obtained, sex and age distribution of
positive cases, etc.
Descriptive Statistics
The techniques for tabular and graphical
presentation of data as well as the methods used to
summarize a body of data with one or two
meaningful figures. This aspect of organization,
presentation and summarization of data is labeled
as descriptive Statistics.
Inferential Statistics
The branch of modern statistics that is most relevant
to public health and clinical medicine is statistical
inference. This branch of statistics deals with
techniques of making conclusions about the
population. Inferential statistics builds upon
descriptive statistics. The inferences are drawn from
particular properties of sample to particular
properties of population. These are the types of
statistics most commonly found in research
publications.
Biostatistics
Definition: When the different statistical methods are
applied in biological, medical and public health data,
they constitute the discipline of Biostatistics.
Public Health
• Public Health is the science of protecting and
improving the health of communities through
education, promotion of healthy lifestyles, and
research for disease and injury prevention.
• Public health professionals analyze the effect
on health of genetics and the environment in
order to develop programs that protect the
health of your family and community.
• Overall, public health is concerned with
protecting the health of entire populations.
These populations can be as small as a local
neighborhood, or as big as an entire country.
Overview of Descriptive Statistics
1. Summarizing data sets numerically
•
Measures of location
•
Measures of variation
•
Min, max, quartiles
2. Summarizing data sets graphically
•
Histograms
•
Frequency tables
•
Stem and leaf plots
•
Boxplots
3. Summarizing bivariate data sets
•
•
Scatter plots
Correlation
Where is statistics used?
• Sample size calculations
You use, on average, 15 grams of a particular chemical Each time
you perform an experiment. This chemical is very expensive and
takes three months to be imported from America. You need to run
10 successful experiments for your Honors thesis/research.
Question: How much of the drug/chemical should you
order?
• Hypothesis testing
The standard treatment for Rheumatoid arthritis had a measurable
improvement for 63% (success rate)of people. A new drug has been
trialed on 100 people. 68 people recorded a measurable
improvement.
Question: Is the new drug better than the standard
treatment?
Business
You own a pharmacy. You think you sell more boxes of
tissues in winter than in summer. You have monthly sales data
for tissues for the past three years.
Question:
How many tissues should you order
each month?
Medicine
In the past, doctors did not wash their hands (or their surgical
instruments) between patients. Florence Nightingale observed
that less patients died in wards where nurses washed their
hands.
Question: Does washing your hands save lives?
Descriptive Statistics
Numerical
Summary
Measures of Location
Arithmetic mean
Median
Mode
Measures of Scale/Spread
Range
Standard deviation
Median absolute deviation
Interquartile range
Other
Min and max
Quartiles
Correlation coefficient
Graphical
Summary
Categorical Data
Pie chart
Bar plot
Frequency table
Dot plot
Quantitative Data
Histogram
Bivariate Data
Scatter plot
Individuals, variables, and data
Individuals are simply the objects measured in a
statistical problem. A variable is a characteristic that we
would like to measure on individuals. The actual
measurements recorded on individuals are called data.
Table 1.1: data on relieving times (in minutes) from
patients after applying three different medicines/drugs.
Patient
Medicine 1 Medicine 2 Medicine 3
1
6.8
5.5
6.5
2
6.5
5.3
7.2
3
6.4
5.4
5.5
4
4.2
6.1
6.3
5
5.5
5.8
7.2
Individuals are patients, variable is the relieving time.
TERMINOLOGY : In a statistical problem, the
population is the entire group of individuals that
we want to make some statement about. A sample
is a part of the population that we actually observe.
NOTE :
Usually, we cannot examine all members of a
population due to time, cost, and other constraints.
So, we examine only a portion of the population
and try to draw conclusions about the whole. This
smaller portion is the sample. The process of
generalizing the results in our sample to that of the
entire population is known as statistical inference.
We'll study this more formally later in the course.
Introduction:
An investigator usually wants to make a
statement about a large group of individuals.
This group is called the population.
 all eligible voters in KPK/ Election
 all employees at a company/ Income Level
 all students of University/ Exam System
 all HIV patients/ Risk Factors of HIV

The way we select a sample from the population
is called the sampling design.
Example (Population):
The set of all enrolled students across the
University of Peshawar.
Definition (Sample)
A subset of the population of interest that is
collected during the course of a study.
 Example (Sample)
 A set of 100 University of Peshawar students
interviewed while walking across TCC Road
on Monday morning.
Why bother with samples?
Wouldn't it be better to work with populations?
 Populations are typically very large.
 It can be prohibitively expensive to survey a
population.
 It can take a long time to survey a population.
Question
Can you think of any examples where the
population is surveyed? (Census)
 Samples need to be representative of the
population.
 The best way to ensure a representative sample is
to use “random sampling.“/ University teachers
Income level.
A simple random sample (SRS) is a sampling
design
 We are choosing individuals so that our sample
will hopefully be representative.
 Each individual in the population has an equal
chance of being selected.
 We use the terms “random sample" and “simple
random sample" interchangeably.
 Drawing random samples can be “thought of"
as picking numbers at random from a hat. This
is useful for small populations; not for large!
TERMINOLOGY : A parameter is a number that
describes a population. Since it characterizes a
population, and we can not contact every member
in the population, a parameter is unknown.
Parameters are fixed values (they are what they are,
even if we don't know them!). Greek letters are used
for parameters.
TERMINOLOGY : A statistic is a number that
describes a sample. When we take a sample, we can
compute the value of the statistic. Of course,
different samples may produce different values of
the statistic!
IMPORTANT : We use sample statistics to estimate
population parameters.
Example. The mean prescription amount of
Methylphenidate/Ritalin, across all doctors in
KPK is an example of parameter.
Example. The sample mean prescription
amount of Methylphenidate, prescribed by
doctors in Peshawar medical Centre is an
example of Statistic.
General notation for writing observations:
For a general sample of size n we write the
observations as
x1, x2, …, xn
In other words, the ith observation is denoted
as “Xi” for i = 1, 2,…, n.
A Fox News poll, taken on June 29, 2006,
reported the results from an SRS of n = 900
adults nationwide. Interviewers asked the
following question:
“Do you approve or disapprove of the way
George W. Bush is handling his job as
president?"
 Of the 900 adults in the sample, 369
responded by stating they approve of the
President's handling of his job.
 What does this information tell us about the
population of US adults?

Let P = proportion of US adults which approve of
President Bush
 The value of P is a parameter since it represents
the population. The parameter p is called the
population proportion.
 To know P, we would have to poll every American
adult! Since this is not done, we do not get to
know p.
 On the other hand, let p = proportion of
individuals in our sample who approve of
President Bush.
 Since p is computed from the sample of
individuals, we call it the sample proportion.
Recall that, in the sample, 369 of the 900 adults
approved of the way President Bush is handling his
job.

Thus p= 369/900= 0.41
We might state that
“Our sample results indicate that 41
percent of US adults favor the way
that President Bush is handling his
job."
Variables: Definition (Variable)
 A variable is a value or characteristic that can
differ between individuals/observations.
 Example
 A study might include variables such as gender,
blood pressure, quantity of drug prescribed,
age, time since diagnosis, . . .
 Definition (Quantitative)
 Quantitative variables are numerical values
whose magnitude provides some meaning.
 Definition (Qualitative)
 Qualitative variables are used to distinguish
between categories.
Qualitative/Categorical
 Marital Status
 Race
 Gender
Quantitative/Numerical
Discrete
 Number of children
 Score in a multiple choice quiz
 Number of attempts at a driving test
Continuous
 Height of an individual
 Weight of an individual
 Time since infection
Qualitative or Quantitative, Discrete or Continuous?
 Blood pressure reading from a patient.



Probabilistic problems confront everyone—
from the business person considering plant
expansion(Gain and Loss), to the scientist
testing a new wonder drug, to the individual
deciding whether to carry an umbrella to
work.
All involve element of Randomness
At the foundation of sound decision-making
lies the ability to make accurate estimates of
the probabilities of future events. That is
predicting future events
You are the campaign manager for the Republican
candidate for President of the United States. You
have the results from a recent poll taken in New
Hampshire. You want to know
The chance that your candidate would win(f/event) in
New Hampshire if the election were held today.
 You are the manager and part owner of a small
construction company. You own 20 trucks. The
chance that any one truck will break down on any
given day is about one in ten. You want to know
the chance on a particular day— tomorrow—that
four or more of them will be out of action.




Three Horses A, B and C are in race. You want
to know which horse is going to win? Which
horse is more likely to win? Will there be any
tie?
Past history shows that a particular medicine
has a success rate of 0.80 for getting relief
from fever. If a doctor gives a dose to a new
patient, what are the chances that the patient
will get a relief? If 5 persons are given the
medicine, what are the chances that 4 will get
relief?
The core of all these problems, and of the
others that we deal with, is that you want to
know the “chance” or “probability”(different
words for the same idea) that some event will
or will not happen, or that something is true
or false.
 To put it another way, we want to answer
questions about
“What is the probability that...?”, given the
body of information that you have in hand.

Eventually, a person wants to use the estimated
probability to help make a decision concerning
some action one might take.
These are the kinds of decisions, related to the
questions about probability stated above, that
ultimately we would like to make:
1. Should you (the researcher) advise doctors to
prescribe medicine CCC for patients, or, should
you (the researcher) continue to study CCC before
releasing it for use? A related matter: should you
and other research workers feel sufficiently
encouraged by the results of medicine CCC so that
you should continue research in this general
direction rather than turning to some other
promising line of research? These are just two of
the possible decisions that might be influenced by
the answer to the question about the probability
that medicine CCC cures cancer.
2. Should you advise the Republican presidential
candidate to go to New Hampshire to campaign? If
the poll tells you conclusively that he or she will
not win in New Hampshire, you might decide that
it is not worthwhile investing effort to campaign
there.
 Similarly, if the poll tells you conclusively that he
or she surely will win in New Hampshire, you
probably would not want to campaign further
there.
 But if the poll is not conclusive in one direction
or the other, you might choose to invest the
effort to campaign in New Hampshire. Analysis of
the chances of winning in New Hampshire based
on the poll data can help you make this decision
sensibly.
3. Should your firm buy more trucks? Clearly the
answer to this question is affected by
the probability that a given number of your trucks
will be out of action on a given day.
But of course this estimated probability will be
only one part of the decision.

The kinds of questions to which we wish to find
probabilistic and statistical answers may be found
throughout the social, biological and physical
sciences; in business; in politics; in engineering
(concerning such spectacular projects as the flight
to the moon); and in most other forms of human
endeavor.



1. Experiments whose outcomes are
predicted before performing it, e.g,
experiments in physics lab, chemistry lab,
etc.
2. Experiments( Random experiments) whose
outcomes cannot be predicted before the
experiment is performed: e.g,
Tossing a coin, rolling a die, race
competition, outcomes of taking a dosage of
medicine, etc.
Definition (Sample Space)
The sample space, often denoted by S , of an experiment
or random trial is the set of all possible outcomes.
More Definition (Event)
An event, denoted by A or B or C, etc., is a set of
outcomes (a subset of the sample space) to which a
probability is assigned.
Example (Tossing a dice)
The sample space is S= {1; 2; 3; 4; 5; 6} because either
a 1 or a 2 or a 3 or a 4 or a 5 or a 6 must be on the
surface.
If we are interested in rolling an even number, the event
of interest is A = {2; 4; 6}.
Definition (Mutually exclusive)
Two (or more) events are mutually exclusive if they
cannot occur at the same time.
Example (Tossing a die)
The events rolling a 2 and rolling a 3 are mutually
exclusive because you cannot roll a 2 and a 3 at the same
time.
Definition (Collectively exhaustive)
A set of events is collectively
encompasses all possible outcomes.
exhaustive
if
it
Example (Tossing a die)
The events 1, 2, 3, 4, 5 and 6 are collectively exhaustive
because one of these must occur in each roll of the dice.
Union: Definition
If either event A or event B or both events occur
at the same time, this is called the union of the
events A and B. It is denoted as A U B.
A U B is sometimes read as “A or B" but
remember that A U B really means “A or B or
both A and B".
Union of Mutually Exclusive Events
Recall if two events are mutually exclusive then
if one occurs, the other cannot occur. We can
represent this in a Venn diagram where there's
no overlap between the two events.
Mathematically, if you have two (or more) events
that are mutually exclusive then:
P (A U B) = P (A) + P (B)
Intersection: Definition
If both event A and event B occur at the same
time, this is called the intersection of events A
and B. It is denoted as A ∩ B:
A ∩ B is sometimes read as “A and B."
Venn diagram representation
Union of Non-Mutually Exclusive Events
If events are not mutually exclusive then it is
possible for them to both occur at the same time.
Mathematically, if you have two (or more) events
that are not mutually exclusive then:
P (AUB) = P (A) + P (B) - P (A ∩ B)
P(A)+P(B) counts the overlapping section P (A ∩ B)
twice!
If all the outcomes are equally likely then the probability of event A is
the number of outcomes in A (M(A)) divided by number of all
outcomes (M):
P ( A) 
M ( A)
M
Example: If a coin is fair then probability of H is ½ and probability of
T is ½
Example: If a die is fair then probability of {1} is 1/6
Since random experiments can be repeated as many times as we wish
under identical conditions (in theory) we can measure the relative
frequency of the occurrence of an event. If number of trials is m
and number of the occurrence of A is m(A) then according to
frequency definition probability of A is the limit:
P( A)  lim
m( A)
m
( m  )
According to the law of large numbers this limit exists. When the
number of trials is small then there might be strong fluctuations.
As number of trials increases fluctuations tend to decrease.
Other (subjective) definitions of probability
Degree of belief. How much a person believes in an event. In that
sense one person’s probability would be different from another
person’s.
Probability is defined as a function from subsets of outcome space 
to the real line R that satisfies following conditions:
1.
2.
3.
Non-negativity: P(A)  0
Additivity: if AB= then P(AB) = P(A) + P(B)
Probability of whole space is 1. P() = 1
All above definitions obey these rules. So any property that can be
derived from these axioms is valid for all definitions
Show that: P( )=0 (Hint:   = )
Show that: 0  P(A)  1 (Hint A and Ã=-A are not intersecting).
Probability: Definition
Probability is a way of expressing knowledge or
belief about the likelihood of an event occurring.
Mathematically, the probability that some event,
let's call it E, occurs is expressed as: P (E)
• The probability of an event occurring must be
between 0 and 1:
0< P (E)< 1
• If an event cannot happen it has probability,
P (E) = 0.
• If an event is certain to happen, its probability
is, P (E) = 1.
Independence :Definition
Two events A and B are independent if the occurrence of
event A makes it neither more nor less probable that
event B occurs. Mathematically, independence occurs if
and only if
P (A ∩B) = P (A) x P (B)
For dependent events
P (A ∩B) = P (A) x P (B/A)
Example (Throwing a dice)
The event of getting a 4 the ist time a die is rolled and
the event of getting a 4 the second time are independent.
The probability of rolling a 4 in the ist roll and a 4 again
in the second roll:
P (4 on ist roll ∩ 4 on second roll) = P (4) x P (4)
= 1/6 x 1/6 = 1/36
Example I:
An urn contains four blue balls and five
red balls. What is the probability that a
ball chosen from the urn is blue?
Solution:
There are nine possible outcomes, and the
event “blue ball is chosen” comprises four
of these outcomes. Therefore, the
probability of this event is 4/9 or
approximately 44.44%.
Theorem
If P(A) is the probability that event A will occur,
then the probability that A will not occur is:
P(notA) = 1- P(A)
Theorem
If A and B are two mutually exclusive events
(the occurrence of one event makes the other
event impossible), then the probability that
either event A or event B will occur is the sum
of their respective probabilities:
P(A or B) = P(A) +P(B)
This is the “additive law of probability”.
Theorem
If event A and event B are not mutually exclusive,
then the probability of either event A or event B
or both is given by:
P(AUB)=P(A or B or both) = P(A) +P(B) – P(A∩B)
Events that are not mutually exclusive have some
outcomes in common
Theorem
The sum of the probabilities of the events of a
situation is equal to 1.000
P(A) + P(B) + …..+ P(N) = 1.000
Theorem
If A and B are independent events (one where its
occurrence has no influence on the probability of
the other event or events), then the probability of
both A and B occurring is the product of their
respective probabilities:
P(A and B) = P(A) X P(B)
Theorem 7
If A and B are dependent events, the probability
of both A and B occurring is the probability of A
and the probability that if A occurred, then B
will occur also:
P(A and B) = P(A) X P(B\A)
P(B\A) is defined as the probability of event B,
provided that event B has occurred.
Example
A coin is tossed twice. What is the probability of
having
a) two heads?
b) no head?
c) at least one head?
Example I: A sequence of 10 bits is randomly
generated. What is the probability that at least
one of these bits is zero?
Solution: There are 210 = 1024 possible
outcomes of generating such a sequence. The
event –E, “none of the bits is zero”, includes
only one of these outcomes, namely the
sequence 1111111111.
Therefore, p(-E) = 1/1024.
Now p(E) can easily be computed as
p(E) = 1 – p(-E) = 1 – 1/1024 = 1023/1024
Example I: A die is biased so that the number 3
appears twice as often as each other number.
What are the probabilities of all possible
outcomes?
Solution: There are 6 possible outcomes s1, …, s6.
p(s1) = p(s2) = p(s4) = p(s5) = p(s6)
p(s3) = 2p(s1)
Since the probabilities must add up to 1, we have:
5p(s1) + 2p(s1) = 1
7p(s1) = 1
p(s1) = p(s2) = p(s4) = p(s5) = p(s6) = 1/7, p(s3) =
2/7
Example II: For the biased die from Example I,
what is the probability that an odd number
appears when we roll the die?
Solution:
Eodd = {s1, s3, s5}
Remember the formula p(E) = sE p(s).
p(Eodd) = sEodd p(s) = p(s1) + p(s3) + p(s5)
p(Eodd) = 1/7 + 2/7 + 1/7 = 4/7 = 57.14%
Exercise 4.
P(Anita receives A in statistics exam) = P(Stats)=0.4
P(Anita receives A in physics exam) = P(Physics)=0.5
P(Anita receives A in either statistics or physics) = 0.86
What is the probability that
a) Anita receives A neither in stats nor in physics
b) Anita receive A’s in both stat and physics
Solution
P(stats)= 0.4
P(physics) = 0.5 P(stats U phys) = 0.86
a) P(Anita receives A neither in stats nor in physics)
P(Stat U Physics)C = 1 - P(stat U physics)
= 1- 0.86 = 0.14
b) P(stat ∩physics) = P(stat) + P(physics) – P(stat U phys)
= 0.4 + 0.5 – 0.86 = 0.04
Problem
Among 32 dieters following a similar routine, 18
lost weight, 5 gained weight and 9 remained with
the same weight. If one of these dieters is
randomly chosen, fins the probability that he or
she
a) Gained weight
b) Lost weight
c) Neither lost nor gained weight
Solution
a) P(w↗)= 5/32
b) P(w↘)= 18/32
c) P(w=) = 1- P(w↗U w↘)= 1- 23/32= 9/32