Lect 9 Graphic representations in statistics

Download Report

Transcript Lect 9 Graphic representations in statistics

Graphic
representations in
statistics
Graphic representation and
graphic analysis



Graphic representations are used for evident
representation of statistical quantities they allow
to analyze them deeper.
The graphic representation can be built both
after absolute and after relative quantities.
Using the graphic method, it is important to know
that the type of graphic representation must
strictly answer the maintenance of every index.
Source data and graph types
Relative quantities:
intensive indices
 extensive indices
 index of correlation
 index of evidence

Source data and graph types

Intensive quantities - 4 types of
diagrams:
-
column
linear
mapgram
mapdiagram
Source data and graph types
Extensive quantities: (they characterize
the structure) sector or inwardly-column
diagram.
 Indices of correlation: the same
diagrams, that for intensive quantities
(column and linear diagrams, mapgram,
mapdiagram).

Source data and graph types
Indices of evident: the principles of
graphic representation are the same,
that for intensive quantities.
 Column diagrams – for illustration of
homogeneous, but not interconnected
indices. They represent the static’s of
the phenomena.

Source data and graph types


Linear diagrams – for the representations of
dynamics of that or other phenomenon (a
typical example is a temperature curve,
change of birth-rate, death rate level).
Radial diagram – is built on the system of
polar co-ordinates of the phenomenon
representations during the close cycle of time
(days, week, year). For example: structure of
morbidity or cause of mortality, where in a
circle every cause of mortality, depending on
its percent occupies a certain sector.
Source data and graph types
Mapgram is the representations of
statistical quantities on a geographical
map (or scheme of card). Absolute and
other indices can be marked.
 Mapdiagram is the representations of
different types of diagrams on the
geographical map.

Common rules of construction
of graphs
every graphic representation must have a
name, where its contest, time and place is
mentioned;
Common rules of construction
of graphs
it must be built to a certain scale;
Common rules of construction
of graphs
for every graphic representation explanation
of colored application must be given (as
conditional denotations or shading).
Types of graphs


Linear diagram is used for illustration of the
frequency phenomena which changes with
time, that for the representations of the
phenomena dynamics.
The base of this diagram is the rectangular
system of co-ordinates. For example: on
abscising axis – Х - segments are put aside
on a scale, on a y-axis – indices of morbidity
(х : y = 4: 3).
Types of graphs


Column diagram (rectangular) is used for illustration
of homogeneous, but not connected between
themselves intensive indices. It represents dynamics
or static of the phenomena.
At construction of this kind of diagram columns are
drawn, the height of which must suit the quantities of
the represented indices taking the scale into account.
It is necessary to take into account that the wideness
of all the columns and also the distance between
them must be identical and arbitrary. Columns on a
diagram can be vertical or horizontal. For example:
growth of number of beds in permanent
establishment from 1990 to 2003 year.
Types of graphs


Sector diagram is used for illustration of
extensive indices, which characterize the
structure of the phenomenon, thus they give
imagination about specific gravity of the
phenomena in common.
The circle is taken as 100 % (if indices are
shown in %) thus 1 % equal to 3,60
circumferences. With the help of protractor
the segments, which suit the size of an index
are put aside on a circle.
Types of graphs
Radial diagram is the type of the linear
diagram built on polar co-ordinates.
 At the construction of radial diagram in
the role of abscising axis - Х is the
circle divided on the identical number of
parts, according to the spans of time of
that or another cycle.

Types of graphs

Linear diagrams are made according to
system of rectangular coordinates where the
horizontal scale is postponed at the left - to
the right on a line of abscissas (X), and
vertical - from below - upwards on a line
which is called as ordinate (Y). The obligatory
requirement of construction of any schedule
is scale, that is the image on drawing should
be reduced, compared with corresponding
figures.
Types of graphs

Long-pillar diagrams are for comparison of
several sizes. It is possible to rectangular
which represent sizes, it is possible to place
also on the plane diagram not on a vertical,
and across and then there will be a tape
diagram (Fig.4). In some cases the image of
sizes as tapes (stirs) is more convenient, than
as columns because it is easier to sign with
each tape by a horizontal inscription.
Quantitative types of connection
 functional
one is the connection, at
which the strictly defined value of the
second sign answers to any value of
one of the signs (for example, the
certain area of the circle answers to
the radius of the circle)
Quantitative types of connection

correlation - connection at which a few values of
one sign answer to the value of every average
size of another sign associated with the first one
(for example, it is known that the height and
mass of man’s body are linked between each
other; in the group of persons with identical
height there are different valuations of mass of
body, however, these valuations of body mass
varies in certain sizes – round their average
size).
Correlative connection


Correlative connection foresees the dependence
between the phenomena, which do not have
clear functional character.
Correlative connection is showed up only in the
mass of supervisions that is in totality. The
establishment of correlative connection foresees
the exposure of the causal connection, which will
confirm the dependence of one phenomenon on
the other one.
Correlative connection


Correlative connection by the direction (the
character) of connection can be direct and
reverse. The coefficient of correlation, that
characterizes the direct communication, is
marked by the sign plus (+), and the coefficient
of correlation, that characterizes the reverse one,
is marked by the sign minus (-).
By the force the correlative connection can be
strong, middle, weak, it can be full and it can be
absent.
Estimation of correlation by
coefficient of correlation
Force of connection
Complete
Line (+)
Reverse (-)
+1
Strong
From +1 to +0,7
Average
from +0,7 to +0,3 from –0,7 to –0,3
Weak
No connection
From -1 to -0,7
from +0,3 to 0
from –0,3 to 0
0
0
Types of correlative
connection
By direction
direct (+) – with the increasing of one sign
increases the middle value of another one;
 reverse (-) – with the increasing of one sign
decreases the middle value of another one;

Types of correlative
connection
By character
 rectilinear - relatively even changes of
middle values of one sign are
accompanied by the equal changes of the
other (arterial pressure minimal and
maximal)
 curvilinear – at the even change of one
sing there can be the increasing or
decreasing middle values of the other sign.
Terms Used To Describe The
Quality Of Measurements
Reliability is variability between subjects
divided by inter-subject variability plus
measurement error.
 Validity refers to the extent to which a test
or surrogate is measuring what we think it
is measuring.

Measures Of Diagnostic Test
Accuracy




Sensitivity is defined as the ability of the test to identify
correctly those who have the disease.
Specificity is defined as the ability of the test to identify
correctly those who do not have the disease.
Predictive values are important for assessing how
useful a test will be in the clinical setting at the individual
patient level. The positive predictive value is the
probability of disease in a patient with a positive test.
Conversely, the negative predictive value is the
probability that the patient does not have disease if he
has a negative test result.
Likelihood ratio indicates how much a given diagnostic
test result will raise or lower the odds of having a disease
relative to the prior probability of disease.
Measures Of Diagnostic Test
Accuracy
Expressions Used When
Making Inferences About Data

Confidence Intervals
- The results of any study sample are an estimate of the true value
in the entire population. The true value may actually be greater or
less than what is observed.



Type I error (alpha) is the probability of incorrectly
concluding there is a statistically significant difference in
the population when none exists.
Type II error (beta) is the probability of incorrectly
concluding that there is no statistically significant
difference in a population when one exists.
Power is a measure of the ability of a study to detect a
true difference.
Multivariable Regression
Methods


Multiple linear regression is used when the
outcome data is a continuous variable such as
weight. For example, one could estimate the
effect of a diet on weight after adjusting for the
effect of confounders such as smoking status.
Logistic regression is used when the outcome
data is binary such as cure or no cure. Logistic
regression can be used to estimate the effect of
an exposure on a binary outcome after adjusting
for confounders.
Survival Analysis


Kaplan-Meier analysis measures the ratio of
surviving subjects (or those without an event)
divided by the total number of subjects at risk for
the event. Every time a subject has an event, the
ratio is recalculated. These ratios are then used
to generate a curve to graphically depict the
probability of survival.
Cox proportional hazards analysis is similar to
the logistic regression method described above
with the added advantage that it accounts for
time to a binary event in the outcome variable.
Thus, one can account for variation in follow-up
time among subjects.
Kaplan-Meier Survival Curves
Why Use Statistics?
Cardiovascular Mortality in Males
1.2
1
0.8
SMR 0.6
0.4
0.2
0
'35-'44 '45-'54 '55-'64 '65-'74 '75-'84
Bangor
Roseto
Descriptive Statistics
Identifies patterns in the data
 Identifies outliers
 Guides choice of statistical test

Percentage of Specimens Testing
Positive for RSV (respiratory syncytial virus)
Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun
South 2
2
5
7
20
30
15
20
15
8
4
3
North- 2
east
West 2
3
5
3
12
28
22
28
22
20
10
9
2
3
3
5
8
25
27
25
22
15
12
2
2
3
2
4
12
12
12
10
19
15
8
Midwest
Descriptive Statistics
Percentage of Specimens Testing Postive for
RSV 1998-99
35
30
25
20
15
10
5
0
South
Northeast
West
Midwest
Jul
Sep
Nov
Jan
Mar
May
Jul
Distribution of Course Grades
14
12
10
Number of 8
Students 6
4
2
0
A
A- B+ B
B- C+ C
Grade
C- D+ D
D-
F
Describing the Data
with Numbers
Measures of Dispersion
•
•
•
RANGE
STANDARD DEVIATION
SKEWNESS
Measures of Dispersion
• RANGE
highest to lowest values
STANDARD DEVIATION
• how closely do values cluster around the
mean value
SKEWNESS
• refers to symmetry of curve
•
•
•
Measures of Dispersion
• RANGE
highest to lowest values
STANDARD DEVIATION
• how closely do values cluster around the
mean value
SKEWNESS
• refers to symmetry of curve
•
•
•
Measures of Dispersion
•
•
•
RANGE
• highest to lowest values
STANDARD DEVIATION
• how closely do values cluster around the
mean value
SKEWNESS
• refers to symmetry of curve
The Normal Distribution




Mean = median =
mode
Skew is zero
68% of values fall
between 1 SD
95% of values fall
between 2 SDs
Mean, Median, Mode
.
1

2