Prezentace aplikace PowerPoint
Download
Report
Transcript Prezentace aplikace PowerPoint
Statistics
and
Informatics
www.vfu.cz/statistics
Statistics
is the science that allows to formulate and describe
complex data (measurements, observations) in a short
form, easily understood by all professionals.
Statistics is especially needed in more probabilistic and
less predictive sciences such as biology and applied
biology (medicine).
In a predictive science (such as math, physics) one has only to apply
data in an appropriate formula to obtain accurate answer.
(e.g. 1 + 1= 2)
In biology and applied biology (medicine) – we deal with living
organisms, that are very complex in their reactions and description.
There is a high level of insecurity.
Every individual is unique, therefore obtained data may be very
different and variable (genetic variability) – they need specific
methods (statistical) for their evaluation.
The only statistical methods can take into account this great
variability of biological data, evaluate them and give correct
inferencies about studied biological objects.
When we use statistics for biological issues we can use a term
biostatistics.
Biostatistics
= statistics applied to biological problems (in particular in
research sphere: how to design experiments and
evaluate their results)
- analyzes biological characters (their values differ from
one entity to another) they are termed variables
Different kinds of variables types (categories) of data they may be treated differently depending on their
exactness
Types of Biological Data
• Data on Nominal Scale – are classified by some quality
(Categorical Data)
(2 possibilities: present or not present –
disease, anomaly, death, vaccination … )
• Data on Ordinal Scale (Rank Data) – consist of arrangement of
measurements based on subjective scale.
(classification on grades, points in competitions)
• Data on Numerical Scale – exact numeric values (obtained in
objective measurement, device).
(body temperature, weight, lenght, volume etc.)
Categories of Data
Different methods of statistical examination
(different exactness)
Statistical methods useful with numerical or ordinal data
are more exact and generally are not applicable to nominal
data (little information for exact methods).
It is possible reversely : less exact methods for nominal (or
ordinal) data are useful also for numerical data (used for
preliminary analyses).
Formal viewpoint:
•
Continuous Data - variables that could be any conceivable
value within any observed range
(height, lenght, weight, temperature)
•
Discrete Data (discontinuous) - variables that can take only
certain values – integer numbers
(number of animals, patients, eggs, cells etc.)
Numerical- and ordinal-scale data may be continuous or discrete.
Nominal-scale data are discrete by their nature.
Statistical Sets
(groups of individuals – animals, plants, cells, items, etc.)
• Population (Universe) – N=
(number of members)
- „all items“, that could show studied variable
- is often very large (cattle in Europe, dogs in Czech Rep., world)
- „endless“ number of entities
We are not able to obtain all possible measurements from the
population in practice analysis of a small subset inferences
about the population (aim of statistics)
Statistical Sets
• Sample (Subset) – n (number of members)
- definite number of individuals from the population (that implies inaccuracy
in evaluation in comparison with the whole population)
- to reach the most valid conclusions about a population, the sample must be
a representative subset of the population. It means:
• random sample (no subjective choice)
- drawing lots for registration numbers of animals,
table of random numbers, etc.
• appropriate size of the sample (the more the better, but there are
practicable limits - time, money etc.)
Characteristics of Variables
- Discrete
- Continuous
Statistical variables can be described by means of some specific terms:
Variant Sequence – listing of all observed values (variants)
- arranged up or down
e.g.: 2,3,4,4,5,5,5,6,6,7,7,8 (discrete data- number of youngs in a litter)
Frequency of Variant – how many times each value is observed
Frequency Distribution – graphically presented distribution of all
observed frequencies in the sample
Frequency Distribution – Discrete Data:
(Bar Graph)
y (frequency)
3
2
1
0
1
2
3
4
5
6
7
8
x
(number of pups)
Discrete data - number of pups in a litter: 2,3,4,4,5,5,5,6,6,7,7,8
Frequency Distribution – Continuous Data:
(histogram)
Continuos data: we create classes = equivalent intervals of data.
Number of classes: according to the sample size (to 100 items: 6- 9 classes
to 500 items: 10-15 classes)
freq.
Histogram
Polygon (Empirical curve) –
specific for one sample
x (weight)
midpoint of the
class
All data in the interval get the same value = midpoint of the class
Number of items (individuals) in the interval = frequency of the class
Frequency (Probability) Distribution
P(x) – probability (proportion of cases)
Empirical curves
(samples)
Theoretical curve
(population)
x (weight)
Empirical curves for different samples (obtained from one population) are
located along the only one theoretical curve (continuous), that describes
probability distribution of the variable in the population.
Shapes of Probability Distributions
a)
Normal (Gaussian)
symmetric bell curve
(most often in biol.data)
b) Asymmetric (right-skewed, left-skewed)
ad b) Extreme (decreasing, increasing)
c) Nonnormal (unknown, irregular, 2 and more peaks)
Proportions of Distribution
For each distribution we can define measures (quantiles) that divide a
group of arranged data into 2 parts (portions):
- values that are smaller than quantile
- values that are bigger than quantile
50% quantile – x0.5 (median) divides a group into 2 halves
50%
50%
X0.5
50%
50%
X0.5
Quartiles – divide a group of data into four equal parts
25% 25%
25%
25%
X0.25 x0.5 x0.75
25%
25%
25% 25%
X0.25 x0.5 x0.75
Deciles – divide a group of data into 10 equal parts
Percentiles – divide a group of data into 100 equal parts
Quantiles – Use in Statistics:
• Important quantiles and their corresponding proportions of the most
common distribution curves are tabulated in statistical tables
• Are used:
- as critical values in statistical hypotheses testing
- as coefficients in calculations (assesment of confidential
intervals of statistical parameters)