Transcript Slide 1

The Scientific Method
Data Recording & Transformation
Recording Data
Field or experimental data must be recorded in a planned way
Variables under investigation may be :
direct measurements e.g. weight, length, amount, pH
category frequencies e.g. numbers of a species, length
range, colour
derived measurements e.g. %, numbers / area, amount /
time, amount / area / time
Datalogging tables must:
record all values needed to derive the value of a variable
permit derivation calculations to be recorded alongside the raw data
have clear headings and units for each value e.g.
Lecture 3: Data Recording and Transformation
Recording Data
Frequency data are normally recorded as a tally
each fifth stroke crossing out the previous four for quick mental addition!
Areas to be sampled can be measured out on a grid system (eg 0.5 metre intervals
for 0.5 x 0.5m quadrats) and sample quadrat positions chosen from
random number tables
Map references are given as Eastings (rows) first, then Northings (columns) second
Lecture 3: Data Recording and Transformation
Recording Data
Fixed independent variables ie temperature, humidity, container size which have
the potential to affect the value of the dependent variable must also be recorded
Preliminary experiments may be needed to set values for these
e.g. a temperature at which a bacterial culture will grow well
laboratory instruments must be calibrated before recording variable values eg
pH meters are checked / reset against buffer solutions
spectrophotometers must be zeroed against a blank solution containing
reagents but no product, then read against a range of known concentrations
of the product - used to plot a calibration curve for the instrument
Consistent rounding of decimal numbers (up or down!) and correct choice of
significant figures to reflect the accuracy of measurements is very important
Rounding up is conventional in scientific work
Lecture 3: Data Recording and Transformation
Data Transformation
Summarises and highlights trends in the data eg
Totals – sum all the data values for a variable, useful for
comparison and other purposes
Percentages – describe the proportion of data falling into particular
categories
Rates – show how a variable changes with time and allow
comparison of data recorded over different time periods
Reciprocals (1 ÷ variable) – reverse the magnitude of a variable
and can help data interpretation
Relative values – expression of data in relation to a standard value,
providing context or helping application
e.g. egg output per 1000 hens per month , energy requirement per
Kg body weight
problem?
Lecture 3: Data Recording and Transformation
Now attempt the two data
transformation exercises in your
workbook!
Lecture 3: Data Recording and Transformation
Descriptive Statistics
Three important mathematical descriptions of the distribution of data
Empirical frequency distributions
Measures of location
Measures of dispersion
Frequency Distributions
Show the frequency of occurrence of observations in a data set
Qualitative, non-numerical and discrete data (for at least one
variable) are usually depicted in a bar chart
Lecture 3: Data Recording and Transformation
Descriptive Statistics
The data are discontinuous, so the bars do
not touch
Mean frequency of vole jawbones
in owl pellets
10%
9%
9%
8%
7%
7%
6%
5%
Data values may be entered on or above
the bars and multiple data sets can be
displayed using different
coloured/hatched bars side by side
4%
4%
3%
Mean frequency of vole jawbones
in owl pellets
2%
2%
1%
0%
Woodland
Grassland
Cornfield
Riverbank
Riverbank,
9%
Habitat
A pie diagram displays the relative frequency
of data in each category (shown numerically
above the bars in the bar chart)
Absolute values may be entered alongside
the segments
Lecture 3: Data Recording and Transformation
Woodland,
4%
Cornfield,
2%
Grassland,
7%
Descriptive Statistics
Continuous data is usually depicted in a histogram
Size Frequency in a Sample of Perch
The bars touch to depict continuity
The X (horizontal) axis usually
records the class interval
This distribution is skewed to the left
60
50
40
30
20
10
15
.9
14
.0
–
13
.9
12
.0
–
11
.9
10
.0
–
9.
9
8.
0–
7.
9
6.
0–
5.
9
4.
0–
3.
9
2.
0–
0–
1.
9
0
Class intervals must be even and clearly defined such that an observation can fall
INTO ONE CLASS ONLY
e.g. 0 - 0.99, 1.00 – 1.99, 2.00 – 2.99, 3.00 – 3.99
Lecture 3: Data Recording and Transformation
Descriptive Statistics
It is sometimes helpful when comparing two or more frequency distributions
where the total numbers of observations differ to calculate relative frequency or
cumulative relative frequency distributions
This type of data plot is called an ogive
Lecture 3: Data Recording and Transformation
Measures of Location (Averages)
Average refers to several measures of the central tendency of a data set
► arithmetic
mean x
if x is a continuous variable and there are n observations in the sample, then the
sample mean x
x
x= n
sigma = “sum of ”
The mean is a good measure of central tendency when the data is distributed
symmetrically
but
will be distorted by a few excessively small or large values of x (outliers)
Lecture 3: Data Recording and Transformation
Measures of Location (Averages)
► median
— the central value in a set of n observations arranged in rank order, with
as many observations above it as below it
If n is an odd number, the median = the
counting from the smallest
n  1thobservation,
2
If n is an even number, the median is half-way between the value of the central
two values
► mode
- the most commonly occurring observation in a data set.
The modal class is the group or class into which most observations fall in
a histogram
In a perfectly symmetrically distributed data set, mean, median and mode have the
same value
Lecture 3: Data Recording and Transformation
Measures of Dispersion
Four main expressions of the spread of data
►Range
– the difference between the largest and the smallest observations
►Interquartile
range - the range of values enclosing the central 50% of the
observations when they are arranged in order of magnitude (ranked )
►Variance
- determined by calculating the average of the deviation of each
observation from the arithmetic mean
The variance is a very useful measure of data dispersion. Because some of the values
will be negative, the deviations are squared to make them all positive and the variance
( s2 ) is calculated as:
Lecture 3: Data Recording and Transformation
Measures of Dispersion
s2 is used to denote the sample variance and distinguish it from the population
variance – given the symbol σ2 and calculated by dividing by n
s2 = mean of (squares minus the square of the mean)
Lecture 3: Data Recording and Transformation
Measures of Dispersion
►
Standard deviation ( s or SD) is the square root of the variance and the most
popular measure of dispersion
and represents the average of the deviations of the observations from the
arithmetic mean
The population standard deviation ( σ ) is calculated by using n rather than (n – 1) in
the same way that σ2 represents the population variance. The population mean is
given the symbol µ (mu)
Lecture 3: Data Recording and Transformation