Transcript ppt

Data Transformations
Data Transformations
 For some data sets, it may be necessary to
transform variables
– e.g. change units (lb to kg, ˚C to ˚F, etc.)
• This is simply a change in the scale, and such
transformations are called ‘Linear’.
• Linear transformations consist of (1) multiplying all
the observations by a constant, (2) adding a constant
to all observations, or (3) both.
Data Transformations
 Multiplicative transformation example
– Y = weight in kg
– Y’ = weight in lb
– Y’ = 2.2Y
 Additive transformation example
– Measurements of nitrate (mg/l) → Y
• Y = 0.3, 0.35, 0.5, 0.42, 0.38, 0.56…
– Add 1 to each number → Y’
• Y’ = 1.3, 1.35, 1.5, 1.42, 1.38, 1.56…
Data Transformations
 Additive and Multiplicative example
– Body temperature measurements in ˚C (Y)
were taken for 47 women; if we convert to ˚F
(Y’):
• Y’ = 1.8Y + 32
 Multiplicative transformations affect S in
the same way that they affect the mean:
– e.g., if mean Y = 22, and mean Y’ = 2.2Y
– then SY’ = 2.2SY
Data Transformations
 Additive transformations, however, don’t affect S
Mean
Original
observations
Deviations
Transformed
observations
Deviations
0.36
-0.3
1.36
-0.3
0.40
0.1
1.40
0.1
0.42
0.3
1.42
0.3
0.38
-0.1
1.38
-0.1
0.39
1.39
Data Transformations
 Additive transformations thus effectively
move probability distributions to the left or
the right – but the shape of the histogram is
unchanged.
 Multiplicative transformations shrink or
stretch the probability distribution
Nonlinear Transformations
 These sorts of
transformations affect
data in more complex
ways.
 Examples:
Y '  log( Y )
Y' Y
Y' 1
Y
2
Y' Y
Nonlinear Transformations
 These transformations do change the
essential shape of frequency distributions
 They are thus used to try and make
distributions more symmetric – i.e., are
tools to achieve normality.
Transformations to achieve normality
 If the distribution is skewed
to the right (the most
common problem) then each
of the following
transformations will help
produce a more symmetric
distribution.
 The transformations are
listed in order of how much
they will pull in a rightskewed distribution.
Y
log 10 (Y )
ln( Y )
1
Y
1
Y
Transformations to achieve normality
 Percentage or
proportion data is a
special case – it often
appears binomially
distributed
– e.g., 0-100%, 0-1
 Here the appropriate
transformation is:
Y ' arcsin Y
Results
 Tables and figures - must have a purpose
Results: Tables
 When to use:
– Present numerical values
– Large amounts of information
 Rules
– Numbered consecutively
– Must be able to stand alone
– Vertical arrangement
– Title goes above the table
– Definitions/’explanations’ go below the table
“Bad Table”
Table 6. Growth rate of cell cultures and activity of ornithine
decarboxylase (ODC) and succinate dehydrogenase (SDH)
in Pseudomonas aeruginosa in response to various carbon
sources
Carbon Source
Growth rate
(generations/h)
Activity of ODC
(mol CO2/h)
Activity of SDH
(mmol
fumarate/h)
Glucose
Sucrose
Mannitol
0.93
0.21
0.47
12.6
6.9
1.5
137.7
19.3
50.9
“Good Table”
Table 7. Growth rate of cell cultures and activity of ornithine decarboxylase
(ODC) and succinate dehydrogenase (SDH) in Pseudomonas aeruginosa in
response to various carbon sources
Enzyme activity
Carbon Growth rate
ODC
SDH
Source (generations/h) (mol/CO2/h) (mmol fumarate/h)
Glucose
0.93
12.6
137.7
Sucrose
0.21
6.9
19.3
Mannitol
0.47
1.5
50.9
Table 4. Response of male fighting fish (Betta splendens) to
their image in a mirrora
aPrior
to the experiment, fish had been visually isolated from one another for
2 wk. Observation period for each fish was 30 s.
Results: Figures
 Use to illustrate important points
– summarize your data
 Number graphs consecutively
– separately from tables
 Must be able to stand alone
 Titles go below figure or on separate “Figure
Legends” page
 Know when to use specific types of graphs
– Bar graph vs histogram
– Scatter plot vs line graph
Bar graph (refer to page 57)
Mean # of flowers/plant
25
20
15
10
5
0
C. rap.
E. ang.
H. aur.
Species
Problems?
Bar graph (refer to page 57)
Cleared quadrat
Control quadrat
Mean # of flowers/plant
25
20
15
10
5
0
C. rap.
E. ang.
H. aur.
Species
25
Seed frequency
20
15
10
5
0
0 to 2
4 to 6
8 to 10
12 to 14
Disance from parent plant (cm)
16 to 18
20 to 22
Results: Graphs
 Do not forget to include error bars
– Is your data significant?
– Are there differences
 Complete figure legend
Cleared quadrat
Control quadrat
Mean # of flowers/plant
25
20
15
10
5
0
C. rap.
E. ang.
H. aur.
Species
Figure 2. Production of flowers by three species of plants in the
absence of interspecific competition and under natural conditions
Cleared quadrat
Control quadrat
Mean # of flowers/plant
25
20
15
10
5
0
C. rap.
E. ang.
H. aur.
Species
Figure 2. Production of flowers by three species of plants in the
absence of interspecific competition (cleared quadrats) and under
natural conditions (control quadrats). The plants were Campanula
rapunculoides, Epilobium angustifolium, and Hieracium aurantiacum.
Plotted are means for eight randomly chosen quadrats. Each 1 x 1 m2.
 Text
– Data summary
– Do not discuss or draw conclusions
 Statistics
– Incorporate statistics into the verbal text
– Be careful when using the word “significant”
– Refer to appropriate tables and figures
• When do you use “Figure” and when do you use
“Fig.”?
As shown in Figure 1, the shoreline of Hicks
Pond was generally predominated by
grasses and sedges.
Observed frequencies of turtles obtaining food
differed significantly from expected
frequencies (x2=58.19, df=8, P<0.001; Fig.
2).