Basic reading, writing and informatics skills for

Download Report

Transcript Basic reading, writing and informatics skills for

Basic reading, writing and
informatics skills for biomedical
research
Segment 6. Developing and
presenting your project
27 June 2008
Copyright: Ganesha Associates
1
Contents
• Writing a project proposal
• Experimental design
• Making a presentation
27 June 2008
Copyright: Ganesha Associates
2
Outline of a research proposal - 1
•
•
•
•
•
•
•
Title
Abstract
Specific Aims
Background & Significance
Preliminary Data
Methods
Resources
27 June 2008
Copyright: Ganesha Associates
3
Outline of a research proposal - 2
• Title
– What is the problem ?
• Abstract
– Write it last
– State the problem and the specific aims of the
project
– Describe the main methodologies to be used
– State the significance of the work
– May be the only thing some reviewers read
27 June 2008
Copyright: Ganesha Associates
4
Outline of a research proposal - 3
• Specific Aims
– One page
– Short conceptual narrative followed by well-defined
objectives and success criteria
– Relationship to experimental plan should be clear
• Background & Significance
– Helps reviewer understand why you have chosen this
particular problem and how it builds on previous work
– Shows you know what the important issues are and
why
• Preliminary Data
– Proof that the project is realistic and feasible
27 June 2008
Copyright: Ganesha Associates
5
Outline of a research proposal - 4
• Methods
–
–
–
–
Presents a detailed plan of attack for each specific aim
Should support costs proposed in the budget
Describes how you will evaluate success in achieving your aims
Provides a flow chart of logic for each experiment's results and
the subsequent steps in the research plan
– Addresses sub-optimal methodologies and offers rationale for
their use
• Resources
– Includes time table, often at end of section, to make
organizational and resourcing requirements apparent
– Budget
27 June 2008
Copyright: Ganesha Associates
6
Methods - choose your model system carefully
• In vivo, in vitro, in silico
• Pharmacological, surgical, genetic
• Example: Fetal malnutrition and metabolic
syndrome
– Animal: Rat, mouse, human ?
– Diet: Global under-nutrition, low maternal protein,
high fat diet during pregnancy
– Single, or multigenerational study ?
– Pharmacological, genetic or surgical model
– Disease: diabetes, hypertension, cardiovascular
– See review Brit. Med. Bull. 2001, 60, 103-121
27 June 2008
Copyright: Ganesha Associates
7
Methodology – make sure you understand the
variables you are measuring
• What is the normal range of variation in
measurement values ?
• Do you know why these arise ?
• What is the time course of the effect ?
27 June 2008
Copyright: Ganesha Associates
8
Project proposal – quick check list
• Why is the problem under study of importance
– Economic, medical significance ?
– What are the underlying key issues of basic scientific significance
– Establish strong links to the consensus view ?
• How is the problem to be addressed experimentally ?
– Has an appropriate model system been chosen ?
– What information needs to be collected ?
– Which methods have been chosen for this purpose and why ?
• Limitations
– Have the most-likely reasons for failure been identified ?
– What is the ‘Fail early’ strategy ?
• Literature review
– Is it up-to-date ?
– Are all key points of logical development in the text backed by an
appropriate reference ?
27 June 2008
Copyright: Ganesha Associates
9
Experimental design
•
•
•
•
•
•
•
•
Hypothesis
Assumptions, expectations
Statistics
Experiment 1
Results
Test assumptions
Experiment 2
Results...
27 June 2008
Copyright: Ganesha Associates
10
Experimental design
• An experimental strategy, often involving
specialist statistical techniques, used to test
hypotheses involving independent and
dependent variables by means of manipulation
of variables, controls and randomization.
• A true experiment involves the random allocation
of participants to experimental and control
groups, manipulation of the independent
variable, and the use of a control group for
comparison purposes.
27 June 2008
Copyright: Ganesha Associates
11
Early example of experimental design
• In 1747, while serving as surgeon on HM Bark
Salisbury, James Lind, the ship's surgeon,
carried out a controlled experiment to develop a
cure for scurvy.
• Lind selected 12 men from the ship, all suffering
from scurvy, and divided them into six pairs,
giving each group different additions to their
basic diet for a period of two weeks. The
treatments were all remedies that had been
proposed at one time or another.
27 June 2008
Copyright: Ganesha Associates
12
Early example of experimental design
• They were
– A quart of cider per day
– Twenty five gutts of elixir vitriol three times a
day upon an empty stomach,
– Half a pint of seawater every day
– A mixture of garlic, mustard and horseradish,
in a lump the size of a nutmeg
– Two spoonfuls of vinegar three times a day
– Two oranges and one lemon every day.
27 June 2008
Copyright: Ganesha Associates
13
Early example of experimental design
• The men who had been given citrus fruits
recovered dramatically within a week. One
of them returned to duty after 6 days and
the other became nurse to the rest. The
others experienced some improvement,
but nothing was comparable to the citrus
fruits, which were proved to be
substantially superior to the other
treatments.
27 June 2008
Copyright: Ganesha Associates
14
Early example of experimental design
• In this study his subjects' cases "were as
similar as I could have them", that is he
provided strict entry requirements to
reduce extraneous variation.
• The men were paired, which provided
replication. From a modern perspective,
the main thing that is missing is
randomized allocation of subjects to
treatments.
27 June 2008
Copyright: Ganesha Associates
15
Statistics
• There are many types of statistical tests
• Most can be carried out in Excel or with a
specialist statistics package
• The problems include:
– Selecting the right test (preferably before you do the
experiment)
– Understanding the assumptions on which the test is
based (which may have an impact on your
experimental design)
– Making sure the power of the test is adequate
27 June 2008
Copyright: Ganesha Associates
16
Variables
• The independent variables are the ones that the
researcher expects to be the cause of an outcome of
interest.
• The dependent variable is the outcome variable. In
experimental research, this variable is expected to
depend on a predictor (or independent) variable.
• For example, if a researcher wants to examine the effect
of a drug on blood pressure, the drug is the independent
variable, the blood pressure response the dependent
variable.
• An experiments can have more than one independent or
dependent variable, eg. Multivariate ANOVA
27 June 2008
Copyright: Ganesha Associates
17
Some definitions
• For a data set, the mean is the sum of the
observations divided by the number of
observations.
• The mean is often quoted along with the
standard deviation which describes the spread
of the data about the mean.
• Standard error – a statistical measure of
variation in a population of means
• The variance is a measure of statistical
dispersion, the average of the squared
differences between sample values and the
expected value (mean).
27 June 2008
Copyright: Ganesha Associates
18
Measurement - analysis of variance
27 June 2008
Copyright: Ganesha Associates
19
Measurement - 1
•
•
•
•
Repeated measurements are rarely the same
This variation can be expressed as a frequency histogram
The variation may be due to experimental error or to natural variations in
the variable being measured
The standard deviation about the mean is a statistic that is used to define
this variation precisely
27 June 2008
Copyright: Ganesha Associates
20
Measurement - 2
•
•
•
•
When many observations are made, the histogram becomes a curve.
In many cases this curve can be described precisely by a mathematical
equation – called the ‘normal distribution’.
The normal distribution can be defined mathematically by its mean and its
standard deviation.
Note, biological phenomena at best only approximate to the normal curve
27 June 2008
Copyright: Ganesha Associates
21
Measurement - 3
•
•
•
If you take a sample of n measurements of a variable that has a normal
distribution (blue) then you can calculate an estimate of the mean and the
standard deviation.
If you repeat this sampling many times than you will get a second,
narrower normal distribution (green - n, red - 4n).
The standard deviation of these errors is known as the standard error.
27 June 2008
Copyright: Ganesha Associates
22
Measurement - 4
•
•
•
Imagine that the green curve is the distribution of possible means of n
measurements for the blood pressure of control animals, and the purple
curve is the corresponding distribution for animals receiving the drug.
The actual mean recorded for the test animals is shown by the grey arrow
on the left, controls on the right.
Can I tell from these measurements whether the drug had an effect ?
27 June 2008
Copyright: Ganesha Associates
23
Measurement - 5
•
•
•
No !
All I can do is calculate the probabilty that both sets of measurements
come from the same normal distribution, i.e. Ho, the null hypothesis,
‘there is no effect’
If the probability is sufficiently low, usually p<0.05, then I may choose to
reject the Ho.. But I could still be wrong...
27 June 2008
Copyright: Ganesha Associates
24
Statistical tests
• Most statistical tests begin with the
assumption that each data sample
(control, test, etc) was drawn from the
same population, i.e. that there is no
treatment effect
• They assume that the individual
measurements are normally distributed (or
can be transformed so that they
approximate to a normal distribution)
27 June 2008
Copyright: Ganesha Associates
25
Assumptions
• Controls and test subjects must from identical
populations
– Age, gender, medical history, genetics...
• Data are independent
• Effects of multiple testing have been accounted
for
• Sources of human bias have been controlled for
• The power of the statistical test is sufficient to
detect the change predicted. Use a positive
control
27 June 2008
Copyright: Ganesha Associates
26
Assumptions - controls
• Suppose a farmer wishes to evaluate a new
fertilizer. She uses the new fertilizer on one field
of crops (A), while using her current fertilizer on
another field of crops (B).
• The irrigation system on field A has recently
been repaired and provides adequate water to
all of the crops, while the system on field B will
not be repaired until next season.
• She concludes that the new fertilizer is far
superior.
• Examples from clinical genetics
27 June 2008
Copyright: Ganesha Associates
27
Assumptions – independence (1)
• Statistical tests are based on the assumption that
each subject was sampled independently of the rest.
• Consider the following three situations:
– You are measuring blood pressure in animals.
– You have five animals in each group, and measure the blood
pressure three times in each animal.
– You do not have 15 independent measurements, because
the triplicate measurements in one animal are likely to be
closer to each other than to measurements from the other
animals.
– You should average the three measurements in each
animal.
– Now you have five mean values that are independent of
each other.
27 June 2008
Copyright: Ganesha Associates
28
Assumptions – independence (2)
– You have done a biochemical experiment three times,
each time in triplicate. You do not have nine
independent values, as an error in preparing the
reagents for one experiment could affect all three
triplicates. If you average the triplicates, you do have
three independent mean values.
– You are doing a clinical study, and recruit ten patients
from an inner-city hospital and ten more patients from
a suburban clinic. You have not independently
sampled 20 subjects from one population. The data
from the ten inner-city patients may be closer to each
other than to the data from the suburban patients. You
have sampled from two populations, and need to
account for this in your analysis.
27 June 2008
Copyright: Ganesha Associates
29
Assumptions – multiple tests
• If you test several independent null hypotheses, and
leave the threshold at 0.05 for each comparison, there is
greater than a 5% chance of obtaining at least one
"statistically significant" result by chance
• For example, if you test three null hypotheses and use
the traditional cutoff of p<0.05 for declaring each p value
to be significant, there would be a 14% chance of
observing one or more significant p values, even if all
three null hypotheses were true.
• To keep the overall chance at 5%, you need to lower the
threshold for significance to 0.0170.
27 June 2008
Copyright: Ganesha Associates
30
Assumptions - bias
• Double blind experiments
• A research design where both the
experimenter and the subjects are
unaware of which is the treatment group
(drug) and which is the control (placebo).
27 June 2008
Copyright: Ganesha Associates
31
Types of statistical test - 1
• Number of independent variables
– Drug, diet...
• Number of dependent variables
– Blood pressure, heart rate, glucose levels...
• Type of data
– Parametric, non-parametric
27 June 2008
Copyright: Ganesha Associates
32
Types of statistical test - 2
•
•
•
•
•
•
•
•
•
•
Student's t-test
chi-square test
Analysis of variance (ANOVA)
Mann-Whitney U
Regression analysis
Factor Analysis
Correlation
Pearson product-moment correlation coefficient
Spearman's rank correlation coefficient
Time Series Analysis
27 June 2008
Copyright: Ganesha Associates
33
Types of statistical test - 3
• Interval, or parametric
– 0.32, 1052, etc
– Normal distribution
• Nominal, or non-parametric
– Male, pregnant, red
– Binary distribution
• Ordinal, or non-parametric
– First, third
– Order by rank
27 June 2008
Copyright: Ganesha Associates
34
Types of statistical test - 4
27 June 2008
Copyright: Ganesha Associates
35
Types of statistical test - 5
When we have more than two groups, it is inappropriate to
simply compare each pair using a t-test because of the
problem of multiple testing.
The correct way to do the analysis is to use a one-way
analysis of variance (ANOVA) to evaluate whether there
is any evidence that the means of the populations differ.
If the ANOVA leads to a conclusion that there is evidence
that the group means differ, we might then be interested
in investigating which of the means are different.
27 June 2008
Copyright: Ganesha Associates
36
Types of statistical test - 6
Tukey's multiple comparison test is one of several tests that
can be used to determine which means amongst a set of
means differ from the rest.
The results are presented as a matrix showing the result for
each pair, either as a P-value or as a confidence interval.
The Tukey multiple comparison test, like both the t-test
assumes that the data from the different groups come
from populations where the observations have a normal
distribution and the standard deviation is the same for
each group.
27 June 2008
Copyright: Ganesha Associates
37
“Why most published research findings
are false”
• There is increasing concern that most current published research
findings are false.
• A research finding is less likely to be true:
– when the studies conducted in a field are smaller
– effect sizes are smaller
– when there is a greater number and lesser pre-selection of tested
relationships
– where there is greater flexibility in designs, definitions, outcomes, and
analytical modes
– when there is greater financial and other interest and prejudice
– when more teams are involved in a scientific field in chase of statistical
significance.
• For many current scientific fields, claimed research findings may
often be simply accurate measures of the prevailing bias.
John Ioannidis, PLos Medicine, 30 Aug 2005
27 June 2008
Copyright: Ganesha Associates
38
Learning points
• If you aren’t certain how much variation to
expect in your experiment, try a small
scale preliminary version.
• The more measurements you take, the
greater the precision, but
• First try to identify and eliminate some of
the sources of variation
27 June 2008
Copyright: Ganesha Associates
39
Collecting data – keep a notebook
27 June 2008
Copyright: Ganesha Associates
40
Collecting data – make a spreadsheet
27 June 2008
Copyright: Ganesha Associates
41
Collecting data – check key
assumptions
60.0
Colonização (%)
50.0
Jan-05
40.0
Aug-05
30.0
Linear (Jan-05)
20.0
Linear (Aug-05)
10.0
0.0
16 I
16 II 16 III 12 I
12 II 12 III
8I
8 II
8 III
4I
4 II
4 III M ata M ata M ata
I
II
III
Subáreas
27 June 2008
Copyright: Ganesha Associates
42
Beware, in biology there are many unknowns
“As we know,
There are known knowns.
There are things we know we know.
We also know
There are known unknowns.
That is to say
We know there are some things
We do not know.
But there are also unknown unknowns,
The ones we don't know
We don't know.”
Donald Rumsfeldt, US Secretary of Defense (sic)
Feb. 12, 2002, Department of Defense news briefing
from "The Poetry of D.H. Rumsfeldt"
http://slate.msn.com/id/2081042/
27 June 2008
Copyright: Ganesha Associates
43
Presenting your ideas
• Create a slide show that is an outline, not a
script
• Use the slide show...
– to select important information and visuals
– to organize content
– to create a hierarchy
•
Many of the subsequent slides were adapted from work done by the
Cain Project in Engineering & Professional Communication
•
www.owlnet.rice.edu/~cainproj
27 June 2008
Copyright: Ganesha Associates
44
Selecting Content
• Consider your audience – not everyone
will have your knowledge of the problem!
• State problem/question clearly, early and
repeat (in the title, in the introduction)
• Explain the significance, context
• Include background:
organism/system/model
• State the point of departure for work
precisely
27 June 2008
Copyright: Ganesha Associates
45
Displaying Text
• Remember that your audience...
–
–
–
–
skims each slide
looks for critical points, not details
needs help reading/ seeing text
So keep to an outline only
• Help your audience by…
–
–
–
–
–
27 June 2008
Projecting a clear font
Using bullets
Using content-specific headings
Using short phrases
Using grammatical parallelism
Copyright: Ganesha Associates
46
Project a clear font
• Serif: easy to read in printed documents
– Times New Roman, Palatino, Garamond
• Sans serif: easy to see projected across
the room
– Arial, Helvetica, Geneva
27 June 2008
Copyright: Ganesha Associates
47
Use bullets – but not too many
• Bullets help your audience
– to skim the slide
– to see relationships between information
– organize information in a logical way
• For example, this is Main Point 1, which leads
to...
– Sub-point 1
• Further subordinated point 1
• Further subordinated point 2
– Sub-point 2
27 June 2008
Copyright: Ganesha Associates
48
Use content-specific headings
• “Results” suggests the content area
for a slide
• “Substance X up-regulates gene Y”
(with data shown below) shows the
audience what is observed
27 June 2008
Copyright: Ganesha Associates
49
Use short phrases
• Be clear, concise, accurate
Difficult to read
Better
DNA polymerase catalyzes
elongation of DNA chains in
the 5’ to 3’ direction
DNA polymerase
extends 5’ to 3’
• Write complete sentences only in certain cases:
 Hypothesis / problem statement
 Quote
 ???
27 June 2008
Copyright: Ganesha Associates
50
Use grammatical parallelism
• Use same grammatical form in lists
• Not Parallel:
– Cells were lysed in buffer
– 5 minute centrifuging of lysate
– Removed supernatant
• Parallel:
– Lysed cells in buffer
– Centrifuged lysate for 5 minutes
– Removed supernatant
27 June 2008
Copyright: Ganesha Associates
51
Use grammatical parallelism
How would you revise this list?
Telomeres
• Contain non-coding DNA
• Telomerases can extended telomeres
• Cells enter senescence/apoptosis when telomeres
are too short
27 June 2008
Copyright: Ganesha Associates
52
Use grammatical parallelism
One possible revision…
Telomeres
• Contain non-coding DNA
• Are extended by telomerase
• Cause senescence/apoptosis when shortened too
much
27 June 2008
Copyright: Ganesha Associates
53
Displaying visuals
• Select visuals that enhance understanding
– Figures from your work: evidence for
argument
– Figures from other sources (web; review
articles):
• Model a process or concept
• Help explain background, context
• Design easy-to-read visuals
– Are the visuals easy to read by all members of
your audience?
• Draw attention to aspects of visuals
27 June 2008
Copyright: Ganesha Associates
54
Simplify and draw attention
27 June 2008
Copyright: Ganesha Associates
http://www.indstate.edu/thcme/mwking/tca-cycle.html
55
Cite others’ visuals
Harvey et al. (2005) Cell 122:407-20
http://www.bioc.rice.edu/~shamoo/shamoolab.html
27 June 2008
Copyright: Ganesha Associates
56
Samples
Features to consider:
• Text
– Fonts, use of phrases, parallelism
• Visuals
– Readability, drawing attention
• Slide design
• Organization/ hierarchy
– Titles, bullets, arrangement of information, font size
27 June 2008
Copyright: Ganesha Associates
57
27 June 2008
Copyright: Ganesha Associates
58
27 June 2008
Copyright: Ganesha Associates
59
27 June 2008
Copyright: Ganesha Associates
60
27 June 2008
Copyright: Ganesha Associates
61
27 June 2008
Copyright: Ganesha Associates
62
The Calcium Ion
Calcium is a crucial cell-signaling molecule
–Calcium is toxic at high intracellular
concentrations because of the phosphatebased system energy system
–Intracellular concentrations of calcium are
kept very low, which allows an influx of
calcium to be a signal to alter transcription
27 June 2008
Copyright: Ganesha Associates
63
Microarrays
27 June 2008
Copyright: Ganesha Associates
Phillips G. (2004) Iowa State University College of Veterinary Medicine.
64
Presenting
• Delivery
• Handling questions
27 June 2008
Copyright: Ganesha Associates
65
Delivery
• Physical Environment
• Stance
– Body language
– Handling notes
• Gestures
• Eye contact
• Voice quality
– Volume
– Inflection
– Pace
27 June 2008
Copyright: Ganesha Associates
66
Handling Questions
•
•
•
•
27 June 2008
LISTEN
Repeat or rephrase
Watch body language
Don’t pretend to know
Copyright: Ganesha Associates
67
Practical activity 6a - Developing and
presenting your project
• Total duration - ca. 2 hours.
• Identify the five most important research articles that frame your
hypothesis, i.e. the fundamental facts and assumptions upon which
your idea is based.
• Describe the basis for your hypothesis in a paragraph of no more
than seven sentences.
• Read the article by Peter Norvig on experimental design. (For
Firefox users the alternative URL is here.)
• What alternative experimental approaches are available to answer
your question ?
• How do you intend to verify your hypothesis?
• Identify and justify the journal you want to publish the results of your
research in.
• Give a 5-slide presentation to justify your choices at the next
session.
27 June 2008
Copyright: Ganesha Associates
68
Practical activity 6b - Thinking about
probability and statistics
•
•
•
•
•
•
•
•
•
•
•
•
Total duration - ca. 3 hours.
First read the series of articles published recently by Wai-Ching Leung in the British Medical Journal. Although
intended for a medical audience, these article provide the basis for a useful primer for all most fields of
biomedical research. The articles are:
Why and when do we need medical statistics
Measuring chances
Summarising information
Testing hypotheses
Now answer the following questions:
I have a plant extract which I believe has an effect on blood pressure. I measure its effects by injecting the
substance into rats and measuring their blood pressure before and after the injection. The statistical test I use
tells me that the probability of collecting this sample of results is less than 0.05. What does this mean ?
1% of women aged forty who participate in routine mamography screening have breast cancer. 80% of the
women with breast cancer get a positive result. 9.6% of women without breast cancer will also get a positive
result. So, if a woman from this group gets a positive result, what is the probablity that she has breast cancer
?
In the UK, car registration plates can typically consist of a string of 6 or 7 alphanumeric characters (A, B, C,
etc, 1, 2, 3 etc). So the probability of a specific sequence of characters (e.g. DB1979) is less than 1 in 2
billion. I send a small group of people out into a car park and ask them to look for a registration plate that has
personal significance for them. What is the likelihood of this happening ?
A friend of mine has consistently predicted the results of 5 of the football matches leading to today's final. He
is offering to sell me his prediction for the final match so that I can place a bet and make some money. What
are the odds that he will predict the outcome of the last match correctly ?
A murder is committed. Traces of your fingerprints are found on the murder weapon. What is the probability
that you are guilty ?
27 June 2008
Copyright: Ganesha Associates
69
Practical activity 6c - Presenting data
• Total duration - ca. 1 hour.
• Read Mary Purugganan's presentation
about data visualisation. Identify some
examples of illustrations used in recent
primary research papers which illustrate
some of the points she makes.
27 June 2008
Copyright: Ganesha Associates
70