Powerpoint for lesson

Download Report

Transcript Powerpoint for lesson

Introduction to Gene Mining
Part A: BLASTn-off!
After Part A you will demonstrate your ability to:
Use the bioinformatics NCBI Gene and BLASTn
tools to search for a human gene of interest in a
plant model.
Evaluate the significance of your search results
to see how similar human and plant
genes might be.
1
The Arabidopsis Information Portal is funded by a grant from
the National Science Foundation (#DBI-1262414)
and co-funded by a grant from the Biotechnology and
Biological Sciences Research Council (BB/L027151/1).
These lessons were developed during the summer of 2015 as
education outreach for the www.Araport.org portal in
conjunction with the J. Craig Venter Institute, Rockville, MD,
20850, USA.
Contact information
General information: [email protected]
Jason Miller, Grant Co-Principal Investigator, JCVI
[email protected]
This lesson was prepared by Andrea Cobb, Ph.D.
([email protected])
with the help of Margot Goldberg
([email protected])
The images below are all examples of….?
3
What science models do you recall?
Lipid bilayer model
Lock and key model of enzymes
Stickleback model of evolution
Computer models
Experimental model of osmosis
4
Why use models instead of the “real
thing”?
To simplify a complex system
Example: Study an enzyme reaction
in a test tube rather than in the
whole organism which contains many enzymes.
To better manipulate and measure an effect
Example: Treat Drosophila with drug X and measure the
drug’s effect on Drosophila life span.
To predict (test the model)
Example: Use a computer model to find protein coding
regions in the DNA of a newly sequenced genome.
Other ideas?
5
Thanks for volunteering for our study. Your chart
says you have problems eating, facial weakness
and overall poor muscle tone. Looks like your
mother had the same symptoms.
Your diagnosis is nemaline myopathy. I am sad to
tell you that no known treatment exists, but my
researchers and I are working hard to find a
treatment.
Thanks
for your
help,
Doctor!
You can find information on this genetic disorder
in a website called Online Mendelian Inheritance
in Man http://www.OMIM.org
The OMIM database shows that you might have
a mutation in your Actin alpha 1 gene.
We won’t experiment on you! It is much faster,
kinder and less expensive to use a plant model.
Which plant will you use to
study a version of my actin alpha 1
(ACTA1) gene?
https://www.youtube.c
om/watch?v=foHiKrlY9
Qc explains why
scientists use a certain
plant for a model
7
https://www.arabidopsis.org/portals/education
/aboutarabidopsis.jsp
8
Can plants really be used as models
for studying human diseases?
9
Xiang Ming Zu and Simon Geir Molier, Current Opinions in
Biotechnology, 2011, 22, 300-307.
10
Before we find out whether plants have
human muscle genes, it would be
important to know if plants move!
• http://www.bbc.co.uk/progra
mmes/p00lx6cl
• https://www.youtube.com/w
atch?v=eDA8rmUP5ZM
http://aboutlifting.com/music-helps-plantsgrow-and-will-help-muscles-grow/
11
Why don’t you rest? I am going to search the OMIM
database to find out more about your possible gene
mutation.
Use your computer and go to:
http://www.OMIM.org
and find out more about nemaline
myopathy and the ACTA1 gene that
may be involved.
After you answer questions on
your handout, type in any human
disease that interests you and
examine the results.
12
• Use your computer to find:
http://www.OMIM.org and learn more about
nemaline myopathy and the ACTA1 gene that
may be involved.
• After you answer questions on your handout,
search for any human disease that interests
you and examine the information.
13
Use your textbook, open access textbooks, videos
and databases to begin to find information about
muscle genes and proteins.
https://www.boundless.com/biology/
14
Usually, a general search engine will give
you too many hits for the question below!
15
108 results
Even a
broad
scientific
database
may
provide
too many
unrelated
hits!
Why are
there
SO MANY
results?
16
“BIG DATA”
https://en.wikipedia.org/wiki/List_of_RNAs
Biologists are increasingly able to quickly generate enormous amounts of data but
their data analysis may take weeks or even years. Data transfer protocols are not
interchangeable, data storage is expensive, queries can crash!
17
What scientific approach
finds better information?
• Bioinformatics is an
interdisciplinary
approach which uses
computational,
mathematical, and
engineering methods
to analyze and make
discoveries from
enormous data sets.
18
To address the
problem of BIG DATA,
scientists can share
data and analysis with
other scientists.
This speeds analysis
and adds expertise .
Scientists can share
their data in researchspecific portals.
These research-specific
portals usually have
customized
bioinformatics tools.
19
A few examples of how bioinformatics is used….
Use
Use
Questions
Questions addressed:
addressed:
Basic
Basic research
research
How
How is
is DNA
DNA organized
organized in
in chromosomes?
chromosomes?
Are
Are genes
genes related
related to
to other
other genes?
genes? Given
Given
sequence
sequence data,
data, how
how do
do we
we find
find aa gene?
gene?
How
How are
are genes
genes expressed
expressed in
in response
response to
to
the
the environment?
environment?
Biomedicine
Will this drug work on this patient? Can
we cure genetic diseases? Which genetic
variations are associated with heart
disease? Which pathogen proteins are
best for vaccine development?
Can microbes remove pollution? Can
microbes decrease the impact of climate
change? Where did a disease originate?
Microbiology
Agriculture
Can drought resistant plants be identified,
bred or engineered? Can insect resistant
plants improve food supplies? Can more
healthful food sources be developed?
20
Scientists are more likely to find useful information in
bioinformatics portals that support their particular research.
21
Araport
https://www.araport.org/
National Center for
Biotechnology
Information
http://www.ncbi.nlm.
nih.gov/gene
FLOR-ID
http://www.phytosystems.ulg.ac.be/florid/
An example of increasingly more specific research-centered portals
22
For our plant model to be useful for my
research, I must find a similar plant
version of the ACTA1 gene involved in
nemaline myopathy.
Since plants and animals both move, do
they use the same types of proteins to
move?
Do they have the same genes coding for
these proteins?
23
Begin your search on the NCBI portal to find
names of human muscle genes.
Use http://www.ncbi.nlm.nih.gov/ and enter information shown, use the pull- down menu to
select Gene. (Note: Araport.org and similar genome browsers will also allow you to search
for genes and proteins of interest.)
24
Could plant and animal versions of this gene
have a function in common?
25
Actin subunits self-assemble to form filaments which
have a role in cell structure.
Check the “Inner Life of the Cell” video.
https://www.youtube.com/watch?v=FzcTgrxMzZk
(2:20 until 3:15)
This is how
your actin
should work.
https://www.youtube.com
/watch?v=VVgXDW_8O4U
is a video showing
polymerization of G-actin,
a protein similar to Alpha
Actin.
26
If it is reasonable that plants might have a gene similar
to human ACTA1, you will need to find the ACTA1 gene
sequence.
Click on
FASTA to
obtain the
human
ACTA1
gene
sequence.
27
Copy, then paste the ACTA1 gene sequence to a new Word
document or clipboard—we will use this to look for an
Arabidopsis thaliana version of this gene. Save the Word
document as “human ACTA1 DNA sequence”.
28
I want to search for a version of the human ACTA1
gene in Arabidopsis thaliana.
What bioinformatics tool
could I use?
29
30
BLAST Types
BLASTn compares 2 or more DNA sequences
BLASTp compares 2 or more protein sequences
BLASTX reads a DNA sequence in the 6
possible reading frames then compares it to a
protein sequence database
tBLASTX compares 2 or more DNA sequence
translated in 6 reading frames
31
32
If I have a known DNA sequence ,
how can I use BLASTn to look for an
unknown similar sequence?
http://www.ncbi.nlm.nih.gov/
There are several ways to
access NCBI BLAST. Start at
the URL and page, then
select BLAST.
Or just go to the
BLAST page URL
below.
http://blast.ncbi.nlm.nih.gov/Blast.cgi
Select nucleotide blast
33
You found a human gene to compare…
Click on
FASTA to
obtain the
human
ACTA1
gene
sequence.
34
And you’ve already copied and pasted the ACTA1
gene sequence to a Word document or clipboard—we
will use this to look for an Arabidopsis thaliana
version of this gene.
35
Paste in your
copied ACTA1
sequence
Steps to use Blastn
Enter the name of
the organism in
which we are
looking for the
same gene
(Arabidopsis
thaliana)
Select the
program –use
“Somewhat
similar
sequences” for
the broadest
search
Check “show results” in
a new window, then
click on BLAST
#4 push blast
button
36
What information is provided in an NCBI BLASTn report?
The Graphics Section shows the query sequence in the red bar (green
arrow) and aligned sequences are shown in colored tracks below.
Each “track” represents a sequence
that the BLASTn tool discovered in
the database that is similar to your
query sequence. The colored
sections in each track are blocks of
DNA which align with varying
similarity (score), shown by the
colored bar above. The black lines
connecting the colored blocks are
poorly aligned sequences (less than
40% identity).
Move the mouse over a
block to see the definition
and score for that sequence
result (also called “hit”).
By clicking on a colored box,
you will jump to the actual
DNA alignment farther down
the page.
37
What information is provided in an NCBI BLASTn report?
The Descriptions Section lists the aligned sequence names and provides
information about the alignment. In this search, we are using one gene
sequence to find a similar gene sequence. Look at the results that end in
“gene”.
38
What is gene alignment?
What BLASTn values tell us whether the
alignment is meaningful?
39
https://www.youtube.com/watch?v=6Udqou3vmng
Go to 31:13-40:15 for a more detailed explanation of alignment.
Query
Starting and ending nucleotides of your query
Starting and ending nucleotide coordinates for this sequence in its database
Subject
(database
used for
search)
40
BLASTn seeks to maximize the score for aligning shorter stretches of
Query compared to the database. Alignment of the entire query is not
required by Local alignment.
Matching nucleotides are given a score of +1 and mismatches are
negative. There are penalties for gaps. There are different algorithms,
but this is the general idea.
41
42
“Query cover” tells what percentage of the alignment is a good
match to your input sequence (query).
Note that the query is more than 2750 nucleotides long.
43
The query coverage is low here (20%) because you are comparing 2
DNA sequences which contain exons (conserved, thus aligned) and
introns (not highly conserved, thus non-aligned or poorly aligned.
44
Although only 20% of the query aligns to a sequence in the Arabidopsis
database, 80% of the aligned part is identical to the query (see the
“Ident” value of 80% and the color-coded portions of the result track. )
45
Access more
info about the
sequence by
clicking on the
sequence ID
“Alignments” provides
details about
nucleotide locations,
matches, gaps or
mismatches.
46
The E-value indicates the number of alignments with an
equivalent or better score from this database that would be
expected just by chance. For example, a one-in- a million
(1/1,000,000) chance is a very small chance and would be
written 1e-6.
The lower the E-value, the more significant the score (less likely
due just to chance) .
E-values are in scientific notation, ex: 3e-80 = 3 x 10-80
In general, an E-value of 1X10-5 or smaller is considered
significant (not just aligned by chance).
47
This is from the Alignments
Section and shows the details
48
Click on the
accession number
for more
information about
the gene that had
the most significant
alignment
Results are arranged in a default setting from lowest E-value to
highest. Compare the E-value, Query cover and % identity for the
checked “hits”.
49
Which GENE is most similar to the human ACTA1 sequence query?
Link for
more info!
Amino acid
sequence
50
51
the process you used to find a version of the
human ACTA1 gene in Arabidopsis thaliana.
What information did you use to indicate that the plant
version was a meaningful find?
52
1. Pick a human gene which you
think is highly conserved
between plants and animals.
2. Follow the procedure you just
learned to see if a similar
Arabidopsis version exists.
3. Record your info on the
scorecard.
4. Repeat for a gene that you
predict is unique to humans.
53
Gene Discovery Scorecard
Human
Gene
Name
Human Human
Gene ID Gene
Function
Actin
alpha 1
ACTA1
Arabid
opsis
Gene
Name
Cytoskele ACT7
tal
structure
Arabid
opsis
Gene
ID
Arabidopsis
Gene
Function
Out-come
Predicevidence :
tion?
Score, E-value,
Similar
Function,
Actin 7
Cytoskeletal
structure
E value was
1e-80, not
random, both
have similar
functions….
Yes
54
• What information so far
indicates whether or not
plants have animal muscle
genes?
• What additional
information might you
need to be more certain
whether ACT7 is a plant
version of human ACTA1?
55