Public data and tool repositories Section 2 Survey of

Download Report

Transcript Public data and tool repositories Section 2 Survey of

Public data and tool repositories
Section 2
Survey of analysis tools and
tutorials
Problems from last section
1. Query Entrez Gene with the following two queries separately
and then explain the differences between the two results using a
logical NOT operation:
a) tyrosine kinase[Gene Ontology] AND human[Organism]
b) cd00192[Domain Name] AND human[Organism]
2. Retrieve the APP gene record from NCBI and use the Display
dropdown menu to display Conserved Domain Links. Use the
ids of the listed domains to query Entrez Gene for records with
the same domains.
3. Use the SNP Geneview link at NCBI to identify coding SNPs in
the APP gene. Which SNP is missing from this display which
was present in the Ensembl APP protein record?
4. Use the Homologene link at NCBI to identify possible functional
orthologs for human APP. How does this list compare to the
Ensembl list of orthologs that we reviewed previously?
Review of last section
example: human APP gene
1.
NCBI Entrez databases
a)
b)
c)
2.
UCSC Genome Browser
a)
b)
c)
3.
Finding genes
Displaying data tracks
Comparing data from different sources
EBI/Ensembl
a)
b)
4.
Constructing queries
Gene, Nucleotide and Protein
RefSeq
Finding genes
Viewing Genes, Transcripts, Exons, Proteins and SNPs
Common id and data formats
This section
1.
2.
3.
Protein structure
visualization/analysis example
Promoter/enhancer analysis
example
More information
Amyloid Precursor Protein
(APP)
G-protein
coupled
receptor that
binds heparin
and laminin
ß-secretase
-secretase
amyloid
fibril
amyloid
plaque
DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVVIA
Ex: Viewing the structure of an amyloid fibril
Other structure tools
1. Structure visualization. Free applications:
a) RasMol
b) Cn3D
c) VMD
2. Structure prediction servers/applications
a) CASP: Critical Assessment of Techniques for
Protein Structure Prediction
b) General method:
i.
ii.
Sequence similarity search to identify closest
homolog with known structure
Fit to homolog’s known structure, minimizing
some constraint
APP Upstream Region
15kb
Ex: Extracting and aligning human and mouse APP upstream regions
Promoter/enhancer analysis
approaches
1.
Same gene, multiple species
a)
b)
c)
Assumed evolutionary conservation of non-coding regions
Can use pairwise or multiple alignment method
Examples:
i.
ii.
2.
Precomputed: UCSC conservation tracks
Dynamic: eg, rVista
Different genes, same species
a)
b)
c)
d)
Typical output as co-expressed clusters from microarray data
Looking for over-represented, small binding sites
Much better results if looking for a pattern or clustering of
multiple sites
Motif-finding algorithm, eg, MEME
Tutorials
1. NCBI
•
•
•
Field Guide
Information and tutorials
Science Primer
2. EBI
•
2Can Tutorials
3. UCSC
•
Genome Browser User’s Guide
Next week’s sections
John Major
1.
Genome Browsers
•
•
2.
genome build process, ongoing and complete genome
projects
genome browsers of Ensembl, UCSC and NCBI
Mapviewer
Bulk downloads
•
•
•
how bulk bioinformatics data might be useful
common data formats
retrieving data