- Bioinformatics Shared Resource Homepage
Download
Report
Transcript - Bioinformatics Shared Resource Homepage
Bioinformatics : How to…
BIMR
Bioinformatics Shared Resource
http://bsrweb.burnham.org
By:
Kutbuddin Doctor, PhD
Overview
Homology Defined
Searching for Homologs
Homology Exceptions & Warnings
Alignments
BSR Services
Homology Defined
Homologs: “Proteins/genes that share a
common ancestral protein/gene.”
They may share function.
Homology is inferred based on similarity of
sequences.
A statistical model decides if similarity is
sufficient to infer homology.
(never say “% homology” – this is wrong)
Statistical Models
BLAST : Provides a very reliable model.
Low false positives.
Heuristic methods make assumptions
Fast, heuristic method.
at the risk of missing some alignments.
Smith-Waterman : optimal* alignment.
Less false negatives.
Slow - NOT heuristic.
Protein-based homology : more sensitive than
nucleotide sequence searches.
(Multiple) Sequence Alignments : NO!
BLAST
Choose specific
method
BLAST
BLAST
Enter
Query
sequence
Choose reference
DB to search
nr : non-redundant
refseq : curated reference sequences by NCBI
SwissProt : curated reference seq by EMBL / SIB
PDB : experimental 3-D Structure database
env-nr : enviromental sequences (unknown organism)
BLAST
Domains
Actual results in
separate window
Searched
against this
database
Overview of
results
switch to
browser
BLAST - output
BLAST - output
This alignment was MUCH BETTER than we expect
for a random occurrence.
The alignment is due to common ancestry rather
than a random chance occurrence.
2e-109 = 2 x 10-109
0.000000000….. 2
Expectation value (“E-value”): The number
of RANDOM alignments you can expect
to have with that score or higher given
the size of the database searched.
We expect at ~1 false positive (non-homolog) alignment
at this score or better… is this the false positive?
What false positive rate can you tolerate?
Based on that, you can choose an “E-value cutoff”
Expectation value (“E-value”): The number
of alignments you can expect to have
with that score or higher given the size
of the database searched.
BLAST - output
Better statistical models are able to move this “FALSE NEGATIVE”
to a “TRUE POSITIVE” prediction.
Lasergene
• Install on Windows : \\windows_server\
• Mac : (link from IS institute software page)
http://homepage/resources/services/sharedresources/installation/lg7_mac.pdf
Lasergene
• Protein search – use EditSeq program
• Load (or create) a query
• NetSearch , choose blastp on “nr”
database.
LaserGene results
(Results same as that from online BLAST tool)
BLAST (cont’d)
Protein vs DNA BLAST
protein encoded searches are better.
Translated DNA BLAST (TBLASTN):
both query and reference database are
translated to protein (6 frames). Then
BLAST is run as a protein vs protein
search.
Protein BLAST varieties
Customizes Statistical
model based on protein
family (reliable homologs)
Protein BLAST varieties
Finding “known” protein in the
database (>90% identical)
Protein BLAST varieties
When the presence of a
complete domain is
required.
Some homologs will be
missed.
BLASTp vs. PSI-BLAST
Query
Query
REFERENCE
DATABASE
(RefSeq)
Alignments
Alignments
Fixed
Statistical Model
Homologs
Fixed
Statistical Model
Homologs
BLASTp vs. PSI-BLAST
Query
Query
REFERENCE
DATABASE
(RefSeq)
Alignments
Alignments
Fixed
Statistical Model
Homologs
Modified
Statistical Model
Homologs
BLASTp vs. PSI-BLAST
Query
Query
M
REFERENCE
DATABASE
(RefSeq)
M
Fixed
Statistical Model
Homologs
Modified
Statistical Model
(more) Homologs
iterations
Alignments
Alignments
ORIGINAL BLAST
ITERATIVE BLAST : PSI-BLAST
BLAST - Resources
NCBI BLAST on WEB:
(shown)
Local BLAST:
Specialized DB
Lots of queries.
BLAST as service:
BSR – Cluster : 132 CPU
Example : Smith-Waterman 1000 siRNA
workstation : 4,000 hrs
cluster
:
8 hrs
“Small jobs” : single families, multi-domain families
ALIGNMENTS
• BLAST based alignments are
specialized.
• Global vs Local alignments.
• Multiple Sequence Alignments.
• Structure-based Alignments.
Parting TIPS..
• Biobar - great browser bar for biologists:
• BSR website does (eventually) work .. Press RELOAD.
Contact
• http://bsrweb.burnham.org/
• Myself: Kutbuddin Doctor, PhD
• Bldg 10, Rm 1205 (downstairs)
• x3488 ; [email protected]