Genome & Protein “ Sequence Analysis Programs”
Download
Report
Transcript Genome & Protein “ Sequence Analysis Programs”
Genome & Protein “ Sequence Analysis Programs”
application in establishing Epidemiology and
Variability
RAJESH KUMAR
Ph.D 1st yr
Dairy Microbiology Division
N.D.R.I
Introduction
Bio-informatics/Computational Biology:-
Proteomics:- Large-scale study of proteins.
Genomics:- study of an organism’s genome and use of
genes.
Comparative Genomics:- comparison of genomes.
Structural Genomics:- determination of
tridimensional structure of all proteins of a given
organism.
Major Research efforts of Bio-informatics:Sequence analysis / alignment.
gene finding.
genome assembly.
protein structure alignment.
protein structure prediction.
prediction of gene expression and protein-protein interactions.
modeling of evolution.
Sequence Analysis
Encompasses the use of various bioinformatic methods to
determine the biological function and structure of genes
and the proteins.
DNA sequences Decoded Stored in electronic databases
Analysis
Phylogenetic Tree
Comparative Genomics
Shotgun Sequencing
Used in genetics for sequencing long DNA strands.
DNA small segments sequenced
Computer programs
Sequence Alignment:arrangement of two or more sequences & highlighting
their similarity.
tcctctgcctctgccatcat---caaccccaaagt
|||| ||| ||||| ||||| ||||||||||||
tcctgtgcatctgcaatcatgggcaaccccaaagt
Structural Alignment
More reliable over long evolutionary distances.
Useful in identifying structurally-conserved regions.
Multiple Alignment
extension of pairwise alignment to incorporate more than
two sequences into an alignment.
help in the identification of common regions between the
sequences.
Programs
Clustal is used in cladistics to build phylogenetic trees
Framesearch
It is extension of Smith-Waterman, for pairwise
alignment between a protein sequence and a nucleotide
sequence.
It dynamically considers every possible single-nucleotide
insertion or deletion to generate the translation that
best matches the protein sequence.
Software:Ssearch
Smith-Waterman remains the gold standard for proteinprotein or nucleotide-nucleotide pairwise alignment.
BLAST
An algorithm for comparing biological sequences.
Widely used tools for searching protein and DNA
databases for sequence similarities.
It gives answers of following questions: Which bacterial species have a protein that is related in lineage
to a certain protein whose amino-acid sequence I know?
Where does the DNA that I've just sequenced come from?
. What other genes encode proteins that exhibit structures or
motifs such as the one I've just determined?
To run, BLAST requires two sequences as input:
a query sequence or target sequence
a sequence database.
Search for high scoring sequence alignments.
Three stages of BLAST: 1st stage, BLAST searches for exact matches of a small fixed
length W between the query and sequences in the database.
2nd stage, BLAST tries to extend the match in both directions,
starting at the seed.
If a high-scoring ungapped alignment is found, the database
sequence is passed on to 3rd stage .
In 3rd stage BLAST performs a gapped alignment
between the query sequence and the database sequence
Alternative to BLAST is BLAT (Blast Like Alignment Tool).
FASTA:-
Slower but more sensitive than BLAST.
DNA and Protein sequence alignment software package.
The original FASTP program was designed for protein
sequence similarity searching.
FASTA provided a more sophisticated shuffling program
for evaluating statistical significance.
Programs in this package:"FAST-Aye", and stands for "FAST-All“.
"FAST-P" (protein) alignment.
"FAST-N" (nucleotide) alignment.
Current FASTA package contains programs for:protein:protein
DNA:DNA.
Protein:translated DNA
Ordered or unordered peptide searches.
Recent versions of the FASTA package include special
translated search algorithms that correctly handle
frameshift errors when comparing nucleotide to protein
sequence data.
Clustal
Clustal is a widely used multiple alignment computer
program.
i) ClustalW
ii) ClustalX
Sequence Analysis Programmes:EMBOSS
European Molecular Biology Open Software Suite (EMBOSS) is a
program suite for nucleic acid and protein sequence analysis.
EMBOSS programs manipulate, analyze, and display nucleic acid and
protein sequences.
Similar in functionality to the commercial GCG Wisconsin Software.
PhyloGibbs
Designed to identify where these regulatory molecules bind to
DNA.
PhyloGibbs compares DNA from multiple species in order to
identify areas in which the genetic code is statistically similar and
filter segments that are most likely to be of interest to
scientists.
AutoEditor : Automated correction of sequencing and
basecaller errors
a tool for correcting sequencing and basecaller errors using
sequence alignment and chromatogram data.
On average AutoEditor corrects 80% of erroneous base calls.
It also greatly improves our ability to discover SNPs between
closely related strains and isolates of the same species.
MUMmer
System for aligning whole genome sequences. Using an efficient
data structure called a suffix tree, the system is able rapidly to
align sequences containing millions of nucleotides.
MUMmer 3.0
Open source.
Improved efficiency.
Ability to find non-unique, repetitive matches as well as unique
matches.
New graphical output modules.
Applications:MUMmer 1.0 was used to detect numerous large-scale inversions
in bacterial genomes.
MUMmer 2.1 was used to align all human
chromosomes to one another and to detect numerous
large-scale.
PROmer was used to compare the human and mouse
malaria parasites P.falciparium and P.yoelii.
Current use of MUMmer 3.0:1) Identifying SNPs and other mutations in a large
collection of Bacillus anthracis strains.
2) Comparing different assemblies of the same genome
at different stages of sequencing and finishing.
E.coli K12 vs. E.coli O157:H7
S.cerevisiae vs. S.pombe
A.fumigatus vs. A.nidulans
P.falciparum vs.P.yoelii
PSORT WWW Server
PSORT is a computer program for the prediction of protein localization
sites in cells.
WoLF PSORT
WoLF PSORT Prediction
PSORT II (Recommended for animal/yeast sequences)
PSORT II Users' Manual
PSORT II Prediction
PSORT (Old version; for bacterial/plant sequences)
PSORT-B (Recommended for Gram-negative bacteria)
PSORT-B Prediction
PSORT-B, a program applicable to the sequences of Gram-negative
bacteria.
PSORT Prediction
Source of Input Sequence:
Gram-positive bacterium
Gram-negative bacterium
yeast
animal
plant
Sequence ID (Default is MYSEQ):
Enter your Amino Acid sequence below (by copy & paste):
Characters except the standard 20 codes will be removed off
To submit the query, press this button:
Submit
PHIRE
This Visual Basic program performs an algorithmic string-based search
on bacteriophage genome sequences.
Discovering and extracting blocks displaying sequence similarity,
without any prior experimental or predictive knowledge.
MB Advanced DNA Analysis
MB is relatively small and easy to use program.
Main features of MB are:
restriction analysis
amino acids analysis
multiple sequence alignment tool
dot plot
calculation of molecular weights and chemical properties of proteins
prediction of 3D structures for small amino acids sequences.
UniPro DPview
This is a tool for finding and analyzing matches between
genomes.
SEQtools
Program package for routine handling and analysis of DNA
and protein sequences.
The package includes general facilities for sequence and
contig editing, restriction enzyme mapping, translation, and
repeat identification.
DNA Club
DNA analysis software,
Features:- remove vector sequence, find ORF, sequence
editing, translate to protein sequence, protein sequence
editing, RE Map, RE Map with translation, PCR primer
selection, primer or probe evaluation.
ZCURVE
New highly accurate system for recognizing protein coding
genes in bacterial and archaeal genomes based on the Z
curve theory of DNA sequence.
DNA for Windows
is a compact, easy to use DNA analysis program, ideal for
small-scale sequencing projects.
Webcutter
is a free on-line tool to help restriction map nucleotide
sequences.
Features: a simple, customizable interface
worldwide platform-independent accessibility via the web
seamless interfaces to NCBI's GenBank
DNA sequence database
restriction enzyme database.
Multilocus sequence typing (MLST)
Compares sequence variation in numerous housekeeping
gene targets.
Developed for Neisseria gonorrhoeae, Streptococcus
pneumoniae, and S. aureus.
Based on the classic multilocus enzyme electrophoresis
(MLEE) method used to study the genetic variability of a
species.
Drawbacks:labor-intensive, time-consuming, and costly.
Single-locus sequence typing(SLST)
compares sequence variation of a single target.
provides an inexpensive, rapid, objective, and portable
genotyping method to subspeciate bacteria.
Using a single target depends on finding a region for
sequencing that is sufficiently polymorphic to provide useful
strain resolution.
Loci with short sequence repeat (SSR) regions may have
suitable variability for discriminating outbreaks.
Two S. aureus genes conserved within the species, protein A (spa) and
coagulase (coa), have variable SSR regions constructed from closely
related 24- and 81-bp tandem repeat units, respectively.
The genetic alterations in SSR regions include both point mutations and
intragenic recombination that arise by slipped-strand mispairing during
chromosomal replication and that result in a high degree of
polymorphism.