Bioinformatics and Phylogenetic Analysis

Transcript Bioinformatics and Phylogenetic Analysis

Bioinformatics and
Phylogenetic Analysis
Edgar Scott
Multicampus Bioinformatics
Education Specialist
What is Bioinformatics

Interdisciplinary field that combines
principles and techniques from
computer science, probability and
statistics, and linguistics to the study of
genomic and proteomic sequences.


Biological database for storing and
organizng DNA and protein sequences
Computational tools for analyzing
sequences
Phylogenetic Analysis and
Bioinformatics




Phylogenetics – study of evolutionary
relationships
Phylogenetic trees used to represent
evolutionary relationships
Use of protein or DNA sequences to detect
relationships versus morphological characters
Bioinformatics provides both sequence
repositories and sequence analysis software.
Overview

Acquiring Data Set




Text searching at the National Center for
Biotechnology Information (NCBI)
Sequence similarity and homology
Sequence similarity searching with Basic Local
Alignment Search Tool (BLAST)
Analyzing Data Set

Phylogenetic Analysis with Molecular Evolutionary
Genetics Analysis (MEGA) 3.1 software


Build multiple sequence alignments of sequences using
ClustalW
Build phylogenetic trees
Text Searching at NCBI

NCBI maintains provides molecular
information and bioinformatic tools to
the scientific community



GenBank – an archival DNA and protein
sequence database
RefSeq – a curated DNA and protein
sequence database
Entrez Gene – a gene centered database
Sequence Similarity and
Homology

Homology – sequence that share a common
ancestral sequence





Paralogs – arise via gene duplication
Orthologs – arise via speciation event
Xenologs – arise via gene transfer
Evolutionarily related sequences have similar
sequences.
Sequence differences correspond to amount
of change that has occurred since they last
shared a common ancestral sequence.
Sequence Alignments

Sequence Alignment – a process that identifies a
series of characters or character patterns that are in
the same order in both sequences.




Pairwise Global alignment
Pairwise Local alignment
Optimal alignment – an alignment between
sequences in which the number of matching
characters are maximized and the mismatching
characters are minimized.
Quantifying alignments



Alignment score of the optimal alignment
Percent identity scores
Percent similarity scores
Sequence Similarity Searching

Basic Local Alignment Search Tool (BLAST)



Blastp, Blastn, Blastx, Tblastn, & TblastX
Local alignments are reported
Expectation Value – the number of times an
investigator can expect to find an alignment
that has an alignment score as good or better
than the alignment score under consideration.
Steps to Build a Tree


Build a multiple sequence alignment of
data set.
Analyze multiple sequence alignment
using either distance based methods or
character based methods.
Molecular Evolutionary
Genetics Analysis (MEGA) 3.1



Phylogenetic Analysis program
Constructs multiple sequence alignment using
ClustalW
Provides tree building methods

Distance based Methods




Character based Method


UPGMA
Neighbor-joining method
Minimum Evolution
Maximum Parsimony
Provides a great help document!
Multiple Sequence Alignment



Multiple Sequence Alignment – an alignment
between three or more sequences.
Computationally classified as NP-hard
Programs




ClustalW – fast, applies a progressive method
T-Coffee – slower, applies an advanced
progressive method
Dialign – slow, applies an iterative method
Combine – combines multiple sequence
alignments
Tree Building methods

UPGMA, Neighbor-Joining, Minimum Evolution




Distance based methods
Analyze the multiple sequence alignment to
calculate a distance matrix.
Clustering algorithm analyzes the distance matrix
to determine which sequences should be
clustered.
Maximum parsimony


Character based method
Analyze the multiple sequence alignment to create
a tree whose tree length has been minimized.
Tree Reliability


Bootstrapping – method for assessing
the reliability of trees.
Steps



The original data set is resampled several
times (e.g. 1000).
For each resampling, a tree is built
The trees created from the resampling
iterations are compared to the original
tree.
Review

Acquiring Data Set




Text searching at the National Center for
Biotechnology Information (NCBI)
Sequence similarity and homology
Sequence similarity searching with Basic Local
Alignment Search Tool (BLAST)
Analyzing Data Set

Phylogenetic Analysis with Molecular Evolutionary
Genetics Analysis (MEGA) 3.1 software


Build multiple sequence alignments of sequences using
ClustalW
Build phylogenetic trees

Bioinformatics and Phylogenetic Analysis

Transcript Bioinformatics and Phylogenetic Analysis

Directory