341- INTRODUCTION TO BIOINFORMATICS Overview of the …

Download Report

Transcript 341- INTRODUCTION TO BIOINFORMATICS Overview of the …

341- INTRODUCTION TO
BIOINFORMATICS
Overview of the Course Material
1
Introduction to Biology
• Biological Terms: DNA, RNA, Gene,
Gnome, Codon, Protein, Proteome,
Amino acids
• Central Dogma:
– The transcription of DNA to mRNA.
– Translation of the mRNA to a protein
2
Sequencing and Sequence
Alignment
• Sequences, Mutations, and Evolution
• Sequencing technologies:
– Sanger Sequencing
– Next Generation Sequencing
• Sequence Alignment:
– Why is it needed?
– Sequence Similarity Scoring Matrices; e.g. BLOSUM
– BLAST Algorithm
– Accuracy, Sensitivity and Selectivity of sequence searching
– Dynamic Programming based Sequence Alignment
• Algorithms: Needleman-Wunsch (global alignment) ,
• Smith-Waterman(local alignment)
3
Functional Genomic and
Microarray Analysis
• What is gene expression?
• What are microarrays?
• What are the steps of microarray experiments?
• Statistical significance tests that are used for evaluating
the results of microarray experiments; e.g., t -test.
• Data clustering / classification
– Distance Measures
– Hierarchical clustering
– Decision trees
– K-means/K-medoids clustering
– K-nearest neighbourhood clustering
4
Introduction to Graph Theory
• Basic definitions on graphs and graph types; e.g. bi -partite graphs, cycles,
trees, clique, path, subgraphs, connected component, isomorphism,
automorphism, automorphism orbits.
• Minimum spanning trees
– Kruskal’s Algorithm
– Prim’s Algorithm
• Graph representations:
– Adjacency Matrix
– Adjacency List
• Running times of Algorithms; upper bounds, lower bounds.
• Complexity classes of problems: P, NP, NP-Hard
• Graph Traversal Algorithms:
– Breadth-first search
– Depth-first search
• Shortest Path Algorithms
– Dijsktra’s Algorithm
– Floyd-Warshall algorithm
5
Introduction to Biological Networks
• Different types of biological networks:
– Metabolic Networks
– Transcription Regulation Networks
– Cell Signalling networks
– Protein-protein interaction Networks
– Genetic interaction Networks
– Protein Structure Networks
• What do they represent?
• What are the relations among these network types?
• What are the available sources of data for obtaining these
networks?
• What are the possible sources of bias in these networks?
6
Network Properties
• The global network properties:
– Degree / Degree Distribution
– Clustering Coefficient
– Diameter / Shortest Path Length Distribution
– Centrality Measures:
• Degree Centrality
• Betweenness Centrality
• Closeness Centrality
• Eccentricity Centrality
• The local network properties:
– Network motifs
– Graphlets / Graphlet Degree Distributions
7
Network Models
• Different types of network models:
– Erdos-Renyi Networks (ER)
– Small-world Networks
– Scale-free Networks (SF)
– Hierarchical Networks
– Geometric Networks (GEO)
– Generalized Random Networks (ERDD)
– Geometric Networks with Gene Duplication / Divergence
Events(GEOGD)
– Scale-free Networks with Gene Duplications/Divergence
Events (SFGD)
– Stickiness-index based Networks (STICKY)
8
Network Comparison / Alignment
• Paralogy / Orthology
• Network: alignment, integration, querying
• Network Alignment Algorithms:
– Global / Local Alignment
– Pairwise Alignment / Multiple Alignment
– Functional vs. topological information
• Alignment quality evaluations measures:
o
o
o
o
Edge Correctness,
Size of Common Connected Subgraphs,
attributed to chance,
biological quality
9
Network Comparison / Alignment
• Key algorithmic components:
– Similarity between nodes
• Functional
• Topological
– Search (identification of high-scoring alignments)
• Seed-and-extend
• Finding an optimal alignment
10
10
Protein 3D Structure
• Four levels of protein structure
• Available data
• The structure function paradigm
• Rigid and flexible structure comparison
– Algorithms
11
Network Data Integration
• Today’s lecture
12