Slide1 - upatras eclass
Download
Report
Transcript Slide1 - upatras eclass
Bioinformatics
Lecturer:
Christos Makris
Office:
Π502 (ΠΡΟΚΑΤ)
e-mail:
[email protected]
Teaching:
Friday
09:00-11:00
Introduction
Bioinformatics can be defined as: “the application of
computational techniques and methods in an effort to
understand and organize daata and information that is related
to biologocal macromolecules...”.
Bioinformatics-Bioinformatics is conceptualizing biology in terms
of molecules (in the sense of Physical Chemistry) and applying
"informatics techniques" (derived from disciplines such as applied
maths, computer science and statistics) to understand and organise
the information associated with these molecules, on a large scale.
In short bioinformatics is a management information system for
molecular biology and has many practical applications.
Research Areas
Efficient storage and organization of biological data
Development of tools that permit the efficient analysis of
biological data.
Development of tools for data analysis of experimental
results.
Course contents
General Introduction
Part I
-Introduction to algorithms for efficient storage and management of
strings and biological data sequences .
-Algorithms for exact pattern matching (Boyer-Moore,Knuth-MorrisPratt,Shift-Or, Multiple pattern matching).
-Introduction to suffix trees and its applications.
-Αlgorithms for approximate pattern matching and Sequence Alignment
-Algorithms for searching in Sequence Data Bases (FASTA, BLAST,
PROSITE)
Part II
-- The theoretical base of molecular design.
-- Molecular models and biochemical information
-- Structure based drug design
-- Open problems
Part III
Clustering and categorization techniques for biological data, in order to
predict the behavior of biological molecules.
Course Exam
Course exams consist of:
1. Delivering a project of 1-2 individuals→30% total degree, based
on the book: Dan Gusfield, Algorithms on trees strings and
sequences, Computer Science and Computational Biology,
Cambridge University Press
2. Presentation & Oral exam, on the class notes and one more project
→70% total degree
Basic Notion (1)
Bioinformatics is management of Biology in terms of molecules (Physical
Chemistry) and the application of “information techniques” (applied
mathematics, computer science and statistics) for comprehending and
organizing information that is related to molecules in large scale.
Figure from
“Bioinformatics, from
Genomes to Drugs”.
T. Lengauer
Basic Notions (2)
DNA consists of 1 double 1 strand of bases, The bases
are united in a concrete sequence and store the genetic
information of every organization : Α (adenine ), Τ
(thymine ), C (cytosine ), G (guanine )
Every DNA molecule can be considered as a string of
alphabet {A,C,T,G}
Double helix, the knowledge of one assures the
knowledge of the other (Α-Τ, C-G)
Genomes
The term genome, refers to the DNA sequence of
a living organism,
Human genome consist of 46 chromosomes.
Every cell consist of the while genome of an
organism (differentiation between prokaryotes and
eukariotes)
Proteins
Proteins are molecules that consist of one or more
polypeptides,
A polypeptide is a polymer consisting of amino acids
The cells produce their proteins from 20 different amino
acids.
A protein sequence can b considered as a seuqence of an
alphabet of 20 characters, Σ= {Ala, Arg, Asp, Asn, Cys,
Glu, Gln, Gly, Hsi,Ile, Leu, Lys, Met, Phe, Pro, Ser, Thr,
Trp, Tyr, Val}.
Genes
Genes
are the basic unit of inheritance,
A gene encodes a protein since it stores information for its
construction,
Human genome consists of ~40.000 genes
Molecular Biology Dogma
DNA (RNA-polymerase) -> pre-RNA (Τ->U (uracil))
Pre-RNA (Spliceosome) -> RNA
RNA (Ribosome) -> Protein
(the first step for eukariotes)
-- The process deals with the part of the DNA that codes a
gene
-- The term gene expression refers to the RNA production
from a gene.
Central Dogma of Molecular Biology
DNA molecule (https://public.ornl.gov/site/gallery/default.cfm - U.S.
Department of Energy Genome Programs )
Central dogma of Molecular Biology
(http://www.accesexcellence.org/AB/GG)
Replication
Transcription
Synthesis
Genetic code
(http://www.accesexcellence.org/AB/GG)
In a nutshell
(http://www.doegenomes.org - U.S. Department of Energy Genome
Programs )
Little changed imply variations
(http://www.doegenomes.org - U.S. Department of Energy Genome
Programs )
Genomics, and proteomics
Notions
Genomics (studying genes)
Proteomics (studying proteins)
Molecular Biology targets
Sequencing
and comparison of genomes of different
organisms (evolution, correlation).
Gene recognition and definition of the functionality that they
define (recognition of contact areas and form there gene
recognition).
Understanding gene expression (every gene is acticated
during the production of the specific gene expression,
studying the activation procedure).
Understanding the role of genes for several diseases (gene
mutation).
The data that we use can be from experiments or from biological data
baes)
Solving Computational Problems using
Bioinformatics Tools
Combining
gene information (sequencing and linking)
Comparing sequences (information retrieval algorithms
using schematic similaritities).
Protein categorizaion.
Information extraction from gene expressiong.
Representing cells as transcription networks.
Bioinformativ research areas
Implementing
and designing computations tool for
automatic knowledge extraction from Biological Data Bases
Analysing Biological Data Sequences
Biological data categorization
Molecular modeling
Protein analysis
Drug design with Computers
Computational Tools
Managing data with Molecular Biology is increasingly
demanding and the model of relational data bases does
not seem satisfying. The target is to design and
implement a model that permits automated knowledge
discovery from data of large scale.
Recognizing common structural charateristics not only
at the sequence level but also in two dimensional (2D) or
three dimesional (3D) level.
Similarity tracking between 2D or 3D shapes
Sequence Analysis
Exact matching
Approximate matching
Alignment (local or global)
Searching for maximal common subsequencew
Data Bases of biological macromolecules (compression
Algorithms for searching, indexing and Cross-Referencing
Categorization/Clustering Biological
Data
Categorization takes place using common motifs structutal
or functional
We are interested in applications that integrate fdifferent
data types (data integration: sequences, 3D co-ordinates,
functional knowledge)
Molecular Modeling
Selection of the proper model that described sufficiently
the innermolecular correlations of the studied biological
system.
Computing the energy state of the system and it
minimixation,
Analysis of these calculations and control of the final
formulation so that the conditions and the restrictions that
have been settled by are fullfilled.
Protein Analysis
Setting the three dimensional structure of a protein and its
amino acid sequence
Studying the Protein Docking Problem and the πρωτεϊνώνDNA-Protein Docking Problem
Computer Based Drug Design
Computer systems store useful information related to: 1) three dimensional
architecture of molecules, 2) phycic and chemical properties, 3) comparing one
molecule with other molecules, 4) micromolecules and macromolecules
complexes, 5) predictios for new molecules.
The first target of scientists that are engaged to computer based drug design is
the effective representations of the structure of normal and pathological
molecules that are subsesequently compared to pathogenic enzymes and active
receptors, and the target of molecular design is settled.
So, if we know the protein structure and the way that the receptor ar the active
region acts, we can “built” and simulate their docking saving time and space
Computer Based Drug Design
Graphic Algorithms
Geometric calculations
Arithmetic methods
Graph-theoretic methods
In silico drug based design
(in silico)
Design drug compound
Optimizing drug compound
In vitro and in vivo tests
Toxic tests
Human tests
Performance tests
Design
Genome sequences
Gene finding
Protein sequences
Structure prediction
Protein structure
Geometry calculation
Protein surface
Molecular modeling
Field forces
Molecular docking
Molecular docking
Problems
the lack of a general and unified tool for designing molecular structures
that can contain the whole set of biological molecules,
the increasing computational complexity that is expressed in time and
needed resources and that increases with the size of the molecule
Selection of the proper representation model (aalogous to the biological
molecult) and defining the crucial parameters (e.g.: geometrical
coordinates) that should be checked especially at the level of
computational geometry,
Robustness to errors in the input data and rebuilding of a three
dimensional model from incomplete data,
Simulataneous representation of a set of physicchemical properties
(entropy, energy) and the means that this information can be
comperehensible and interpetable by a researcher
Phylogenetics means to register the genetic correlations between sepcies and the history of living
organisms
Lyhlogenetic is the tree that depicts the evolution between different sepcied with common ancestor.ο.
Φύλλα : είδη/βιολογικές ακολουθίες
Εσωτερικοί Κόμβοι: υποθετικοί κοινοί πρόγονοι
Projects (1)
Handbook of Computational Molecular Biology
Sequence alignment (pairwise sequence alignment, spliced
alignment and similarity based gene recognition, multiple
sequence alignment, parametric sequence alignment)
String data structures (lookup tables suffix trees and suffix
arrays, suffix tree applications, enhanced suffix tree and
applications)
Genome assembly and EST clustering (Computational
methods for genome assembly, assembling the human genome,
comparative methods for sequence assembly, information
theoretic approach to genome reconstruction, expressed
sequence tags clustering and applications, algorithms for large
scale clustering and assembly of biological sequence data)
Projects (2)
Genome scale computational methods (comparison of long
genomic sequence, algorithms and applications, chaining
algorithms and applications in comparative genomics,
computational analysis of alternative splicing, human genetic
linkage analysis, haplotype inference)
Phylogenetics (phylogenetic reconstruction, consensus trees and
supertrees, large scale phylogenetic analysis, high performance
phylogeny reconstruction)
Microarrays and gene expression analysis (microarray data:
annotation retrieval, storage and communication, computational
methods for microarray design, clustering algorithms for gene
expression analysis, biclustering algorithms, identifying gene
regulatory networks from gene expression data, modeling and
analysis of gene networks using feedback control analysis)
Projects (3)
Computational Structural Biology (predicting protein structure
and supersecondary structure, protein structure prediction with
lattice models, proteins tructure determination via NMR spectral
data, geometric and signal processing of reconstructed 3D maps of
molecular complexes, in search of remote homologs, biomolecular
modeling using parallel supercomputers)
Bioinformatic databases and data mining (string search in
external memory , index structures for approximate matching in
sequence databases, algorithms for motif search, data minign in
computational biology)
Interesting articles - Books
J. Cohen, Bioinformatics-An introduction for Computer
Sceintists, ACM Computing Surveys, 36(2), 2004, 122-158
N. Luscombe, D. Greenbaum, M. Gerstein, What is
Bioinformatics? A proposed definition and overview of the
field, Method Inform Med, 2001
Thomas Lengauer, Raimund Mannhold, Hugo Kubinyi, and
Hendrik Timmerman , Bioinformatics: form genomes to drugs
Thomas Lengauer, Bioinformatics: from genomes to therapies
(two volumes)
Neil Jones, Pavel Pevzner, An Introduction to Bioinformatics
Algorithms,
S. Aluru, Hanbook of Computational Molecular Biology
Dan Gusfield, Algorithms on trees strings and sequences,
Computer Science and Computational Biology, Cambridge
University Press.
Main Journals
Bioinformatics, Oxford University Press
IEEE/ACM transactions on Computational
Biology and Bioinformatics (TCBB)
Journal of Computational Biology)
Science, Nature, Nucleic Acid Research, Journal of
Molecular Biology, Proceedings of the National
Academy of Sciences (PNAS),
Main Conferences
RECOMB, (Research in Computational Molecular
Biology)
IEEE Computer Society Bioinformatics Conference
PSB Pacific Symposium on Biocomputing
ISMB Intelligent System for Molecular Biology
Medical Informatics
Medical Informatics is interdisciplinary and engages with
information science, informatics and medical science. It deals
with resources, devices and techniques that are demanded to
improve the storage, retrieval, and usage of information in health
and biomedical science.
The combination of health informatics and bioinformatics is
termed biomedical informatics.
Biomedical informatics
Medical Data Bases and genomics
Proteomics and analysis
Selection of proper therapies
Gene expression in medical diagnosis and prediction
Modeling and simulating biological structures and procedures
Medical annotation of biological data bases
Functional and molecular image processing
Embedding molecular data of patients in electronic heath records
Medical Decision Systems
Interfaces for molecular information in patients
Semantic interoperability and ontologies in biinformatics
Technologies for biomedical information integration
Mutlilevel modeling and vertical intergation of information
Data extraction in biomedical informatics
Data interoperabiblity and standards
Biomedical informatis in Public Health
Connecting biobanks with data bases of large scale in order to etract
knowledge,
Records that relate molecular, fmily and clinical data
Biodefence systems and netwroks
Patitent life management
Personal identification and personal genomics
Informatiques for supporting fharmocgenetic and layered clinical tests
Informatiques that allows the debelopment of medical devices and biosensors
Applied pharmaceutical precision
Clinical and ethical facts concerning biomedical data processing
Medical Decision Support Systems
Interfaces for moleculr information to doctors
Semantic Interoperability and ontologies in Biomedicine
Tecnologies for Biomedical Data Integration
Multilevel modeling and vertical integration of information
Knowledge extraction in biomedical informatics