Bioinformatics: a Data Centric Perspective

Download Report

Transcript Bioinformatics: a Data Centric Perspective

Bioinformatics:
a Multidisciplinary Challenge
Ron Y. Pinter
Dept. of Computer Science
Technion
March 12, 2003
What is Bioinformatics?
• The application of information technology
to life sciences research
– modeling (abstraction)
– analysis and collection
– data integration and information retrieval
• Enables the discovery and analysis of biomolecules
and their properties (structure, function, interactions)
for e.g.
– pharmaceutical research
– medical diagnosis
– agriculture
• AKA computational or dry or in silico biology
Computational Sciences:
Analytic and Predictive
• Physics
– Universal: mechanics, electricity, particle physics
– Started in the 17th Century
• Chemistry
– Specific materials
– 19th Century
• Biology
– The study of living organisms
– Metamorphosis coincides with the huge increase in data
acquisition capabilities and computational power
Biological Revolution Necessitates
Bioinformatics
•New bio-technologies (automatic sequencing, DNA chips,
protein identification, mass specs., etc.) produce large
quantities of biological data.
• It is impossible to analyze data by manual inspection.
• Bioinformatics: Development of algorithms that enable the
analysis of the data (from experiments or from databases).
Data produced by
biologists and
stored in database
New information
for biological
and medical use
Bioinformatics
Algorithms and Tools
Central Dogma
of Molecular Biology
Transcription
Translation
mRNA
Gene (DNA)
Protein
Cells express different subset of the genes in
different tissues and under different conditions
The Genetic Code
Central Paradigm of Bioinformatics
Genetic
Information
Biochemical
Function
Molecular
Structure
Symptoms
• Exponential growth of biological information:
growth of sequences, structures, and literature.
• Efficient storage and management tools are most important.
Activities
• Development of new models, algorithms
and statistical methods to assess and predict
the relationships among members of very
large data sets
• Development and implementation of tools
to efficiently access and manage different
types of information.
• Application of these methods and tools to
real problems in biology by conducting
bioinformatic experiments
Primary Areas
•
•
•
•
Genomics
Proteomics
Metabolomics
“Systems biology”
Genomics
• Sequence analysis
– Homology searches
– Assembly of ESTs
– Domain and profile identification
• Gene hunting
– Promoter identification
– Genomic maps
• Comparative genomics
– SNP detection (point mutations) : among individuals
– Genomic rearrangement: among species
Towards large scale genomic comparisons…
Human vs. Mouse
Proteomics
•
•
•
•
•
•
Functional prediction
Localization
Expression analysis
Structure prediction
Docking information for biomolecules
…
Metabolomics
and Systems Biology
• Metabolic and regulatory pathways
• Cell simulation
• Toxicological and phaprmacological
parameters
Data Types
•
•
•
•
•
•
•
Strings (over nucleotides and amino acids)
2D and 3D geometric structures
Images
Numeric data (expression data, mass spec, …)
Graphs (pathways, networks, …)
Text articles
…
Some Queries
•
•
•
•
•
•
What genes are connected to a disease?
What proteins are encoded by them?
Under what conditions are they expressed?
What pathways do they participate in?
Which are targets for new therapeutics?
What will happen if we introduce a virus
into a certain environment?
• …
Data Sources
• Mostly public
– NCBI, EMBL, KEGG, Swissprot, …
• Also some commercial
– Celera, Compugen, …
• Ever changing …
Disciplines
• Life sciences
–
–
–
–
Biology
Biochemistry
Medicine
…
• Computing
–
–
–
–
Mathematics
Computer science
Information management
Information theory
The Gap
• Life sciences
– Descriptive
– Based on observations, lots of exceptions
– Constant evolution and change of paradigms
based on new discovery
• Computer science
– Analytic
– Exact and predictive
– “Linear”, synthetic evolution
Bridging the Gap
• Study both disciplines
– Start as early as possible
• Work in joint teams
– At all levels
• Learn from each others’ methods
– Increase [web] sophistication of life scientists
– Teach computer scientists to model the real
world
Example: Intro to Bioinformatics
• Grand tour of tools and methods
–
–
–
–
–
Extensive web presence
Many highly specialized tools
Diversity in each category
Require high skill in specific usage
Loose integration
• Initial encounter with topic
– Prereqs: Biology 1 and Intro to CS
• Must bridge gaps among disciplines
Method
• All work in pairs of LS and CS students
– Strict enforcement
– Develop dialogue
– Complementary skills
• “Dry” labs, homework (reports) and final
project (including presentation)
• Topical presentations coupled with labs
– Delivered by Esti Yeger-Lotem, a CS/Biology
expert (speaks both languages)
– Labs run by TAs