Database systems design - Amirkabir University of Technology

Download Report

Transcript Database systems design - Amirkabir University of Technology

Introduction to Bioinformatics
9/30/2004
TCSS588A Isabelle Bichindaritz
1
Introduction to Class
•
•
•
•
Syllabus
Schedule
Web-site http://courses.washington.edu/tcss588
Assignments:
– An application to genetics
– An application to proteomics
–…
• Project – project teams (proposal due next week)
9/30/2004
TCSS588A Isabelle Bichindaritz
2
Introduction to Class
• 1. Biological foundations.
• 2. Machine learning algorithms and applications
to biology/life sciences.
• 3. Neural networks.
• 4. Hidden Markov Models.
• 5. Graphical models.
• 6. Case Based Reasoning.
• 7. Phylogenetic trees induction.
• 8. Microarrays and gene expression.
• 9. Image understanding and mining.
• 10. Biometrics.
9/30/2004
TCSS588A Isabelle Bichindaritz
3
Introduction to Class
Day
R
T
R
T
R
T
R
T
R
T
Date
9/30
10/5
10/7
10/12
10/14
10/19
10/21
10/26
10/28
11/2
Subject
Introduction to Bioinformatics and the Life Sciences
Probabilistic Framework
Probabilistic Inference
Machine Learning Algorithms (Part I)
Machine Learning Algorithms (Part II)
Neural Networks Theory
Neural Networks Applications
Hidden Markov Models Theory
Hidden Markov Models Applications
Graphical Models (Part I)
Pre-reading
Chapter 1
Chapter 2
Chapter 3
4.1-4.4
4.5-4.8
Chapter 5
Chapter 6
Chapter 7
Chapter 8
9.1-9.4
R
T
R
T
R
T
R
T
R
T
R
R
11/4
11/9
11/11
11/16
11/18
11/23
11/25
11/30
12/2
12/7
12/9
12/16
Graphical Models (Part II)
Case Based Reasoning
Veterans Day Holiday
Future Trends Discussion / MIDTERM
Phylogenetic Trees Induction
Microarrays and Gene Expression
Thanksgiving Holiday
Image Understanding
Image Mining
Biometrics
Future Perspectives Discussion / FINAL
FINAL PROJECT PRESENTATIONS in CP 106 5:00P– 7:15P
9.5-9.6
Handout
9/30/2004
TCSS588A Isabelle Bichindaritz
Chapter 10
Chapter 12
Handout
Handout
Handout
4
Course Learning Objectives
• Understand biological concepts and set of problems.
o Understand scientific framework for bioinformatics in statistics,
complexity, and information theory.
o Understand machine learning methods for bioinformatics.
o Understand innovative algorithms and methods for
bioinformatics.
o Program using available bioinformatics tools.
o Learn familiarity with statistical learning, concept learning,
hidden Markov models, case based reasoning, neural networks,
knowledge-based systems and ontologies, genetic algorithms,
stochastic grammars and linguistics, grid computing, and
semantic Web.
o Design and develop new computer systems for bioinformatics.
9/30/2004
TCSS588A Isabelle Bichindaritz
5
Outline
• Informatics / Medical Informatics /
Bioinformatics / Computational Biology
• Project examples
–
–
–
–
Care Partner
Telemakus
Phylsyst
Human Genome Project
• Introduction to biology
9/30/2004
TCSS588A Isabelle Bichindaritz
6
Informatics / Medical Informatics
• Informatics is “The science of rational and
computerized processing of information as it
supports human knowledge and
communication in scientific, technical,
economical, and social domains.” .
• Often associated with health care and medical
research applications  medical informatics
• Interdisciplinary field involving medicine,
biology, computer science, mathematics,
information science, and statistics.
9/30/2004
TCSS588A Isabelle Bichindaritz
7
Medical Informatics
• Computer Applications in Health Care
6 research and development
5 therapy and control
4 diagnosis and decision making
3 processing and automation
2 storage and retrieval
1 communication and telematics
9/30/2004
TCSS588A Isabelle Bichindaritz
INCREASING
LEVEL OF
COMPLEXITY
8
Bioinformatics
• Bioinformatics is the discipline that develops
technologies for supporting information
management in fields like biology.
• Target domains: biology, medicine, pharmacology,
agriculture …
• Interdisciplinary field.
• Main tasks: analyze biological sequence data,
genome content, and arrangement, predict the
function and structure of macromolecules.
9/30/2004
TCSS588A Isabelle Bichindaritz
9
Computational Biology
• Computational biology provides algorithms
for bioinformatics.
• Target applications:
– Genomics
– Proteomics
– Phylogenetics
9/30/2004
DNA  genes
proteins
evolutionary classifications
TCSS588A Isabelle Bichindaritz
10
Care Partner System Description
• A decision support system for stem cell post
transplant care:
– comprehensive knowledge-base (scientific
literature, monographs, clinical guidelines,
clinical pathways, clinical cases)
– available on the WWW
– learns from experience
9/30/2004
TCSS588A Isabelle Bichindaritz
11
Knowledge-Base
LTFU SNOMED
CDSS
v. 3.4
N
Diseases
Functions
Labs
Procedures
Medications
Sites
9/30/2004
1109
452
1152
547
2684
460
TCSS588A Isabelle Bichindaritz
35,834
19,221
30,723
20,105
14,846
5,875
12
Knowledge-Base
N
CDSS
Terms
Relations
N
Patient cases
9/30/2004
739,439
51
CDSS
4904
TCSS588A Isabelle Bichindaritz
13
9/30/2004
TCSS588A Isabelle Bichindaritz
14
Telemakus
• Goal of the Telemakus System:
– to enhance the knowledge discovery process by
developing retrieval, visual and interaction tools to mine
and map research findings from the research literature.
• Objective of the research:
– to create, test and validate an infrastructure to permit
the automation of the creation and maintenance of a
searchable database that generates knowledge maps via
query tools and concept mapping algorithms.
– to apply natural language processing models and
information analysis methods to ultimately speed up the
scientific discovery process.
9/30/2004
TCSS588A Isabelle Bichindaritz
15
Telemakus
9/30/2004
TCSS588A Isabelle Bichindaritz
16
Phylsyst
9/30/2004
TCSS588A Isabelle Bichindaritz
17
Phylsyst
• Example – Phylsyst built cladogram
clado1
Level 1 01-10 Doublon split on characters: 8 12 27
Level 1 values: 8(0) 12(1) 27(1)
Level 2 01-10 Doublon split on characters: 18 29 25
Level 2 values: 18(0) 29(1) 25(1)
Taxon Diphylleia
Level 2 values: 18(1) 29(0) 25(1)
Level 3 01-10 Doublon split on characters: 14 17
Level 3 values: 14(0) 17(1)
Taxon: Dysosma
Level 1 values: 8(1) 12(0) 27(0)
Level 2 01-10 Doublon split on characters: 16 29 30 19
Level 2 values: 16(0) 29(1) 30(0) 19(0)
Level 3 00-11 Doublon split on characters: 1 7 33 25 23 13 11
Level 3 values: 1(0) 7(0) 33(0) 25(0) 23(0) 13(0) 11(0) 10(0)
Level 4 Agglom. Split
Taxon: Berberis
Taxon: Mahonia
Level 3 values: 1(1) 7(1) 33(1) 25(1) 23(1) 13(1) 11(1) 10(1)
Taxon: Ranzania
9/30/2004
TCSS588A Isabelle Bichindaritz
18
Human Genome Project
• Goal of the Human Genome Project:
– identify all the approximate 30,000 genes in human DNA,
– determine the sequences of the 3 billion chemical base
pairs that make up human DNA,
– store this information in databases,
– improve tools for data analysis,
– transfer related technologies to the private sector, and
– address the ethical, legal, and social issues (ELSI) that
may arise from the project.
• Completed in 2003
9/30/2004
TCSS588A Isabelle Bichindaritz
19
9/30/2004
TCSS588A Isabelle Bichindaritz
Human Genome Program, U.S. Department of Energy, Genomics and Its Impact on Medicine and Society: A 2001 Primer, 2001
20
The Human Genome Project
• The Human Genome Project
9/30/2004
TCSS588A Isabelle Bichindaritz
21
The Visible Human Project
• Image understanding –
the Visible Human
Project
9/30/2004
TCSS588A Isabelle Bichindaritz
22