The Era of Biognostic Machinery

Download Report

Transcript The Era of Biognostic Machinery

The Era of Biognostic Machinery
Lawrence Hunter, Ph.D., Director
Center for Computational Pharmacology
http://compbio.uchsc.edu
The Ultimate Biological Irony
Human understanding
of our own genome will
require partnership with
biognostic machines
What is a Biognostic Machine?
From the Greek(life) and(knowing)
Two kinds of biognostic machines:
– Instruments that produce data about a living
things in molecular detail and with genomic
breadth
– Bioinformatics systems that bring to bear
existing knowledge in the computational
analysis of data
Gene Chips as Biognostic
Instruments
Good example of the kind of instruments to come...
Gene chips read out the expression (production) of each
gene in different tissues
Gene expression is important,
but just the first step in realizing
the “blueprints” in our DNA
Overwhelming amounts of data!
Each chip is 40,000 genes and
dozens of chips for each study
Other kinds of biognostic
instruments
High throughput SNP
genotyping automation
Finds millions of tiny
genetic differences among
people
Combinatoral
Chemistry robotics
Tests 50,000 potential
new drugs per day
So much wonderful data...
Growth of Protein Databank
Growth of Biomedical Literature
More than 11,000,000
biomedical journal
articles in Medline
600,000 new articles
per year, accelerating
at 10% per year
Genome sequencing projects
...Is Still Not Enough!
Statistics 101: Never test more hypotheses than
you have data, since you will find impressive
looking results just by chance.
Each chip is effectively testing 40,000 hypotheses!
Run a lot of chips? Not at $1000 each!
So what can we do with all this data?
Invent Biognostic Computers
Take traditional statistics as far as possible, e.g.
– New corrections for multiple testing, randomization approaches
But also...
Integrate existing knowledge into computational analysis.
Our computer programs have to know about biology!
– Bayesian inference
– Knowledge-based interpretation of high throughput results
– Managing diverse sources of knowledge, including the biomedical
literature
Bayesian Inference
p Data#Hypothesis Å p Hypothesis
p Hypothesis#Data =
p Data
An old idea gaining new life
A principled way of combining
data with prior knowledge
We balance the belief in new
results against how closely
they fit with our existing ideas
Where do priors come from?
Rev. Thomas Bayes, 1701-1765
A Knowledge-base of
Molecular Biology
A knowledge-base encodes facts and concepts in
a computationally useful representation
General relationships, e.g.
Part-of, Has-parts, Kind-of
Specific relationships, e.g.
Binds-to, regulates-gene
Supports many kinds of
inference (not just Bayesian)
Knowledge visualization tools
(in partnership with Accenture)
Literature
Related
Gene/locus
Protein
Expert
Gene/
locus
Protein Variant
Organization
Protein
Family
Phenotype
8
How do we create
biognostic computer programs?
Knowledge management and organization tools from other
domains (especially executive information systems)
Still takes a lot of expert human time and effort
Good community efforts in some areas (e.g. Gene Ontology
Consortium) can be leveraged effectively
Once a bootstrap knowledge-base exists, extend it by
automated information extraction from textbooks, review
articles and journals.
A special kind of supercomputer
Recent grant from IBM Life Sciences
Latest p690 “Regatta” architecture
Most important aspect is not speed!
Extraordinarily large memory
– 64,000MB of RAM, about 1000x the
memory of a desktop machine
– Allows us to load both all the data and
all the knowledge into memory at once
Why CU?
Talent
– World class researchers in many relevant areas:
Gene chips, proteomic mass spec, macromolecular
structure determination, high throughput genotyping
Technology
– Biognostic instrument facilities are top tier for an academic
institution. We are within reach of the very best.
– Supercomputing facilities for knowledge-driven applications
Teamwork
– Unique culture of collaboration that transcends traditional
boundaries
What can we achieve?
Cognitive Disability Applications
– Pilot application was in animal models of alcoholism, fetal
alcohol syndrome and alcohol-related dementias.
Pharmacology
– Identification of synergistic drug targets
– Relationships between individual genotype and drug
response
Development of novel biotherapies
– Stem cell differentiation signals
The Road Ahead
Three directions must be pursued simultaneously:
Bringing our instrumentation to the very first rank,
including engineering new generations of instruments.
Extending the knowledge-base and developing novel
computational methods that take full advantage of the it
and supercomputer
Close collaborations on specific bio/medical research
projects taking advantage of the latest instruments and
bioinformatics techniques. Creation of a broad
biognostic infrastructure to support that research.