Martin John Bishop

Download Report

Transcript Martin John Bishop

Martin John Bishop
UK HGMP Resource Centre
Hinxton
Cambridge CB10 1 SB
[email protected]
http://www.hgmp.mrc.ac.uk
Bioinformatics scope
 Genome
sequences - DNA
 Transcripts - RNA
 Proteins
 Protein interactions
 Macromolecular assemblies
 Development and cellular function
 Genetic linkage analysis
Molecular biology needs
bioinformatics

Biological data molecules






Sequences
Structures
Gene expression
Proteomes
Pathways
Evolution
 Computer
methods






analysis –
Comparison
Modelling
Co-regulation
Mass spectrometry
Knowledge bases
Phylogenetics
Molecular biology is about
information

Central dogma


Genome repository
<-> RNA world
-> Protein sequence
-> Protein structure
-> Protein function
-> Phenotype
<- Fed back to genome

DNA
<-> RNA
-> protein
-> phenotype
<- DNA


Molecules
Processes
Central paradigm

Information processing
The activities of HGMP-RC
HGMP-RC
Bioinformatics Services
MHC
Research
Fugu
Mouse sequencing
Biology Services
Technology
development
Biological materials Biological services
by mail order
including
hotel facilities
Contract R&D
On-line service
On-line service
Services
Mail
Network News
Files/Backup
Information
Unrestricted
Data
Links
Analytical tools
Registered users
Public
Data
Private
Data
HGMP-RC SERVICE
 Web
menu
X (or VNC)
 Java
 Telnet

 Telnet
menu / Unix login
GENOME WEB
 Up
to date
 Relevant
 Fully searchable
 Fully verified
 Extensive
INTEGRATED ANALYSIS
 BLAST
 NIX
 PIX
 GLUE
 PIE
 MAGI
 PINT
COMMON OPTIONS
 EMBOSS
 GCG
 PINE
 CLUSTAL
 STADEN
 PASSWORD
GENOMICS APPLICATIONS










Linkage Analysis
Radiation Hybrid Mapping
Sequence Ready Clone Maps
Genome Databases
Polymorphisms
Sequence Analysis
Gene Prediction
Expression Profiling
Phylogenetic Analysis
Integrated Tools - GLUE,
RHYME, NIX, PIE
PROTEOMICS
APPLICATIONS









Protein Sequence Analysis
Protein Structure Analysis
Protein Structural Modelling
Proteome Databases
Tools for Peptide Sequence
Determination
Protein Cellular Localisation
Protein Functional Studies
Pathways and Protein
Interactions
Integrated tools and databases PIX
NETWORK / JANET
SERVICE

LONDON


Currently 34 Mbps
main link
Future keep 34 Mbps
link for backup
 CAMBRIDGE


Currently 8 Mbps
redundant link
Future Gigabit
Ethernet
SERVERS
 More
than 80 servers
 1, 4 and 8 cpu SMP
 Sparc and Intel
 Solaris and Linux
 Databases doubling every 14 months
LOADS
 Load
is the percentage of processes trying
to run
 Interactive load 50%
 Job queues load 100%
 Jobs waiting can be 6-10 times the work
being processed
PROCESSES AND QUEUES
 Menu
service (hot swop)
 General analysis (overloaded)
 Sun BLAST and NIX queue
 Dell BLAST queue
 BLAST data file server
 Interactive Linkage queue
 Heavy Linkage queue
USERS’ REAL WORLD PROBLEMS
 Comparative
method
 Extrapolate from known to similar
 Hints to reduce the amount of experimental
work that needs to be done
SOFTWARE SYSTEMS
A variety of technical solutions are used
 BLAST
 NCBI Entrez
 SRS
 GeneCards
 NIX
 ENSEMBL
HELPING THE USER
 Information
discovery – completeness
 Communication – multiple sites
 Ontology – uniformity?
 Software integration – ease of use
 Reasoning about results
 Monitoring – repeat queries
MAJOR CHALLENGES
User
interface
Back end processing
Cost recovery
NEW TECHNOLOGIES?
 Web
services
 GRID (EMBnet)
 Object-orientated computing
 Multi-agent systems
TREASURE
 Web
service with top level container
 Customise for the user
 User selects a service and opens it as an
application
 An alternative view can be built around
user data as the fundamental objects
IMPLEMENTATION
 EMBREO
library written in Java handles
web service layer (also CORBA, XML-RPC,
JDBC and other connectivity)
 Also handles file access and transfer and
display of results (including use of VNC)
 Simple Object Access Protocol (SOAP)
 Browser channel uses XML format
USER ACCOUNTING AND
CUSTOMIZATION
 Currently
very complex
HED
 NIS+
 Filesystem configuration files

 Future

a single database
Lightweight Directory Access Protocol (LDAP)
CREDITS
 Gary

Menu systems and Genome Web
 Geoff

Gibbs
Network and systems
 Peter

Williams
Tribble
Web servers, Queues, Treasure