Martin John Bishop
Download
Report
Transcript Martin John Bishop
Martin John Bishop
UK HGMP Resource Centre
Hinxton
Cambridge CB10 1 SB
[email protected]
http://www.hgmp.mrc.ac.uk
Bioinformatics scope
Genome
sequences - DNA
Transcripts - RNA
Proteins
Protein interactions
Macromolecular assemblies
Development and cellular function
Genetic linkage analysis
Molecular biology needs
bioinformatics
Biological data molecules
Sequences
Structures
Gene expression
Proteomes
Pathways
Evolution
Computer
methods
analysis –
Comparison
Modelling
Co-regulation
Mass spectrometry
Knowledge bases
Phylogenetics
Molecular biology is about
information
Central dogma
Genome repository
<-> RNA world
-> Protein sequence
-> Protein structure
-> Protein function
-> Phenotype
<- Fed back to genome
DNA
<-> RNA
-> protein
-> phenotype
<- DNA
Molecules
Processes
Central paradigm
Information processing
The activities of HGMP-RC
HGMP-RC
Bioinformatics Services
MHC
Research
Fugu
Mouse sequencing
Biology Services
Technology
development
Biological materials Biological services
by mail order
including
hotel facilities
Contract R&D
On-line service
On-line service
Services
Mail
Network News
Files/Backup
Information
Unrestricted
Data
Links
Analytical tools
Registered users
Public
Data
Private
Data
HGMP-RC SERVICE
Web
menu
X (or VNC)
Java
Telnet
Telnet
menu / Unix login
GENOME WEB
Up
to date
Relevant
Fully searchable
Fully verified
Extensive
INTEGRATED ANALYSIS
BLAST
NIX
PIX
GLUE
PIE
MAGI
PINT
COMMON OPTIONS
EMBOSS
GCG
PINE
CLUSTAL
STADEN
PASSWORD
GENOMICS APPLICATIONS
Linkage Analysis
Radiation Hybrid Mapping
Sequence Ready Clone Maps
Genome Databases
Polymorphisms
Sequence Analysis
Gene Prediction
Expression Profiling
Phylogenetic Analysis
Integrated Tools - GLUE,
RHYME, NIX, PIE
PROTEOMICS
APPLICATIONS
Protein Sequence Analysis
Protein Structure Analysis
Protein Structural Modelling
Proteome Databases
Tools for Peptide Sequence
Determination
Protein Cellular Localisation
Protein Functional Studies
Pathways and Protein
Interactions
Integrated tools and databases PIX
NETWORK / JANET
SERVICE
LONDON
Currently 34 Mbps
main link
Future keep 34 Mbps
link for backup
CAMBRIDGE
Currently 8 Mbps
redundant link
Future Gigabit
Ethernet
SERVERS
More
than 80 servers
1, 4 and 8 cpu SMP
Sparc and Intel
Solaris and Linux
Databases doubling every 14 months
LOADS
Load
is the percentage of processes trying
to run
Interactive load 50%
Job queues load 100%
Jobs waiting can be 6-10 times the work
being processed
PROCESSES AND QUEUES
Menu
service (hot swop)
General analysis (overloaded)
Sun BLAST and NIX queue
Dell BLAST queue
BLAST data file server
Interactive Linkage queue
Heavy Linkage queue
USERS’ REAL WORLD PROBLEMS
Comparative
method
Extrapolate from known to similar
Hints to reduce the amount of experimental
work that needs to be done
SOFTWARE SYSTEMS
A variety of technical solutions are used
BLAST
NCBI Entrez
SRS
GeneCards
NIX
ENSEMBL
HELPING THE USER
Information
discovery – completeness
Communication – multiple sites
Ontology – uniformity?
Software integration – ease of use
Reasoning about results
Monitoring – repeat queries
MAJOR CHALLENGES
User
interface
Back end processing
Cost recovery
NEW TECHNOLOGIES?
Web
services
GRID (EMBnet)
Object-orientated computing
Multi-agent systems
TREASURE
Web
service with top level container
Customise for the user
User selects a service and opens it as an
application
An alternative view can be built around
user data as the fundamental objects
IMPLEMENTATION
EMBREO
library written in Java handles
web service layer (also CORBA, XML-RPC,
JDBC and other connectivity)
Also handles file access and transfer and
display of results (including use of VNC)
Simple Object Access Protocol (SOAP)
Browser channel uses XML format
USER ACCOUNTING AND
CUSTOMIZATION
Currently
very complex
HED
NIS+
Filesystem configuration files
Future
a single database
Lightweight Directory Access Protocol (LDAP)
CREDITS
Gary
Menu systems and Genome Web
Geoff
Gibbs
Network and systems
Peter
Williams
Tribble
Web servers, Queues, Treasure