Transcript Slide 1
EMBL-EBI
MSD Search tools
EMBL-EBI
MSDlite
EMBL-EBI
MSDlite
EMBL-EBI
The “Atlas” Pages
EMBL-EBI
The Atlas: Ligands
EMBL-EBI
The Atlas: Sequence
EMBL-EBI
Simple search interface
Strengths:
simple, easy to use form
allows multiple search fields to be combined
relatively fast, despite performing quite complex SQL
queries
Weaknesses:
not exposing the power of a relational database
user can't specify the relationship between search fields:
"name" AND "title" AND "keyword"
"name" OR "title" OR "keyword"
( "name" OR "title" ) AND NOT "keyword"
the search form is defined by the authors of the search
system, not the author of a query
EMBL-EBI
Describing complex searches
We want to allow the user to entirely control their query
Since HTML forms are inherently static, we'll use an applet
to provide a dynamic "form" that will let the user:
choose the fields to be searched
specify the relationships between search fields
choose the result fields and how results are presented
perform "complex" sub-queries e.g. SSM, FASTA
EMBL-EBI
Graphical DB search system
MSDpro uses an applet for constructing queries and a
server to execute them
Avoids the need for the user to understand a complex
database schema or know SQL
The user describes their query entirely graphically,
including logical operations such as AND, OR and
NOT
Applet generates an XML description of the user’s
query, which is sent to the MSD query server and
converted to SQL automatically
EMBL-EBI
EMBL-EBI
EMBL-EBI
Automatic SQL generation
The query server is a Java servlet:
accepts a query description as XML
converts the user’s query description into a true
SQL query, which is then submitted to the search
database
Searches can include components that are executed
outside of the database, e.g. sequence similarity,
determined using FASTA or structural similarity,
determined using SSM
EMBL-EBI
Visualisation
• The process of representing abstract data to aid in
understanding the meaning of the data.
• Not to be confused with rendering data (drawing
pictures)
• Typically though, we render data in such a way to
visualize the information within that data.
EMBL-EBI
Introduction
Biological data comes from & is of interest to:
Chemists : reaction mechanism, drug design
Biologists : sequence, expression, homology, function.
Structure biologists : atomic structure, fold, classification,
function.
Medicine : clinical effect
Education :
Media :
Presentation of diverse information to a diverse audience.
Each has there own point of view (context).
Expert = scientist working within their own field of expertise
Non-expert = scientist using data/information outside their field
Novice = Non-scientist
EMBL-EBI
Not just presentation of results
Web pages
These are notoriously badly designed often resulting in
the information on that site being unusable.
The front page should load quickly
The main point should appear on the first full screen
Clutter – not logically laid out
Too busy – cannot find the salient point
8% men & 0.5% women are colour blind
Google is a
Bad text/fonts
Too often it doesn’t work
good design
User will go somewhere else
The latest wiz-bang stuff only works on the latest browsers
Only works in one browser – they only tested on one.
Does not conform to standard HTML
EMBL-EBI
Asking questions
Asking questions
Biological data is very complex
Chemistry, Biology, Physics, Statistics,
Medicine..
Most users will be from a different field
Asking the right question is difficult.
The user cannot use the correct terminology
Too many things to query (2000 attributes in
MSD)
SQL : not suitable for most users
Interface too complex
Too many check boxes, widgets etc
Trying to be too clever
The “Go” button is buried somewhere
EMBL-EBI
Result presentation
Results
Biological data is complex
Chemistry, physics, biology, statistics, medicine…
Experts users want all the detail
Ie : want to use a specific method
They want all the details
The want (I hope) the statistical validity of the results
The non-expert wants the best practice answer
returned within their own context.
The want comparative analysis with other fields
The want to know the results are valid
EMBL-EBI
Query design
The simple text box design is very common
Suitable for text
queries
Only one logic
AND or OR
Predefined
Easy to use
Limited scope
2000 attributes ->
2000 check-boxes !
EMBL-EBI
Query design
Graphical interface
Multiple logic
AND/OR/NOT
Under users
control
Slower
Steep learning
curve
Some users just
cannot get it
Intuitive once
mastered
Pretty
EMBL-EBI
Query design
select distinct entry_id, ligand_id from contact_search sel
where neighbour_code_3_letter in ('SER','HIS')
and DISTANCE <= 2.0
and type_id = 1
and neighbour_substruct_code = 'side'
and MACROMOL_SEC_STRUCT_TYPE = 1
intersect
select distinct entry_id, ligand_id from contact_search sel
where neighbour_code_3_letter = 'HIS'
and ( NEIGHBOUR_ATOM_NAME = 'NE2'
and type_id = 1
and distance <= 2.0 or NEIGHBOUR_SYMBOL = 'N'
and type_id = 1
and distance <= 2.0)
and TYPE_ID != 0
group by entry_id, ligand_id having count(distinct
neighbour_residue_id) >= 2
intersect
select distinct entry_id, ligand_id from contact_search sel
where neighbour_code_3_letter = 'HIS'
and NEIGHBOUR_ATOM_NAME = 'NE2'
and DISTANCE <= 2.0 and type_id = 1
and neighbour_substruct_code = 'side'
and MACROMOL_SEC_STRUCT_TYPE = 2
intersect
select distinct entry_id, ligand_id from contact_search sel
where neighbour_code_3_letter = 'HIS'
and NEIGHBOUR_SYMBOL = 'N'
and DISTANCE <= 2.0
and type_id = 1
and neighbour_substruct_code = 'side'
and MACROMOL_SEC_STRUCT_TYPE = 3
intersect
select distinct entry_id, ligand_id from residue_contact sel
where neighbour_code_3_letter in ('HIS','SER','HIS')
and BOND_STRENGTH != 10
group by entry_id, ligand_id having count(*) >= 3;
Figurative 2D sketch for 3D query (Active
sites)
Informative – presents meaning for the question
Slower
Less error prone
HIS|SER:S/H>C2.0
HIS.ne2:S/S>C2.0
HIS.[n]:S/T>C2.0
EMBL-EBI
YAMGP (yet another molecular
graphics program)
Many different programs are available
AstexViewer@MSD-EBI
LigPlot
VMD
Quanta InsightII
Bobscript
WebMol
Frodo
iMol
Chime Grasp
Pymol
POVRay
Spock
Rasmol
Pymol
Mage
Raster3D
Yasara
Molscript
Chimera
O
MolMol
Whatif
Frodo
XtalView WebLab-viewer
Swiss-PDBviewer
EMBL-EBI
Result visualisation
Multiple types of biological data
Textual data
3D structure
2D chemical sketches
1D sequence
Node linked
General/derived data
Web pages
Errors/Variance
Data provenance
EMBL-EBI
AstexViewer@MSD-EBI
Java 1.1 Applet
Should run under most
browsers
Small footprint, high speed.
Structure
Line, stick, ball & stick,
sphere, schematic, surface +
texture map.
Written by Mike Hartshorn
(Astex therapeutics Ltd).
Multiple structures supported
EMBL-EBI
AstexViewer@MSD-EBI
Sequence
Multiple sequence
alignment
Editing,
Annotation, colours…
Consensus alignment
Pick, Brushing & Magic
lens
EMBL-EBI
Chemistry
2D flat representation
Annotation, colours…
Interaction types
Placement fn(contact
distance)
Editable
Pick, Brush and magic
lens
EMBL-EBI
Graphs
Graphs
2D, 2D grid and ND
Linkage plots
Annotation, colours…
Ramachandran, etc…
Pick, Brush Magic Len
EMBL-EBI
AstexViewer@MSI-EBI
Visualisation
Lensing
Linked views
Brushing
Picking
Flying views
Hyperbolic
distortion
Animation
Solid rendering
Depth cues
Colour,lighting
Highlighting
Etc…
EMBL-EBI
Visualisation : comparative
analysis
Similarity/Difference
Data superposition
Attribute display
Colour, size…
Correlation
Attribute mapping
Sequence colour by
structure alignment