Transcript Slide 1

EMBL-EBI
MSD Search tools
EMBL-EBI
MSDlite
EMBL-EBI
MSDlite
EMBL-EBI
The “Atlas” Pages
EMBL-EBI
The Atlas: Ligands
EMBL-EBI
The Atlas: Sequence
EMBL-EBI
Simple search interface
 Strengths:
 simple, easy to use form
 allows multiple search fields to be combined
 relatively fast, despite performing quite complex SQL
queries
 Weaknesses:
 not exposing the power of a relational database
 user can't specify the relationship between search fields:
 "name" AND "title" AND "keyword"
 "name" OR "title" OR "keyword"
 ( "name" OR "title" ) AND NOT "keyword"
 the search form is defined by the authors of the search
system, not the author of a query
EMBL-EBI
Describing complex searches
 We want to allow the user to entirely control their query
 Since HTML forms are inherently static, we'll use an applet
to provide a dynamic "form" that will let the user:
 choose the fields to be searched
 specify the relationships between search fields
 choose the result fields and how results are presented
 perform "complex" sub-queries e.g. SSM, FASTA
EMBL-EBI
Graphical DB search system
 MSDpro uses an applet for constructing queries and a
server to execute them
 Avoids the need for the user to understand a complex
database schema or know SQL
 The user describes their query entirely graphically,
including logical operations such as AND, OR and
NOT
 Applet generates an XML description of the user’s
query, which is sent to the MSD query server and
converted to SQL automatically
EMBL-EBI
EMBL-EBI
EMBL-EBI
Automatic SQL generation
 The query server is a Java servlet:
 accepts a query description as XML
 converts the user’s query description into a true
SQL query, which is then submitted to the search
database
 Searches can include components that are executed
outside of the database, e.g. sequence similarity,
determined using FASTA or structural similarity,
determined using SSM
EMBL-EBI
Visualisation
• The process of representing abstract data to aid in
understanding the meaning of the data.
• Not to be confused with rendering data (drawing
pictures)
• Typically though, we render data in such a way to
visualize the information within that data.
EMBL-EBI
Introduction
 Biological data comes from & is of interest to:
 Chemists : reaction mechanism, drug design
 Biologists : sequence, expression, homology, function.
 Structure biologists : atomic structure, fold, classification,
function.
 Medicine : clinical effect
 Education :
 Media :
 Presentation of diverse information to a diverse audience.
 Each has there own point of view (context).
 Expert = scientist working within their own field of expertise
 Non-expert = scientist using data/information outside their field
 Novice = Non-scientist
EMBL-EBI
Not just presentation of results
Web pages
These are notoriously badly designed often resulting in
the information on that site being unusable.
 The front page should load quickly
 The main point should appear on the first full screen
 Clutter – not logically laid out
 Too busy – cannot find the salient point
 8% men & 0.5% women are colour blind
Google is a
 Bad text/fonts
Too often it doesn’t work
good design
 User will go somewhere else
 The latest wiz-bang stuff only works on the latest browsers
 Only works in one browser – they only tested on one.
 Does not conform to standard HTML
EMBL-EBI
Asking questions
Asking questions
 Biological data is very complex
 Chemistry, Biology, Physics, Statistics,
Medicine..
 Most users will be from a different field
 Asking the right question is difficult.
 The user cannot use the correct terminology
 Too many things to query (2000 attributes in
MSD)
 SQL : not suitable for most users
 Interface too complex
 Too many check boxes, widgets etc
 Trying to be too clever
 The “Go” button is buried somewhere
EMBL-EBI
Result presentation
Results
Biological data is complex
 Chemistry, physics, biology, statistics, medicine…
Experts users want all the detail
 Ie : want to use a specific method
 They want all the details
 The want (I hope) the statistical validity of the results
The non-expert wants the best practice answer
returned within their own context.
 The want comparative analysis with other fields
 The want to know the results are valid
EMBL-EBI
Query design
The simple text box design is very common
 Suitable for text
queries
 Only one logic
 AND or OR
 Predefined
 Easy to use
 Limited scope
 2000 attributes ->
2000 check-boxes !
EMBL-EBI
Query design
Graphical interface
 Multiple logic
 AND/OR/NOT
 Under users
control
 Slower
 Steep learning
curve
 Some users just
cannot get it
 Intuitive once
mastered
 Pretty
EMBL-EBI
Query design
select distinct entry_id, ligand_id from contact_search sel
where neighbour_code_3_letter in ('SER','HIS')
and DISTANCE <= 2.0
and type_id = 1
and neighbour_substruct_code = 'side'
and MACROMOL_SEC_STRUCT_TYPE = 1
intersect
select distinct entry_id, ligand_id from contact_search sel
where neighbour_code_3_letter = 'HIS'
and ( NEIGHBOUR_ATOM_NAME = 'NE2'
and type_id = 1
and distance <= 2.0 or NEIGHBOUR_SYMBOL = 'N'
and type_id = 1
and distance <= 2.0)
and TYPE_ID != 0
group by entry_id, ligand_id having count(distinct
neighbour_residue_id) >= 2
intersect
select distinct entry_id, ligand_id from contact_search sel
where neighbour_code_3_letter = 'HIS'
and NEIGHBOUR_ATOM_NAME = 'NE2'
and DISTANCE <= 2.0 and type_id = 1
and neighbour_substruct_code = 'side'
and MACROMOL_SEC_STRUCT_TYPE = 2
intersect
select distinct entry_id, ligand_id from contact_search sel
where neighbour_code_3_letter = 'HIS'
and NEIGHBOUR_SYMBOL = 'N'
and DISTANCE <= 2.0
and type_id = 1
and neighbour_substruct_code = 'side'
and MACROMOL_SEC_STRUCT_TYPE = 3
intersect
select distinct entry_id, ligand_id from residue_contact sel
where neighbour_code_3_letter in ('HIS','SER','HIS')
and BOND_STRENGTH != 10
group by entry_id, ligand_id having count(*) >= 3;
Figurative 2D sketch for 3D query (Active
sites)
 Informative – presents meaning for the question
 Slower
 Less error prone
HIS|SER:S/H>C2.0
HIS.ne2:S/S>C2.0
HIS.[n]:S/T>C2.0
EMBL-EBI
YAMGP (yet another molecular
graphics program)
Many different programs are available
AstexViewer@MSD-EBI
LigPlot
VMD
Quanta InsightII
Bobscript
WebMol
Frodo
iMol
Chime Grasp
Pymol
POVRay
Spock
Rasmol
Pymol
Mage
Raster3D
Yasara
Molscript
Chimera
O
MolMol
Whatif
Frodo
XtalView WebLab-viewer
Swiss-PDBviewer
EMBL-EBI
Result visualisation
Multiple types of biological data
 Textual data
 3D structure
 2D chemical sketches
 1D sequence
 Node linked
 General/derived data
 Web pages
 Errors/Variance
 Data provenance
EMBL-EBI
AstexViewer@MSD-EBI
 Java 1.1 Applet
 Should run under most
browsers
 Small footprint, high speed.
 Structure
 Line, stick, ball & stick,
sphere, schematic, surface +
texture map.
 Written by Mike Hartshorn
(Astex therapeutics Ltd).
 Multiple structures supported
EMBL-EBI
AstexViewer@MSD-EBI
Sequence
 Multiple sequence
alignment
 Editing,
 Annotation, colours…
 Consensus alignment
 Pick, Brushing & Magic
lens
EMBL-EBI
Chemistry
 2D flat representation
 Annotation, colours…
 Interaction types
 Placement fn(contact
distance)
 Editable
 Pick, Brush and magic
lens
EMBL-EBI
Graphs
Graphs
2D, 2D grid and ND
Linkage plots
Annotation, colours…
Ramachandran, etc…
Pick, Brush Magic Len
EMBL-EBI
AstexViewer@MSI-EBI
 Visualisation
 Lensing
 Linked views
 Brushing
 Picking
 Flying views
 Hyperbolic
distortion
 Animation
 Solid rendering
 Depth cues
 Colour,lighting
 Highlighting
 Etc…
EMBL-EBI
Visualisation : comparative
analysis
Similarity/Difference
Data superposition
Attribute display
Colour, size…
Correlation
Attribute mapping
Sequence colour by
structure alignment