Transcript PPT
EMBL-EBI
Dimitris Dimitropoulos
MSD-mine
EMBL-EBI
MSD-mine overview
Web application for online data analysis and mining
For the advanced MSDSD researcher
Flexible guidance for ad-hoc queries
Exploitation of integrated knowledge
Analysis, charts and Data drill
Flexible combination of data with multiple joins
Generic but customised for the MSDSD
EMBL-EBI
Characteristics
Classical systems give list of entries for visualisation
MSD-mine returns detailed records, homogenised and
ready for analysis
Allows arbitrary queries on the more than
100 entities (tables)
organised in 9 sections (or marts)
restrictions and results for 2000 attributes
combine entities based on 450 relations
Operability safeguards
Reject long queries (10 mins) and overload of
results (1000 rows)
EMBL-EBI
Exploring MSDSD
Explores and explains MSDSD
With context sensitive help and descriptions
With links to MSDSD documentation
Helps to understand the structure of MSDSD
Helps learning query writing in SQL for advanced
custom queries
EMBL-EBI
Filter build page
Areas on the page
Entity area (E): select
entities and relations
Restriction area (R):
set or view the
restrictions
Filter area (F): view the
nodes of the filter
Description area (D):
context sensitive
documentation
EMBL-EBI
MSDSD marts
MSDSD is organised in sections (marts)
Each mart is a set of entities that may start a filter
EMBL-EBI
Define Restrictions
Select the attribute
Choose the operator
Type in the value or
select one from a
sample list
Add the new restriction
EMBL-EBI
Combine entities
Using one of its relations
Relations are organised
per mart
Understand cardinality
User may choose the new
entity as the working node
and follow its relations
EMBL-EBI
MSD preferences
User may set preferences to
specify MSDSD shortcuts
for filters
All assemblies –
Representative assembly –
Assymetric unit
All models –
Representative model
One chain per sequence
All entries –
SCOP or DALI entries –
Custom set
EMBL-EBI
Execute query
View-Navigate results
Load all records
Set result based
constraints
View details
Navigate relation
links
Export in
Text-XML-Script
EMBL-EBI
Data analysis
Complete or Sample
Range or Value
Fully customisable
Context sensitive
chart
Data drill operations
EMBL-EBI
Analysis over a base attribute
Choose base
attribute
Choose grouping
operation for
analysis attribute
Options and
data-drill operations
supported
EMBL-EBI
First Example
Find the entries with resolution < 1.2
Select the “Structure” mart
and
Choose the Entry table
Set the restriction on
resolution
Browse the results
EMBL-EBI
Filter Expressions
Find the entries with resolution < 1.2 and are related
to HEMOGLOBIN
Add the main restriction on
the resolution and
Add a sub-expression where
the logical operator is “Or”
And the title contains the
word “HEMO” or “HAEMO”
or “GLOBIN”
EMBL-EBI
A simple distribution chart
Find the distribution of assembly types
Use the “Assembly” table
from the “Structure” mart
Execute the query
Go to the analysis page for
the “Assembly type”
attribute
EMBL-EBI
Relation and external links
Find entries related to “cell death” and follow their GO (gene
ontology) mappings and the links to the external GO service
Use the “Entry” table where
the title contains the word
“death”
Follow the GO mappings for
a particular entry
Follow the links to the GO
database
EMBL-EBI
A more complex example
Find the active site contacts of helices that are part of betaalpha-beta motifs
Examine their linearity
Select “Motif” as the starting
point and combine with “Helix”
and “Residue Contacts”
Add a restriction
View results and statistics for
the helix linearity
Focus (drill) on an area of
interest
EMBL-EBI
Saving results and exporting
Find the binding sites of “kinked” residues
Build the query by combining
“Residue”, “Helix” and “Site”
tables
Save the results on a local file
Export the results
in XML
TAB delimited
as a script
EMBL-EBI
Preferences and representative sets
Find the distribution of
number of crystals in
experiments
Use the “XRay-data” table
View the distribution of number
of crystals
For the whole PDB
For the DALI representative
set
For our own custom
representative set
EMBL-EBI
Custom filters and results
Find the percentage of residues that interact in helix
interactions, of helices with similar size
Use the “Helix interaction” table
Add a custom “normalised
interaction factor” result item
Add a custom restriction “one
helix is at most double in size
than the other”
View the distribution of the
“interaction factor”