Transcript PPT

EMBL-EBI
Dimitris Dimitropoulos
MSD-mine
EMBL-EBI
MSD-mine overview
 Web application for online data analysis and mining
For the advanced MSDSD researcher
Flexible guidance for ad-hoc queries
Exploitation of integrated knowledge
Analysis, charts and Data drill
 Flexible combination of data with multiple joins
 Generic but customised for the MSDSD
EMBL-EBI
Characteristics
 Classical systems give list of entries for visualisation
 MSD-mine returns detailed records, homogenised and
ready for analysis
 Allows arbitrary queries on the more than
100 entities (tables)
organised in 9 sections (or marts)
restrictions and results for 2000 attributes
combine entities based on 450 relations
 Operability safeguards
Reject long queries (10 mins) and overload of
results (1000 rows)
EMBL-EBI
Exploring MSDSD
 Explores and explains MSDSD
With context sensitive help and descriptions
With links to MSDSD documentation
 Helps to understand the structure of MSDSD
 Helps learning query writing in SQL for advanced
custom queries
EMBL-EBI
Filter build page
 Areas on the page
Entity area (E): select
entities and relations
Restriction area (R):
set or view the
restrictions
Filter area (F): view the
nodes of the filter
Description area (D):
context sensitive
documentation
EMBL-EBI
MSDSD marts
 MSDSD is organised in sections (marts)
 Each mart is a set of entities that may start a filter
EMBL-EBI
Define Restrictions
 Select the attribute
 Choose the operator
 Type in the value or
select one from a
sample list
 Add the new restriction
EMBL-EBI
Combine entities
 Using one of its relations
 Relations are organised
per mart
 Understand cardinality
 User may choose the new
entity as the working node
and follow its relations
EMBL-EBI
MSD preferences
 User may set preferences to
specify MSDSD shortcuts
for filters
All assemblies –
Representative assembly –
Assymetric unit
All models –
Representative model
One chain per sequence
All entries –
SCOP or DALI entries –
Custom set
EMBL-EBI
Execute query
 View-Navigate results
 Load all records
 Set result based
constraints
 View details
 Navigate relation
links
 Export in
Text-XML-Script
EMBL-EBI
Data analysis
 Complete or Sample
 Range or Value
 Fully customisable
 Context sensitive
chart
 Data drill operations
EMBL-EBI
Analysis over a base attribute
 Choose base
attribute
 Choose grouping
operation for
analysis attribute
 Options and
data-drill operations
supported
EMBL-EBI
First Example
 Find the entries with resolution < 1.2
 Select the “Structure” mart
and
 Choose the Entry table
 Set the restriction on
resolution
 Browse the results
EMBL-EBI
Filter Expressions
 Find the entries with resolution < 1.2 and are related
to HEMOGLOBIN
 Add the main restriction on
the resolution and
 Add a sub-expression where
the logical operator is “Or”
 And the title contains the
word “HEMO” or “HAEMO”
or “GLOBIN”
EMBL-EBI
A simple distribution chart
 Find the distribution of assembly types
 Use the “Assembly” table
from the “Structure” mart
 Execute the query
 Go to the analysis page for
the “Assembly type”
attribute
EMBL-EBI
Relation and external links
 Find entries related to “cell death” and follow their GO (gene
ontology) mappings and the links to the external GO service
 Use the “Entry” table where
the title contains the word
“death”
 Follow the GO mappings for
a particular entry
 Follow the links to the GO
database
EMBL-EBI
A more complex example
 Find the active site contacts of helices that are part of betaalpha-beta motifs
 Examine their linearity
 Select “Motif” as the starting
point and combine with “Helix”
and “Residue Contacts”
 Add a restriction
 View results and statistics for
the helix linearity
 Focus (drill) on an area of
interest
EMBL-EBI
Saving results and exporting
 Find the binding sites of “kinked” residues
 Build the query by combining
“Residue”, “Helix” and “Site”
tables
 Save the results on a local file
 Export the results
in XML
TAB delimited
as a script
EMBL-EBI
Preferences and representative sets
 Find the distribution of
number of crystals in
experiments
 Use the “XRay-data” table
 View the distribution of number
of crystals
 For the whole PDB
 For the DALI representative
set
 For our own custom
representative set
EMBL-EBI
Custom filters and results
 Find the percentage of residues that interact in helix
interactions, of helices with similar size
 Use the “Helix interaction” table
 Add a custom “normalised
interaction factor” result item
 Add a custom restriction “one
helix is at most double in size
than the other”
 View the distribution of the
“interaction factor”