Technical Director National e

Download Report

Transcript Technical Director National e

Can Grids Deliver the Vision for
Future Hypothesis Driven Life
Science Research?
Professor Richard Sinnott
Technical Director National e-Science Centre
University of Glasgow
9th May 2006
Grids and e-Research
• Classical characteristics
– HPC, data deluge, …
• More recent push
– Security, virtual organisations, usability, …
computers
software
Grid
sensor nets
instruments
colleagues
Shared data
archives
E-Health Future Drivers
• The big questions
– Why do people who eat less tend to live longer?
– Is there a genetic reason why Scotland has such a high incident
rate of cardiovascular disease? How significant are social, cultural,
occupational factors in this?
– …
• Tailored e-Heath
– Wouldn't it be wonderful to know what measures you could take to
stave off/prevent the onset of disease?
– Wouldn't it be a relief to know that you are not allergic to the drugs
your doctor just prescribed?
– Wouldn't it be a comfort to know that the treatment regimen you are
undergoing has a good chance of success because it was
designed just for you?
Protein-protein interaction (pathways)
Protein Structures
Gene expressions
Nucleotide structures
GRID
SECURITY
+social, lifestyle, occupational,
environmental, …
Epidemiology
Populations
Organisms
Physiology
Tissues
The Big Picture…
NeSC in the UK
• There are still issues to be resolved
– OGSA definition and delivery
• Standards OGSI, WSRF, …
• Technologies GT2, GT3, GT4, EGEE, OMII…
– What about the science drivers Glasgow
Challenges/
•
What
data
sets,
what
services, accessed by
The next Grid
Opportunities
whom, …
Belfast
software
– Longevity of systems…? ?
Lab
– If I build a Grid infrastructure forDaresbury
you, do you
NeSC
Edinburgh
Newcastle
White Rose Grid
ManchesterGrid Service
Core National
promise not to change your requirements
CSA Oxford
(completely!)
R
Cardiff
RAL
HPC(x
)
Cambridge
Hinxton
London
Southampton
Grid Projects & Experiences
BRIDGES Project
CFG Virtual
Publically Curated Data
Ensembl
Organisation
OMIM
Glasgow
SWISS-PROT
Private
Edinburgh
MGI
VO Authorisation
Private
data
Oxford
Information
Integrator
Synteny
Service
Magna
Vista
Service
London
HUGO
…
RGD
Leicester
DATA
HUB
OGSA-DAI
Private
data
data
Private
data
Netherlands
Private
data
Private
data
+
+
+
MagnaVista
www.nesc.ac.uk
BRIDGES Security
• Used PERMIS (www.permis.org) to provide fine grained
security (authorisation)
– XML based policies digitally signed (tamperproof) and used to
make authorisation decisions when users invoke services
• (XACML based policies coming…)
– Use SAML callouts to transparently link Grid service and policies
• Data Policies
– Only members of CFG can access all public and local warehoused
data
– Other guest users can only access remote genome databases
• Security at DB level!
• Computational Policies
– CFG members can run BLAST across NGS, Glasgow clusters and
Condor pools
– Guest users only get access to the Condor pool
• Users do not need their own X.509 certificates – all hidden behind
portal!
BRIDGES data
• Originally planned that would have many different
types of data with different security requirements
– Public data: data from public sources
– Processed public data: public data that has additional annotation or indexing to
support the analyses needed by CFG
– Sensitive data: data about individuals in the cohorts of patients and the data
derived from animal experiments
– Special experimental data: such as quantitative trait loci (QTL) or microarray data
– Personal research data: data specific to a researcher as a result of experiments or
analyses that that researcher is performing
– Team research data: data shared by the team members at a site
– Consortium research data: data produced by one site or a combination of sites that
is now available for the whole consortium
– Personalisation data: metadata collected and used by the bioinformatics tools
pertinent to individual users
• …but scientists reluctant to share their data!
JDSS Project
• Public data resources openness
– Often cannot query directly nor easy/possible to find
schemas (and they change… often!)
– Joint Data Standards Study investigated this
• Funded by MRC, BBSRC, Wellcome Trust, JISC, NERC, DTI
and involved
– Digital Archiving Consultancy
– NeSC (Edinburgh and Glasgow)
– Bioinformatics Research Centre (Glasgow)
• Looked at technical, political, social, ethical etc issues involved in
accessing and using public life science resources
• Final report completed September 2005 and available at:
– www.mrc.ac.uk/prn/pdf-jdss_final_report.pdf
» (to also appear as a NeSC technical report)
Grid Enabled Microarray Expression
Profile Search (GEMEPS)
• 1 year project (just) started 1st March 2006
– Funded by BBSRC
• Involves Glasgow, Cornell University, US, Riken Institute, Japan
– Aim to provide tools for discovery, comparison and analysis of
microarray data sets
• How does my data compare to others?
• How do these experiments compare?
• Can we improve the way we establish how genes in different species are
linked?
• …
– Microarrays expensive and contain potentially important (valuable) data
sets
– Fine grained security essential (and willingness of researchers to collaborate)!
Grid Enabled Microarray Expression
Profile Search (GEMEPS)
• Why bother…?
– Major journals require experimental data to be published
– Minimal Information About a Microarray Experiment (MIAME) standard
• Does not provide sufficient information for scientist to repeat experiment, to
compare results, …
– Scientists often unwilling to spend time to provide additional meta-data
• …experiences from BRIDGES
– Scientists also now questioning sensitivity of microarray data results
• Gene names and expression values vs ordering of gene expression values
• Initial prototypes support both of these but issues of gene naming
– entrez, unigene, go, …
– Work on searching/mining of public repositories on-going
• including GEO, arrayExpress, …
Grid Enabled Occupational Data
Environment (GEODE)
• GEODE
– Funded by ESRC lead by University of Stirling with NeSC Glasgow
• Two year project aiming to develop Grid enabled portal for occupational
data
– includes integration of various existing classification scheme
– Many occupational classification schemes exist
• Used by different researchers/sociologists
– Linkage to national and international census data sets
– When is a plumber not a plumber?
– When they are a water transport technician…?
• How many plumbers had a heart attack in Scotland in the last 2 years?
VOTES
• Virtual Organisations for Trials and Epidemiological Studies
– 3 year MRC (£2.8M) funded project expected to start imminently
– Plans to develop Grid infrastructure to address key components of
clinical trial/observational study
• Recruitment of potentially eligible participants
• Data collection during the study
• Study administration and coordination
– Involves Glasgow, Oxford, Leicester, Nottingham, Manchester
Clinical Virtual Organisation Framework
Used to realise
CVO-1
(e.g. for data
collection)
CVO-2
(e.g. for
recruitment)
LeiNott
GLA
Disease
registries
Hospital
databases
Transfer
Grid
GPs
OX
IMP
Clinical trial
data sets
– Prototypes available now building on SCIStore, GPASS, consent DB, existing
trials repositories
Distributed Data Framework
Portal
Grid Server
Access
Security
Policies
Data Server
Authorisation
Access Matrix
Security Policies
Globus
Container
OGSA-DAI
Service
User
Authentication
Glasgow GlasgowSCI Store 1
GPASS
Local
Trust
Policies
(SQL
Server)
SCI Store
1
(SQL Server)
Driving
DB
SCI Store 2
(SQL Server)
Local
Trust
Policies
Remote
Trust
Policies
Consent DB
(Oracle 10g)
RCB Test
Trials DB
(SQL Server)
Local
Trust
Policies
Local
Trust
Policies
Other
Transfer
Grid
Nodes
VOTES Data Federation Portal
Beta Prototype
VOTES Data Federation Portal
Beta Prototype
VOTES Data Federation Portal
Beta Prototype
VOTES Data Federation Portal
Beta Prototype
VOTES Data Federation Portal
Beta Prototype
Generation Scotland Scottish
Family Health Study
• Five (2+3) year proposal (£4.6M) started January 2006
– Funded by Health Department and Department for Enterprise and
Lifelong Learning
• Involves Glasgow, Dundee, Edinburgh, Aberdeen
– focus of genetics as applied to healthcare
– first two years emphasis on providing a platform for research into the genetic
basis of common complex diseases in Scotland
» Mental health, cardiovascular, …
» Plan to establish 15,000 family-based intensively-phenotyped cohort
recruited from the East and West of Scotland
– basis for neutralising heritable (genetic) risk factors in disease surveillance,
treatment optimisation, avoidance of adverse drug events and prediction of
response to therapy, health care planning and drug discovery, …
– Recruitment process has started already!
Security Related Projects
• GLASS
– JISC funded started January 2006
• Exploring early adoption of Shibboleth
– Working with Computer Services directly
• Scenarios based upon teaching and access to NHS resources/data
• Builds upon university wide unified account management system being
rolled out (based on Novell nSure technology)
• ESP-Grid
– JISC/Oxford University funded
• Developing demonstrator to show how Grid resources can be accessed
and used via Shibboleth technology
– Initial prototypes already available
• Grid Security Report
– JISC/JCSR funded
• Focus on Grid security practices, middleware and outlook
– Contact me if want a copy!
DyVOSE Project
Glasgow
Edinburgh
Condor pool
Glasgow SoA using Edinburgh DIS
LDAP
Glasgow
Education
VO policies
Create new ACs for
Glasgow users/roles
Edinburgh
Education
VO policies
PERMIS based Authorisation
checks/decisions
Job scheduling/
data management
Grid BLAST
Service
Implemented
by Students
LDAP
Grid BLAST
Data
Service
data input
Grid-data
Client
Protein/nucleotide sequence data returned based on
student team and Edinburgh policy
Nucleotide
+ Protein
Sequence
DB
Future
• The Grid is not a magic wand
– Your data quality issues won’t go away
– We can however identify what these are
• SCIStore schema incompatibilities
• Ethics and legal aspects essential
– Working closely with NHS
• Consent crucial
– Scenarios now implemented looking at patient consent via GPASS
Protein-protein interaction (pathways)
Protein Structures
Gene expressions
Nucleotide structures
GRID + Security
Populations
Organisms
Physiology
Tissues
The Future…