Transcript Slide 1

Service Oriented Bioscience Cluster
at OSC
Umit V. Catalyurek
Associate Professor
Dept. of Biomedical Informatics
Dept. of Electrical & Computer Engineering
The Ohio State University
Department of
Biomedical Informatics
Origins of caBIG
• Goal: Enable investigators and research teams
nationwide to combine and leverage their findings and
expertise in order to meet NCI 2015 Goal.
“Relieve suffering and death
due to cancer by the year
2015”
• Strategy: Create scalable, actively managed
organization that will connect members of the NCIsupported cancer enterprise by building a biomedical
informatics network
Department of
Biomedical Informatics
2
Driving needs:
cancer Biomedical Informatics Grid
• A multitude of “legacy” information systems, most of which cannot
be readily shared between institutions
• An absence of tools to connect different databases
• An absence of common data formats
• A huge and growing volume of data must be collected, analyzed,
and made accessible
• Few common vocabularies, making it difficult, if not impossible, to
interlink diverse research and clinical results
• Difficulty in identifying and accessing available resources
• An absence of information infrastructure to share data within an
institution, or among different institutions
Department of
Biomedical Informatics
3
What is caBIG?
• Common, widely distributed infrastructure that permits
the cancer research community to focus on innovation
• Shared, harmonized set of terminology, data elements,
and data models that facilitate information exchange
• Collection of interoperable applications developed to
common standards
• Cancer research data available for mining and integration
Department of
Biomedical Informatics
What is caGrid?
• A grid based software infrastructure consisting of
services, toolkits, APIs, and applications
• A production grid deployment of the core services
provided by that infrastructure
• A community of developers leveraging that grid and
infrastructure to provide applications and services to the
cancer research community
Department of
Biomedical Informatics
5
What is caGrid?
•
•
•
•
•
•
•
Development project of Architecture Workspace
The Grid infrastructure for caBIG (the “G” in caBIG)
Driven from use cases and needs of cancer research community
Service Oriented Architecture
Based on federation
Model Driven
Object-Oriented, Semantically-Annotated Data Virtualization
Department of
Biomedical Informatics
6
What is caGrid? cont…
• Builds on existing Grid technologies
• Provides additional enterprise Grid components
•
•
•
•
•
•
•
•
•
•
Grid Service Graphical Development Toolkit
Metadata Infrastructure
Advertisement and Discovery
Semantic Services
Data Service Infrastructure
Analytical Service Infrastructure
Identifiers
Workflow
Security Infrastructure
Client tooling
Department of
Biomedical Informatics
7
caGrid Community Involvement
• caGrid itself provides no real “data” or “analysis” to caBIG™; its the
enabling infrastructure which allows the community to do so
• Community members add value to the grid as applications,
services, and processes (for example: shared workflows)
• caGrid provides the necessary core services, APIs, and tooling
• The real “value” of the grid comes from bringing this information
to the “end user”
• Community members develop end user applications which
consume of the resources provided by the grid
Department of
Biomedical Informatics
8
caGrid @ OSC
• Goals:
• Create an expandable caGrid Installation at OSC
• Deploy Pilot Applications to demonstrate
Service Oriented Access to HPC resources
• Dorian, GTS and Index services are deployed
• cagrid-dorian01.osc.edu
• cagrid-gts01.osc.edu
• cagrid-index01.osc.edu
• SyncGTS along with Dorian and Index for performance
• caGrid 1.2 was released this week, and we deployed it!
Department of
Biomedical Informatics
9
Pilot Application : TMA
• Image Mining for Performing Comparative Analysis of Expression Patterns in
Tissue Microarrays
•
Project funded by NIH R01 (PI: David Foran, Co-PI: Joel Saltz)
• Development of innovative analysis methods for analysis of tissue microarrays
•
Computation of features, annotations of image data based on features
• Development of software support
•
•
to manage and share tissue microarray data and analysis results
to process large volumes of tissue microarray data on high performance systems
• Development of ability to share data and analytical resources using caGrid
• Supports Help Defeat Cancer project which 100,000 imaged histology specimens
originating from breast, head & neck, colorectal cancers.
Department of
Biomedical Informatics
TMA Analytical Service
Implementation
• TMA Application is a pipelined workflow
• Several processing steps that need to be applied in sequence to the images
• Build a prototype workflow orchestration system
• Wraps a program execution
• Stages the the data in
• Invoke the executable
• Retrieve the output files
• Uses caGrid’s bulk data transfer to move files from host to host
• Interacts with a scheduler to allocate resources for the execution
• Executable can be a parallel/distributed application
• TMA user interface
• Specify the workflow
• List with executables and parameters
• Invoke the service for the first stage
Department of
Biomedical Informatics
11
What is next?
• Next Pilot Application: Prof. Dan Janies’ Supramap
• http://supramap.osu.edu
• Builds a phylogenetic tree and projects onto the map of the
planet
• Computationally expensive
• Next Pilot Application(s): Your Application!?
• More Info: http://bmi.osu.edu and http://www.cagrid.org
• Contact: Umit V. Catalyurek email: [email protected]
Department of
Biomedical Informatics
12