Using OGSA-DAI in a commercial environment
Download
Report
Transcript Using OGSA-DAI in a commercial environment
Using OGSA-DAI in a commercial
environment
Terry Sloan
EPCC
Telephone: +44 131 650 5155
Email: [email protected]
Overview
FirstDIG
INWA
Outstanding issues raised by these projects
First Data Investigation on the Grid:
FirstDIG
http://www.epcc.ed.ac.uk/firstdig/
Motivation
Few UK e-Science projects involve service
companies such as First plc
First plc
– Operate worldwide in variety of transport sectors
– Over 10000 vehicles in the UK, 23% of the market
– UK’s largest operator
The challenge for First
– Meeting the needs of the travelling public whilst making money
– Data integration and mining may assist but huge range of
fragmented data sources
Data Sources in the Bus
Industry
Many different kinds of data involved with running
a bus company
– Mileage, revenue, customer contact, schedule, fuel consumption,
vehicle maintenance, routes…
Many means to collect data
–
–
–
–
Manually entered data at depot
Data collected on buses from ticket machines
Data collected on buses from GPS systems
GPS system notes when bus passes through a predefined
“footprint” and records the time at which this happens
Answering Business Questions
Want to combine data from more than one
source:
– Complaints versus Lateness
– Revenue versus Lost Miles
– Complaints versus Lost Miles
Want data aggregated in some way:
– By Service
– By Day
Want to consider subsets of the data
– e.g. weekdays only
Disparate Databases
Data is typically stored in disparate databases
– Various reasons for this: Incremental construction of systems.
– Not a problem for day-to-day running and querying but…
Introduces challenges for Data Analysis
–
–
–
–
–
–
Systems introduced at different times
Different database engines
Different front-ends
Different operating systems
Different physical locations
Different ways of representing data
These issues are NOT unique to buses
OGSA-DAI
OGSA-DAI
– Open Grid Services Architecture : Data Access and Integration
– Potentially provides a solution
– Need business users to make transition from science to commerce
Grid middleware:
– Assists with the access and integration of data from separate data
sources via the Grid
– Represents databases as Grid Services
– Enables access from other machines in a secure manner
FirstDIG Achievements
Deployment at First South Yorkshire
Combined two databases to answer real
business questions
– The Customer Contact System
• Microsoft Access
• Information on customer complaints e.g. time, service, nature
– The Mileage database
• dBASE IV
• Information on bus mileage e.g. lost miles
Produced generic Grid Data Service Browser
– SQL access including joins across the databases
First Grid Data Service
Browser
Informing Business & Regional Policy:
Grid-enabled fusion of global data &
‘local’ knowledge
INWA
http://www.epcc.ed.ac.uk/~inwa/
INWA
An e-Social Science demonstrator
– Demonstrates how grid technologies can improve business
– Combining private and public data sources
– Finance and Telecommunications
Uses many grid technologies
– TOG from Sun DCG provides access to remote HPC resource
– OGSA-DAI provides access control and discovery of distributed
heterogeneous data resources
– FirstDIG grid data service browser provides SQL access to
OGSA-DAI enabled resources
– Globus Toolkit 2 and 3
INWA Grid Infrastructure
User@Edinburgh
User@Curtin
Grid Engine FirstDIG
FirstDIG Grid Engine
Bank
Telco
Bank
TOG
Telco
TOG
Globus
Grid
Curtin
EPCC
UK Property
data service
Bank data
Australian
Property
data service
Telco data
References
EPCC
– http://www.epcc.ed.ac.uk/
FirstDIG
– http://www.epcc.ed.ac.uk/firstdig/
OGSA-DAI
– http://www.ogsadai.org.uk
INWA
– http://www.epcc.ed.ac.uk/~inwa
Sun Data & Compute Grids
– http://www.epcc.ed.ac.uk/sungrid/
Transfer-queue Over Globus (TOG)
– http://gridengine.sunsource.net/project/gridengine/tog.html
Outstanding issues raised by FirstDIG & INWA
Outstanding Issues:
Usability
OGSA-DAI is middleware, client toolkit helps
Incorporation of demo First browser helpful’ish
But really want …
Interfaces to real data analysis & dbms packages eg
SPSS
Otherwise users could end up building applications that
replicate these eg the First Grid Data Service Browser
Want to be able to point Access, Excel, etc at a grid data
source and examine it
Outstanding issues:
Data
CSV (Comma separated value) data sources
– are common but current JDBC-ODBC drivers do not have
sufficient functionality (NOT an OGSA-DAI issue per se)
No support for BIT type field
– And others eg BOOLEAN, BINARY, etc
Certain characters (eg &, >) are not handled by
the OGSA-DAI XML parser
– Company names often have & in them
Dates from certain sources not handled properly
– First Grid Data Service has to handle this internally
Outstanding issues:
Miscellaneous
Security
– Rolemap file is not encrypted
– If one GDS accesses another GDS the user security credentials
are not passed on so it does not work
Installation & Testing
– Install & Set-up
• Well-explained but still a fair amount of user effort involved
– Lack of an example OGSA-DAI site to point at to test that your
OGSA-DAI installation works
Outstanding Issues:
Miscellaneous
Installation & Testing
– Lack of an example OGSA-DAI site to point at to test that your
OGSA-DAI installation works
Large results sets
– Can increase JVM size but this is not scalable
– This occurred on most datasets
Integration
– DQP is a start ….(Linux, OQL)
Why use OGSA-DAI ?
– Easysoft etc
– http://www.easysoft.com/products/2001/main.phtml
Why use OGSA-DAI ?
‘a RDBMS engine that appears
to client apps as a fully
conformant ODBC 3.5 data
source….can be used to
provide real-time,
heterogeneous access to
multiple target data sources.’