context mediation

Download Report

Transcript context mediation

MASSACHUSETTS INSTITUTE OF TECHNOLOGY
SLOAN SCHOOL OF MANAGEMENT
INFORMATION TECHNOLOGIES GROUP
SEMANTIC INTEGRATION
(COIN PROJECT)
For Dr. Bob Popp, DARPA
8 April 2003
Stuart Madnick ([email protected])
Michael Siegel ([email protected])
Richard Wang ([email protected])
1
COntext INterchange (COIN) Project
Applications
Receivers
OUTPUT
PROCESSING
CONTEXT
MEDIATION
INPUT
PROCESSING
ODBC Driver
* Automatic
conflict
detection and
conversion
* Automatic
web
wrapping
- Derived data
Web Publishing
- Source selection
- Source attribution
TRUSTED
AGENTS
Browsers
- Semistructured text
-Multi-source
query plan and
execution
APPLICATIONS: Financial services, electronic
commerce, asset visibility, in-transit visibility.
Web
Pages
Sources
Data
bases
2
Background on DARPA Support
for Context Mediation Research
• Initial efforts funded as part of DARPA Intelligent
Integration of Information (I3) Program
• Period: July 1993 - Sept 1998
• Started under: Gio Wiederhold
• then under: Dave Gunning & Bob Neches
Other related activity:
• MIT Total Data Quality Management (TDQM)
• Since 1991 (web.mit.edu/tdqm)
3
Multiple Perspectives . . .
old lady or young lady ?
4
Role Of Context
02-01-03
Context
Context
$
01-02-03
£
?
Context
¥

03-02-01
CONTEXT VARIATIONS:
- GEOGRAPHIC ( US vs. UK )
- FUNCTIONAL (CASH MGMT vs. LOANS )
- ORGANIZATIONAL ( CITIBANK vs. CHASE )
Data:
Databases
Web data
E-mail
5
Example : Context Differences
( from multiple web sources)
Daimler Benz ( DAI ) Financial Data
P/E Ratio
11.6
ABC
Bloomberg 5.57
DBC
19.19
MarketGuide 7.46
6
Complementary Aggregation Example
• Q: How did CO2 emissions
(total, per GDP, per capita)
change over time
(between 1990 and 2000)
in Yugoslavia?
– User 1: YUG as a geographic
region bounded before the
breakup
– User 2: YUG as a legal
autonomous state
Related effort:
- Laboratory for Information Globalization and Harmonization Technologies (LIGHT)
7
World Bank’s World Dev.
Indicator DB; UN Statistic
Division; Statistics Bureaus
OAK Ridge’s CDIAC DB;
WRI; GSSD; EPAs
Olsen (Web)
CO2 Emission
1990
2000
Countr
y
GDP
Pop
GDP
Pop
YUG
698.3
23.
7
1627.
8
10.
6
BIH
13.6
3.9
HRV
266.9
4.5
MKD
608.7
GDP in billions local currency;
SVN
7162
Population in millions
2.0
Fro
m
To
1990
2000
10.5
67.267
Country
1990
2000
YUG
35604
1548
0
USD
YU
G
BIH
1279
USD
BIH
2.086
HRV
5405
USD
8.089
MKD
3378
HR
V
USD
MK
D
64.757
USD
SV
N
225.93
InSVN
1000 tons per year3981
2.0
User 1
User 2
Country
Code
Currency
CurCod
e
Country
1990
2000
1990
2000
Yugoslavia
YUG
New
Yug.
Dinar
YUN
CO2
35604
29523
35604
1548
0
GDP
66.5
104.8
66.5
24.2
Marka
BAM
CO2/capita
1.5
1.28
1.5
1.46
CO2/GDP
535
282
535
640
Bosnia and
Herzegovia
BIH
Croatia
HRV
Kuna
HRK
Macedonia
MKD
Denar
MKD
Slovenia
Tolarneeded:
SIT
ManySVN
sources
Meanings in sources & users might differ
GDP/Capit 2800
4560
2800
1100
Total
CO2
in
1000
tons
per
year;
GDP
in
billions
USD;
a
CO2/Capita in tons per person; CO2/GDP in tons per
8
million USD; GDP/Capita in USD per person
The 1999 Overture
Unit-of-measure mixup tied to
loss of $125Million Mars Orbiter
“NASA’s Mars Climate Orbiter was lost because
engineers did not make a simple conversion from English
units to metric, an embarrassing lapse that sent the $125
million craft off course. . . .
. . . The navigators ( JPL ) assumed metric units of
force per second, or newtons. In fact, the numbers were
in pounds of force per second as supplied by Lockheed
Martin ( the contractor ).”
Source: Kathy Sawyer, Boston Globe, October 1, 1999, page 1.
9
The Context Interchange Approach
Concept:
Length
Meters
f()
meters
feet
Feet
Shared
Ontologies
Source
Context
Conversion
Libraries
Context
Mediator
Receiver
Context
part
length
Context
Transformation
17
Source
Context Management
Application
Select partlength
From catalog
Where partno=“12AY”
Receiver
10
COIN Elevation Axioms
(Ontology)
11
Another Context Example
Company Name DAIMLER-BENZ
Net Income
614,995
97,736,992
Sales
Context Mediation
Services
*
Datastream
Company Name DAIMLER-BENZ AG
Net Income
346,577
Sales
*
56,268,168
WorldScope
Company Name DAIMLER BENZ CORP
Net Income
615,000,000
Sales
*
97,737,000,000
Disclosure
O&A DEM-USD Exchange Rate
1.00 German Mark= 0.58 US Dollar as 12/31/93
*
OANDA
Web Server
Users & Appl.
Systems
* Wrapper Services
12
Some Context Differences
Context Definitions
Disclosure
Country of
Incorporation
Money Amount
As_Of_Date
3 Letters
Currency
Used
Currency
Conversion
Currency
Symbols
Scale Factor 1
Disclosure Names
Company
Names
American with ‘/’ as
Date Style
separator
Worldscope
USD
Money Amount
As_Of_Date
3 Letters
DataStream
Country of
Incorporation
Money Amount
As_Of_Date
2 Letters
1000
Worldscope Names
1000
DataStream Names
American with ‘/’ as
separator
European with ‘-’ as
separator
Olsen (OANDA) Web Source uses 3 Letter Currency Symbols and European Date Style
with ‘/’ as a separator
13
Domain Model
exchangeRate
scaleFactor
number
string
countryName
currencyType
date
companyFinancials
Inheritance
Attribute
Modifier
company
companyName
Some currency context possibilities:
• Currency is stated explicitly as part of record
• Currency not stated, but the same for all (e.g., US $)
14
• Currency not stated or constant, but inferred by country
COIN System Architecture
SERVER PROCESSES
MEDIATOR PROCESSES
CLIENT PROCESSES
Web Client
SQL Query
Context
Mediator
Datalog
Query
WWW Gateway
Optimizer
HTTPD-Daemon
Wrapper
Executioner
Mediated
Query
Optimized
Query Plan
(cgi-scripts)
N
HTTPD-Daemon
SQL Compiler
SQL Query
HTTPD-Daemon
COIN
Repository
N
ODBC-compliant Apps
Results
(e.g Microsoft Excel)
ODBC-Driver
HTTPD-Daemon
Web-site
Data Store for
Intermediate
Results
15
System Demonstration
Single Source Queries with Mediation
Q6. Scenario: Using Context Interchange, the
financial analyst can look at the Disclosure data
using Datastream Context.
Query: Find out from Disclosure what Net Income for
DAIMLER-BENZ was. Use Datastream Context.
Capabilities Demonstrated:
Ability to perform Scale Factor Conversion, Date
Format Conversion, Company Name Conversion.
16
Demonstration – context2.mit.edu
Source
Context
17
Conflict Detection and Mediation
Mediated Query in Datalog
Date convert
Scale factor convert
Name convert
18
Mediated SQL Query & Result
Mediated SQL Query
Adjust scale factor
Date format
conversion
Name conversion
Final results – from Disclosure but in Datastream context
19
The 1805 Overture
In 1805, the Austrian and Russian Emperors agreed
to join forces against Napoleon. The Russians promised
that their forces would be in the field in Bavaria by Oct. 20.
The Austrian staff planned its campaign based on
that date in the Gregorian calendar. Russia, however, still
used the ancient Julian calendar, which lagged 10 days
behind.
The calendar difference allowed Napoleon to
surround Austrian General Mack's army at Ulm and force
its surrender on Oct. 21, well before the Russian forces
could reach him, ultimately setting the stage for Austerlitz.
Source: David Chandler, The Campaigns of Napoleon, New York: MacMillan 1966, pg. 390.
20
Summary
• Tremendous opportunity to gather and integrate
information from many diverse sources
• But … need to overcome many context challenges
• Context-type “metadata” plays a critical role
• COIN technology can be an important aid
21