Intelligent Information and Access Infrastructures

Download Report

Transcript Intelligent Information and Access Infrastructures

Intelligent Access to Digital Heritage Conference
19 Oct. 2007, Tallinn, Estonia
Intelligent
Information and Knowledge
Infrastructures
Daniel Olmedilla
L3S Research Center & Hannover University
Outline
 L3S Background
 Introduction & Motivation
 Personalized Search & Ranking
 Privacy & Access Control
 EU Projects Summary
Daniel Olmedilla
19 Oct. 2007
2
Outline
 L3S Background
 Introduction & Motivation
 Personalized Search & Ranking
 Privacy & Access Control
 EU Projects Summary
Daniel Olmedilla
19 Oct. 2007
3
L3S Background
Mission and Focus
L3S research focuses on innovative and cutting-edge methods and
technologies for three key enablers for the European
Information Society:
 Knowledge
 Information
 Learning
LS3 projects focus on
 digital resources and their technological underpinnings:
 Digital libraries and Search
 Semantic Web and Knowledge Sharing
 Distributed Systems, Networks and Grids
 the use of these resources in eLearning and eScience contexts
Daniel Olmedilla
19 Oct. 2007
4
L3S Background
Area “Semantic Web & Digital Libraries”
provide personalized access to distributed information resources and
advanced search and recommendation functionalities
provide enhanced search on the desktop, in companies, on the Web
enhance traditional libraries with digital content and personalized
library services
Daniel Olmedilla
19 Oct. 2007
5
Outline
 L3S Background
 Introduction & Motivation
 Personalized Search & Ranking
 Privacy & Access Control
 EU Projects Summary
Daniel Olmedilla
19 Oct. 2007
6
Introduction & Motivation
Conference Theme
Intelligent Access
to Digital Heritage
Daniel Olmedilla
19 Oct. 2007
7
Introduction & Motivation
UNESCO E-Heritage (I)
Digital Heritage are resources of human knowledge or
expression, whether cultural, educational, scientific and
administrative, or embracing technical, legal, medical and
other kinds of information
Digital materials include texts, databases, still and
moving images, audio, graphics, software, and web
pages, among a wide and growing range of formats
[ http://portal.unesco.org/ci/en/ev.php-URL_ID=1539&URL_DO=DO_TOPIC&URL_SECTION=201.html,
http://portal.unesco.org/ci/en/files/13367/10700115911Charter_en.pdf/Charter_en.pdf ]
Daniel Olmedilla
19 Oct. 2007
8
Introduction & Motivation
UNESCO E-Heritage (II)
Born-digital heritage available on-line, including
electronic journals, World Wide Web pages or
on-line databases, is now part of the world’s cultural
heritage
Using computers and related tools, humans are creating
and sharing digital resources - information,
creative expression, ideas, and knowledge encoded for
computer processing - that they value and want to share
with others over time as well as across space
Daniel Olmedilla
19 Oct. 2007
9
Introduction & Motivation
UNESCO E-Heritage (& III)
The purpose of preserving the digital heritage is to ensure
that it remains accessible to the public. (…) . At the
same time, sensitive and personal information
should be protected from any form of intrusion.
Daniel Olmedilla
19 Oct. 2007
10
Introduction & Motivation
Focus of this talk
Search
• Personalized
of media
Rank
Information
• Access to sensitive
Resources
Intelligent Access
to Digital Heritage
Daniel Olmedilla
19 Oct. 2007
11
Introduction & Motivation
Information growth
In today's society, individuals and
organisations are, on one hand,
confronted with an ever growing
load of information and content
and, on the other, with increasing
demands for knowledge and skills.
To cope with this, we need to link
content, knowledge and learning,
making content and knowledge
more accessible, interactive and
usable over time by humans and
machines alike.
Daniel Olmedilla
19 Oct. 2007
12
Introduction & Motivation
Not only textual resources
Daniel Olmedilla
19 Oct. 2007
13
Introduction & Motivation
The 1 TB life (Gordon Bell)
1TB gives you 65+ years of:
100 email messages a day (5KB each)
100 web pages a day (50KB each)
5 scanned pages a day (100KB each)
1 book every 10 days (1 MB each)
10 photos per day (400 KB JPEG each)
8 hours per day of sound - e.g. telephone,
voice annotations, and meeting recordings (8 Kb/s)
 1 new music CD every 10 days (45 min each at 128 Kb/s)






It will take you 10 years to fill up your 160 GB drive
Want video? Buy more cheap drives (1 TB/year lets you
record 4 hours/day of 1.5 Mb/s video)
Daniel Olmedilla
19 Oct. 2007
14
Introduction & Motivation
Main Objectives
1. Search for textual and audiovisual content
2. Rank results according to relevance
3. Personalize such search and ranking
 Not all users are the same
 Find what they are interested in
4. While protecting private information and resources
Daniel Olmedilla
19 Oct. 2007
15
Outline
 L3S Background
 Introduction & Motivation
 Personalized Search & Ranking
 Privacy & Access Control
 EU Projects Summary
Daniel Olmedilla
19 Oct. 2007
16
Personalized Search & Ranking
Representing context by SW metadata
Metadata for resources can be
created by appropriate metadata
generators
Ontologies specify context
metadata for i.e.:




Emails
Files
Web pages
Publications
Metadata have to be applicationindependent!
 Store Metadata as RDF
Daniel Olmedilla
19 Oct. 2007
17
Personalized Search & Ranking
Personalization in the SW
 gather online information, integrate heterogenous sources,
syndicate according to user’s preferences
 embed resources with a personalized context
 enable users to choose which kind of personalized guidance in
what combination they appreciate as support (plug & learn)
Realization:
 semi-automated extraction of information from heterogenous
sources
 re-usable personalization algorithms reason about distributed
data sources (user data, course descriptions, ontologies, etc.)
 personalization rules reason about resources, e.g. to make
recommendations
[Baumgartner, Henze, Herzog. The Personal Publication Reader: Illustrating Web
Data Extraction, Personalization and Reasoning for the Semantic Web. ESWC’05 ]
Daniel Olmedilla
19 Oct. 2007
18
Personalized Search & Ranking
User Knowledge and Interests
Competence: “an effective performance within a domain / context
at different levels of proficiency”
Can be explicitly defined by the user or inferred automatically
Proficiency
Level
Competency
Context
Daniel Olmedilla
Competence
19 Oct. 2007
19
Personalized Search & Ranking
Expanding User Queries with Local Context
Score and extract
keywords
Top
query-dependent,
user-biased
keywords
Extract
query expansion or
re-ranking terms
User related documents
(desktop documents)
containing the query
[ Chirita, Firan, Nejdl. Summarizing local context to personalize global web
search. CIKM 2006 ]
Daniel Olmedilla
19 Oct. 2007
20
Personalized Search & Ranking
Data heterogeneity
Characteristics
 A lot of text (unstructured information)
 A lot of structures, e.g. title, author, creation-date, …
 Heterogeneity in structure
 Different holders (applications) use different schemas
 In nature, the structure of a domain is too complex for us to give it a
clear and certain definition
Classical Data Integration
 Transform data into a clear and uniform structure before we use it
 Intensive human intervention – very laborious and not scalable
Malleable Schema (X. Dong & A. Halevy ’05)
 Allow overlapping and vague elements to be defined in a single schema
Daniel Olmedilla
19 Oct. 2007
21
Personalized Search & Ranking
Malleable Schemas: Example Data
first name
xml search
Xml is the standard
for data exchange
…….
Person
title
body
sur name
Pan
author
Doc
author
False
John Gary
Isa book
subject
body
Dear Sergey, Please
find attached the file
…….
Daniel Olmedilla
name
Person
sender
My paper
Jack
email
writer
Isa paper
attachment
date
25.03.2006
Doc
True
contents
Desktop Search
We have many data
…….
19 Oct. 2007
22
Personalized Search & Ranking
Querying Malleable Schemas
first name
Person
…
Person
sur name
name
……
…
For example, user issue query:
Q1: Select Person Where first_name Contains “Philip”
To obtain the complete results, we should relax the query to:
Q2: Select Person Where first_name Contains “Philip”
Or name Contains “Philip”
A query has to be relaxed to related schema elements
But, how to discover the correlation between schema elements?
Daniel Olmedilla
19 Oct. 2007
23
Personalized Search & Ranking
Discover Schema Correlations (I)
Solution: find duplicates which use different attributes.
Observation:
1. more duplicates – better schema correlation discovery
2. more accurate schema correlations – better duplicate detection
Solution: Let schema correlation discovery and duplicate detection
reinforce each other to achieve improved results
Daniel Olmedilla
19 Oct. 2007
24
Personalized Search & Ranking
Discover Schema Correlations (& II)
title
E1
XML
E2
E3
E6
author
writer
Daniel
XML
DB
E4
E5
subject
Daniel
DB
Dec 2003
Jul 1994
Ullman
Stuart
Logic
Rec-date
Jan 1999
Ullman
AI
Pub-date
Nov 2001
Nov 2001
Stuart
Nov 2001
duplicates: {E1, E2}, {E3, E4}, {E5, E6}
attribute matches: {title, subject}, {author, writer}, {pub-date, rec-date}
[ Xuan Zhou, Julien Gaugaz, Wolf-Tilo Balke, Wolfgang Nejdl. Query Relaxation
Using Malleable Schema. SIGMOD’07 ]
Daniel Olmedilla
19 Oct. 2007
25
Outline
 L3S Background
 Introduction & Motivation
 Personalized Search & Ranking
 Privacy & Access Control
 EU Projects Summary
Daniel Olmedilla
19 Oct. 2007
26
Privacy & Access Control
Access Control in Open Systems (I)
Daniel Olmedilla
19 Oct. 2007
27
Privacy & Access Control
Access Control in Open Systems (& II)
Assumption: I already know you
 you have a local account!
Not a member?
Daniel Olmedilla
19 Oct. 2007
28
Privacy & Access Control
Policy Examples
 Give customers younger than 26 a 20% discount
 Up to 15% of network bandwidth can be reserved by
paying with an accepted credit card
 Customers can rent a car if they are 18 or older, and
exhibit a driving license and a valid credit
card
[ Bonatti, Olmedilla. Driving and Monitoring Provisional Trust Negotiation with
Metapolicies. IEEE Policies for Distributed Systems and Networks, 2005 ]
Daniel Olmedilla
19 Oct. 2007
29
Privacy & Access Control
Use Credentials
Daniel Olmedilla
19 Oct. 2007
30
Privacy & Access Control
Negotiations
Alice
Bob
Step 1: Alice requests a service from Amazon
Step 2: Amazon discloses its policy for the service
Step 3: Alice discloses her policy for VISA
Step 4: Amazon discloses its BBB credential
Step 5: Alice discloses her VISA card credential
Service
Step 6: Amazon grants access to the service
[Winsborough, Seamons, Jones. Automated trust negotiation. DARPA Information
Survivability Conference and Exposition, 2000 ]
Daniel Olmedilla
19 Oct. 2007
31
Privacy & Access Control
User awareness and Control
 Explain policies and system decisions
 Make rules & reasoning intelligible to the
common user
 Use natural language?
 “Academic users can download the files in folder
historical_data whenever their creation date
precedes 1942”
 Suitably restricted to avoid ambiguities
 Fortunately, users spontaneously formulate rules
Daniel Olmedilla
19 Oct. 2007
32
Privacy & Access Control
Cooperativeness & Verbalization
Suppose Alice's request is rejected
She may want to ask questions like:
 Why didn't you accept my credit card?
Other possible queries
 How-to queries
 What-if queries
 Would I get the special discount on financial
products X if I were locally employed?
[ Bonatti, Olmedilla, Peer. Advanced policy explanations on the web. ECAI 2006 ]
Daniel Olmedilla
19 Oct. 2007
33
Privacy & Access Control
Sample Screenshot (I)
Daniel Olmedilla
19 Oct. 2007
34
Privacy & Access Control
Sample Screenshot (& II)
Daniel Olmedilla
19 Oct. 2007
35
Outline
 L3S Background
 Introduction & Motivation
 Personalized Search & Ranking
 Privacy & Access Control
 EU Projects Summary
Daniel Olmedilla
19 Oct. 2007
36
EU Projects Summary
EU IP Nepomuk: Social Semantic Desktop
- Desktop: Help individuals in managing information on their PC
- Semantic: Make content available to automated processing
- Social:
Enable exchange across individual boundaries
Person
Email
Topic
Document
WebSite
Event
friend
acquaintance
Person
colleague
Image
Personal Semantic Web: a semantically enlarged
intimate supplement to memory
Daniel Olmedilla
Social protocols
NEPOMUK enabled
and distributed search
peers
19 Oct. 2007
37
EU Projects Summary
EU IP PHAROS
PHAROS will move forward audiovisual searching from a
point-solution search engine paradigm to an
integrated search platform paradigm.
PHAROS will integrate future user and search
requirements in a living laboratories for innovation
PHAROS partners are from 9 European Countries
and will integrate its development with their
nationally funded projects. SMEs, academia and large
industrial players will ensure maximum impact on
the business scenario
PHAROS will use an open approach in integrating
external experiences and contributions and exchange
results through the PHAROS Federation.
PHAROS
will
use
an
specifically-designed
management structure, integrating the different
PHAROS “streams”
Daniel Olmedilla
19 Oct. 2007
38
EU Projects Summary
EU NoE REWERSE
REasoning on the WEb with Rules and SEmantics
Web reasoning languages & processing
 Define set of reasoning languages
 Coherent
 Inter-operable
 Functionality and application independent
 For Advanced Web systems and applications
Advanced Applications as testbeds for languages
 Context-adaptive Web systems
 Web-based decision support systems
Daniel Olmedilla
19 Oct. 2007
39
EU Projects Summary
EU IP TENCompetence
Daniel Olmedilla
19 Oct. 2007
40
EU Projects Summary
L3S Project Leaders (http://www.L3S.de)
NEPOMUK (http://nepomuk.semanticdesktop.org/
Dr. Claudia Niederee
PHAROS - http://www.pharos-audiovisual-search.eu/
Dr. Bhaskar Mehta
REWERSE - http://rewerse.net/
Prof. Dr. Nicola Henze
TENCompetence - http://www.tencompentece.org/
Dr. Daniel Olmedilla
Daniel Olmedilla
19 Oct. 2007
41
Thanks !
Daniel Olmedilla
[email protected] - http://www.L3S.de/~olmedilla/
Daniel Olmedilla
19 Oct. 2007
42