Recommendation Systems - IKT Group

Download Report

Transcript Recommendation Systems - IKT Group

Institute of Informatics, Slovak Academy of Sciences
Michal Laclavík
Ladislav Hluchý
Partner Description
Institute of Informatics (II SAS) is one of more than 50 scientific and research institutes of Slovak
Academy of Sciences, Bratislava, Slovakia.
The scope of activities of II SAS includes scientific and research work in informatics, information
technology, control theory, robotics and artificial intelligence.
Organized in 7 departments:
– Parallel and distributed computing
• Intelligent and Knowledge oriented
Technologies Group
–
–
–
–
–
–
EDA Meeting
Design and diagnostics of digital systems
Numerical methods and algorithms
Speech analysis and synthesis
Electron beam lithography
Discrete processes modeling and control
Sensor systems
EADS, Paris, 24th February 2009
2
Primary Research Team & Capabilities
URL: http://ikt.ui.sav.sk
Dept. of Parallel and Distributed Computing
Research and Development Areas:
– Large-scale HPCN and Grid applications
– Intelligent and Knowledge oriented Technologies
Director & leader of PDC:
Dr. Dipl. Ing. Ladislav Hluchý
Experience from IST:
– 3 project in FP5: ANFAS, CrosGRID, Pellucid
– 6 project in FP6: EGEE II, K-Wf Grid, DEGREE
(coordinator), EGEE, int.eu.grid, MEDIGRID
– 4 projects in FP7: Secricom, Commius, Admire, EGEE III
Several National Projects (SPVV, VEGA, APVT)
IKT Group Focus:
–
–
–
–
Information Processing
Semantic Web
Knowledge oriented Technologies
Parallel and Distributed
Information Processing
– Multi-agent systems
Solutions:
–
–
–
–
EDA Meeting
Ontea: Pattern-based Semantic Annotation
ACoMA: KM tool in Email
EMBET: Recommendation System
AgentOWL: Agent library, model and methodology
EADS, Paris, 24th February 2009
3
IISAS Expertise
• Many Experience from European and National projects
– FP5,FP6 and FP7 EU projects
•
•
•
•
Multi agent systems: Pellucid, Secricom
Data Integration and Data mining: Admire
Large Scale Data processing: Crossgrid, EGEE, DEGREE, …
Simulation (Flood Application): CrossGrid, K-Wf Grid, …
– National project Raport
• Support for Military Training
– National project NAZOU
• Heterogeneous distributed data processing, integration including semantics
• Very good technology background
–
–
–
–
–
–
Data integration (related to Sensor Data fusion)
Large scale data processing
Simulation and Visualization
Multi-agent Systems
Semantics, social networks
Recommendation Systems
EDA Meeting
EADS, Paris, 24th February 2009
4
Relevant Projects
ADMIRE
(Advanced Data Mining and Integration Research for Europe)
•
•
•
•
7RP programme, ICT priority, 2008/03 – 2011/02, 6 institutions from EU
Accelerate access to and increase the benefits from data exploitation;
Deliver consistent and easy to use technology for extracting information and
knowledge;
Cope with complexity, distribution, change and heterogeneity of services, data, and
processes, through abstract view of data mining and integration; and
Provide power to users and developers of data mining and integration processes

UISAV’s role
•

Environmental dataspecific methods for data
integration

Tools applying
knowledge management
to data mining in SOA
environment

Pilot application, user
interface
EDA Meeting
EADS, Paris, 24th February 2009
6
RAPORT project
•
Experience with Military project
•
•
(2004-2007)
The goal of project was proposal of knowledge
management support system for organization of
military training.
•
•
Intelligent process management
Recommendation system
•
Organization of military exercise in a Centrum of
Simulation Technologies (CST) of National
Academy of Defense (NAD) in Liptovský
Mikuláš (Slovakia)
Exercise planning
Exercises
•
CST organizes several overlapping military
exercises during a training year
–
–
–
problems with large amount of documents
and e-mails
problems with knowledge management
problems with time management
Headquarters staff
Exercise management
A
B
C
Exercise
outline
D
E
F
CST
A
B
C
I
J
D
Exercise
planning
Dept. of technical
support
Dept. of military
aplications
Dept. of staff training
G
I
Exercise
execution
H
E
•
knowledge system based on e-mail
communications and a web portal
SW support
Activities
Execution
Context
Modification
Ontology
Technical support
Exercise
evaluation
Workflow
execution
EDA Meeting
EADS, Paris, 24th February 2009
7
Pellucid IST Project
Interaction Layer
Process Layer
• Title: Platform for
Organizationally
Mobile Public
Employees
• Duration: Sep 2002Dec 2004
• Agent Architecture
based on autonomous
co-operating agents
• Knowledge
Management to
support employees
• Workflow based
Administration
Processes
• To support Employee
Mobility in organization
Workflow Tracking/Management System
CONTEXT
&
ACTIONS
USERS
Pellucid
“core” PELLUCI
D
ACTIVE
HINTS
USER
FEEDBACK
Pellucid interface
EDA Meeting
EADS, Paris, 24th February 2009
8
Secricom: Seamless Communication for Crisis Management
 EU FP7 project: (2008-2011) FP7-218123
 Service platform for mobile devices
 Support for Intelligent Crisis
Management
 Securing legacy data
 Using Multi-agents and SOA
• Distributed Secure Agent Platform
(DSAP)
–
–
• Process Execution Agents (PEA)
–
–
Activity
initialization
–
Specific
requirements
–
Context
Generic plan
New agent
(Untrusted)
Server 1
Trusted
administrator
DS
Problem
Event’s
verification
–
Iniciator
DS
Trusted Server
Action
authorisation
Trusted
local contact
DS
–
Our trusted
domain
EDA Meeting
Public
keys
Servers
and
services
register
Trusted
users
Trusted
authority
Database of system users, agents and
their certificates.
Process of accreditation of agents.
• Resource Inquire System
(RIS)
–
Base of
agents
(Untrusted)
Server N
Communicating with users in a form of
guided dialog through electronic
device.
Including authentication and interface
to authorization of the user.
• Agent Repository (AR)
Event
Trusted
domain Z
Based on the plan collected from users
generating a plan of activities.
Executes the plan.
• User Communication Agent (UCA)
Specific plan
Trusted
domain A
The core agent platform.
Providing means for agent deployment,
execution, migration and
communication.
Provide information which system to
query for specific information.
EADS, Paris, 24th February 2009
9
K-Wf Grid Project
• Semantic Service Oriented Architecture
• Workflows of Web Services
EDA Meeting
EADS, Paris, 24th February 2009
10
NAZOU
•
•
Tools for acquisition, organization and maintenance of knowledge in an environment of heterogeneous
information resources
Slovak National project, duration: 2004-2007
•
Web => Documents => Text => RDF/OWL => XML + XSL => HTML
Stemmer
Lemmatizer
Document
Language
Doc
Converter
NALIT
Documents
URL
RIDAR
RFTS
OnTeA
Indexing Documents
Offers
URL
Corporate Memory
Internet
File
Management
ERID
URL
Relevance
Relational
Database
Management
Semantic and
Ontological
Management
Searching,
Management,
Presentation
URL
Documents
URL
HTML
Documents
EDA Meeting
Web
Crawler
Relational
Data
Ontological data Job Offers
RDB2Onto
JOP
• OnTeA - Ontology based Text
Annotation using Regular
Expression Patterns
• RDB2Onto – Relational database
data to ontology converter
• OSID - Offer Separation for Internet
Document
• RIDAR - Relevant Internet Data
Resource Identification
• SETH - SETH is a software effort to
deeply integrate Python with Web
Ontology Language (OWL-DL
dialect)
• WEBCRAWLER - WebCrawler
downloads recursively web pages.
Only relevant documents are
downloaded estimating relevance
using ERID.
EADS, Paris, 24th February 2009
11
Commius project
•
•
•
•
•
•
•
Commius IST FP7 project FP7-213876
Community-based Interoperability Utility for SMEs
(2008-2011)
Pattern based Information Extraction and
Semantic Annotation
External and Legacy System Integration
Large Scale Information Processing
Information and Knowledge Management
Social Networks Extraction
•
•
Addressing ISU like interoperability
Commius partners envision a future in which
SMEs enjoy a zero-cost entry into interoperability
based on non-proprietary protocols.
•
Systems Interoperability
–
•
Semantic Interoperability
–
–
•
interoperability over SMTP
Understanding communication and
documents exchanged
Process Interoperability
–
Understanding and supporting business
process from communication activities
EDA Meeting
EADS, Paris, 24th February 2009
12
Prototypes and Solutions
Agent Knowledge Model
• Used in:
– Business process management
– recommendation systems for user or agent modeling
• Based on Events, Resources, Actions, Actors, Context
• Formally Described using Sets, Description Logic (compatible with
OWL-DL), Graph Representation
• Actor Context updating function/algorithm
(Actor Environment State)
• Resources updating function/algorithm
(result of fulfilled actor goals)
EDA Meeting
EADS, Paris, 24th February 2009
14
AgentOWL Library
•
Agents with OWL ontology models using JADE agent
system and Jena
•
•
Support for OWL based Agent Knowledge Model
Support for XML-RPC connection to receive event and send
plain XML
Support for agent communication using FIPA ACL with OWL
and RDQL as content languages
Support for Presentation of Ontological Knowledge
(RDF/OWL => plain XML + XSL => HTML)
JADE and Jena Integration
Available on JADE official
website to MAS community
•
•
•
•
Multi Agent System
ACL
IIOP,
HTTP,
SMTP
User requests
Displaying results
XML, XML-RPC, SOAP
KM
Agent 1
External
System
FIPA ACL,
KIF, FIPA-SL, FIPA-RDF
Agent 3
Knowledge
Model
KM
Agent 2
Directory
Facilitator
EDA Meeting
Graphical User Interface
FIPA ACL,
RDF/OWL, RDQL
EADS, Paris, 24th February 2009
Knowledge Storage
Querying
Knowledge
Base
15
Ontea: Platform for Pattern Based Annotation
•
•
Input: Sensor data, log files or text
Output: Relation to semantic model
•
Text
–
•
•
Bratislava is the capital of Slovakia.
Slovakia is in Europe.
Pattern: “(in|by) + (the)? *([A-Z][a-z]+)” for Location
Ontea discovers key – value pair:
–
•
Location – Europe
By transformation to ontology knowledge base - it finds Europe
as continent using inference (sub-class of Location)
–
•
Continent – Europe
Features

Identification of concept instances from the
ontology

Automatic population of ontologies with
instances

Identifying relevance, when creating instances
using information retrieval techniques

Key-value pairs transofrmation

Integration with data from external systems

Large scale semantic annotation of documents
or texts using Google’s MapReduce
architecture.
More Examples are in the table:
#
Text
Key – value
Patterns – regular expressions
1
Apple, Inc.
Company: Apple
Company: ([A-Za-z0-9]+)[, ]+(Inc|Ltd)
2
Mountain View, CA 94043
Settlement: Mountain View
Settlement: ([A-Z][a-z]+[ ]*[A-Za-z]*)[ ]+[A-Z]{2}[ ]*[0-9]{5}
3
[email protected]
Email: [email protected]
Email: [-_.a-z0-9]+@[-_.a-zA-Z0-9]+\.[a-z]{2,8}
4
Mr. Michal Laclavik
Person: Michal Laclavik
Person: (Mr.|Mrs.|Dr.) ([A-Z][a-z]+ [A-Z][a-z]+)
EDA Meeting
EADS, Paris, 24th February 2009
16
Semantic Graphs processing
• Spread Activation
– Fast method for inference
– Logic based inference is
exponential and slow
– Used e.g. by IBM in Galaxy
Library
– Experiments on emails
– Experiments on wikipedia
links
• Discovery of relations between
articles
• 72 mil. links in network
• Transformation of graphs
• Exploration of related elements
• Use of Google’s MapReduce
architecture for large scale data
processing
EDA Meeting
EADS, Paris, 24th February 2009
17
Recommendation Systems
Objective:
Recommend and provide user
information or knowledge in context
Email based
Web based




EDA Meeting
Collaboration among users
Knowledge sharing
Active knowledge provision
Reuse of knowledge: notes and other
resources
EADS, Paris, 24th February 2009
18
Simulation of Floods, Visualization
EDA Meeting
EADS, Paris, 24th February 2009
19
Technology Expertise
• We work mainly with Open-Source Products, concretely:
• Technology
–
Linux, Java, Tomcat, Apache, PHP, JSP, XML, RDF
• Data Integration and Processing
–
–
MapReduce: Hadoop, Grid: Globus, gLite
OGSA-DAI
• Ontologies
–
–
–
–
Standards: RDF, OWL (OWL-DL)
Ontology Design and Definition: Protégé (many plug-ins OntoViz)
Methodology for Developing Knowledge Management Systems: CommonKADS
Knowledge Bases:
•
•
–
Sesame (primary RDF storage)
JENA Storage Interfaces (MEM model or RDBM model)
Onto Interfaces
•
•
JENA (SPARQL, RDQL)
Jena inference engine (rule based)
• Graph Processing
–
–
–
Spread Activation
Semantic Transformation
Back-edge heuristic for Steiner tree packing problem approximation
EDA Meeting
EADS, Paris, 24th February 2009
20
IISAS Contribution
• Good expertise from projects participation and technology background
•
•
•
•
Data Integration
Large scale data and information processing
Multi-agent Systems
Semantics and Graph Processing
• If fits with proposal
– Recommendation Systems
– Simulation and Visualization
EDA Meeting
EADS, Paris, 24th February 2009
21