Metadata Harvesting in Detail

Download Report

Transcript Metadata Harvesting in Detail

www.cdac.in
P2P Framework for Community Based
Creation, Semantic Annotation, Sharing and
Quality Assessment of Courseware for
Higher Technical Education
Dr. B.D. Chaudhury & Dr. Hemant Darbari
CSED, MNNIT-Allahabad & Applied Artificial Intelligence Group, CDAC-Pune
1
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
2
C-DAC/AAIG/Pune & MNNIT, Allahabad
© C-DAC & MNNIT 2010
www.cdac.in
Our Software Architecture
3
C-DAC/AAIG/Pune & MNNIT, Allahabad
Layer 1: Distributed and Federated Database
www.cdac.in
It Contains:
 Meta-data base
 Ontology base
 Knowledge Resource base
 Access log
 Base for user profiles
4
C-DAC/AAIG/Pune & MNNIT, Allahabad
Layer 1: Distributed and Federated Database
www.cdac.in
It also contains:
 Publication base
 Subscription base
 Base for event brokering
5
C-DAC/AAIG/Pune & MNNIT, Allahabad
Layer 2: Publish/Subscribe, Overlay Layer
www.cdac.in
It has three sub-layers:
 Sub-layer 1 : Overlay sub-layer
 Sub-layer 2 : Community Management sub-layer
 Sub-layer 3 : Publish/Subscribe sub-layer
6
C-DAC/AAIG/Pune & MNNIT, Allahabad
Layer 3: Service Layer
www.cdac.in
Provides Services for
 Distributed Ontology Creation
 Metadata Harvesting
 Inference Engine
 Multilingual Subscription/Publication Support
7
C-DAC/AAIG/Pune & MNNIT, Allahabad
Modules Involved for ACM Paper Simulation
• Metadata Extraction
www.cdac.in
• Metadata Harvesting
• Ontology Creation
• Knowledge Resource Creation & Semantic Net
• Inference Engine
• Multilingual Search Support
8
C-DAC/AAIG/Pune & MNNIT, Allahabad
ACM Paper on Data Mining Example for
Process Simulation
When a new discipline emerges it usually takes some time
and lots of academic discussion before concepts and terms
get standardised. Such a new discipline is text mining. In a
groundbreaking paper, Untangling text data mining, Hearst
[1999] tackled the problem of clarifying text-mining
concepts and terminology. This essay aims to build on
Hearst's ideas by pointing out some inconsistencies and
suggesting an improved and extended categorisation of
data- and text-mining techniques. The essay is a
conceptual study. A short overview of the problems
regarding text-mining concepts is given. This is followed
by a summary and critical discussion of Hearst's attempt to
clarify the terminology. The essence of text mining is
found to be the discovery or creation of new knowledge
from a collection of documents. The parameters of nonnovel, semi-novel and novel investigation are used to
differentiate between full-text information retrieval,
standard text mining and intelligent text mining. The same
parameters are also used to differentiate between related
processes for numerical data and text metadata. These
distinctions may be used as a road map in the evolving
fields of data/information retrieval, knowledge discovery
and the creation of new knowledge.
Authors
•
Jan H. Kroeze Department of Informatics,
School of IT, University of Pretoria, Pretoria,
0002
•
Machdel C. Matthee Department of
Informatics, School of IT, University of
Pretoria, Pretoria, 0002
•
Theo J. D. Bothma Department of Information
Science, School of IT, University of Pretoria,
Pretoria, 0002
Sponsors
•
Microsoft : Microsoft
•
ACM : Assoc. for Computing Machinery
Publisher
•
South African Institute for Computer Scientists
and Information Technologists , Republic of
South Africa
9
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Excerpts from ACM Paper on Data Mining “Differentiating data- and text-mining terminology
ACM Paper Example Simulation through
Overall Architecture
Publication
Base
Subscription
Base
(LAYER3)
Publish
(LAYER2)
Metadata Extraction
ACM
Paper on
Data
Mining
Metadata Harvesting
Ontology Creation
Event Brokering
Base
Knowledge
Resource
Ontology
Base
Knowledge & Semantic Net
User Access
History
Inference Engine
POS Tagging
Phrase Marking
Ontological Analysis
(LAYER3)
Semantic
Analysis
Search & Retrieval
User Profile
Searchable
Tokens
Publish/Subscribe
(LAYER2)
(Node1)
(Node2)
(Node n)
Parsing
GUI (LAYER 4) for
Publication, Subscription,
Search & Notification)
ACM
Paper on
Data
Mining
Notify
(LAYER2)
10
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Moderator Validation
Verification & Updation
Distributed & Federated DB (LAYER 1)
Metadata
Extraction
Automatic | Semi
Automatic
Metadata Extraction from ACM Paper
inde
x
File
Diges
t
(Has
h ID)
Title
Author
Keywords
Depart
ment
Publish
er
1
A1B2
C3….
.DD4
Differentiati
ng data- and
text-mining
terminology
1. Jan H.
Kroeze
IR, algorithms,
database queries,
documentation,
full-text retrieval,
information
retrieval,
knowledge
creation,
knowledge
discovery,
knowledge
management,
languages,
measurement,
metadata, text
data mining, text
mining, textmining
Departm
ent of
Informat
ics,
School
of IT,
Universi
ty of
Pretoria,
Pretoria,
0002
South
African
Institute
for
Comput
er
Scientist
s and
Informat
ion
Technol
ogists ,
Republi
c of
South
Africa
2. Machdel
C. Matthee
3. Theo J.
D. Bothma
C-DAC/AAIG/Pune & MNNIT, Allahabad
Sponsor
No: Of
Downlo
ads in
last 12
months
336
Citation
Count
www.cdac.in
When a new discipline emerges it usually takes some time and lots of academic discussion
before concepts and terms get standardized. Such a new discipline is text mining. In a
groundbreaking paper, Untangling text data mining, Hearst [1999] tackled the problem of
clarifying text-mining concepts and terminology. This essay aims to build on Hearst's ideas
by pointing out some inconsistencies and suggesting an improved and extended categorization
of data and text-mining techniques. The essay is a conceptual study. A short overview of the
problems regarding text-mining concepts is given. This is followed by a summary and
critical discussion of Hearst's attempt to clarify the terminology. The essence of text mining
is found to be the discovery or creation of new knowledge from a collection of documents.
The parameters of non-novel, semi-novel and novel investigation are used to differentiate
between full-text information retrieval, standard text mining and intelligent text mining.
The same parameters are also used to differentiate between related processes for numerical
data and text metadata. These distinctions may be used as a road map in the evolving fields
of data/information retrieval, knowledge discovery and the creation of new knowledge.
0
Microsoft
Assoc. for
Computing
Machinery
(ACM)
11
www.cdac.in
Metadata Harvesting & Knowledge Resources
Extraction from ACM Paper
12
C-DAC/AAIG/Pune & MNNIT, Allahabad
Metadata Harvesting & Knowledge Resources
Extraction from ACM Paper
•
IR
–
–
–
•
knowledge management
–
–
•
full-text retrieval
information retrieval
database queries
www.cdac.in
When a new discipline emerges it usually takes some
time and lots of academic discussion before concepts
and terms get standardized. Such a new discipline is text
mining. In a groundbreaking paper, Untangling text
data mining, Hearst [1999] tackled the problem of
clarifying text-mining concepts and terminology. This
essay aims to build on Hearst's ideas by pointing out
some inconsistencies and suggesting an improved and
extended categorization of data and text-mining
techniques. The essay is a conceptual study. A short
overview of the problems regarding text-mining
concepts is given. This is followed by a summary and
critical discussion of Hearst's attempt to clarify the
terminology. The essence of text mining is found to be
the discovery or creation of new knowledge from a
collection of documents. The parameters of non-novel,
semi-novel and novel investigation are used to
differentiate between full-text information retrieval,
standard text mining and intelligent text mining. The
same parameters are also used to differentiate between
related processes for numerical data and text metadata.
These distinctions may be used as a road map in the
evolving fields of data/information retrieval,
knowledge discovery and the creation of new
knowledge.
knowledge creation
knowledge discovery
text mining
–
–
metadata
text data mining
13
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Knowledge Net and Semantic Net of Extracted
Data from ACM Paper
14
C-DAC/AAIG/Pune & MNNIT, Allahabad
Knowledge Net and Semantic Net of Extracted
Data from ACM Paper
Data
Processing
Knowledge Net
Information
Retrieval
Information
Retrieval
Semantic
Search
Knowledge Net
Knowledge
Acquisition
Knowledge
Creation
Information
Creation
Information
Disseminatio
n
Knowledge Net
Semantic
Net
www.cdac.in
Knowledge
Management
Text Data
Mining
Text Mining
Meta Data
Extraction
Text
Processing
15
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Ontology of Concepts Creation on
ACM Paper on Data Mining
16
C-DAC/AAIG/Pune & MNNIT, Allahabad
Domain Ontology Creation on ACM Paper on
Data Mining
Knowledge Discovery
has
Knowledge Creation
is-a
is-a
Instance
Processing Task
Task
is-a
Feature Value
Computation Task
is-a
is-a
is-a
Data Processing Task
is-a
Feature Processing
Text Mining
Data Mining
Clustering Task
Descriptive Modeling
Task
is-a
is-a
is-a
is-a
Subgroup
Discovery
Task
Probability Estimation
Task
Association
Discovery Task
Instance
Normalization Task
is-a
is-a
Instance
Normalization Task
has model parameter
Pattern Discovery
Task
www.cdac.in
is-a
is-a
produces
SVM Model
Support Vector
Machine
produces
Information
Retrieval
17
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Details of User Access Pattern & History An
Input for Behavior Mining for Dynamic
Community Creation and Quality Assessment of
Courseware on ACM Paper
18
C-DAC/AAIG/Pune & MNNIT, Allahabad
User Profile & Access History on ACM
Paper
Comment 1: A
conceptual essay on
Text Mining, a must
read for beginners
User 1 (Professor:John Smith)
Comment 2: The paper
discusses Hearst’s attempt to
clarify concepts in Text
Mining like text metadata,
standard text mining etc
User
User
Role
User 2 (Researcher: Mary
Susan)
www.cdac.in
When a new discipline emerges it usually takes some time
and lots of academic discussion before concepts and terms
get standardized. Such a new discipline is text mining. In a
groundbreaking
paper, Untangling text data mining,
19
Hearst [1999] tackled the problem of clarifying textmining concepts and terminology. This essay aims to build
on Hearst's ideas by pointing out some inconsistencies and
suggesting an improved and extended categorization of
data and text-mining techniques. The essay is a conceptual
study. A short overview of the problems regarding textmining concepts is given. This is followed by a summary
and critical discussion of Hearst's attempt to clarify the
terminology. The essence of text mining is found to be the
discovery or creation of new knowledge from a collection
of documents. The parameters of non-novel, semi-novel
and novel investigation are used to differentiate between
full-text information retrieval, standard text mining and
intelligent text mining. The same parameters are also used
to differentiate between related processes for numerical
data and text metadata. These distinctions may be used as
a road map in the evolving fields of data/information
retrieval, knowledge discovery and the creation of new
knowledge.
Area-ofinterest
Comment
s
1
Professor
Data Mining
Comments1
2
.
.
.
n
Researcher
Data Mining
Comments2
19
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
NLP process description for Multilingual Semantic Search and Retrieval
on ACM paper
20
C-DAC/AAIG/Pune & MNNIT, Allahabad
NLP based Multilingual Semantic
Search using Inference Engine
The ACM paper on Data Mining taken as example can be
searched with English/Hindi query with three types of
queries:
www.cdac.in
 Content level search
 Meta data search
 Ontology search
21
C-DAC/AAIG/Pune & MNNIT, Allahabad
Examples of Queries on ACM Paper on
Data mining
Query on
Metadata
Jan H Kroeze’s paper on Data
Mining.
www.cdac.in
Query on
Ontology
Paper in NLP related to Information
Retrieval.
Query on
Content
Information Retrieval paper with Information
Dissemination.
22
C-DAC/AAIG/Pune & MNNIT, Allahabad
Query on Metadata
POS Tagging
Behavioral
Mining
When a new discipline emerges it usually takes some time and lots of academic
discussion before concepts and terms get standardized. Such a new discipline is text
mining. In a groundbreaking paper, Untangling text data mining, Hearst [1999]
tackled the problem of clarifying text-mining concepts and terminology. This essay
aims to build on Hearst's ideas by pointing out some inconsistencies and suggesting
an improved and extended categorization of data and text-mining techniques. The
essay is a conceptual study. A short overview of the problems regarding textmining concepts is given. This is followed by a summary and critical discussion of
Hearst's attempt to clarify the terminology. The essence of text mining is found to
be the discovery or creation of new knowledge from a collection of documents.
The parameters of non-novel, semi-novel and novel investigation are used to
differentiate between full-text information retrieval, standard text mining and
intelligent text mining. The same parameters are also used to differentiate between
related processes for numerical data and text metadata. These distinctions may be
used as a road map in the evolving fields of data/information retrieval, knowledge
discovery and the creation of new knowledge.
Phrase
Marking
Ontology
Analysis
Anaphora
Resolution
Parsing
Semanti
c
Analysi
s
Decision Support
System (DSS)
Metadata Query
Notified
Multi-lingual
Semantic Search
Inference
Query Attribute Checking
Engine
Author Jan H Kroeze
Title Differentiating data-
Distributed & Federated DB
Metadata
index
Semantic Knowledge
Net
ACM
Paper on
Data
Mining
1
File Digest
(Hash ID)
Title
A1B2C3…..D
D4
Differentiating data- and text-mining
terminology
Author
1.
Kroeze
Jan H.
Keywords
Department
Publisher
IR, algorithms, database queries, documentation, full-text
retrieval, information retrieval, knowledge creation,
knowledge discovery, knowledge management,
languages, measurement, metadata, text data mining, text
mining, text-mining, theory
Department of
Informatics, School of
IT, University of
Pretoria, Pretoria, 0002
South African Institute
for Computer Scientists
and Information
Technologists ,
Republic of South
Africa
Sponsor
No: Of Downloads in
last 12 months
336
Citation Count
and text-mining
terminology
0
Microsoft
2. Machdel C. Matthee
Ontology
3. Theo J. D. Bothma
Assoc. for Computing Machinery
(ACM)
User Profile/Access
History
C-DAC/AAIG/Pune & MNNIT, Allahabad
23
www.cdac.in
Jan H Kroeze’s paper
on Data Mining
Query on Ontology
Paper related to IR in
NLP
POS Tagging
Behavioral
Mining
Phrase
Marking
Anaphora
Resolution
Ontology Query
When a new discipline emerges it usually takes some time and lots of academic
discussion before concepts and terms get standardized. Such a new discipline is text
mining. In a groundbreaking paper, Untangling text data mining, Hearst [1999]
tackled the problem of clarifying text-mining concepts and terminology. This essay
aims to build on Hearst's ideas by pointing out some inconsistencies and suggesting
an improved and extended categorization of data and text-mining techniques. The
essay is a conceptual study. A short overview of the problems regarding textmining concepts is given. This is followed by a summary and critical discussion of
Hearst's attempt to clarify the terminology. The essence of text mining is found to
be the discovery or creation of new knowledge from a collection of documents.
The parameters of non-novel, semi-novel and novel investigation are used to
differentiate between full-text information retrieval, standard text mining and
intelligent text mining. The same parameters are also used to differentiate between
related processes for numerical data and text metadata. These distinctions may be
used as a road map in the evolving fields of data/information retrieval, knowledge
discovery and the creation of new knowledge.
Ontology
Analysis
Semanti
c
Analysi
s
Inference
Query Attribute Checking
Engine
Notified
ACM
Paper on
Data
Mining
Knowledge Management
Knowledge Discovery
Distributed & Federated DB
Knowledge Creation
Metadata
Index
Semantic Knowledge
Net
Ontology
User Profile/Access
History
1
2
.
.
.
n
Domain
name
ACM
Paper on
Data
Mining
C-DAC/AAIG/Pune & MNNIT, Allahabad
Ontology
concepts
Knowledge
Management
Metadata
Ontology tree
Know
ledge
Know
Mana
ledge
Know
geme
Disco
ledge
Text
nt
very
Minin
Text Creati
Metad
g
Data on
ata
Infor
Minin
matio
Full
g
n
Text
Retrie
Retrie
val
val
Text Mining
Text Data Mining
Metadata
Information Retrieval
Full Text Retrieval
24
www.cdac.in
Parsing
Decision Support
System (DSS)
Query on Content
Paper related to Information
Retrieval in NLP
POS Tagging
Behavioral
Mining
Phrase
Marking
Anaphora
Resolution
Ontology Query
When a new discipline emerges it usually takes some time and lots of academic
discussion before concepts and terms get standardized. Such a new discipline is text
mining. In a groundbreaking paper, Untangling text data mining, Hearst [1999]
tackled the problem of clarifying text-mining concepts and terminology. This essay
aims to build on Hearst's ideas by pointing out some inconsistencies and suggesting
an improved and extended categorization of data and text-mining techniques. The
essay is a conceptual study. A short overview of the problems regarding textmining concepts is given. This is followed by a summary and critical discussion of
Hearst's attempt to clarify the terminology. The essence of text mining is found to
be the discovery or creation of new knowledge from a collection of documents.
The parameters of non-novel, semi-novel and novel investigation are used to
differentiate between full-text information retrieval, standard text mining and
intelligent text mining. The same parameters are also used to differentiate between
related processes for numerical data and text metadata. These distinctions may be
used as a road map in the evolving fields of data/information retrieval, knowledge
discovery and the creation of new knowledge.
Ontology
Analysis
Semanti
c
Analysi
s
Inference
Query Attribute Checking
Engine
Notified
Knowledge Net
ACM
Paper on
Data
Mining
Synonym
Data Processing
Distributed & Federated DB
Synonym
IR
Metadata
Information Retrieval
Synonym
Semantic Knowledge
Net
ID
Concept
Synonym
1
Information
Retrieval
Semantic
Search
12.0
2
Information
Retrieval
IR
12.1
3
Text Mining
Metadata
Extraction
16.0
4
Text Mining
Text
Processing
16.1
Ontology
User Profile/Access
History
C-DAC/AAIG/Pune & MNNIT, Allahabad
Semantic index
Semantic Search
25
www.cdac.in
Parsing
Decision Support
System (DSS)
www.cdac.in
Distributed Ontology Creation Details
26
C-DAC/AAIG/Pune & MNNIT, Allahabad
Distributed Ontology Creation Details

Domain specific ontology creation by ontology experts in
P2P community in distributed fashion.

The domain of content is Computer Science oriented
research papers and research notes of different file
types.
27
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Distributed Ontology Creation
Distributed Ontology Creation Details

For large scale enactment of above points without third
party web servers requiring periodical maintenance as
our P2P network being autonomous and maintenance
free.

This distributed and collaborated manner of creating
ontologies is for enhancement of search, knowledge
enhancement and quality assessment .
28
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Why Distributed Ontology Creation
Distributed Ontology Creation Details
www.cdac.in
 The ontology is created out
of content or resources
published as well as from
the profiles and usage
patterns (component of
behavior mining).
 The metadata or references
fetched using portal level,
community level and user
level information. The user
oriented or personalization
ontology is created as
shown in the figure.
29
C-DAC/AAIG/Pune & MNNIT, Allahabad
Distributed Ontology Creation Details
www.cdac.in
An Example of Distributed Ontology
30
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Domain ontology example in engineering
domain
31
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Metadata Harvesting in Detail
32
C-DAC/AAIG/Pune & MNNIT, Allahabad
Metadata Harvesting in Detail
In interest based communities some experts may
come forward to work in collaborative manner for
www.cdac.in
knowledge resource generation and sharing that
is metadata harvesting.
33
C-DAC/AAIG/Pune & MNNIT, Allahabad
Metadata Harvesting in Detail
Users may take the role of annotators to
enhance the knowledge resources by some
metadata to support advance search
functionalities and quality assessment.
Some members of community can work as
ontology experts to generate domain specific
ontology in distributed fashion.
34
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Metadata harvesting process:
Metadata Harvesting in Detail
Advantage of Metadata harvesting

A P2P network is more suitable for large scale enactment of
above activities without the need for third party Web servers,
often
require
considerable
management
and
maintenance effort whereas P2P networks operate in an
autonomous
and
spontaneous
way
with
minimal
management overhead.
35
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
which

The “harvesting process” relies on the metadata produced by humans
or by full or semi-automatic processes supported by software.

For example, Web editing software and selected document software
automatically produce metadata at the time a resource is created or
updated for “format,” “date of creation,” “revision date,” without
human intervention.

Software can also support a semi-automatic approach to metadata
creation by presenting a person with a “template” that guides the
manual input for “keywords” and “description” metadata, and
additional metadata.

The software automatically converts the metadata to META tags (or
another tagged form depending on the document format) and places
them in the resource header.
36
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Metadata Harvesting in Detail
Metadata Harvesting in Detail
www.cdac.in
Different sources of meta-data harvesting
37
C-DAC/AAIG/Pune & MNNIT, Allahabad
Metadata Harvesting in Detail
www.cdac.in
• Metadata Harvesting process:
38
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Theoretical and Architecture Details
of Multi-lingual Semantic Search with
Inference Engine (i-Quester)
39
C-DAC/AAIG/Pune & MNNIT, Allahabad
Overall System Features of i-Quester: Multilingual
Semantic Search with Inference Engine
 Contains semantic search for most relevant retrieval of data from
distributed information of peers
www.cdac.in
 Semantic search is empowered with strong domain ontology, metadata
lineage where even in a pragmatic context relation the query will relate
to all inter-linked information like a network of language connotations,
called semantic-net
 The inputs for semantic search is inter-dependent on inference engine
and domain ontology
40
C-DAC/AAIG/Pune & MNNIT, Allahabad
Overall System Features of i-Quester –The
Multilingual Semantic Search
Overall System Features of proposed System
 Behavioral pattern of users are auto-analyzed to form semantic indexing
for most relevant search in most distant and remotely referenced
information in distributed peer-architecture of nodes and super nodes.
 Multi-lingual Query handling and retrieval
 Text-Audio-Image-Video various format support are given for semantic
search
41
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
 Strong Decision Support System (DSS) features and components of AI
models are integrated in Inference Engine to aid to the i-QUester
www.cdac.in
Architecture of i-Quester on Distributed
Courseware
42
C-DAC/AAIG/Pune & MNNIT, Allahabad
i-QUester Components
i-QUester has following components
www.cdac.in
 Inference Engine
 Multi-lingual Semantic Search
43
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
i-Quester Layers Overview
44
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Inference Engine Nuances
45
C-DAC/AAIG/Pune & MNNIT, Allahabad
Inference Engine
An
inference engine is a computer program that tries to
derive answers from a knowledge base.
is the "brain" that expert systems use to reason about the
information in the knowledge base for the ultimate purpose of
formulating new conclusions.
Inference
engine is based on domain specific Ontology and Semantic-
web.
The
inference engine is based on the behavioral analysis of users. It
consists of semantic- pragmatic connotations like usage context on
inferences.
User profile collection and comparison is one of the intrinsic features of
inference.
It provides support to multi-lingual semantic search
46
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
 It
Inference Engine
 Anaphora resolution: It connects references of different metadata,
content level knowledge-net and semantic-net and ontologies so
that inter-connected fashion the three levels of search, i.e.,
Metadata search, Ontology search and content search
Input data
Anaphora
Resolution
Semantic
annotation
AI based DSS
Inference
47
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
 Consists of three components : Anaphora resolution, Semantic
Annotation and AI based Decision Support System.
Inference Engine
Semantic annotation: Semantic
fetched from two layers
annotations
are
information, their synonyms, hyponyms, hypernyms etc.
 Semantic annotations drawn from references related to
connection between meta-data and ontologies created out of the
contents.
48
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
 Knowledge-net and semantic-net comprising of content level
Inference Engine Details
AI( Artificial Intelligence) based DSS( Decision Support
Systems):
 The behavior pattern of users that are derived from usage pattern,
profile analysis and the information drawn from other two layers
www.cdac.in
of inference engine, i.e. Semantic Annotations and Anaphora
resolution. Based on this two significant things are achieved:
- Dynamic community creation
- Quality assessment of contents
49
C-DAC/AAIG/Pune & MNNIT, Allahabad
Inference Engine Details
 The inference engine as described above uses the behavior mining
analyzing the user data patterns and usages.
 The inputs of behavior mining are used in semantic search as well
for locating the content from user search query. The entire
process of entanglement of semantic-search and inference engine
has been shown in following diagram.
50
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
 The inference engine takes input from behavior mining.
Inference Engine Details
The following is the example of Inference engine working over LOM (
Learning object metadata)
 It draws information from user queries , match with Learning
object metadata (LOM) ontology, semantic-net & knowledgewww.cdac.in
net are used (ontology and concept mapping) and finally it
creates the metadata as output of inference engine. This can
be searched by the user as well as used to refine user queries.
Example :
 What is the most searched article for NLP with reference to
search?
51
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Inference Engine Architecture
52
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Inference Engine Process flow
LOM: Learning object Metadata
53
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Behavior Mining with Dynamic
Community Creation and Content
Quality assessment Nuances
54
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
The Overall Architecture of Dynamic Community
Evolution for P2P with Behavior Mining
55
C-DAC/AAIG/Pune & MNNIT, Allahabad
Behavior Mining
www.cdac.in
Behavior Mining Architecture
56
C-DAC/AAIG/Pune & MNNIT, Allahabad
User Access History
Architecture of User Access History
User n
Application
Logging
Pattern
Log
files
Discovery
www.cdac.in
User 1
Rules, Patterns
& Statistics
Processing data
User Identification
Pattern
Analysis
57
C-DAC/AAIG/Pune & MNNIT, Allahabad
Access Pattern History & User Profile




Access Pattern History mainly focuses on ‘demand-side’ of
Semantic search, i.e. interpreting user queries and studying their
information needs.
Maintaining logs of user behaviors – browsing patterns and
transaction data.
Assigning search queries to one or more predefined categories
based on its topic to provide better search results in terms of
efficiency & accuracy.
Developing Tools like Pattern Discovery & Pattern Analysis
Maintaining information about user’s educational background, their
interest areas.
58
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in

www.cdac.in
Behavior Mining Attributes and determining
factors
59
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Behavior Mining on Courseware Quality
Assessment
60
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Courseware Quality Assessment with
Behavior Mining
61
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Behavior Mining on LOM
62
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Multi-lingual Semantic Search
63
C-DAC/AAIG/Pune & MNNIT, Allahabad
Features of Semantic Search
Semantic Search using NLP search engine searches on domain ontology
and inference engine created semantic annotations for apt information
linking and retrieval in a distributed network traversing through
sub/super layer.

Cross-lingual support with sense translator of multi-lingual query

Along with texts, audio, image and video retrieval are also facilitated

www.cdac.in

Relevancy ranking and retrieved information linking through semanticpragmatic annotations
64
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Multi-lingual Semantic Search
Architecture
65
C-DAC/AAIG/Pune & MNNIT, Allahabad
Multilingual Search with Ontology
www.cdac.in
Ontology based Multi-lingual Search Process
66
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Multi-lingual Audio-video Search
67
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Linguistic Process in NLP with
Video Search
68
C-DAC/AAIG/Pune & MNNIT, Allahabad
Multilingual Search Support

System takes as input a formal
query.
This query could be generated
from a keyword query, a
natural language query, a formbased interface where the user
can explicitly select ontology
classes and enter property
values or more sophisticated
search interfaces.
www.cdac.in

69
C-DAC/AAIG/Pune & MNNIT, Allahabad
Multilingual Search Support
The NLP oriented semantic search with multi-lingual support
will have following features from query side:


Ontology cross linking in English and Hindi so that whether
Hindi/English query given it can fetch
Content level matching using NLP knowledge-net/semantic-net
will contain words and their all possible meanings and linking.
Meta-data will have also cross-linkages for multi-lingual query.
70
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in

Multilingual Search Support
The semantic search in multi-lingual level comprises of a semantic level
query analysis and retrieval as described in following diagrams:


It consists of stages of semantic search with all the NLP layers involved.
i.e. POS tagging, Phrase Marking, Ontology Analysis, Parsing and
semantic analysis.
www.cdac.in

It is also connected to inputs from Ontology, metadata and inference
engine.
The output is comprising of searchable tokens and annotations to be
fetched from annotated extracted information (metadata, ontology and
content level knowledge-net and semantic-net)
71
C-DAC/AAIG/Pune & MNNIT, Allahabad
Multilingual Search Support
NLP Process Involved
POS Tagging
www.cdac.in
Phrase Marking
Ontology Analysis
Parsing
Semantic
Analysis
72
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
Linguistic Process in NLP Search
73
C-DAC/AAIG/Pune & MNNIT, Allahabad
www.cdac.in
74
C-DAC/AAIG/Pune & MNNIT, Allahabad
© C-DAC & MNNIT 2010
www.cdac.in
Thank You !
75
C-DAC/AAIG/Pune & MNNIT, Allahabad