Diapositiva 1

Download Report

Transcript Diapositiva 1

An Introduction to Semantic
Technology for SharePoint
Administrators
Who We Are
Leader in the development of semantic software,
used by organizations to make information mangement
more efficient, and to gain strategic knowledge through
the automatic comprehension of text.
2
Our Customers
3
Why are we here?
 Greater demand for information in the decision making process
 Ever increasing volumes of data to be considered every day
(documents, emails, web pages, social media and “Big Data”)
 Traditional technologies increasingly expected to manage and process
information
But, most organizations are not taking advantage
of all of their data.
4
Ultimately, we are here to create
value from information
INCREASED SALES
Increase sales and customers
Increase customer satisfaction
Increase competitive advantage through the
monitoring of markets and innovations
Enhance brand value with targeted social media
analysis
Simplify the organization and recovery of data
Improve internal knowledge sharing
More timely and effective customer interactions
Reduce the time and costs of traditional customer
assistance
5
REDUCED COSTS
So much data, so little time
For organizations that use SharePoint, it is their primary means of collaboration.
These organizations have invested a tremendous amount of time and money to connect
their employees and their data to improve communication and workflow.
Despite this investment, users still spend massive amounts of time searching for relevant
data and content.
Search is AN ABSOLUTE FAILURE for large SharePoint deployments. The bigger the farm,
the less useful and less relevant search becomes.
6
WHY?
The majority of enterprise content is unstructured in the form of electronic documents,
emails, forms, etc. Searching through the textual portion of unstructured content can be a
daunting task as it is highly likely the search operation will return a large number of
possible results.
Further, people are generally searching for content inside content and inter-relationships
between content– which complicates search even more.
Most companies are guilty of one or more of the following:
 Underutilization of features
 Lack of clear requirements or vision
 Not using metrics to gauge feature usage and adoption
 Understaffing to properly support the platform
7
The problems with unstructured data
Extraction and categorization are used to
structure unstructured data and make the
retrieval and management of information
more effective
Taxonomy and text mining rules are often
dependent on specific business needs and
influenced by market sector and project
objectives
Organizations need flexible solutions that are
easily integrated and customizable, and
capable of responding to specific requirements
for extraction and categorization
8
SharePoint technologies for managing information
Technology is a key factor in managing unstructured information.
There are different approaches for managing unstructured information:
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
•
Keyword-based plus statistical elements
•
Shallow linguistics
•
Semantic technology
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Keyword Based
Text is divided into single words that are inserted in an alphabetical index,
with no understanding of content:
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Az IBM szokásosan nagy hangsúlyt helyez a továbbképzésre, így
munkatársai évente számos szakmai tanfolyamon vesznek részt. Az elmúlt
években a csoport több tagja is részt vett több hónapos, egyesült
államokbeli, angliai illetve németországi projekt munkákban, melyek
során nemzetközi csoportban végeztek fejlesztői tevékenységet.
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Shallow Linguistics
• Words in the text are recognized as either belonging to the dictionary,
or not.
INGOMBRO EVENTUALE
• Acknowledged words are linked to the basicIPOTESI
headword
and a
SCREENSHOT
grammatical type is assigned.
• Some logical groupings are made.
• Indexes contain headwords and keywords.
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Semantics
• Simulate a human’s process of text analysis.
• Morphological analysis, parsing, sentence and semantic analysis allow
IPOTESI INGOMBRO
EVENTUALE
the extraction of large amounts of information
and
work
from a
SCREENSHOT
conceptual point of view (thanks to the semantic network).
• Document indexing creates a set of words, headwords, concepts,
relationships, subjects and structures (cognitive/conceptual map).
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Why Does Semantic Technology Excel?
The answer is in primary measures of Information Management:
#1 – Precision (a measure of exactness)
IPOTESI INGOMBRO EVENTUALE
Retrieving a high level of accurate
results
SCREENSHOT
that are relevant to your search.
#2 – Recall (a measure of completeness)
Retrieving a high percentage of relevant documents.
Locating what applies.
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Keyword and statistics (math based) technology can achieve one, but not both.
Why is Semantics Important?
...because language is too ambiguous.
Same word – Different meanings
Jaguar = car IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
jaguar = animal
Jaguars = football team
Different words – Same meaning
Disability Legislation = Equal Opportunity Law
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Different words – Related meanings
Organization = company
Organization = trade union
Organization = charity
What Does Properly Analyzed Mean?
4 Requirements
Definition
Morphological Analysis
understand word forms
Example
dog, dog-catcher and doggy-bag are closely
related.
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
"There
are 40 rows in
Grammatical Analysis
understand the parts of speech
Logical Analysis
understand how words
relate to other words
Semantic Analysis (disambiguation)
understand the context
of key words
the table" uses rows as
a noun, vs. "She rows 5 times a week" uses
rows as a verb.
“Davey Jones, represented by attorney
Daniel Stanley, is married to Rebecca
Carter". Rebecca is married to Davey, not
Daniel.
"I used
chickenEVENTUALE
broth for my soup stock"
IPOTESI
INGOMBRO
SCREENSHOT
stock in the context of food, vs. "The
uses
company keeps lots of stock on hand" uses
stock in the context of inventory.
Using Semantic Technology
morphological analysis
parsing
sentence analysis
sentence analysis
[ ]
semantic network
concepts,
domain
ontologies,
places,
companies,
products,
people
16
Adding value to information with syncons and
links
A syncon coincides with a node of the semantic net;
each is connected to other syncons by specific
semantic relations (= link) that develop a
hereditary hierarchical structure.
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
This structure allows every node to inherit
characteristics from nearby nodes, thus enriching
itself with information.
Information inherently contains different kinds of
links:
•
hypernymy link (is a/type of)
•
meronymy link (has a/part of)
•
geographical link
•
linguistic relations link (subject/verb,
verb/object)
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Ordering principles
Links, which identify the semantic relationships between syncons, are the
ordering principles.
IPOTESI INGOMBRO EVENTUALE
Syncons may contain:
SCREENSHOT
• single headwords (‘set', ‘vacation‘, ‘work', ‘quick‘, ‘more')
• compounds ('non-stop', 'abat-jour', ‘policeman')
• collocations (‘credit card', ‘university degree', ‘go forward‘)
A syncon has the following main elements:
• word class (noun, verb, adjective, adverb)
• semantic relations (link)
• gloss (explanation of meaning)
• domain, register and frequency
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Links
The link supernomen/subnomen concerns the relationship between a
specific concept and a more general one.
INGOMBRO
EVENTUALE
A supernomen is the more general term; it is aIPOTESI
word
that
has a general
SCREENSHOT
meaning compared to those that represent a specification of the same
meaning.
EXAMPLES
• Dog – hunting dog – Irish terrier
• Habitation – flat – two-roomed flat
• Computer – portable computer – palmtop
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Links
The link superverbum/subverbum is one of the semantic relationships
that link verb syncons together. This link is the equivalent for verbs
compared to what link supernomen/subnomen is for nouns.
EXAMPLES
• Eat – nibble at, eat listlessly
• Sleep – doze, snooze
• Walk – limp
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Links
The link omninomen/parsnomen is a “part/all” semantic relationship. A
parsnomen is a term that indicates a part of something (omninomen).
EXAMPLES
• Limb – hand – finger
• House – bathroom – washbasin
• Tree – trunk – bark
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Parsing
Parsing is a complete morphological, grammatical and syntactical analysis
of a sentence, quickly applying many thousands of rules.
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Parsing identifies every element of a text, assigning each to the
appropriate logical and grammatical function.
Disambiguating Text
For a human, the meaning of a word is clear because the surrounding
elements help him understand the sense in which the word is used.
Software needs an unambiguous word interpretation
represented
by a
IPOTESI INGOMBRO
EVENTUALE
SCREENSHOT
reference system that is equivalent to the human
world experience.
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
If correctly trained in human common sense, the computer can achieve
logical world comprehension and join it with its own memory and
computing power.
Disambiguating
Disambiguating analyzes single sentences or whole documents and finds the
correct meaning for each element by removing every ambiguity.
“Reasoning” takes place which identifies the different
meanings of all
IPOTESI INGOMBRO EVENTUALE
elements of a text and the reference context.
SCREENSHOT
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Examples of Disambiguation
Let’s look at some sentences using the term “bomb”:
The disambiguator intercepts the first possible meaning of “bomb”: it is a
sport noun which means a long high forwardIPOTESI
pass.
INGOMBRO EVENTUALE
SCREENSHOT
In the second sentence, “bomb” is still a noun, IPOTESI
but INGOMBRO
in this
case it means a
EVENTUALE
SCREENSHOT
commercial or artistic failure.
Examples of Disambiguation
If “bomb” appears in a sentence with the term volcano, it is interpreted as
a lump of lava.
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Finally, in the following sentence, the disambiguator interprets the term
IPOTESI INGOMBRO EVENTUALE
“bomb” as an explosive device.
SCREENSHOT
Disambiguator: Text Map
Using semantics allows for the creation of a cognitive knowledge map, a
graphic view of the text elements analyzed. We will use this internet
biography of Edgar Allan Poe as an example:
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Disambiguator: Classification
Classification recognizes the main categories in the text (literature,
military, publishing, etc.) and identifies the main concepts according to
the semantic domain identified with the corresponding percentage
(“allegory”, “book review”, “character” forIPOTESI
literature;
“Tamerlane”,
INGOMBRO EVENTUALE
SCREENSHOT
“United States Army”, “West Point” for military;
“book” for publishing;
“epic poem”, “Baudelaire” for poetry, and so on).
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Disambiguator: Main Concepts
Main concepts included in the text are listed – frequency is indicated in
the document analyzed as indicated by the colored bar.
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
Semantic Analysis Means Disambiguation
Disambiguation is made possible by:
•
A semantic network that contains the representation of concepts and
relationships between them.
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
•
A disambiguation engine that, based on knowledge from the semantic
network, is able to associate every textual element to the meaning it
represents.
morphological analysis
parsing
sentence analysis
sentence analysis
[
IPOTESI INGOMBRO EVENTUALE
semantic network
SCREENSHOT
Concepts, domain
ontologies, places,
companies,
products, people
]
What is a Semantic Network?
A lexical database structured by a conceptual framework.
Which means structuring
words in groups of
synonyms and words that
are identical or similar in
expressed meaning
(concept).
A concept in the language is
named syncon
(synonymous congressus),
which is a set of synonyms
representing the same
lexical concept.
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
IPOTESI INGOMBRO EVENTUALE
SCREENSHOT
For More Information
Thank You!
Expert System
www.expertsystem.net