Taxonomies & Classifications for Organizing Content

Download Report

Transcript Taxonomies & Classifications for Organizing Content

Taxonomies & Classifications
for
Organizing Content
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
What do we know about taxonomies?
Ontology comes from the Greek ontologia.
Onto = the science of existence
Logia =talking about being
Who gets credit for taxonomies?
Aristotle is the founder of taxonomy.
•His ideas represent the foundation for object-oriented systems
•He introduced a number of inference rules (syllogisms) used
in modern logic-based reasoning systems
Why is it, that in the last decade ( 2000 years after A)
that knowledge representations & ontologies have
gained importance?
•Agent communication (Automated data mining)
•Artificial Intelligence (Cyc)
•Description of content to facilitate its retrieval (Intelligent searches)
•Ecommerce (Amazon)
•E-science experiments
•E-learning systems
•Information integration (Personalized newspapers & journals)
•Intelligent devices (Management of Remote equipment)
•Knowledge management (Corporate Intranet)
•Speech and natural language understanding
•Web Service discovery (Mobile devices)
•Etc, etc, etc, whatever the humankind concocts (the MATRIX)
What do all of these things have in common?
•Automated data mining
•Artificial Intelligence
•Intelligent searches
•Amazon
•E-science experiments
•E-learning systems
•Personalized newspapers & journals
•Intelligent devices
•Knowledge management
•Speech and natural language understanding
•Web Service discovery
Through the use of ONTOLOGIES, they attempt to represent
knowledge in such a way that it can be understood by a computer and
have the computer use this knowledge in real time.
What are the ontological challenges?
•Multiple groups of people are conceptualizing different ways
to represent knowledge and the programs they write have different
conceptual backgrounds:
learning theory, psychology, philosophy, logic, computer science
•Ontologies can differ depending on the needs/conventions
of the producers & the consumers of the knowledge being represented.
•The word ontology is used to describe different degrees of structure
•Ontologies can differ depending on the needs/conventions of the
producers & the consumers of the knowledge being represented.
For example the word APPLIANCE has many different meanings:
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
An ontology about the domain of APPLIANCE could model:
•Household Appliances (small & major) - blenders,
expresso machine, stoves, washer/dryers, etc.
•Computer Appliances - 1U, software, virtual, etc.
•Orthodontic Appliances - braces, retainers, etc.
Domain ontologies represent concepts in very specific and often
eclectic ways, thus they are often incompatible. Furthermore, different
ontologies in the same domain can also arise due to different
perceptions of the domain based on cultural background, education,
ideology, or because a different representation language was chosen
•The word ontology has been used to describe artifacts with different
degrees of structure.
Simple taxonomies
YAHOO
Metadata schemes
DUBLIN CORE
Logical theories
CYC
Artificial Intelligence
AI
DAMLMarkup
Agent
DARPALanguage
GL
Generative
Lexicon
HAC
Hierarchical
Agglomerative Knowledge
HTML
Clustering
IE
HyperText
Markup Language
ILP
Information
Extraction
IR
Inductive
Logic Programming
JS
Informational
Retrieval
KB
Jensen-Shannon
divergence
KM
Knowledge
Base
KR
Knowledge
Management
LSI
Knowledge
Representation
LSA Semantic Indexing
Latent
MRD Semantic Analysis (=LSI)
Latent
MT
Machine
Readable Dictionary
MUC Translation
Machine
Message Understanding Conferences
Named Entity Recognition
NER
NLP Language Processing
Natural
NP Phrase
Noun
OIL
Ontology
Inference Layer
OWLOntology Language
Web
PLSI
Probabilistic
Latent Semantic
PMI
Indexing
POS
Pointwise
Mutual Information
PP Of Speech
Part
RDF(S)
Prepositional
Phrases
SVMs Description Framework
Resource
VP
(Schema)
QA
Support
Vector Machines
UMLPhrase
Verb
XML
Questioning
Answering
XML-DTD
Unified
Modeling Language
WSD
eXtensible
Markup Language
XML-Document Type Definition
Word Sense Disambiguation
Regardless of these differences, in one way or another
an ontology looks at a domain in terms of:
• Classes (general things) in the many domains of interest
• The relationships that can exist among things
• The properties (or attributes) those things may have
Cyc
A project started in Austin, Texas by Doug Lenat as part of
Microelectonics and Computer Technology. It is an AI project that
attempts to assemble a comprehensive ontology and database of
everyday common sense knowledge, with the goal of enabling AI
applications to perform human-like reasoning.
The original knowledge base is proprietary, but now there is an open
version.
WordNet
A semantic lexicon for the English language.
The purpose is twofold:
•to produce a combination of dictionary and
thesaurus that is more intuitively usable
•to support automatic text analysis and AI
applications.
The Dublin Core
A metadata element set is a standard for cross-domain information
resource description. It provides a simple and standardized set of
conventions for describing things online in ways that make them
easier to find. Dublin Core is widely used to describe:
• Digital materials such as video
•Sound
•Image
•Text
•Composite media like web pages.
Suggested Upper Merged
Ontology or SUMO
It was originally developed by the Teknowledge Corporation and now is maintained by
Articulate Software.
SUMO originally concerned itself with meta-level concepts and thereby would lead
naturally to a categorization scheme for encyclopedias. It has now been considerably
expanded to include a mid-level ontology and dozens of domain ontologies. SUMO
was first released in December 2000.
Web Ontology Language
or OWL
W30 trying to define an ontology that can be used across all domains and applications:
•Agent communication
•Artificial Intelligence
•Description of content to facilitate its retrieval
•Ecommerce
•E-science experiments
•E-learning systems
•Information integration
•Intelligent devices
•Knowledge management
•Speech and natural language understanding
•Web Service discovery
The General Formal Ontology
(GFO)
Developed by Heinrich Herre, Barbara Heller and collaborators (research
group at Onto-Med in Leipzig.
Primarily, the ontology GFO:
•
Includes objects as well as processes and both are integrated into one coherent
system
•
includes levels of reality
•
is designed to support interoperability by principles of ontological mapping and
reduction
•
contains several novel ontological modules in particular, a module for functions
and a module for roles
•
is designed for applications, firstly in medical, biological, and biomedical areas,
but also in the fields of economics and sociology.
EXAMPLES of ONTOLOGIES IN AC
Web Portals - define an ontology for its community
An ontology for an information science portal includes the terms:
"journal paper," "publication," "person," and "author." This ontology
could include definitions that state things such as "all journal papers
are publications" or "the authors of all publications are people." When
combined with facts, these definitions allow other facts that are
necessarily true to be inferred. These inferences can, in turn, allow
users to obtain search results from the portal that are impossible to
obtain from conventional retrieval systems. Such a technique relies on
content providers using the web ontology language to capture highquality ontology relationships.
EXAMPLES of ONTOLOGIES IN ACTI
Multimedia Collection
An indexer selects the value "Late Georgian" for the style/period of an antique
chest of drawers, it should be possible to infer that the data element
"date.created" should have a value between 1760 and 1811 A.D. and that the
"culture" is British. Availability of this type of background knowledge significantly
increases the support that can be given for indexing as well as for search.
Another feature that could be useful is support for the representation of default
knowledge. An example of such knowledge would be that a "Late Georgian chest
of drawers," in the absence of other information, would be assumed to be made
of mahogany. This knowledge is crucial for real semantic queries, e.g. a user
query for "antique mahogany storage furniture" could match with images of Late
Georgian chests of drawers, even if nothing is said about wood type in the image
annotation.
EXAMPLES of ONTOLOGIES IN ACT
Corporate Website Management
An ontology-enabled web site may be used by:
•A salesperson looking for sales collateral relevant to a sales pursuit
• A technical person looking for pockets of specific technical expertise and
detailed past experience
• A project leader looking for past experience and templates to support a
complex, multi-phase project, both during the proposal phase and during
execution
A typical problem for each of these types of users is that they may not share
terminology with the authors of the desired content. The salesperson may not know
the technical name for a desired feature or technical people in different fields might
use different terms for the same concept. For such problems, it would be useful for
each class of user to have different ontologies of terms, but have each ontology
interrelated so translations can be performed automatically.
Moving from the World Wide Web
to
the Semantic Web
Ontologies figure prominently in the emerging Semantic Web
as a way of representing the semantics of documents and
enabling the semantics to be used by web applications and
intelligent agents.
There are studies on generalized techniques for merging
ontologies, but this area of research is still largely theoretical.
Information versus Knowledge
The World Wide Web is based mainly on documents written in
Hypertext Markup Language (HTML).
Language
When you enter a search query:
- Expandable
- language independent
“Information
Architecture
- machine understandable
- understood by humans
and Design -Fall
2007 and UT Austin”
ambiguous
the search engine is programmed to pull relevant documents based on an
algorithm formula which factors metadata relevant to your query word:
• number of keywords
in the page
Knowledge
•name of images
- changes
•number of hyper
linksrapidly
entering and exiting the page
•etc.
- may be local to an entity
Information versus Knowledge
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>Information Architecture and Design Fall 2007</title>
<meta name="keywords" content="Information Architecture, Information Design,
Information Architecture and Design, School of Information, Web Information Seeking, Web Site Design, Information Seeking,
The search program
It has
pulls
no real understanding,
Information Retrieval, Fall 2006">
<meta name="description" content="Course Web Site: Information Architecture and
Design, Fall 2006">
<meta content="text/html; charset=iso-8859-1">
<link href="Web_files/iabeta.css" rel="stylesheet" type="text/css">
<link rel="stylesheet" type="text/css" href="Web_files/iaprint.css" media="print">
</head>
<body>
<!--Begin page header logo and search box. This header contains an image file that will change with each class.-->
<div id="headerlogo">
NOINFORMATION
KNOWLEDGE
School of Information, The University of Texas at Austin<br>
<span class="logoL2">385E Information Architecture and Design l</span><br>
of the page
<p class="logoL1">
<span class="logoL3">Fall 2007</span></p></div><div id="headersearch" class="noprint">
<form method="get" action="http://www.google.com/univ/utexas"><input name="q" size="30" maxlength="255" value="" align="top" type="text"><br>
<input name="btnG" value="Search" align="center" type="submit"> <a href="http://www.google.com"><img src="Web_files/GoogleLogo.gif" border="0" height="27" width="64"></a></form>
</div><!--Begin top navigation including primary (folder) and secondary nav subline. --><ul id="topnavfolders" class="noprint"><li><a href="index.html" class="selected">Overview</a></li>
<li><a href="policies.html">Policies</a></li><li><a href="schedule.html">Schedule</a></li<li><a href="assignments.html">Assignments</a></li<li><a href="resources.html">Resources</a></li
</ul><div id="topnavsub" class="noprint"><a href="#1" class="overview">General Info</a>&nbsp;<a href="#2" class="overview">Description</a>&nbsp;<a href="#3"
class="overview">Objectives</a>&nbsp;<a href="#4" class="overview">Textbooks</a>&nbsp;<a href="#5" class="overview">Mailing List</a>&nbsp;</div><div id="content"><a
name="1"></a><h1>General Information:</h1><p>Instructor: A. Fleming Seay, PhD <br>Email: <a href="mailto:[email protected]">[email protected]</a><br>Phone: (412) 3341682<br>Office Hours: by appointment</p><p>Class Meeting Time: Tuesday 6:30&ndash;9:30pm <br>Classroom: SZB 546<br>Course Website: <a
href="http://www.ischool.utexas.edu/%7Ei385e/index.html">http://www.ischool.utexas.edu/~i385e</a><br>TA: Jade Anderson<br>
ischool.utexas.edu">[email protected]</a>
Email: <a href="mailto:jade@
Information versus Knowledge
FACTS - what exists on the Web at the present time
INTERPRETATION OF FACTS
in light of:
•Truths
•Beliefs
•Perspectives
•Judgments
•Methodologies
•Know-how
ontology
=
Information versus Knowledge
Artificial Intelligence
Agent Markup Language
Generative Lexicon
Hierarchical Agglomerative Knowledge
Clustering
HyperText Markup Language
Information Extraction
Inductive Logic Programming
Informational Retrieval
Jensen-Shannon divergence
Knowledge Base
Knowledge Management
Knowledge Representation
Latent Semantic Indexing
Latent Semantic Analysis (=LSI)
Machine Readable Dictionary
Machine Translation
Message Understanding Conferences
Named Entity Recognition
Natural Language Processing
Noun Phrase
Ontology Inference Layer
Web Ontology Language
Probabilistic Latent Semantic
Indexing
Pointwise Mutual Information
Part Of Speech
Prepositional Phrases
Resource Description Framework
(Schema)
Support Vector Machines
Verb Phrase
Questioning Answering
Unified Modeling Language
eXtensible Markup Language
XML-Document Type Definition
Word Sense Disambiguation
Bibliography
Cimiano, Phillip. Ontology Learning and Population from Text: Algorithms, Evaluation and Applications.
2006. (New York: Springer Science & Business Media, LLC).
Heflin, Jeff (editor). “OWL Web Ontology Language Use Cases and Requirements: W3C Recommendation
10 February 2004.”. http://www.w3.org/TR/webont-req/ . 2004. World Wide Web Consortium. Retrieved
August 21, 2007.
Hillman, Diane. “ Using Dublin Core.” http://dublincore.org/documents/usageguide/ . 1995-2007. Dublin Core
Metadata Initiative. Retrieved July 25, 2007.
Hillman, Diane. “Using Dublin Core - The Elements”.
http://dublincore.org/documents/usageguide/elements.shtml . 1995-2007. Dublin Core Metadata Initiative.
Retrieved July 25, 2007.
Walton, D. Christopher. Agency and the Semantic Web. 2007. (NewYork: Oxford University Press).
“about Cycorp.” http://www.cyc.com/cyc/company . 2002-2007. Cycorp, Inc. Retrieved September 29, 2007.
“About Wordnet.” http://wordnet.princeton.edu/ . 2006. Princeton University. Retrieved September 29, 2007.
“General Formal Ontology.” http://www.ontomed.de/en/theories/gfo/index.html . 2007. University Leipzig:
Department of Formal Concepts. Retrieved September 29, 2007.
“MODS: Metadata description Schema the Official Website. “ http://www.loc.gov/standards/mods/ . August
27, 2007. Library of Congress. Retrieved September 29, 2007.
A core glossary is a simple glossary or defining
dictionary which enables definition of other
concepts, especially for newcomers to a language
or field of study. It contains a small working
vocabulary and definitions for important or
frequently encountered concepts, usually including
idioms or metaphors useful in a culture.In
computer science, a core glossary is a prerequisite
to a core ontology. An example of this is seen in
SUMO.[edit] The search engine Google provides a
service to only search web pages belonging to a
glossary therefore providing access to a kind of
compound glossary of glossary entries found on
the web.[1]
An upper ontology (or foundation ontology) is a model of the common objects that are generally applicable
across a wide range of domain ontologies. It contains a core glossary in whose terms objects in a set of domains
can be described. There are several standardized upper ontologies available for use, including Dublin Core,
GFO, OpenCyc/ResearchCyc, SUMO, and DOLCEl.
WordNet, while considered an upper ontology by some, is not an ontology: it is a unique combination of a
taxonomy and a controlled vocabulary (see above, under Attributes).
RDF (XML based syntax)
RDFS
OWL Ontology Web Language