Transcript Knowledge
Lecture 16: Knowledge Representation
SIMS 202:
Information Organization
and Retrieval
Prof. Ray Larson & Prof. Marc Davis
UC Berkeley SIMS
Tuesday and Thursday 10:30 am - 12:00 pm
Fall 2004
Credits to Marti Hearst and Warren Sack for some of the slides in this lecture
IS 202 - FALL 2004
2004.10.21 - SLIDE 1
Agenda
• Review of Last Time
• Knowledge Representation
– The Vocabulary Problem
– Commonsense
– CYC
• Discussion Questions
• Action Items for Next Time
IS 202 - FALL 2004
2004.10.21 - SLIDE 2
Agenda
• Review of Last Time
• Knowledge Representation
– The Vocabulary Problem
– Commonsense
– CYC
• Discussion Questions
• Action Items for Next Time
IS 202 - FALL 2004
2004.10.21 - SLIDE 3
Categorization
• Processes of categorization are fundamental to
human cognition
• Categorization is messier than our computer
systems would like
• Human categorization is characterized by
– Family resemblances
– Prototypes
– Basic-level categories
• Considering how human categorization functions
is important in the design of information
organization and retrieval systems
IS 202 - FALL 2004
2004.10.21 - SLIDE 4
Categorization
• Classical categorization
– Necessary and sufficient conditions for
membership
– Generic-to-specific monohierarchical structure
• Modern categorization
– Characteristic features (family resemblances)
– Centrality/typicality (prototypes)
– Basic-level categories
IS 202 - FALL 2004
2004.10.21 - SLIDE 5
Properties of Categorization
• Family Resemblance
– Members of a category may be related to one
another without all members having any
property in common
• Prototypes
– Some members of a category may be “better
examples” than others, i.e., “prototypical”
members
IS 202 - FALL 2004
2004.10.21 - SLIDE 6
Basic-Level Categorization
• Perception
– Overall perceived shape
– Single mental image
– Fast identification
• Function
– General motor program
• Communication
– Shortest, most commonly used and contextually neutral words
– First learned by children
• Knowledge Organization
– Most attributes of category members stored at this level
– Tends to be in the “middle” of a classification hierarchy
IS 202 - FALL 2004
2004.10.21 - SLIDE 7
Agenda
• Review of Last Time
• Knowledge Representation
– The Vocabulary Problem
– Commonsense
– CYC
• Discussion Questions
• Action Items for Next Time
IS 202 - FALL 2004
2004.10.21 - SLIDE 8
Information Hierarchy
Wisdom
Knowledge
Information
Data
IS 202 - FALL 2004
2004.10.21 - SLIDE 9
Information Hierarchy
Wisdom
Knowledge
Information
Data
IS 202 - FALL 2004
2004.10.21 - SLIDE 10
Today’s Thinkers/Tinkerers
George Furnas
http://www.si.umich.edu/~furnas/
Marvin Minsky
http://web.media.mit.edu/~minsky/
Doug Lenat
http://www.cyc.com/staff.html
IS 202 - FALL 2004
2004.10.21 - SLIDE 11
The Birth of AI
• Rockefeller-sponsored Institute at
Dartmouth College, Summer 1956
– John McCarthy, Dartmouth (->MIT->Stanford)
– Marvin Minsky, MIT (geometry)
– Herbert Simon, CMU (logic)
– Allen Newell, CMU (logic)
– Arthur Samuel, IBM (checkers)
– Alex Bernstein, IBM (chess)
– Nathan Rochester, IBM (neural networks)
– Etc.
IS 202 - FALL 2004
2004.10.21 - SLIDE 12
Definition of AI
“... artificial intelligence [AI] is the science of
making machines do things that would
require intelligence if done by [humans]”
(Minsky, 1963)
IS 202 - FALL 2004
2004.10.21 - SLIDE 13
The Goals of AI Are Not New
• Ancient Greece
– Daedalus’ automata
• Judaism’s myth of the Golem
• 18th century automata
– Singing, dancing, playing chess?
• Mechanical metaphors for mind
– Clock
– Telegraph/telephone network
– Computer
IS 202 - FALL 2004
2004.10.21 - SLIDE 14
Some Areas of AI
•
•
•
•
•
•
•
•
•
•
Knowledge representation
Programming languages
Natural language understanding
Speech understanding
Vision
Robotics
Planning
Machine learning
Expert systems
Qualitative simulation
IS 202 - FALL 2004
2004.10.21 - SLIDE 15
AI or IA?
• Artificial Intelligence (AI)
– Make machines as smart as (or smarter than)
people
• Intelligence Amplification (IA)
– Use machines to make people smarter
IS 202 - FALL 2004
2004.10.21 - SLIDE 16
Agenda
• Review of Last Time
• Knowledge Representation
– The Vocabulary Problem
– Commonsense
– CYC
• Discussion Questions
• Action Items for Next Time
IS 202 - FALL 2004
2004.10.21 - SLIDE 17
Furnas: The Vocabulary Problem
• People use different words to describe the
same things
– “If one person assigns the name of an item,
other untutored people will fail to access it on
80 to 90 percent of their attempts.”
– “Simply stated, the data tell us there is no one
good access term for most objects.”
IS 202 - FALL 2004
2004.10.21 - SLIDE 18
The Vocabulary Problem
• How is it that we come to understand each
other?
– Shared context
– Dialogue
• How can machines come to understand
what we say?
– Shared context?
– Dialogue?
IS 202 - FALL 2004
2004.10.21 - SLIDE 19
Vocabulary Problem Solutions?
• Furnas et al.
– Make the user memorize precise system
meanings
– Have the user and system interact to identify
the precise referent
– Provide infinite aliases to objects
• Minsky and Lenat
– Give the system “commonsense” so it can
understand what the user’s words can mean
IS 202 - FALL 2004
2004.10.21 - SLIDE 20
Lenat on the Vocabulary Problem
• “The important point is that users will be
able to find information without having to
be familiar with the precise way the
information is stored, either through field
names or by knowing which databases
exist, and can be tapped.”
IS 202 - FALL 2004
2004.10.21 - SLIDE 21
Minsky on the Vocabulary Problem
• “To make our computers easier to use, we
must make them more sensitive to our
needs. That is, make them understand
what we mean when we try to tell them
what we want. […] If we want our
computers to understand us, we’ll need to
equip them with adequate knowledge.”
IS 202 - FALL 2004
2004.10.21 - SLIDE 22
Agenda
• Review of Last Time
• Knowledge Representation
– The Vocabulary Problem
– Commonsense
– CYC
• Discussion Questions
• Action Items for Next Time
IS 202 - FALL 2004
2004.10.21 - SLIDE 23
Commonsense
• Commonsense is background knowledge
that enables us to understand, act, and
communicate
• Things that most children know
• Minsky on commonsense:
– “Much of our commonsense knowledge
information has never been recorded at all
because it has always seemed so obvious we
never thought of describing it.”
IS 202 - FALL 2004
2004.10.21 - SLIDE 24
Commonsense Example
• “I want to get inexpensive dog food.”
•
•
•
•
•
The food is not made out of dogs.
The food is not for me to eat.
Dogs cannot buy their own food.
I am not asking to be given dog food.
I am not saying that I want to understand
why some dog food is inexpensive.
• The dog food is not more than $5 per can.
IS 202 - FALL 2004
2004.10.21 - SLIDE 25
Engineering Commonsense
• Use multiple ways to represent knowledge
• Acquire huge amounts of that knowledge
• Find commonsense ways to reason with it
(“knowledge about how to think”)
IS 202 - FALL 2004
2004.10.21 - SLIDE 26
Multiple Representations
• Minksy
– “I think this is what brains do instead: Find several
ways to represent each problem and to represent the
required knowledge. Then when one method fails to
solve a problem, you can quickly switch to another
description.”
• Furnas
– “But regardless of the number of commands or
objects in a system and whatever the choice of their
‘official’ names, the designer must make many, many
alternative verbal access routes to each.”
IS 202 - FALL 2004
2004.10.21 - SLIDE 27
Agenda
• Review of Last Time
• Knowledge Representation
– The Vocabulary Problem
– Commonsense
– CYC
• Discussion Questions
• Action Items for Next Time
IS 202 - FALL 2004
2004.10.21 - SLIDE 28
CYC
• Decades long effort to build a
commonsense knowledge-base
• Storied past
• 100,000 basic concepts
• 1,000,000 assertions about the world
• The validity of Cyc’s assertions are
context-dependent (default reasoning)
IS 202 - FALL 2004
2004.10.21 - SLIDE 29
Cyc Examples
• Cyc can find the match between a user's query for
"pictures of strong, adventurous people" and an
image whose caption reads simply "a man climbing
a cliff"
• Cyc can notice if an annual salary and an hourly
salary are inadvertently being added together in a
spreadsheet
• Cyc can combine information from multiple
databases to guess which physicians in practice
together had been classmates in medical school
• When someone searches for "Bolivia" on the Web,
Cyc knows not to offer a follow-up question like
"Where can I get free Bolivia online?"
IS 202 - FALL 2004
2004.10.21 - SLIDE 30
Cyc Applications
• Applications currently available or in development
–
–
–
–
–
Integration of Heterogeneous Databases
Knowledge-Enhanced Retrieval of Captioned Information
Guided Integration of Structured Terminology (GIST)
Distributed AI
WWW Information Retrieval
• Potential applications
–
–
–
–
–
–
–
–
Online brokering of goods and services
"Smart" interfaces
Intelligent character simulation for games
Enhanced virtual reality
Improved machine translation
Improved speech recognition
Sophisticated user modeling
Semantic data mining
IS 202 - FALL 2004
2004.10.21 - SLIDE 31
Cyc’s Top-Level Ontology
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Fundamentals
Top Level
Time and Dates
Types of Predicates
Spatial Relations
Quantities
Mathematics
Contexts
Groups
"Doing"
Transformations
Changes Of State
Transfer Of
Possession
Movement
Parts of Objects
•
•
•
•
•
•
•
•
•
•
•
•
•
Composition of
Substances
Agents
Organizations
Actors
Roles
Professions
Emotion
Propositional
Attitudes
Social
Biology
Chemistry
Physiology
General
Medicine
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Materials
Waves
Devices
Construction
Financial
Food
Clothing
Weather
Geography
Transportation
Information
Perception
Agreements
Linguistic Terms
Documentation
http://www.cyc.com/cyc-2-1/toc.html
IS 202 - FALL 2004
2004.10.21 - SLIDE 32
OpenCYC
• Cyc’s knowledge-base is now coming
online
– http://www.opencyc.org/
• How could Cyc’s knowledge-base affect
the design of information organization and
retrieval systems?
IS 202 - FALL 2004
2004.10.21 - SLIDE 33
Web KR Resources
• OpenCYC
– http://www.opencyc.org/
• OpenMind
– http://commonsense.media.mit.edu
• beingmeta
– http://www.beingmeta.com/technology.fdxml
• Semantic Web
– http://www.w3.org/2001/sw/
IS 202 - FALL 2004
2004.10.21 - SLIDE 34
Agenda
• Review of Last Time
– The Vocabulary Problem
– Commonsense
– CYC
• Knowledge Representation
• Discussion Questions
• Action Items for Next Time
IS 202 - FALL 2004
2004.10.21 - SLIDE 35
Discussion Questions (Furnas)
• Steve Chan on Furnas
– The Furnas results indicating the problems of
word selection would seem to be related to
the motivations behind IR systems that
support relevance feedback, as well as IR
systems that support search term synonyms;
namely, user's search terms may not clearly
identify the desired objects. Of the two IR
approaches, which one seems closer to the
approach suggested by Furnas?
IS 202 - FALL 2004
2004.10.21 - SLIDE 36
Discussion Questions (Furnas)
• Steve Chan on Furnas
– The Furnas experiments used only a small
number of target objects, but allowed a large
number of aliases. We saw in classical IR
systems that search methods that worked well
on small collections, would often have
problems on larger collections. Do you
believe the aliasing would work well for larger
collections of target objects? What kinds of
applications might you want to use unlimited
aliasing for, and how do they differ from the
typical IR document retrieval system?
IS 202 - FALL 2004
2004.10.21 - SLIDE 37
Discussion Questions (Lenat)
• Rupa Patel on Lenat
– Can common-sense databases like CYC help
solve Furnas's problem of vocabulary usage
in systems design?
– How can common-sense knowledge bases
lend insight into natural language
ambiguities?
IS 202 - FALL 2004
2004.10.21 - SLIDE 38
Discussion Questions (Lenat)
• Rupa Patel on Lenat
– In CYC, human “knowledge enterers” are
responsible for adding and editing atomic
terms, assertions of reason, and
contexts. The assertions can be related to
one another, and each holds true only in
certain contexts.
– Based on your understanding of CYC, which
categorization effects are utilized in the
construction of the contexts: prototype effects,
classical categorization theory, polysemy.
IS 202 - FALL 2004
2004.10.21 - SLIDE 39
Discussion Questions (Minsky)
• Andrew Fiore on Minsky
– Minsky's claims about how the mind works
are not supported by cognitive psychology. In
what other useful ways might we view his
theories? As philosophy? Merely as history?
IS 202 - FALL 2004
2004.10.21 - SLIDE 40
Discussion Questions (Minsky)
• Andrew Fiore on Minsky
– Humans clearly use a great deal of commonsense information, and although upon
demand we can express some of this
knowledge in terms of rules, we do not move
through the world logically applying one rule
after another. (The cognitive burden would
overwhelm.) Why, then, represent a commonsense knowledge base in terms of rules?
IS 202 - FALL 2004
2004.10.21 - SLIDE 41
Discussion Questions (Minsky)
• Andrew Fiore on Minsky
– What are the benefits and deficits of this
approach compared with a connectionist or
associative model of the mind? (Efficiency,
effectiveness, model legibility, external
validity...)
IS 202 - FALL 2004
2004.10.21 - SLIDE 42
Agenda
• Review of Last Time
• Knowledge Representation
– The Vocabulary Problem
– Commonsense
– CYC
• Discussion Questions
• Action Items for Next Time
IS 202 - FALL 2004
2004.10.21 - SLIDE 43
Assignment 0 Check-In
• Suggested deliverables
– SIMS email address
– Focus statement
– SIMS web site
– SIMS coursework page
IS 202 - FALL 2004
2004.10.21 - SLIDE 44
Next Time
• Lexical Relations and WordNet (RRL)
IS 202 - FALL 2004
2004.10.21 - SLIDE 45
Homework (!)
• Course Reader
– Word Association Norms, Mutual Information,
and Lexicography (Church, Kenneth and
Hanks, Patrick)
– Wordnet: An Electronic Lexical Database -Introduction & Ch. 1 (C. Fellbaum, G.A. Miller)
IS 202 - FALL 2004
2004.10.21 - SLIDE 46