Knowledge Representation

Download Report

Transcript Knowledge Representation

Lecture 04: Knowledge Representation
SIMS 202:
Information Organization
and Retrieval
Prof. Ray Larson & Prof. Marc Davis
UC Berkeley SIMS
Tuesday and Thursday 10:30 am - 12:00 am
Fall 2003
Credits to Warren Sack for some of the slides in this lecture
IS 202 - FALL 2003
2003.09.04 - SLIDE 1
Today
• Review of Categorization
• Knowledge Representation
– The Vocabulary Problem
– Commonsense
– Cyc
• Discussion Questions
• Phone Project Overview and Assignment 2
• Action Items for Next Time
IS 202 - FALL 2003
2003.09.04 - SLIDE 2
Today
• Review of Categorization
• Knowledge Representation
– The Vocabulary Problem
– Commonsense
– Cyc
• Discussion Questions
• Phone Project Overview and Assignment 2
• Action Items for Next Time
IS 202 - FALL 2003
2003.09.04 - SLIDE 3
Categorization
• Processes of categorization are fundamental to
human cognition
• Categorization is messier than our computer
systems would like
• Human categorization is characterized by
– Family resemblances
– Prototypes
– Basic-level categories
• Considering how human categorization functions
is important in the design of information
organization and retrieval systems
IS 202 - FALL 2003
2003.09.04 - SLIDE 4
Categorization
• Classical categorization
– Necessary and sufficient conditions for
membership
– Generic-to-specific monohierarchical structure
• Modern categorization
– Characteristic features (family resemblances)
– Centrality/typicality (prototypes)
– Basic-level categories
IS 202 - FALL 2003
2003.09.04 - SLIDE 5
Properties of Categorization
• Family Resemblance
– Members of a category may be related to one
another without all members having any
property in common
• Prototypes
– Some members of a category may be “better
examples” than others, i.e., “prototypical”
members
IS 202 - FALL 2003
2003.09.04 - SLIDE 6
Basic-Level Categorization
• Perception
– Overall perceived shape
– Single mental image
– Fast identification
• Function
– General motor program
• Communication
– Shortest, most commonly used and contextually neutral words
– First learned by children
• Knowledge Organization
– Most attributes of category members stored at this level
– Tends to be in the “middle” of a classification hierarchy
IS 202 - FALL 2003
2003.09.04 - SLIDE 7
Today
• Review of Categorization
• Knowledge Representation
– The Vocabulary Problem
– Commonsense
– Cyc
• Discussion Questions
• Phone Project Overview and Assignment 2
• Action Items for Next Time
IS 202 - FALL 2003
2003.09.04 - SLIDE 8
Information Hierarchy
Wisdom
Knowledge
Information
Data
IS 202 - FALL 2003
2003.09.04 - SLIDE 9
Information Hierarchy
Wisdom
Knowledge
Information
Data
IS 202 - FALL 2003
2003.09.04 - SLIDE 10
Today’s Thinkers/Tinkerers
George Furnas
http://www.si.umich.edu/~furnas/
Marvin Minsky
http://web.media.mit.edu/~minsky/
Doug Lenat
http://www.cyc.com/staff.html
IS 202 - FALL 2003
2003.09.04 - SLIDE 11
The Birth of AI
• Rockefeller-sponsored Institute at
Dartmouth College, Summer 1956
– John McCarthy, Dartmouth (->MIT->Stanford)
– Marvin Minsky, MIT (geometry)
– Herbert Simon, CMU (logic)
– Allen Newell, CMU (logic)
– Arthur Samuel, IBM (checkers)
– Alex Bernstein, IBM (chess)
– Nathan Rochester, IBM (neural networks)
– Etc.
IS 202 - FALL 2003
2003.09.04 - SLIDE 12
Definition of AI
“... artificial intelligence [AI] is the science of
making machines do things that would
require intelligence if done by [humans]”
(Minsky, 1963)
IS 202 - FALL 2003
2003.09.04 - SLIDE 13
The Goals of AI Are Not New
• Ancient Greece
– Daedalus’ automata
• Judaism’s myth of the Golem
• 18th century automata
– Singing, dancing, playing chess?
• Mechanical metaphors for mind
– Clock
– Telegraph/telephone network
– Computer
IS 202 - FALL 2003
2003.09.04 - SLIDE 14
Some Areas of AI
•
•
•
•
•
•
•
•
•
•
Knowledge representation
Programming languages
Natural language understanding
Speech understanding
Vision
Robotics
Planning
Machine learning
Expert systems
Qualitative simulation
IS 202 - FALL 2003
2003.09.04 - SLIDE 15
AI or IA?
• Artificial Intelligence (AI)
– Make machines as smart as (or smarter than)
people
• Intelligence Amplification (IA)
– Use machines to make people smarter
IS 202 - FALL 2003
2003.09.04 - SLIDE 16
Today
• Review of Categorization
• Knowledge Representation
– The Vocabulary Problem
– Commonsense
– Cyc
• Discussion Questions
• Phone Project Overview and Assignment 2
• Action Items for Next Time
IS 202 - FALL 2003
2003.09.04 - SLIDE 17
Furnas: The Vocabulary Problem
• People use different words to describe the
same things
– “If one person assigns the name of an item,
other untutored people will fail to access it on
80 to 90 percent of their attempts.”
– “Simply stated, the data tell us there is no one
good access term for most objects.”
IS 202 - FALL 2003
2003.09.04 - SLIDE 18
The Vocabulary Problem
• How is it that we come to understand each
other?
– Shared context
– Dialogue
• How can machines come to understand
what we say?
– Shared context?
– Dialogue?
IS 202 - FALL 2003
2003.09.04 - SLIDE 19
Vocabulary Problem Solutions?
• Furnas et al.
– Make the user memorize precise system
meanings
– Have the user and system interact to identify
the precise referent
– Provide infinite aliases to objects
• Minsky and Lenat
– Give the system “commonsense” so it can
understand what the user’s words can mean
IS 202 - FALL 2003
2003.09.04 - SLIDE 20
Lenat on the Vocabulary Problem
• “The important point is that users will be
able to find information without having to
be familiar with the precise way the
information is stored, either through field
names or by knowing which databases
exist, and can be tapped.”
IS 202 - FALL 2003
2003.09.04 - SLIDE 21
Minsky on the Vocabulary Problem
• “To make our computers easier to use, we
must make them more sensitive to our
needs. That is, make them understand
what we mean when we try to tell them
what we want. […] If we want our
computers to understand us, we’ll need to
equip them with adequate knowledge.”
IS 202 - FALL 2003
2003.09.04 - SLIDE 22
Today
• Review of Categorization
• Knowledge Representation
– The Vocabulary Problem
– Commonsense
– Cyc
• Discussion Questions
• Phone Project Overview and Assignment 2
• Action Items for Next Time
IS 202 - FALL 2003
2003.09.04 - SLIDE 23
Commonsense
• Commonsense is background knowledge
that enables us to understand, act, and
communicate
• Things that most children know
• Minsky on commonsense:
– “Much of our commonsense knowledge
information has never been recorded at all
because it has always seemed so obvious we
never thought of describing it.”
IS 202 - FALL 2003
2003.09.04 - SLIDE 24
Commonsense Example
• “I want to get inexpensive dog food.”
•
•
•
•
•
The food is not made out of dogs.
The food is not for me to eat.
Dogs cannot buy their own food.
I am not asking to be given dog food.
I am not saying that I want to understand
why some dog food is inexpensive.
• The dog food is not more than $5 per can.
IS 202 - FALL 2003
2003.09.04 - SLIDE 25
Engineering Commonsense
• Use multiple ways to represent knowledge
• Acquire huge amounts of that knowledge
• Find commonsense ways to reason with it
(“knowledge about how to think”)
IS 202 - FALL 2003
2003.09.04 - SLIDE 26
Multiple Representations
• Minksy
– “I think this is what brains do instead: Find several
ways to represent each problem and to represent the
required knowledge. Then when one method fails to
solve a problem, you can quickly switch to another
description.”
• Furnas
– “But regardless of the number of commands or
objects in a system and whatever the choice of their
‘official’ names, the designer must make many, many
alternative verbal access routes to each.”
IS 202 - FALL 2003
2003.09.04 - SLIDE 27
Today
• Review of Categorization
• Knowledge Representation
– The Vocabulary Problem
– Commonsense
– Cyc
• Discussion Questions
• Phone Project Overview and Assignment 2
• Action Items for Next Time
IS 202 - FALL 2003
2003.09.04 - SLIDE 28
CYC
• Decades long effort to build a
commonsense knowledge-base
• Storied past
• 100,000 basic concepts
• 1,000,000 assertions about the world
• The validity of Cyc’s assertions are
context-dependent (default reasoning)
IS 202 - FALL 2003
2003.09.04 - SLIDE 29
Cyc Examples
• Cyc can find the match between a user's query for
"pictures of strong, adventurous people" and an
image whose caption reads simply "a man climbing
a cliff"
• Cyc can notice if an annual salary and an hourly
salary are inadvertently being added together in a
spreadsheet
• Cyc can combine information from multiple
databases to guess which physicians in practice
together had been classmates in medical school
• When someone searches for "Bolivia" on the Web,
Cyc knows not to offer a follow-up question like
"Where can I get free Bolivia online?"
IS 202 - FALL 2003
2003.09.04 - SLIDE 30
Cyc Applications
• Applications currently available or in development
–
–
–
–
–
Integration of Heterogeneous Databases
Knowledge-Enhanced Retrieval of Captioned Information
Guided Integration of Structured Terminology (GIST)
Distributed AI
WWW Information Retrieval
• Potential applications
–
–
–
–
–
–
–
–
Online brokering of goods and services
"Smart" interfaces
Intelligent character simulation for games
Enhanced virtual reality
Improved machine translation
Improved speech recognition
Sophisticated user modeling
Semantic data mining
IS 202 - FALL 2003
2003.09.04 - SLIDE 31
Cyc’s Top-Level Ontology
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Fundamentals
Top Level
Time and Dates
Types of Predicates
Spatial Relations
Quantities
Mathematics
Contexts
Groups
"Doing"
Transformations
Changes Of State
Transfer Of
Possession
Movement
Parts of Objects
•
•
•
•
•
•
•
•
•
•
•
•
•
Composition of
Substances
Agents
Organizations
Actors
Roles
Professions
Emotion
Propositional
Attitudes
Social
Biology
Chemistry
Physiology
General
Medicine
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Materials
Waves
Devices
Construction
Financial
Food
Clothing
Weather
Geography
Transportation
Information
Perception
Agreements
Linguistic Terms
Documentation
http://www.cyc.com/cyc-2-1/toc.html
IS 202 - FALL 2003
2003.09.04 - SLIDE 32
OpenCYC
• Cyc’s knowledge-base is now coming
online
– http://www.opencyc.org/
• How could Cyc’s knowledge-base affect
the design of information organization and
retrieval systems?
IS 202 - FALL 2003
2003.09.04 - SLIDE 33
Today
• Review of Categorization
• Knowledge Representation
– The Vocabulary Problem
– Commonsense
– Cyc
• Discussion Questions
• Phone Project Overview and Assignment 2
• Action Items for Next Time
IS 202 - FALL 2003
2003.09.04 - SLIDE 34
Discussion Questions (Furnas)
• Alison Billings & Vijay Viswanathan on
Furnas
– Are unlimited alias indexes an effective design
solution to the problem of precision in "term
based" searches? Is it possible to implement
such a system that could maintain an
accurate relation (category) to the designer’s
“armchair” term with the existence of
polysemy? Would the adaptive nature of this
solution propagate an all inclusive alias
category which could include all accessible
information in a particular index?
IS 202 - FALL 2003
2003.09.04 - SLIDE 35
Discussion Questions (Furnas)
• Alison Billings & Vijay Viswanathan on Furnas
– Since the publishing of this article in 1987 the
technological advances in information retrieval in the
past 16 years have been profound. Is the
Vocabulary-Problem still a major issue in HumanSystem Communication? Furnas, et al., provide
some solutions to the Vocabulary Problem such as
“unlimited aliasing”, “keyword harvesting”, and
“adaptive indices.” But now there are WYSIWYG
interfaces such as Windows that may reduce the
need for command line word choices, search engines
that harvest the content from web pages, or services
like Google that put out “Did you mean xxxxx?” when
search results are sparse. Has the Vocabulary
Problem been solved?
IS 202 - FALL 2003
2003.09.04 - SLIDE 36
Discussion Questions (Minsky)
• Joseph Hall on Minsky
– Minsky talks a lot about commonsense. How would
you define what is within the commonsense? Do you
think that commonsense would be easy or difficult to
teach to a computer? Why? Is commonsense a crosscultural, basic-level category in the sense of what
Lakoff described? Or is it more culturally specific (like
"Don't step in front of moving traffic.") and thus harder
to define? How would culturally-dependent definitions
of "commonsense" complicate Minsky's theory?
– Are machines that learn such a good thing? For
example, I would like my computer to learn certain
things (like how to fix common errors) but not others
(like how to play the stock market with my bank
account). Are ethics (cyber and otherwise) to be
programmed into learning computers?
IS 202 - FALL 2003
2003.09.04 - SLIDE 37
Discussion Questions (Minsky)
• Joseph Hall on Minsky
– What Minsky describes is all fine and dandy... but
there seems to be a rather large gap between the
machines of today and the machines he is
postulating. To learn, machines would not only have
to be able to note (and take action) when they are
deviating from "operational parameter space"
(malfunctioning, blue screen of death, etc.) but be
able to decide on and implement a solution to the
problem at hand from a different direction and/or
using a different technique, quickly.
IS 202 - FALL 2003
2003.09.04 - SLIDE 38
Discussion Questions (Minsky)
• Joseph Hall on Minsky
– Do you think that building such a
commonsense-aware machine is possible
today? (That is, is Minsky's model of a
commonsense-based machine a reasonable
*goal* or just an ideal?) If not, what are some
of the impediments to the realization of one of
Minsky's machines?
– Do user expectations (reasonable or not) of
what a computer should be doing factor into
this at all?
IS 202 - FALL 2003
2003.09.04 - SLIDE 39
Discussion Questions (Lenat)
• Rebecca Shapley on Lenat
– What does this article imply for best-practices in
information organization & retrieval? How would you
articulate the potential for a commonsense
knowledgebase to revolutionize information retrieval?
Does the premise of a commonsense-base feeding
efforts at machine learning or natural language
understanding make sense to you? Which potential
applications Lenat mentions are compelling to you?
– This article is from 1995 - do we hear anything more
about this CYC? Did it revolutionize things? Why
does Minsky call for a huge commonsense
knowledgebase in 2000 when CYC was nearly
complete in 1995?
IS 202 - FALL 2003
2003.09.04 - SLIDE 40
Discussion Questions (Lenat)
• Rebecca Shapley on Lenat
– How would you apply the conduit metaphor &
toolmaker's paradigms to describe, or perhaps
critique, the CYC project?
– If CYC is 'automating the whitespace in documents' capturing the context for information, how would you
describe the context it is capturing? How would you
describe where the captured context is no longer
applicable? How do you feel about the notion that 10+
people in Palo Alto CA were able to describe your
context? Do you trust them with that task? Do you
consider it necessary that some shared automated
context be created? What challenges do you see for
their ostensible goal, or limitations do you see to their
approach?
IS 202 - FALL 2003
2003.09.04 - SLIDE 41
Discussion Questions (Lenat)
• Rebecca Shapley on Lenat
– Anything in particular you can imagine yourself
unwilling to have represented a particular way in the
commonsensebase? Let's say you believe in
reincarnation but the assertions in the
commonsensebase don't leave any room for this idea,
and how to interpret what you might say to a
bereaved friend. How do you feel about the ability to
'automatically' interpret your expression being left
out? Does it make you feel invisible, relieved, angry?
What would be necessary to have it be culturally
sensitive, and would that be encodable?
IS 202 - FALL 2003
2003.09.04 - SLIDE 42
Discussion Questions (Lenat)
• Rebecca Shapley on Lenat
– What can you piece together about how CYC
is implemented, how it makes decisions?
What questions do you still have about how it
works?
– Do you think the tone of the article was
influenced by the fact that Lenat was writing
as President of Cycorp?
– So, can this common-sense-base 'think'? Is it
intelligent? Why and why not?
IS 202 - FALL 2003
2003.09.04 - SLIDE 43
Today
• Review of Categorization
• Knowledge Representation
– The Vocabulary Problem
– Commonsense
– Cyc
• Discussion Questions
• Phone Project Overview and Assignment 2
• Action Items for Next Time
IS 202 - FALL 2003
2003.09.04 - SLIDE 44
Assignment 0 Check-In
• Deliverables
– Personal web page
– Assignments page
– Email address
– Focus statement
– Online Questionnaire
IS 202 - FALL 2003
2003.09.04 - SLIDE 45
Phone Project Overview
• In this project we will be creating,
sharing, and reusing mobile media
and metadata
• You and your Project Group will
design application use scenarios
and develop and refine metadata
frameworks for your photos
• Some of you may even choose to
develop retrieval applications for
the photo database in the second
half of the course
• We will be using the Nokia 3650
mobile media phone and software
developed by Garage Cinema
Research
IS 202 - FALL 2003
2003.09.04 - SLIDE 46
Phone Project Overview
• In the SIMS 202 Phone Project you and your
Project Group will
– Experience the actual process of information
organization and retrieval (especially as regards
metadata creation and use)
– Work in small, focused teams performing a variety of
tasks in image acquisition, description, and
application design
– Develop an ongoing resource for SIMS (an annotated
photo database) that can be used for internal
research and teaching, as well as for external
promotional and informational purposes
IS 202 - FALL 2003
2003.09.04 - SLIDE 47
Phone Project Requirements
• Create engaging and useful application
scenarios and photos for use by your team and
the entire class
– The photos you take and the applications you will
design to use them should be interesting and useful
to you and your colleagues
• Create a shared, reusable resource of annotated
photos
– Design your metadata such that all photos are
accessible not only for the needs of your particular
application, but also for the reusability of your photos
and metadata by other applications
IS 202 - FALL 2003
2003.09.04 - SLIDE 48
Phone Project Assignments
• Photo Use Scenario – Application Idea (Assignment 2)
– You will brainstorm and storyboard an application for a mobile
media device that accesses a server and facilitates the creation,
sharing, and reuse of media and metadata. You will develop
user personas and scenarios of how the application works and
how the user experiences it.
• Photo Capture and Annotation (Assignment 3)
– With the goals of your application and the overall goals of the
class project in mind, each group member is required to take at
least 5 pictures relevant to the scenario you specified in the prior
assignment. You will also get hands-on experience in annotating
photos using the Mobile Media Metadata (MMM) framework, an
application available on the mobile phones. You will also identify
strengths and weaknesses of MMM framework.
IS 202 - FALL 2003
2003.09.04 - SLIDE 49
Phone Project Assignments
• Photo Metadata Design (Assignment 4)
– Having your application and the overall
project goals in mind, you will design a
suitable metadata framework to annotate the
photos in the collection. You will also
annotate more photos using your metadata
framework.
IS 202 - FALL 2003
2003.09.04 - SLIDE 50
Phone Project Assignments
• Project Presentations (Assignment 6)
– In a special class session, your group will present
your application ideas, metadata frameworks, and
annotated photos to your fellow students using the
Flamenco browser. Each group will have about 10
minutes to present their innovative work.
• Metadata Consolidation (Assignment 8)
– You will consolidate your classification scheme with
those belonging to other groups. The entire class will
collaborate to create one overall metadata framework
which will be used to for Phase II of the project.
IS 202 - FALL 2003
2003.09.04 - SLIDE 51
Phone Project Assignments
• Phone Project Phase II – Application Selection
(Assignment 10)
– The entire class will decide on an application to implement from
among the application ideas presented by the various project
groups as well as from among any ideas you or your Project
group have come up with.
• Phone Project Phase II – Specification & Design
(Assignment 13)
– A group of class volunteers will draft specifications and designs
for the application selected in the previous assignment.
• Phone Project Phase II – Implementation & Testing
(Assignment 14)
– A group of class volunteers will implement and test the
application selected in the previous assignment.
IS 202 - FALL 2003
2003.09.04 - SLIDE 52
Assignment 2: Process
• Brainstorm application ideas
• Evaluate your ideas and agree on one to pursue
• Come up with a persona and scenario for your
application idea
• Write a description of your application idea
involving one persona and one scenario
• Draw a storyboard with explanatory text
• Document the results of your brainstorming
• Create your group website
IS 202 - FALL 2003
2003.09.04 - SLIDE 53
Assignment 2: Deliverables
• Brief description of the application idea
you selected
• Persona description
• Scenario description
• Annotated storyboard
• Work distribution table
• List all brainstorming ideas and reasons
for selecting or rejecting each
IS 202 - FALL 2003
2003.09.04 - SLIDE 54
Assignment 2: Turning It In
• Submit an email to [email protected] with the following
information (due September 16, before
class):
– Group name
– URL of your group website
– URL to description (application, persona,
scenario), storyboard, brainstorming results,
work distribution table
– Time it took you to complete the assignment
– Any comments on assignment (optional)
IS 202 - FALL 2003
2003.09.04 - SLIDE 55
Today
• Review of Categorization
• Knowledge Representation
– The Vocabulary Problem
– Commonsense
– Cyc
• Discussion Questions
• Phone Project Overview and Assignment 2
• Action Items for Next Time
IS 202 - FALL 2003
2003.09.04 - SLIDE 56
Homework (!)
• Read
– Word Association Norms, Mutual Information,
and Lexicography (Church, Kenneth and
Hanks, Patrick)
– Wordnet: An Electronic Lexical Database -Introduction & Ch. 1 (C. Fellbaum, G.A. Miller)
(handout)
• Assignment 2: Photo Use Scenario
– Due by Tuesday, September 16
IS 202 - FALL 2003
2003.09.04 - SLIDE 57
Next Time
• Lexical Relations and WordNet (RRL)
IS 202 - FALL 2003
2003.09.04 - SLIDE 58