History of Information Search and Organization

Download Report

Transcript History of Information Search and Organization

Lecture 02: Information
IS 202:
Information Organization
and Retrieval
Prof. Ray Larson & Prof. Marc Davis
UC Berkeley SIMS
Tuesday and Thursday 10:30 am - 12:00 am
Fall 2004
http://www.sims.berkeley.edu/academics/courses/is202/f04/
IS 202 - FALL 2004
2003.09.02 - SLIDE 1
Lecture Outline
• What Is Information?
• History of Information Search and
Organization
• Discussion Questions
• Action Items for Next Time
IS 202 - FALL 2004
2003.09.02 - SLIDE 2
Lecture Outline
• What Is Information?
• History of Information Search and
Organization
• Discussion Questions
• Action Items for Next Time
IS 202 - FALL 2004
2003.09.02 - SLIDE 3
What is Information?
• There is no “correct” definition
• Can involve philosophy, psychology,
signal processing, physics
• Cookie Monster’s definition:
– “news or facts about something”
IS 202 - FALL 2004
2003.09.02 - SLIDE 4
What is Information?
• Oxford English Dictionary
– Information
• Informing, telling; thing told, knowledge,
items of knowledge, news
– Knowledge
• Knowing familiarity gained by experience;
person’s range of information; a theoretical
or practical understanding of; the sum of
what is known
IS 202 - FALL 2004
2003.09.02 - SLIDE 5
Assignment 1 - Discussion
• What is information, according to your
background or area of expertise?
IS 202 - FALL 2004
2003.09.02 - SLIDE 6
What Is Information?
IS 202 - FALL 2004
2003.09.02 - SLIDE 7
Some Answers from Fall 2003
• Relating data to a context
(“situational interpretation”)
• Anything that is important to
anyone (“significance”)
• World data information
knowledge
• Requires community of
interpretation
• All information is dependent
on context
• Capable of being recorded
and stored and transmitted
(also in physical form – e.g.,
fossils)
IS 202 - FALL 2004
• Information must be recorded
• Information is a record of
something that can be
reused
• Information is a commodity
• Negentropy
• Potential energy to become
knowledge
• Potential for it to be built
upon
• Does information have to be
related to “true” data?
• Can information be
downgraded to data if it is
forgotten?
2003.09.02 - SLIDE 8
Types of Information
• Differentiation by form
• Differentiation by content
• Differentiation by quality
• Differentiation by associated information
IS 202 - FALL 2004
2003.09.02 - SLIDE 9
Information Properties
• Information can be communicated
electronically
– Broadcasting
– Networking
• Information can be easily duplicated and
shared
– Problems of ownership
– Problems of control
Adapted from ‘Silicon Dreams’ by Robert W. Lucky
IS 202 - FALL 2004
2003.09.02 - SLIDE 10
Intuitive Notion (Losee 97)
• Information must
– Be something, although the exact nature
(substance, energy, or abstract concept) is not
clear
– Be “new”: repetition of previously received
messages is not informative
– Be “true”: false or counterfactual information
is “mis-information”
– Be “about” something
• This human-centered approach
emphasizes meaning and use of message
IS 202 - FALL 2004
2003.09.02 - SLIDE 11
Information from the Human Perspective
• Levels in cognitive processing
– Perception
– Observation/attention
– Reasoning, assimilating, forming inferences
• Knowledge
– “Justified true belief”
• Belief
– An idea held based on some support; an internally
accepted statement, result of inductive processes
combining observed facts with a reasoning process
IS 202 - FALL 2004
2003.09.02 - SLIDE 12
Information from the Human Perspective
• Does information require a human mind?
– Communication and information transfer
among ants
– A tree falls in the forest … is there information
there?
– Existence of quarks
IS 202 - FALL 2004
2003.09.02 - SLIDE 13
Meaning vs. Form
• Form of information as the information itself
• Meaning of a signal vs. the signal itself
– What aspects of a document are information?
• Representation (Norman 93)
– Why do we write things down?
• Socrates thought writing would obliterate serious thought
• Sounds and gestures fade away
– Artifacts help us to reason
– Anything not present in the representation can be
ignored
– Things left out of the representation are often what we
don’t know how to represent
IS 202 - FALL 2004
2003.09.02 - SLIDE 14
Information
• Consider Borges’ infinite Library of
Babel…
– It has all possible data combinations of letters
– Does it therefore contain all possible
information?
– What about all possible knowledge?
– What about wisdom?
• Is the Internet a prototype Library of
Babel?
IS 202 - FALL 2004
2003.09.02 - SLIDE 15
Information Theory
• Claude Shannon, 1940’s, studying communication
• Ways to measure information
– Communication: producing the same message at its destination
as that seen at its source
– Problem: a “noisy channel” can distort the message
• Between transmitter and receiver, the message must be
encoded
• Semantic aspects are irrelevant
Noise
Message
Source
Transmitter
Receiver
Destination
Channel
IS 202 - FALL 2004
2003.09.02 - SLIDE 16
Information Theory
• Better called “Technical Communication
Theory”
• Communication may be over time and
space
Message
Source
Message
Encoding
Decoding
Destination
Channel
Noise
Message
Source
IS 202 - FALL 2004
Message
Encoding
(Writing/
Indexing)
Storage
Decoding
(Retrieval/
Reading)
Destination
2003.09.02 - SLIDE 17
Human Communication Theory?
Message
Source
Message
Encoding
Decoding
Destination
Channel
Noise
IS 202 - FALL 2004
2003.09.02 - SLIDE 18
Communication Theory
• Encompasses a vast array of disciplines
– Mass communications, literary and media
theory, rhetoric, sociology, psychology,
linguistics, law, cognitive science, information
science, engineering, etc.
• Questions
– What and how we communicate
– Why we communicate
– What happens when communication “works”
and when it doesn’t
– How to improve communication
IS 202 - FALL 2004
2003.09.02 - SLIDE 19
Why Study Communication Theory?
• Our understanding of what, how, and why
we communicate informs our
– Theory of media and practice of media
production
– Analysis, design, and evaluation of
multimedia information system and
applications
– How we work together in teams
– How we read texts and talk with one another
in this course
– Law and public policy
IS 202 - FALL 2004
2003.09.02 - SLIDE 20
Etymology of “Communication”
• Communication - c.1384, from O.Fr. communicacion,
from L. communicationem (nom. communicatio), from
communicare "to impart, share," lit. "to make common,"
from communis (see common).
• Common - 13c., from O.Fr. comun, from L. communis
"shared by all or many," from L. com- "together" + munia
"public duties," those related to munia "office." Alternate
etymology is that Fr. got it from P.Gmc. *gamainiz (cf.
O.E. gemæne), from PIE *kom-moini "shared by all,"
from base *moi-, *mei- "change, exchange."
• Remuneration - c.1400, from L. remunerationem, from
remunerari "to reward," from re- "back" + munerari "to
give," from munus (gen. muneris) "gift, office, duty."
Remunerative is from 1677.
IS 202 - FALL 2004
2003.09.02 - SLIDE 21
What and How Do We Communicate?
• What “gifts” do we give each other?
• What do we do with these gifts?
• How does this gift exchange bring us
together (or not)?
IS 202 - FALL 2004
2003.09.02 - SLIDE 22
Metaphor of/in Communication
•
•
•
•
•
•
It's hard to get that idea across to him.
I gave you that idea.
It's difficult to put my ideas into words.
The meaning is right there in the words.
His words carry little meaning.
That's not what I got out of what he said.
IS 202 - FALL 2004
2003.09.02 - SLIDE 23
The Conduit Metaphor
• Language functions like a conduit, transferring
thoughts bodily from one person to another
• In writing and speaking, people insert their
thoughts or feelings in the words
• Words accomplish the transfer by containing the
thoughts or feelings and conveying them to
others
• In listening or reading, people extract the
thoughts and feelings once again from the words
IS 202 - FALL 2004
2003.09.02 - SLIDE 24
Conduit Metaphor: Minor Frameworks
• Thoughts and feelings are ejected by speaking
or writing into an external “idea space”
• Thoughts and feelings are reified in this external
space, so they exist independent of any need for
living beings to think or feel them
• These reified thoughts and feelings may, or may
not, find their way back into the heads of living
humans
IS 202 - FALL 2004
2003.09.02 - SLIDE 25
Toolmakers’ Paradigm
IS 202 - FALL 2004
2003.09.02 - SLIDE 26
Comparing Models
• Conduit Metaphor
– Repertoire Members (i.e.,
perceptions, thoughts, or
feelings) can migrate from
one mind to another
– Communication is a largely
effort free act of unpacking
the meaning in words (i.e.,
the sender’s RMs in the
Signals)
– Communication does not
involve the RMs of the
receiver of the message
IS 202 - FALL 2004
• Toolmakers Paradigm
– Only Signals can pass
between human beings,
not RMs
– Communication requires
active engagement of both
parties and often breaks
down and needs repair
– The meanings of signals
are not contained within
them, but made out of the
constructive interaction
between the signals and
the RMs of the receiver
2003.09.02 - SLIDE 27
Semantic Pathology
• Semantic Pathology
– “Whenever two or more incompatible senses
capable of figuring meaningfully in the same
context develop around the same name”
• Example
– “This text is confusing.”
• Text(1) = The layout/font of the text is confusing.
• Text(2) = The argument of the text is confusing.
• Question: Where is Text(2)?
IS 202 - FALL 2004
2003.09.02 - SLIDE 28
Lecture Outline
• What Is Information?
• History of Information Search and
Organization
• Discussion Questions
• Action Items for Next Time
IS 202 - FALL 2004
2003.09.02 - SLIDE 29
Origins: Physical Representations
• Very early history of content
representation
– Mesopotamian tokens and
“envelopes”
– Alexandria - pinakes
– Indices
IS 202 - FALL 2004
2003.09.02 - SLIDE 30
Origins: Mental Representations
• Rhetorical mnemonic theory and practice
(“memoria”)
• Memory palaces
– An organization and retrieval technology for concepts
that combines physical and virtual places (“loci”)
• Examples
– Simonides of Ceos
– Cicero’s “testes”
IS 202 - FALL 2004
2003.09.02 - SLIDE 31
Origins: Bibliographic Representations
• Biblical indexes and concordances
– Hugo de St. Caro – 1247 A.D. : 500 monks – KWOC
– Book indexes Nuremberg Chronicle,1493
• Library catalogs
• Journal indexes
• “Information explosion” following WWII
– Bush and Memex
– Cranfield studies of indexing languages and
information retrieval
– Development of bibliographic databases
• Index Medicus – production and Medlars searching
IS 202 - FALL 2004
2003.09.02 - SLIDE 32
How Much Information Today?
• See report by Hal Varian and Peter Lyman
http://www.sims.berkeley.edu/research/projects/
how-much-info/
• Total annual information production including
print, film, magnetic media, etc.
– Upper Bound 2,120,539 Terabytes (1012 bytes)
– Lower Bound
635,480 Terabytes
– I.e., between 1 and 2 Exabytes per year (1018 bytes)
• How do we organize THIS?
IS 202 - FALL 2004
2003.09.02 - SLIDE 33
Lecture Outline
• What Is Information?
• History of Information Search and
Organization
• Discussion Questions
• Action Items for Next Time
IS 202 - FALL 2004
2003.09.02 - SLIDE 34
Discussion Questions (Borges)
•
Colleen Whitney on Borges
– Borges wrote “The Library of Babel” in 1941,
long before the emergence of the Internet.
How might the metaphor be recast in Web
space? How would the structure of the
“universe” be expressed? Would the
problems and metaphysical questions
touched on by the narrator differ
significantly, and in what ways?
IS 202 - FALL 2004
2003.09.02 - SLIDE 35
Discussion Questions (Borges)
•
Colleen Whitney on Borges
–
The narrator discusses the controversy over the
purging of useless works. “They invaded the
hexagons, showed credentials which were not
always false, leafed through a volume with
displeasure and condemned whole shelves: their
hygienic, ascetic furor caused the senseless
perdition of millions of books.” However, the
narrator concludes that “…the consequences of the
Purifiers’ depradations have been exaggerated by
the horror these fanatics produced.” Again, if recast
in digital context, how might this vignette be
expressed?
IS 202 - FALL 2004
2003.09.02 - SLIDE 36
Discussion Questions (Dennett)
• Jennifer Hastings on Dennett:
– Why does Dennett consider Darwin’s “idea”
dangerous?
– What is the role of “intelligent design” in the
context of the Library of Babel and the Library
of Mendel?
IS 202 - FALL 2004
2003.09.02 - SLIDE 37
Discussion Questions (Reddy)
•
Christina Nigro on Reddy
– Do you agree with the author’s contention
that as increased systems of communication
prevail, more information is actually lost as a
result of the conduit metaphor in the English
language?
– How much information are we losing as a
result of our increased dependence on
information storage systems? How can we
remedy this while still encouraging
technological advances?
IS 202 - FALL 2004
2003.09.02 - SLIDE 38
Discussion Questions (Reddy)
•
Bruce Rinehart on Reddy
– What does linguistics have to do with
information?
– What becomes entangled in the conduit
metaphor in the realm of SIMS studies?
IS 202 - FALL 2004
2003.09.02 - SLIDE 39
Discussion Questions (Reddy)
•
Bruce Rinehart on Reddy
– Why do Reddy's example stories, which
have particular constraints regarding the
questions he's asking, seem suspect in
validly portraying anything but the point he is
making? It seems that he could be
fabricating games to support his point. I
don't really believe this, however, upon first
glance, the stories seem overly constructed.
IS 202 - FALL 2004
2003.09.02 - SLIDE 40
Discussion Questions (Reddy)
• Prof. Davis on Reddy
– How can an implicit theory of communication
affect our analysis and design of information
systems?
– What are some examples of information
systems that embody the Conduit Metaphor
or the Toolmakers’ Paradigm of
communication? How might they be
redesigned to facilitate better communication?
IS 202 - FALL 2004
2003.09.02 - SLIDE 41
Lecture Outline
• What Is Information?
• History of Information Search and
Organization
• Discussion Questions
• Action Items for Next Time
IS 202 - FALL 2004
2003.09.02 - SLIDE 42
Next Time
• Introduction to Information Retrieval (IR)
and the Search Process
IS 202 - FALL 2004
2003.09.02 - SLIDE 43
Homework (!)
• Readings
– MIR Ch. 1
– Footprints in the Snow (Munro, Hook
and Benyon)
– Berry-Picking (Bates)
– Where did you Put It? (Berlin et. Al.)
• Create your SIMS home page
IS 202 - FALL 2004
2003.09.02 - SLIDE 44
Marc Davis Office Hours
• Wednesday, September 8
– 4:00 pm – 6:00 pm
• Tuesday, September 14
– 2:00 pm – 4:00 pm
• 314 South Hall
IS 202 - FALL 2004
2003.09.02 - SLIDE 45