new-social - UC Berkeley School of Information

Download Report

Transcript new-social - UC Berkeley School of Information

When Information Technology
“Goes Social”
Marti Hearst
UCB SIMS
SLA Meeting
Oct 19, 2000
1
The SLA
Hearst
2
Standard Information Retrieval
Ranking
Problem: there is a lot of useful
information available. Which pieces
in particular does the user want to
view right now?
 Procedure:

– match the words that reside in
documents against the words users state
in their query
– combinations of words are proxies for
the underlying meaning of the documents
3
Standard Web Search Engine Architecture
crawl the
web
Check for duplicates,
store the
documents
DocIds
create an
inverted
index
user
query
Show results
To user
Search
engine
servers
Inverted
index
4
Document Processing Steps
Figure from Baeza-Yates & Ribeiro-Neto
Spam

Email Spam:
– Undesired content

Web Spam:
– Content is disguised as something it is
not, in order to
» Be retrieved more often than it otherwise
would
» Be retrieved in contexts that it otherwise
would not be retrieved in
6
Web Spam

What are the types of Web spam?
– Add extra terms to get a higher ranking
» Repeat “cars” thousands of times
– Add irrelevant terms to get more hits
» Put a dictionary in the comments field
» Put extra terms in the same color as the background
of the web page
– Add irrelevant terms to get different types of
hits
» Put “free beer” in the title field in sites that are
selling cars
– Add irrelevant links to boost your link analysis
ranking

There is a constant “arms race” between
web search companies and spammers
7
Information Retrieval Goes Social

A new way of
– using words as weapons?
– using words to mean other than what
they say?
– subliminal authoring?

Ranking algorithms must now adopt a
posture of “defensive searching”.
8
Information Retrieval Goes Social
Thirty years of research on ranking
algorithms had never remotely considered
what happens with IR goes social.
For the underlying assumptions of IR,
these problems are almost absurd.
9
Information Retrieval Goes Social
The IR side isn’t all innocent
– issues of ranking sites higher in return
for payoffs
10
Traditional Information Filtering

At least 20 years of research on
information filtering
– A stream of information flows by, filter
out those not of interest, or retain
those of interest

Focus: how to identify which
documents about a particular topic
– financial news, terrorist activity
A classification problem
 Usually single-user judgements only

11
Information Filtering Goes Social

ABC news call ~1993. They’ve heard about
categorization software. They want to
identify:
– news programming about sex and violence

With the WWW, rapid commercial adoption
of filtering software for:
–

adult content.
This was not on the research radar screen.
– Major use of filtering now: taste alignment.
– Major technique: pooled judgements
» Examples: Ringo, DirectHit
12
Why Does this Happen?
Computer Scientists are not trained
to think of the social interactions in
the use of their systems
 There wasn’t good reason to see this
happening soon.

– The PARC Tapestry project (CACM 35 (12), 1992)
– Collaborative Filtering, but ahead of its time
13
Domain Names
14
Hypertext Then

Proceedings of ACM Hypertext 89
– 28 papers:
» Navigation, engineering, knowledge representation,
implementation & interfaces, applications, IR,
usability of links, fiction and writing
– 9 panels:
»
»
»
»
»
»
»
»
»
Interchanging hypertexts
Narrative and Consciousness
Lessons from ACM hypertext project
Indexing
Expert Systems
Higher Education: A Reality Check
Software Engineering
Cognitive Aspects
Confessions: What’s Wrong with our Systems
15
Hypertext Then

Much discussion on
–
–
–
–

semantics of link types
navigation paths
not getting lost (still an issue!)
how to author documents
What about social implications?
16
Hypertext Then

Two papers are relevant.
– Amy Pearl, Sun’s Link Service: A Protocol for
Open Linking
» discusses use of a separate link repository to
allow linking between objects that reside on
different systems
» simply assumes bidirectional linking
» concerned with technical difficulties
– Bob Glushko, Design Issues for Multi-Document
Hypertexts
» considers the question of whether links
should be allowed outside of documents
» concludes they should, but in a cautionary
manner
17
Course Gedanken Experiment


What happens if bi-directional links
are possible? Required?
My naïve pre-social CS-y thoughts:
– easier to link footnotes and their
citations
– easier to link papers, lectures, to
author’s home page
– easier to find related information
18
Course Gedanken Experiment


What the socially-savvy SIMS students
said about bi-directional links
Basically, overall a negative thing.
– link “spamming”
» people who hate microsoft overburdening them with
links
» sexual harrassment
– use for false endorsements
– alliances, negotiation for cross-linking, a link
market
– inability to hide confidential information
– advertisers would be affected
– redirect unwanted links to another page
19
What’s Going On?

Before going social, most hypertext was
– within a single “document” or user group
– incompatible with outside hypertext
– seen as useful as a new way for reading acomplex
documents

After going social, hypertext is
– seen as useful for linking information in quite
farflung places, assembled by people who don’t
have know each other or have access to each
other’s systems
– social issues follow

Without appropriate safeguards, pages might
also have to adopt “defensive linking”
20
CS and the Social Sciences


A subset of CS has long engaged with social
sciences and humanities
Artificial Intelligence (since early 60’s)
» psychology (cognitive science)
» linguistics
» philosophy

More recently, HCI
» human-computer interaction
» psychology (cognitive science, human factors)
» ethnography

But … sociology … NOT
21
What is this leading to?
I might be suggesting the topic: how
should CS research be changed?
 Instead, I think these effects are
interesting in their own right.

22
Turning the Tables

The standard way to incorporate a
field (like sociology) into a CS project
would be for the purposes of building
better systems.
– recommend information better
– filter information better

Instead, what if the goal is to build
systems to better understand
society?
23
Talk Re-Cap
When information processing systems “go
social” they are used in radical, often
unexpected ways
 Now that information processing systems
have gone social, it is time to use them to
help us better understand society
 Let’s Turn the Tables

– Create technology to aid study of society
24