Using and Evaluating Search Engines

Download Report

Transcript Using and Evaluating Search Engines

Using & Evaluating Search Engines
• Readings
- Using Search engines (Google)
- Understanding Search
- Search Tasks
• Let’s do some searching
• Assignments
Browsing and Searching
•
•
•
•
Information Seeking
Using Models
Understanding Navigation
Designing Navigation
What Is Information Seeking?
• “a process in which humans purposefully engage in
order to change their state of knowledge.” p. 5
• “a process driven by human’s need for information so
that they can interact with the environment.” p. 28
• “begins with recognition and acceptance of the
problem and continues until the problem is resolved
or abandoned” p. 49
Marchionini
• more than just representation, storage and systematic
retrieval
Information Seeking in Context
Learning
Information Seeking
Information Retrieval
Analytical
Strategy
Browsing
Strategy
Search Strategies
• Analytical
•
•
•
•
•
careful planning
recall of query terms
iterative query reformulations
examination of results
batched
• Browsing
•
•
•
•
heuristic
opportunistic
recognizing relevant information
interactive (as can be)
Study Findings
- Few participants deliberately set out to search for
new sites
- Determined the modes of scanning and moves
exercised by the participants
- Recurring Web behavioral patterns that relate
people’s browser actions (Web moves) to their
browsing/searching context (Web modes)
- Modes of scanning: Aguilar (1967) & Weick & Daft
(1983, 1984)
- Moves in information seeking behavior: Ellis (1989)
& Ellis et. al. (1993, 1997)
Modes of Scanning
Scanning
Modes
Undirected
Viewing
Conditioned
Viewing
Information Need
Information Use
General areas of
interest;
specific need to be
revealed
Serendipitous
discovery
Able to recognize
topics of interest
Increase
understanding
Amount of
Targeted
Effort
Number
of
Sources
Minimal
Many
“Sensing”
Tactics
• Scan broadly a diversity of
sources, taking advantage
of what’s easily accessible
• “Touring”
Low
Few
“Sensemaking”
• Browse in pre-selected
sources on pre-specified
topics of interest
• “Tracking”
Informal
Search
Able to formulate
queries
Increase
knowledge within
narrow limits
Medium
Few
“Learning”
• Search is focused on an
issue or event, but a goodenough search is
satisfactory
• “Satisficing”
Formal
Search
Able to specify
targets
Formal use of
information for
planning, acting
“Deciding”
High
Many
• Systematic gathering of
information on a target,
following some method or
procedure
• “Retrieving”
Web Moves
Integrated Modes & Moves Model
Undirected
Viewing
Conditioned
Viewing
Informal
Search
Formal
Search
Starting
Chaining
Identifying
selecting
starting
pages, sites
Following
links on
initial
pages
Browsing
Browsing
entry
pages,
headings,
site maps
Differentiating
Monitoring
Extracting
Bookmarking, Revisiting
printing,
‘favorite’ or
copying
bookmarked
sites for new
Going directly information
to known site
Bookmarking, Revisiting
printing,
‘favorite’ or
copying
bookmarked
sites for new
Going directly information
to known site
Using
(local)
search
engines to
extract
information
Revisiting
‘favorite’ or
bookmarked
sites for new
info
Using
search
engines to
extract
information
Behavioral Model of Web Use
Starting
Undirected
Viewing
Chaining
Browsing
Differentiating
Monitoring
Extracting
12 Episodes
Conditioned
Viewing
Informal
Search
Formal
Search
• 61 identifiable episodes
• Confirmed in Interviews
18 Episodes
23 Episodes
8 Episodes
Interview Highlights
•
Most useful work-related sites:
1.
2.
3.
4.
•
•
•
Resource sites by associations & user groups
News sites
Company sites
Search engines
Most people do not avidly search for new Web
sites
Criteria to bookmark a site is largely based on a
site’s ability to provide relevant & up-to-date
information
Methods for identifying new Web sites:
1.
2.
3.
Search engines
Magazines & newsletters
Other people/colleagues
Behavioral Model Highlights
• People who use the Web engage in 4
complementary modes of information seeking
• Certain browser based actions & events
indicate a particular mode of information
seeking
• Surprises
- No Explicit Instances of Monitoring to Support Formal Searching
- Very Few Instances of “Push” Monitoring
- Extracting Involved Basic Search Strategies Only
Review of Web Searching Studies
• How robust is the field of Web Search
studies?
• What are the common approaches to studying
Web use?
• Are all Web searchers the same?
- How?
- Why not?
• “Isolating searching characteristics of search
using a Web IR system via analysis of data,
typicaly gathers from transaction logs” p. 236
Analysis of a Very Large Search Log
•
•
•
•
280 GB – Six Weeks of Web Queries
1 Billion Search Requests
285 Million User Sessions
Web Users:
- Use Short Queries
- Mostly Look at the First Ten Results only
- Seldom Modify Queries
• Traditional IR Isn’t Accurately Describing Web
Search
• Phrase Searching Could Be Augmented
• Silverstein, Henzinger, Marais, Moricz (1998)
Analysis of a Very Large Search Log
• 2.35 Average Terms Per Query
- 0 = 20.6% (?)
- 1 = 25.8%
- 2 = 26.0%
=
72.4%
• Operators Per Query
- 0 = 79.6%
• Terms Predictable
• First Set of Results Viewed Only = 85%
• Some (Single Term Phrase) Query Correlation
- Augmentation
- Taxonomy Input
- Robots vs. Humans
Real Life Information Retrieval
• 51K Queries from Excite (1997)
• Search Terms = 2.21
• Number of Terms
- 1 = 31%
- 2 = 31%
- 3 = 18%
(80% Combined)
• Logic & Modifiers (by User)
- Infrequent
- AND, “+”, “-”
• Logic & Modifiers (by Query)
- 6% of Users
- Less Than 10% of Users
- Lots of Mistakes
Real Life Information Retrieval
• Sessions
- Flawed Analysis (User ID)
- Some Revisits to Query (Result Page Revisits)
• Page Views
- Accurate, but not by User
• Use of Relevance Feedback
- Not Used Much (~11%)
• Terms Used Typical
• Mistakes
- Typos
- Misspellings
- Bad (Advanced) Query Formulation
•
Jansen, B. J., Spink, A., Bateman, J., & Saracevic, T. (1998)
“New” Searching?
• Multimedia search
- User evaluation
- New kinds of systems
• Queries
• Tags & visual search
• How good are you at searching?
- Query terms analysis
- Understanding & Completing search
• Older IR systems & habits
How can we compare?
• Lots of different systems, types of users, at
different times and subjects
• We need a framework
-
How the system works
What the (log) data is
Who the users are
When the study is (context)
What the (normalized, consistent) results are
How Do We Really Use the Web?
• Reading vs. Scanning
- Quality of Elements
- Quantity of Elements
- Purpose of Pages
• Satisficing
- Guessing with Speed
- Low Penalties (Back)
- Testing Boundaries
• Muddling and Forging Ahead
- Stick with what works
- Not concerned with understanding
• Krug 2000
Design vs. Practice
Taxonomy of Decisions & Actions
• Now – not just the taxonomies of content, but
how people work
• Purpose of the Search
• Method to Find Information
• Content of the Information Being Searched
• GVU Survey Question
- Recent instance of important information found
• Critical Incident Technique
- Complete Instances
- Known Consequences (Results)
• Morrison 2001
Taxonomy pt. 2
• Taxonomies of Web Activities
- Why people searched the Web
- How search the Web
- What information searched
• Analysis of Responses from Survey into
Experiment
• Purpose Taxonomy
• Method Taxonomy
• Content Taxonomy
Human Information Behavior
• Information Seeking (Strategies)
• Information Searching (Strategies)
• Information Use
- Physical Actions
- Mental Actions
• Focus on the User
• Wilson 2001
New Models of Info Behavior pt. 3
• Problem Solving
• System Actions
• Integration of Actions
Learning and Interests (Users)
• Learning is Remembering What You’re Interested
In
• Cultivating Interest
• Relevance
• Interests vs. Obligations
• Examples for Understanding
- Metaphors
- Content Presentation
• “Architecture is Making Connections”
Real Life Information Retrieval
• 51K Queries from Excite (1997)
• Search Terms = 2.21
• Number of Terms
- 1 = 31%
- 2 = 31%
- 3 = 18%
(80% Combined)
• Logic & Modifiers (by User)
- Infrequent
- AND, “+”, “-”
• Logic & Modifiers (by Query)
- 6% of Users
- Less Than 10% of Users
- Lots of Mistakes
Real Life Information Retrieval
• Sessions
- Flawed Analysis (User ID)
- Some Revisits to Query (Result Page Revisits)
• Page Views
- Accurate, but not by User
• Use of Relevance Feedback
- Not Used Much (~11%)
• Terms Used Typical
• Mistakes
- Typos
- Misspellings
- Bad (Advanced) Query Formulation
•
Jansen, B. J., Spink, A., Bateman, J., & Saracevic, T. (1998)
Let’s do some searching
• Use Google or anything else on the Web
• We’ll constrain your searching for some
questions
• Think about your thinking
• How much time is it taking?