Bradford`s Formula Itself

Download Report

Transcript Bradford`s Formula Itself

WIRED Future
• Quick review of Everything
• What I do when searching, seeking and
retrieving
• Questions?
• Projects and Courses in the Fall
• Course Evaluation
WIRED Focus
• Information Retrieval: representation, storage,
organization of, and access to information
items
• Focus is on the user information need
• User information need:
- Find all docs containing information on Austin
which:
• Are hosted by utexas.edu
• Discuss restaurants
• Emphasis is on the retrieval of information
(not data, not just a keyword match)
Quick Overview of the IR Process
Documents
documents
index
Information Need
match
?
query
Ranking
Indexing and Searching
• Queries models work against the index
- Find words, word counts, phrases
- Sequential search, indexed search
•
•
•
•
•
•
Inverted Files & Other Indices
Boolean Queries
Sequential Searching
Pattern Matching
Structural Queries
Data structures
- The infrastructure of search
- Varied per data set and query contexts
Personalized IR system design
• How would you design a personal IR system?
• Who would use it?
• How would you learn about them?
- Interests
- Sources
- Preferences
• How do you evaluate a personal system?
• Understanding users is the key to
personalizing search or search interfaces.
Information Seeking in Context
Learning
Information Seeking
Information Retrieval
Analytical
Strategy
Browsing
Strategy
How do we search?
• Analytical
•
•
•
•
•
careful planning
recall of query terms
iterative query reformulations
examination of results
batched
• Browsing
•
•
•
•
heuristic
opportunistic
recognizing relevant information
interactive (as can be)
Behavioral Model
• Recurring Web behavioral patterns
that relate people’s browser actions
(Web moves) to their
browsing/searching context (Web
modes)
• Modes of scanning: Aguilar (1967) &
Weick & Daft (1983, 1984)
• Moves in information seeking
behavior: Ellis (1989) & Ellis et. al.
(1993, 1997)
ISeek Behaviors & Web Moves
What do I use?
• Starting
- Bookmarks and groups of bookmarks
- Search javascripts
• Chaining
- Tabbed windows
- Bookmarking
- Printing
• Browsing and Differentiating
- Firefox/Mozilla & recommended links
- Blogrolls and PageRank
• Monitoring
- RSS feeds with RSS reader
- (Moderated) Listservs
• Extracting
- Saving as HTML, Text, or PDF
How do we really use the Web?
•
•
•
•
People don’t read, they scan Web pages
We move quickly, we know we can go back
Quick experimentation & short memory
Behaviors that work are reinforced &
continued
• Satificing makes measures of quality difficult
How do I use the Web?
• Set of standard, daily Web pages
• Set of “occasional” Web pages
- Fridays - movie reviews, show times, previews
- Monthly - stocks and funds
• Quick focus on a subject, build a set of
documents related to that and file for later use
• I scan quickly down the page and then back
up the page
• Site maps, other links, walk up the URL
Future: Social Issues
•
•
•
•
Who controls the sharing?
Who controls the controls?
“Give to get” systems
Anonymity vs. Community
- Community of “friends”
- People as data points
• Free riders
• Logrolling and Over-rating
Future: Filtering for IR
• How about filtering, without the
collaboration?
- Individual preferences
- Implicit and Explicit
• Text is analyzed
- Feature extraction
- Recall & precision measures
• New models for multidimensional
users/uses/ratings
• Relevance Feedback
- Faster matching, more accurate
- Metadata (use data, preferences)
Future: Community Centered CF
• Forming and keeping community
- Interfaces, functionality
• Helping people find new information
- Interactive search
- Group browsing
• Mapping community (prefs?)
- Daily News
• Rating Web pages
- Incenting users to share
• Providing access to stored preferences
- Fair, open data collection
- Users can tune data
WWW Documents Investigation
• How do you collect data like this?
- Web Crawler
• URL identifier, link follower
- Index-like processing
• Markup parser, keyword identifier
• Domain name translation (and caching)
• How do these facts help with indexing?
• Have general characteristics changed?
• (This would be a great project to update.)
Metadata
• Information that describes a document that is
not (necessarily) in the document
• Describes the document in relation to other
documents
• Context about the Content
• Document semantics
• Internally consistent descriptions of content
for individual documents, document sets or a
specified set of content.
• For collections or individual documents
Metadata Types
• Dublin Core elements
• MARC (machine readable cataloging)
- What isn’t machine readable?
• Semantic Web elements
• Bottom-up, derived data
• Format-based
-
ASCII, EBCDIC
RTF
PostScript PDF
MIME
Digital Libraries
• We all have them
- Email boxes, archives
- Papers written
- Bookmarks
• What I have
- 4GB of academic & technical papers
• Mostly PDF, HTML, text
- Indexed using Adobe Catalog, htDig, OS X Search
- Data sets from previous studies
- Program code
- Scanned documents
Big DigLib Questions
• What’s a document?
- A file or link
• How do you trace & track the information source?
- Filenames, memory, metadata
• How do you integrate the variety of documents & metadata?
- Stick to standard formats
• What kind of storage model?
- Version Control system
- Server storage
- Filenames and directories
• When do you Index?
- Continuously
- After a backup
• Mostly boolean searching with attributes
Course Evaluations (next week)
• Volunteer to get, distribute, collect and turn-in
evaluations
•
•
•
•
•
Overall level of class expertise relevant for you?
Favorite readings – type of readings?
Least favorite (obscure – difficult) readings?
Project ideas and group organization tools?
Assignments: Group Work vs. Papers?