Slide - University of Toronto
Download
Report
Transcript Slide - University of Toronto
Community Information
Management
Nikos Sarkas
Social Information Systems Seminar
DCS, University of Toronto, Winter 2007
Community Information
Management
Community: people with
Shared interests (movies, database research)
Shared purpose (intranets, governments)
Community needs to
Query
Monitor
Discover
information about entities and their
relationships
An Example
Questions that could potentially interest the
database research community
Is there any interesting connection between two
researchers X and Y (e.g., sharing the same
advisor)?
In which course is this paper cited?
Find all citations of this paper in the past one
week on the Web
What is new in the past 24 hours in the database
research community?
Cimple and DBLife
Cimple framework
Joint project between Univ. of Illinois and Univ. of
Wisconsin
A software platform for customized CIM systems
DBLife
http://dblife.cs.wisc.edu/
A CIM prototype aimed at the database research
community
Deploying a Cimple
Application
Homepages
DBWorld
DBLP
Conf. Homepages
DB Group Websites
...
Web pages
Web pages
Documents
Documents
Raghu
Raghu
published at
Sigmod 06
UofT
DB Group
Sigmod 06
gives talk
UofT
DB Group
Community
Technical Challenges
Structured information extraction
Extracted structure exploitation
Extracted structure maintenance
Mass collaboration
Uncertainty and provenance
Structure Extraction
Elements
Community expert
Seed Web data sources
ER semantic schema
Extractors
Challenges
Entity disambiguation
Execution plan optimization (efficiency &
performance)
Structure Exploitation
Services
Keyword search
Entity profiling
Notification
ER graph browsing
Community newsletter
Structured querying
Temporal queries
Structure Maintenance
ER graph update
Rebuilding is expensive and removes temporal
dimension
Incremental maintenance instead
Extractor maintenance
Extractors can “break down”
Malfunction detection and repair
Mass Collaboration
Dual social aspects of CIM
For the community
From the community
Leverage users to improve quality
Persuade, trick or extort them into doing so
Personalized data spaces
Reputation incentives
Payment schemes
Personalized Data Spaces
User voting and tagging paradigm (IMDB,
YouTube, del.icio.us)
Allow users to personalize and manage
private versions of public data
Learn from private actions to improve public
data space
Examples
Reputation Incentives
Allow users to correct mistakes
Why care?
Mistakes adversely affect reputation
Helping earns “credits”
Challenges
User-entity authentication
Conflict reconciliation
Payment Schemes
It is the user that pays…
In order to access a service, she must
answer a simple question
Challenges
Merge multiple, noisy answers into a single
answer
Which questions to ask?
Uncertainty
Uncertainty sources
Modeling and reasoning with uncertainty
Information extraction
Mass collaboration
Answer ranking based on confidence
Interactive queries
Reducing uncertainty
Mass collaboration
User feedback
Provenance information
Provenance
Use provenance to understand and reduce
uncertainty
Justify answers
Display the raw data and the operations on them
that contribute to the answer
Allow hypothetical questions
What does the answer look like under this
assumption?
Conclusions
CIM is text analytics with extra dimensions
Temporal evolution
Community feedback
The End
Questions?
Ideas?