Slide - University of Toronto

Transcript Slide - University of Toronto

Community Information
Management
Nikos Sarkas
Social Information Systems Seminar
DCS, University of Toronto, Winter 2007
Community Information
Management

Community: people with



Shared interests (movies, database research)
Shared purpose (intranets, governments)
Community needs to



Query
Monitor
Discover
information about entities and their
relationships
An Example

Questions that could potentially interest the
database research community




Is there any interesting connection between two
researchers X and Y (e.g., sharing the same
advisor)?
In which course is this paper cited?
Find all citations of this paper in the past one
week on the Web
What is new in the past 24 hours in the database
research community?
Cimple and DBLife

Cimple framework



Joint project between Univ. of Illinois and Univ. of
Wisconsin
A software platform for customized CIM systems
DBLife


http://dblife.cs.wisc.edu/
A CIM prototype aimed at the database research
community
Deploying a Cimple
Application
Homepages
DBWorld
DBLP
Conf. Homepages
DB Group Websites
...
Web pages
Web pages
Documents
Documents
Raghu
Raghu
published at
Sigmod 06
UofT
DB Group
Sigmod 06
gives talk
UofT
DB Group
Community
Technical Challenges





Structured information extraction
Extracted structure exploitation
Extracted structure maintenance
Mass collaboration
Uncertainty and provenance
Structure Extraction

Elements





Community expert
Seed Web data sources
ER semantic schema
Extractors
Challenges


Entity disambiguation
Execution plan optimization (efficiency &
performance)
Structure Exploitation

Services







Keyword search
Entity profiling
Notification
ER graph browsing
Community newsletter
Structured querying
Temporal queries
Structure Maintenance

ER graph update



Rebuilding is expensive and removes temporal
dimension
Incremental maintenance instead
Extractor maintenance


Extractors can “break down”
Malfunction detection and repair
Mass Collaboration

Dual social aspects of CIM




For the community
From the community
Leverage users to improve quality
Persuade, trick or extort them into doing so



Personalized data spaces
Reputation incentives
Payment schemes
Personalized Data Spaces




User voting and tagging paradigm (IMDB,
YouTube, del.icio.us)
Allow users to personalize and manage
private versions of public data
Learn from private actions to improve public
data space
Examples
Reputation Incentives


Allow users to correct mistakes
Why care?



Mistakes adversely affect reputation
Helping earns “credits”
Challenges


User-entity authentication
Conflict reconciliation
Payment Schemes



It is the user that pays…
In order to access a service, she must
answer a simple question
Challenges


Merge multiple, noisy answers into a single
answer
Which questions to ask?
Uncertainty

Uncertainty sources



Modeling and reasoning with uncertainty



Information extraction
Mass collaboration
Answer ranking based on confidence
Interactive queries
Reducing uncertainty



Mass collaboration
User feedback
Provenance information
Provenance


Use provenance to understand and reduce
uncertainty
Justify answers


Display the raw data and the operations on them
that contribute to the answer
Allow hypothetical questions

What does the answer look like under this
assumption?
Conclusions

CIM is text analytics with extra dimensions


Temporal evolution
Community feedback
The End


Questions?
Ideas?

Slide - University of Toronto

Transcript Slide - University of Toronto

Directory