Transcript Slides
Usability and Integration
H. V. Jagadish
Many Sources of Data
•
•
•
•
Text
XML/semi-structured
Experimental measurements
Public databases
• Some data may have time/space variation
• Need to make sense of this big mess
Find Patterns in Data
• Conventional data mining seeks patterns
that can be mathematically specified over
(usually) global extents.
• Typically assume simple data structure.
• Need new approaches to find patterns in
messy data.
Human in the Loop
• Hard for a machine to tell an interesting
pattern apart from one that is not.
• Problem exacerbated when we seek
smaller/localized patterns, or work with
large vocabularies of possible patterns.
• Need human in the loop to make this
judgment.
Computer-Assisted (Human)
Analytics
• Patterns found by human and not by
computer.
• Job of computer is to make patterns easy
to find.
• So computer system must effectively
support queries and display results.
• Eg.Visual Analytics
Organize Data for Analysis
• Join multiple complex temporal data
streams into a “windowed” model suitable
for efficient analysis. [Manish Singh]
• Permit organic change to schema as
information needs evolve. [Eric Qian]
• Provide a spreadsheet interface for direct
manipulation of complex and large data.
Choose small sets of representatives
effectively. [Ben Liu]
Access Data for Analysis
• Under-specified queries, particularly
keyword queries. Derive “qunit” as
response unit, mined from observed query
logs. [Arnab Nandi]
• Visual manipulation algebra for analyzing
large time-varying graphs with data on
nodes and edges. [Anna Shaverdian]
Scientific Data Analysis
• Explain analysis results in terms of source
data, even when the source may have
been updated since. [Jing Zhang]
• Analyze gene expression microarray data,
and electronic health record data, in light
of known biomedical knowledge.
[Fernando Farfan]