CS5545 Data Interpretation and Communication

Download Report

Transcript CS5545 Data Interpretation and Communication

CS4031/CS5012
Data Mining and Visualization
Yaji Sripada
Dept. of Computing Science, University of Aberdeen
1
Time table
• Lectures
– 2 lectures
• 11:00 -12:00 Tuesdays in Taylor A21
• 10:00 – 11:00 Wednesdays in Meston 311
• Practicals
– 1 two hour practical on Mondays in Meston
311 (Room changed since the last lecture)
• 15:00-17:00
Dept. of Computing Science, University of Aberdeen
2
Assessment
• Course is worth 15 credits
• Two components
– 25% continuous assessment
– 75% end of term exam
• Continuous assessment
– Issued in Week 6/7
– Due on the Friday of Week 10/11
Dept. of Computing Science, University of Aberdeen
3
Course Organization
• Lectures
– Discuss topics
including
• Practicals
– 10 weeks
• Mostly using some
existing software
• Developing your own
software (PHP)
• Data Mining
– General Data
– Time series
– Spatial data
• Information
Visualization (InfoVis)
• Case Studies
– 8th week assessment
Dept. of Computing Science, University of Aberdeen
4
Reading
• Mostly lecture notes and some research
papers
• I will provide all the required reading
material
Dept. of Computing Science, University of Aberdeen
5
Introduction
Dept. of Computing Science, University of Aberdeen
6
Overgrowth of Data
• Humans accumulate large volumes of data in
many domains
– Business
• Transactional data
– Scientific
• Complete sequence data from Human Genome Project
– of 3 billion DNA units
– Engineering
• 100s of sensors on a gas turbine taking measurements
every second
– And many more?
Dept. of Computing Science, University of Aberdeen
7
Information Hidden in Data
• Data are raw facts
• Humans routinely ‘dig’ useful abstractions from raw
data
– An example abstraction ‘mined’ from past exam results
– No coursework submitted => will fail the exam as well
• For small data sets (a few hundred bytes)
– Simple and manual data analysis OK (Even preferred!!!)
– Statistics
• For large data sets (a few Gigabytes or more)
– Manual analysis is impossible
– Computer Assistance needed
Dept. of Computing Science, University of Aberdeen
8
Two views of computer assistance
• Data Mining View
– Machines can automatically (or semiautomatically) extract meaningful and
useful information from heaps of raw data
• Information Visualization (InfoVis) View
– Humans themselves can make sense of data
if data are presented visually
• We learn both these views in this
course
Dept. of Computing Science, University of Aberdeen
9
Data Mining
• Process of automatically (or semiautomatically) discovering useful, novel
and meaningful patterns from
substantial quantities of data.
• Tasks
– Classification
– Clustering
– Association Rule Mining
Dept. of Computing Science, University of Aberdeen
10
Typical Applications
• Customer Relationship Management
(CRM)
• Linking gene variations among individuals
to common illnesses (e.g. Cancer)
• Identifying abnormal conditions in an
operational gas turbine
• More ?
Dept. of Computing Science, University of Aberdeen
11
Information Visualization
• Process of representing data in such a way (usually
involves visual presentations) that enable users to
gain useful insights into the data
• Focus is on designing a data representation scheme
that makes in underlying ‘information’ visible to the
user
• For rendering the representation scheme
– Computer graphics technology is exploited
• Good InfoVis techniques are based on
– Good understanding of the information structures underlying
the data
– Good understanding of the human perception and cognition
– Good graphics
Dept. of Computing Science, University of Aberdeen
12
Typical Applications
•
•
•
•
Newsmap
Touchgraph
ManyEyes
CountryScape
Dept. of Computing Science, University of Aberdeen
13
SVG
• Scalable Vector Graphics
– An XML-based Web Language to textually specify vector
graphics
– E.g. SVG specification of a circle
<svg width="300" height="200"
xmlns="http://www.w3.org/2000/svg">
<circle cx="125" cy="110" r="20" fill="red" />
</svg>
• W3C Recommendation
• Browser support for SVG content
– Firefox provides built in support
– IE needs an Adobe Plug-in
• SVG content can be created
– Using text editors (static)
– Programmatically (dynamic)
Dept. of Computing Science, University of Aberdeen
14
SVG in an HTML page
• Three methods
• Using the <embed> tag
<embed src=“circle.svg" width="300" height="100"
type="image/svg+xml"
pluginspage="http://www.adobe.com/svg/viewer/install/" />
• Using the <object> tag
– <object data="rect.svg" width="300" height="100"
type="image/svg+xml"
codebase="http://www.adobe.com/svg/viewer/install/" />
• Using the <iframe> tag
– <iframe src="rect.svg" width="300" height="100">
</iframe>
Dept. of Computing Science, University of Aberdeen
15
Learning SVG
• In this course focus is on designing
effective visualizations
– We assume knowledge of SVG for
rendering the visualizations
• Fundamentals of SVG covered in
lectures
• Some practicals involve SVG creation
– No practicals exclusively for learning SVG
Dept. of Computing Science, University of Aberdeen
16
Detailed Course Plan
Lectures
Data
Exploratory Data Analysis
(EDA)
Practicals
Exploratory Data Analysis (EDA)
Classification – I
Classification – II
Clustering – I
Classification
Clustering
Clustering – II
InfoVis – I
InfoVis – II
HCE
Tree Maps
InfoVis – III
Association rule mining
Association Rule Mining
Time Series – I
Time Series – II
Time Series – III
Assignment
Time Searcher
Sequence Mining
Spatial Data
GIS
Time Series Representations
GIS
Spatial Data Mining
Issues with Data mining and
InfoVis
Exploring spatio-temporal Data
Users
Dept. of Computing Science, University of Aberdeen
17
Summary
• All modern organizations
– possess large volumes of data and
– Users want to understand these data
• You learn technologies to
– Extract and/Or present information from
large data sets
• Analytical methods
• Visualization methods
Dept. of Computing Science, University of Aberdeen
18