Presentation 1

Download Report

Transcript Presentation 1

Linked Data Visualization
Matt Bernier
Joey Murphy
David Coleman
Needs Analysis




Allow users to view data sets
graphically using intuitive and
efficient controls
Specifically to view links among data
points
Contemporary methods include:
diagrams, graphs, and lists
Enable users to perform analysis on
their data
Market Analysis

Linked data is present in several
environments:
• Search engines (page ranking)
• Social networks (recreational, academic,
professional)
• Other database-driven sites
• Computer networks
Market Analysis
Market Users
• Website owners (50-100 million active
domains, multiple sites per domain)
• Enterprise internal site managers
• Social networks operators and users
(more than 200 sites online)
• Network administrators
Background


Web sites supporting large amounts
of users are very popular
Finding common usage statistics can
be very beneficial
• Purchasing similar products
• Participating in common discussions
• Common browsing habits
Background

Showing Web links
• How websites link together
• Visualizing the web

Visualizing any linked data sets
Linked Data Example
An Existing Application
Create Random Nodes
Goals and Objectives



Overall goal is to create an intuitive
web based tool that allows users to
see links within their data
Allow users to analyze and infer
information from the links
Making it easy for web programmers
to implement the graph on their site
using a PHP class structure
Tools



HTML,CSS (data presentation)
PHP (data objects, processing)
JavaScript (graph creation,
interaction)
• JSViz (framework for dynamic views,
Force-directed algorithm creates a
graph that is aesthetically pleasing)
System Diagram
Literature Review
General Ideas



Building graphs from data sets
Displaying data
Data analysis
• Examining and inferring relationships
• Prediction
• Application to real world
Literature Review

Presenting data to users
• Tree structures, Data -> Information
• “Inducing the chosen mental model in
the mind of the observer”
• Easy to understand
• Allows for more information to be
absorbed by observers
Aaron Kershenbaum and Keitha Murray. In Journal of Circuit Systems and Computers
Literature Review


Many theories and techniques for
graph analysis, but not construction
Choice of nodes and links
• What is represented by a node?
• What is represented by a link?
• Greatly influence meaning in a linked
data display
• e.g. hyperlinks, Enron email dataset
A. Badia and M. Kantardzic. In Proceedings of the 3rd international workshop on Link discovery
LinkKDD '05
J. Shetty and J. Adibi. In KDD ’05
Literature Review

Link Mining – analyzing links
• Makes use of descriptive and predictive
modeling (data mining)
• e.g. determining webpage relevance
based on anchor text and surrounding
text of incoming hyperlinks
• e.g. segregating website users into
groups based on common behaviours
L. Getoor. In ACM SIGKDD Explorations Newsletter, Vol. 5, Issue 1, 2003
Literature Review

Link prediction
• Uses node proximity
• “Information about future interactions
can be extracted from network topology
alone”
• Predicting links that represent online
social interaction can help to determine
the feasibility of adding new interaction
features to a site
D. Liben-Nowell and J. Kleinberg. In CIKM '03
Patent Analysis

Computer-implemented system and
method for handling linked data views,
Patent number 7,068,267, held by SAS
Institute Inc.
• A first view and a second view are used to
display at least a portion of the data
observations contained in the data model.
Conditional data that is associated with the
second view specifies how the second view's
display is modified based upon a selection of a
data observation within the first view.
Timeline
Advantages



Design allows for customization
Custom data objects
Almost all visual aspects of the graph
are easily changed or left as default
settings
Disadvantages

Requires a network connection and a
browser
• Or an Apache and PHP installation on a local
machine


As dataset grows larger, application
performance may degrade
Possible Browser compatibility issues
• These are typical web issues with HTML,
JavaScript, and CSS rendering
Requirements Analysis


Functionality (performance)
Flexibility
• Allow users and developers to customize
and deploy application as they see fit

Reliability
• Provide an accurate data representation

Quality
• Provide a meaningful, visual
representation of data
Requirements Analysis

Operating Environment
• Scripts:


Run on a webserver with PHP (4.0+)
installation
Can interface with databases
• Users:


Cross-system
Cross-Browser
Requirements Analysis

Interfaces
• A PHP class is provided, and the data to
be visualized is added by the user.

Performance Requirements
• Time required to produce display varies
with size of dataset
• 1-10 seconds
• Restrict size of datasets to prevent
browser/computer from suffering
Requirements Analysis

Resources
• Design was conceived prior to
undertaking project, 10 man-hours to
refine design
• Coding – 20 man-hours
• Testing – 15 man-hours
Demo

Example
Future

More complex display
• Hyperlinks and/or pictures as nodes
• Re-centering graph by clicking a node
• Mouse-over events for more detail