Spreadsheet Visualisation to Improve End
Download
Report
Transcript Spreadsheet Visualisation to Improve End
Spreadsheet Visualisation to
Improve End-user Understanding
Daniel Ballinger, Robert Biddle and James Noble
School of Mathematical and Computing Sciences
Email: {db, robert, kjx}@mcs.vuw.ac.nz
http://www.mcs.vuw.ac.nz/~db/honours.html
Motivation
Spreadsheets are a common form of end-user
programming.
Unfamiliar spreadsheets can contain daunting
amounts of information in the layout and intercell dependencies.
Visualisation can be used to aid in end-user
understanding of spreadsheets.
Working outside the spreadsheet application
allows for greater flexibility in visualisation.
We focused on Microsoft Excel due to its large
market share.
Excel’s Current Visualisation Support –
Range Finder
Invoked by clicking in
a cell and then in the
formula bar.
Components are
coloured in the bar and
sheet.
Allows for visual
manipulation.
Mainly only useful for
spatially close cells.
Range Finder
Excel’s Current Visualisation Support –
Formula Auditing Tools
Auditing Tools
Invoked using Formula
Auditing Toolbar.
Trace dependents or
precedents.
Arrows always point to
referenced cell.
Users may navigate
spatially disjoint cells.
(semantic navigation)
Complicated
spreadsheets can create a
tangle of arrows.
Related Work
Takeo Igarashi
– Spreadsheets augment “a visible tabular layout with invisible formulas”.
– Created visualisations to help reveal the hidden dataflow graphs and
superficial tabular layouts of spreadsheets.
Markus Clermont
– Most end-users are not trained programmers.
– Many spreadsheets exist beyond being simple scratch pads.
Raymond Panko
– Studies of empirical data into spreadsheet errors.
– Found error rates can be disturbingly high.
– Errors attributed to over confidence and lack of formal checking.
Margaret Burnett
– The importance of scalability in visualisations.
– Forms/3 and an embedded testing methodology.
Spreadsheet Application Toolkit
Query
Find and store spreadsheets
from the Internet.
Extract low level structures.
E.g. Cell values and
formulas.
Analyse spreadsheet
structures. Either individual
or corpus.
Conveying the findings
through visualisation.
Gobbler
Google
Fetcher
Web Servers
Extractor
BIFF Reader
URLs
XLS files
Toolkit Files
Algorithms
Analyser
Processed
Metrics
Data
Visualisation Tools
Image
Visualisations
Spreadsheet layout
Clustering
Data Dependency Flow
Data Dependency Direction
Graph Structure
Fisheye view
Formula Inspection
Corpus Analysis
Spreadsheet layout – Real-estate Utilisation 2D
Understanding layout is an
important first step in learning about
a new spreadsheet.
Actual values and formulas are only
shown as occupied cells.
The visualisation layout mimics that
of Excel, with columns along the top
of the x-axis and rows running down
the y-axis.
Cells with a higher occupancy level
are coloured towards the red end of
the colour spectrum.
Spreadsheet layout – Real-estate Utilisation 3D
Occupancy data is
projected into 3D to
create a surface map.
Discrete to continuous
data transformation helps
smooth the effects of
spikes.
Coloured to give a
Topographical terrain
effect.
Full benefit is seen with
user interaction.
BIRCH Clustering
BIRCH clustering partitions records into clusters that
are similar according to two or more attributes.
Current visualisations use the Euclidean distance
between cells as the similarity metric.
Data Dependency Flow
Visualising just average
unit vectors for each cell
can reduce the visual
clutter.
3D can be used to
separate vectors that occur
at different sheet levels.
Note the curvature back
towards the origin for this
workbook.
Data Dependency Direction
Concentrate purely on
the directions of data
flow relative to cells.
Angles are sorted into
36 buckets then feed to
Excel to create the graph.
After the four main axis
the next significant
measure occurs between
300 and 360º.
Radar Graph for
Outgoing Dependencies
Graph Structure
Disregarding spatial bounds allows some structures to
become clearer.
Spring view
Source Data
Fisheye view - Focus + Context
Addresses formula
dependencies that span large
distances or are many cells
deep.
Trees are warped over a
hyperbolic lens to achieve
both focus in the centre and
context.
An artificial red root node is
introduced to connect disjoint
trees.
Formula Inspection – Data Flow
Visualising formula components and flow direction.
Fully trace worksheets in
one view.
Basic Referencing Components
Formula Inspection - Dependency Types
Row Absolute
Excel allows for
Fully Absolute
combinations of relative
and absolute positioning. Column Absolute
Relative
Understanding the
referencing type is
important when
replicating formula and
identifying regular
patterns.
Corpus Analysis of 259 Workbooks
Spatial Centre
Demonstrations of
visualisations created from a
corpus.
With this sample corpus the
average worksheet centre is
more column centric.
Function utilisation suggests
that the logical functions,
such as IF, actually
outnumber simpler math
functions like SUM.
Number of non-empty Worksheets: 227
Number of empty Worksheets: 195
Average Row: 1.348
Average Column 18.098
Max Row: 1384
Max Col: 82
Total Occupied cells: 55491
Orphans : 51570
Root Cells : 2105
Leafs: 1031
Nodes in Cyclic References: 29
Local Formula: 108
Family Trees: 509
Max Tree Depth: 22
Max Tree Breadth: 150
Function Utilisation
Summary
Spreadsheets are significant examples of end-user
programming
Visualisation can assist end-users in better
understanding the structure of spreadsheets
– In particular, the “hidden structures” created by formula
Reviewed literature to investigate the implications
of the hidden structures.
Developed a toolkit to externally access the
spreadsheet structure and generate visualisations.
Created several sample visualisations to help
improve end-user understanding.
Current and Future Work
Detailed user studies, including usability
evaluations
Domain specific visualisations
Spreadsheet corpus analysis to find large
patterns
Visualisation scalability to larger more
complex spreadsheets
http://www.mcs.vuw.ac.nz/~db/honours.html