Transcript Lecture 31
Introduction to Data Visualization
Definition of Data Visualization
Terms related to Data Visualization
Data Mining
Data Recovery
Data Redundancy
Data Acquisition
Data Validation
Data Integrity
Data Verification
Data Aggregation
Continued….
Data mining
analytic process designed to explore data
analyzing data from different perspectives
summarizing it into useful information
Data recovery
handling the data through the data from damaged, failed,
corrupted, or inaccessible secondary storage media
recovery required due to physical damage to the storage
device or logical damage to the file system
Continued….
Data redundancy
additional to the actual data
permits correction of errors
Data acquisition
process of sampling signals
measure real world physical conditions
converting the resulting samples into digital numeric values
Data validation
process of ensuring that a program operates on clean,
correct and useful data
Continued….
Data integrity
maintaining and assuring the accuracy and consistency of
data
ensure data is recorded exactly as intended
Data verification
different types of data are checked for accuracy and
inconsistencies after data migration is done
Data aggregation
information is gathered and expressed in a summary form
to get more information about particular groups
Continued….
Need for data visualization
Importance of data visualization
Limitation of spreadsheet
Interpretation through data visualization
identify areas that need attention or improvement
understand what factors influence design system
predict how to change system design accordingly
predict the efficiency of system
Interactive Visualization
Humans interact with computers to create graphic illustrations of
information
Process can be made more efficient
Human input
Response time
Continued….
Combination of disciplines
data visualization to provide a meaningful solution requires
insights from diverse fields like statistics, data mining,
graphic design, and information visualization
software-based information visualization adds building
blocks for interacting with and representing various kinds of
abstract data
Continued….
Process of data visualization
Acquire
Parse
Filter
Mine
Represent
Refine
Interact
Acquire
Obtain the data, whether from a file on a disk or a source
over a network
Parse
Provide some structure for the data’s meaning, and order
it into categories
Filter
Remove all but the data of interest
Mine
Apply methods from statistics or data mining as a way to
discern patterns or place the data in mathematical context
Represent
Choose a basic visual model, such as a bar graph, list, or
tree.
Refine
Improve the basic representation to make it clearer and
more visually engaging.
Interact
Add methods for manipulating the data or controlling
what features are visible.
Continued….
Iteration and Combination of steps of data visualization
Unique requirements for each project
each data set is different
the point of visualization is to expose that fascinating aspect
of the data and make it self-evident
readily available representation toolkits are useful starting
points
they must be customized during an in-depth study of the
task
Continued….
Avoid usage of excess data
Audience of problem
Quantitative messages
Time-Series
Ranking
Part-to-Whole
Deviation
Frequency-Distribution
Correlation
Nominal Comparison
Geographic or Geospatial
Time-series:
A single variable is captured over a period of time, such as
the unemployment rate over a 10-year period. A line chart
may be used to demonstrate the trend
Ranking:
Categorical subdivisions are ranked in ascending or
descending order, such as a ranking of sales performance by
sales persons during a single period
A bar chart may be used to show the comparison across the
sales persons
Part-to-whole:
Categorical subdivisions are measured as a ratio to the
whole
A pie chart or bar chart can show the comparison of ratios,
such as the market share represented by competitors in a
market
Deviation:
Categorical subdivisions are compared again a reference,
such as a comparison of actual vs. budget expenses for
several departments of a business for a given time period
A bar chart can show comparison of the actual versus the
reference amount
Frequency distribution:
Shows the number of observations of a particular variable
for given interval, such as the number of years in which the
stock market return is between intervals such as 0-10%, 1120%, etc.
A histogram, a type of bar chart, may be used for this
analysis
A boxplot helps visualize key statistics about the
distribution, such as mean, median, quartiles, etc.
Correlation:
Comparison between observations represented by two
variables (X,Y) to determine if they tend to move in the
same or opposite directions
For example, plotting unemployment (X) and inflation (Y)
for a sample of months. A scatter plot is typically used for
this message
Nominal comparison:
Comparing categorical subdivisions in no particular order,
such as the sales volume by product code
A bar chart may be used for this comparison
Geographic or geospatial:
Comparison of a variable across a map or layout, such as the
unemployment rate by state or the number of persons on the
various floors of a building
A cartogram is a typical graphic used
Continued….
Characteristics of effective graphical display
show the data
avoid distorting what the data have to say
present many numbers in a small space
make large data sets coherent
encourage the eye to compare different pieces of data
reveal the data at several levels of detail, from a broad
overview to the fine structure
serve a reasonably clear purpose: description,
exploration, tabulation or decoration
be closely integrated with the statistical and verbal
descriptions of a data set
Continued….
Visual perception and data visualization
Effective graphics take advantage of pre-attentive
processing and attributes and the relative strength of these
attributes
Types of information display
Tables
Graphs
Data display requires planning
Data collection
Benefits of data visualization
Visualization is so powerful and effective that it can change
someone’s mind in a flash
it encompasses various dataset quickly, effectively and
efficiently and makes it accessible to the interested viewers
It motivates us to a deep insight with quick access
It gives us opportunity to approach huge data and makes it
easily comprehensible, be it the field of entertainment,
current affairs, financial issues or political affairs
It also builds in us a deep insight, prompting us to take a
good decision and an immediate action if needed
It has emerged in the business world lately as geospatial
visualization
The popularity of geo-spatial visualization has occurred due
to lot of websites providing web services, attracting visitor’s
interest
Data Visualization with C++
Chapter 1 “Arrays, Pointers and Structures”
Chapter 2 “Objects and Classes”
Chapter 4 “Inheritance”
Chapter 6 “Algorithm Analysis”
Chapter 1"Arrays, Pointers and
Structures"
In this chapter we examined the basics of pointers, arrays, and
structures
The pointer variable emulates the real-life indirect answer. In C++ it
is an object that stores the address where some other data reside. The
pointer is special because it can be dereferenced, thus allowing
access to those other data
The NULL pointer holds the constant 0, indicating that it is not
currently pointing at valid data
A reference parameter is an alias. It is like a pointer constant, except
that the compiler implicitly dereferences it on every access
Reference variables allow three forms of parameter passing: call by
value, call by reference, and call by constant reference
Choosing the best form for a particular application is an important
part of the design process
Continued….
An array is a collection of identically typed objects
In C++ there is a primitive version with second-class semantics
A vector is also part of the standard library
In both cases, no index range checking is performed, and out-
of-bounds array accesses can corrupt other objects. Because
primitive arrays are second-class, they cannot be copied by
using the assignment operator
Instead they must be copied element by element; however, a
vector can be copied in a single assignment statement
A vector can be expanded as needed by calling resize
Continued….
Structures are also used to store several objects, but unlike
arrays, the objects need not be identically typed
Each object in the structure is a member, and is accessed by the
. member operator
The -> operator is used to access a member of a structure that
is accessed indirectly through a pointer
We also noted that a list of items can be stored noncontiguously by using a linked list
The advantage is that less space is used for large objects than in
the array-doubling technique
The penalty is that access of the ith item is no longer constanttime but requires examination of i structures
Chapter 2 “Objects and Classes"
In this chapter we described the C++ class construct
The class is the C++ mechanism used to create new types. Through it we
can
define construction and destruction of objects,
define copy semantics,
define input and output operations,
overload almost all operators,
define implicit and explicit type conversion operations (sometimes a bad
thing)
provide for information hiding and atomicity
The class consists of two parts: the interface and the implementation
The interface tells the user of the class what the class does. The
implementation does it
The implementation frequently contains proprietary code and in some cases
is distributed only in precompiled form
Continued….
Information hiding can be enforced by using the private section
in the interface
Initialization of objects is controlled by the constructor
functions, and the destructor function is called when an object
goes out of scope
The destructor typically performs clean up work, closing files
and freeing memory
Finally, when implementing a class, the use of const and
correct parameter passing mechanisms, as well as the decision
about whether to accept a default for the Big Three, write our
own Big Three, or completely disallow copying is crucial for
not only efficiency but also in some cases, correctness