Data structure - Virginia Tech

Download Report

Transcript Data structure - Virginia Tech

Course notes
CS2606: Data Structures and
Object-Oriented Development
Ryan Richardson
John Paul Vergara
Department of Computer Science
Virginia Tech
Spring 2008
Goals of this Course
1. Reinforce the concept that costs and
benefits exist for every data structure.
2. Learn the commonly used data
structures.
– These form a programmer's basic data
structure ``toolkit.'‘
3. Understand how to measure the cost of a
data structure or program.
– These techniques also allow you to judge the
merits of new data structures that you or
others might invent.
The Need for Data Structures
Data structures organize data
 more efficient programs.
More powerful computers
 more complex applications.
More complex applications demand more
calculations.
Complex computing tasks are unlike our
everyday experience.
Organizing Data
Any organization for a collection of records
can be searched, processed in any order,
or modified.
The choice of data structure and algorithm
can make the difference between a
program running in a few seconds or many
days.
Efficiency
A solution is said to be efficient if it solves
the problem within its resource constraints.
– Space
– Time
• The cost of a solution is the amount of
resources that the solution consumes.
Selecting a Data Structure
Select a data structure as follows:
1. Analyze the problem to determine the
basic operations that must be supported.
2. Quantify the resource constraints for
each operation.
3. Select the data structure that best meets
these requirements.
Some Questions to Ask
• Are all data inserted into the data structure
at the beginning, or are insertions
interspersed with other operations?
• Can data be deleted?
• Are all data processed in some welldefined order, or is random access
allowed?
Costs and Benefits
Each data structure has costs and benefits.
Rarely is one data structure better than
another in all situations.
Any data structure requires:
– space for each data item it stores,
– time to perform each basic operation,
– programming effort.
Costs and Benefits (cont)
Each problem has constraints on available
space and time.
Only after a careful analysis of problem
characteristics can we know the best data
structure for the task.
Bank example:
– Start account: a few minutes
– Transactions: a few seconds
– Close account: overnight
Example 1.2
Problem: Create a database containing
information about cities and towns.
Tasks: Find by name or attribute or
location
• Exact match, range query, spatial query
Resource requirements: Times can be
from a few seconds for simple queries
to a minute or two for complex queries
Abstract Data Types
Abstract Data Type (ADT): a definition for a
data type solely in terms of a set of values
and a set of operations on that data type.
Each ADT operation is defined by its inputs
and outputs.
Encapsulation: Hide implementation details.
Data Structure
• A data structure is the physical
implementation of an ADT.
– Each operation associated with the ADT is
implemented by one or more subroutines in
the implementation.
• Data structure usually refers to an
organization for data in main memory.
• File structure: an organization for data on
peripheral storage, such as a disk drive.
Metaphors
An ADT manages complexity through
abstraction: metaphor.
– Hierarchies of labels
Ex: transistors  gates  CPU.
In a program, implement an ADT, then think
only about the ADT, not its implementation.
Logical vs. Physical Form
Data items have both a logical and a
physical form.
Logical form: definition of the data item
within an ADT.
– Ex: Integers in mathematical sense: +, -
Physical form: implementation of the data
item within a data structure.
– Ex: 16/32 bit integers, overflow.
Data Type
ADT:
Type
Operations
Data Items:
Logical Form
Data Structure:
Storage Space
Subroutines
Data Items:
Physical Form
Example 1.8
A typical database-style project will have
many interacting parts.
Binary Search Trees
What is good and bad about BSTs?
• Space requirements?
• Time requirements?
• Average vs. Worst Case?
• What drives worst-case behavior?
What problems can a BST solve?
Example: BST Template Interface
template <typename T> class BST {
private:
BinNodeT<T>* Root;
// additional private members not shown
public:
BST(); // create empty BST
BST(const T& D); // root holds D
BST(const BST<T>& D); // deep copy support
BST<T>& operator=(const BST<T>& D) const;
bool Insert(const T& D); // insert element
bool Delete(const T& D); // delete element
T* const Find(const T& D); // return access to D
const T* const Find(const T& D) const; // return access to D
void Clear(); // restore to empty
// state
~BST();
};
Spatial Data Structures
What if we want to search for points in 2D or 3D
space? How do we store the data?
• Could store separate data structures organizing by
x- and y-dimensions (list, BST, etc.)
• This is OK for exact-match queries, but doesn’t
handle range queries well
• Could combine
We need a spatial data structure to handle spatial
queries.
Spatial Data Structures
• Spatial data records include a sense of location as an
attribute.
• Typically location is represented by coordinate data (in 2D
or 3D).
• If we are to search spatial data using the locations as key
values, we need data structures that efficiently represent
selecting among more than two alternatives during a
search.
Spatial Data Structures
• One approach for 2D data is to employ quadtrees, in which
each internal node can have up to 4 children, each
representing a different region obtained by decomposing
the coordinate space.
• There are a variety of such quadtrees, many of which are
described in:
– The Quadtree and Related Hierarchical Data
Structures, Hanan Samet, ACM Computing Surveys,
June 1984
Spatial Decomposition
• In binary search trees, the structure of the tree depends not
only upon what data values are inserted, but also in what
order they are inserted.
• In contrast, the structure of a Point-Region Quadtree is
determined entirely by the data values it contains, and is
independent of the order of their insertion.
• In effect, each node of a PR Quadtree represents a
particular region in a 2D coordinate space.
Spatial Decomposition
• Internal nodes have exactly 4 children (some may be
empty), each representing a different, congruent quadrant
of the region represented by their parent node.
• Internal nodes do not store data.
• Leaf nodes hold a single data value.
• Therefore, the coordinate space is partitioned as insertions
are performed so that no region contains more than a single
point.
• PR quadtrees represent points in a finitely-bounded
coordinate space.
PR Quadtree Insertion
• Insertion proceeds recursively, descending until
the appropriate leaf node is found, and then
partitioning and descending until there is no more
than one point within the region represented by
each leaf.
• It is possible for a single insertion to add many
levels to the relevant subtree, if points lie close
enough together.
• Of course, it is also possible for an insertion to
require no splitting whatsoever.
• The shape of the tree is entirely independent of the
order in which the data elements are added to it.
PR Quadtree Deletion
• Deletion of elements is more complex
(naturally) and may involve collapsing of
nodes.
• Since deletion is not required for the first
project, a discussion of the details will be
deferred until a later time.
PR Quadtree Node Implementation
• Of course, the PR Quadtree will be implemented
as a C++ template.
• However, it may be somewhat less generic than
the general BST discussed earlier.
• During insertion and search, it is necessary to
determine whether one point lies NW, NE, SE or
SW of another point. Clearly this cannot be
accomplished by using the usual relational
operators to compare points.
PR Quadtree Node Implementation
Two possible approaches:
1) have the data type provide accessors for the x- and ycoordinates
2) have the type provide a comparator that returns NW, NE,
SE or SW
• Either is feasible. It is possible to argue either is better,
depending upon the value placed upon various design
goals. It is also possible to deal with the issue in other
ways.
• In any case, the PR Quadtree implementation will impose
fairly strict requirements on any data type that is to be
stored in it.
PR Quadtree Implementation
Here's a possible PR Quadtree interface:
template <typename T> class prQuadTree {
public:
prQuadTree(int xMinimum, int xMaximum,
int yMinimum, int yMaximum);
bool Insert(const T& Elem);
bool Delete(const T& Elem);
T* const Find(const T& Elem);
const T* const Find(const T& Elem) const;
void Display(std::ostream& Out) const;
private:
prQuadNode<T>* Root;
int xMin, xMax, yMin, yMax;
// . . .
};
PR Quadtree Implementation
Some comments:
• the tree must be created to organize data elements that lie
within a particular, bounded region for the partitioning
logic to be correct
• the question of how to manage different types for internal
and leaf nodes raises some fascinating design and coding
issues…
• there will, of course, be a number of private, recursive
helper functions
• how to display the tree also raises some fascinating
issues…