Lecture+20_data_structx

Download Report

Transcript Lecture+20_data_structx

Data Structures
Fundamental Data Storage
Data Structures
• For sizeable programs, one problem
that can quickly arise is that of data
storage.
– What is the most efficient or effective
way to organize and utilize information
within a program?
– Quick answer – it depends on the task.
Data Structures
• For some tasks, it is helpful (at
minimum) and possibly necessary to
have sorted data.
• For other tasks, it is not necessary to
note where any given piece of data is
stored within a storage data
structure.
Data Structures
• Note: while we have seen these in
passing and as examples earlier in
the course, we will now examine
these a little more closely.
Arrays
• Possibly the most basic non-trivial
data storage structure is that of the
array.
– We’ve already seen the notion of a
“vector” that dynamically resizes.
0
1
2
3
4
5
6
7
8
9
Beyond Arrays
• Note that the main structure being
implemented by an array is
effectively that of an ordered list.
– Just like with an array, each element
being stored has a specific location,
which implies an ordering.
0
1
2
3
4
5
6
7
8
9
Beyond Arrays
• In Java, there is an ArrayList class
in the java.util.* package.
– This class internally uses an array and
resizes it when necessary as new items
are added to the conceptual underlying
list.
– This resizing is handled internally and
automatically by the class.
Beyond Arrays
• In C++, there is a vector class as
part of the std namespace.
– Likewise, this class internally uses an
array and resizes it when necessary as
new items are added to the conceptual
underlying list.
– This resizing is also handled internally
and automatically by the class.
Beyond Arrays
• However, arrays are not the only way
to model a list.
– Another such model is that of the linked
list. (See the graphic below.)
Linked Lists
• The linked list stores each data
element separately and individually,
allocating space for new elements
whenever as they are added into the
list.
Linked Lists
• Adding data to the end of a linked
list is trivial, as it (usually) also is for
an array.
Linked Lists
• Adding data in the middle of the list,
or at its beginning, is (relatively)
very time-consuming for an array.
• For a linked list, however, it is often a
much simpler operation.
Adding Elements
• Remember that for an array,
elements are in fixed locations.
• To insert an element into the middle
of an array requires moving all
elements at and after the point of
insertion, e.g., insert 7 at index 3.
3
8
1
2
4
0
1
2
3
4
13 42
5
6
9
5
7
8
9
Adding Elements
3
8
1
2
4
13 42
0
1
2
3
4
5
3
8
1
2
4
0
1
2
3
4
5
3
8
1
7
2
4
0
1
2
3
4
5
6
9
5
7
8
9
9
5
8
9
9
5
8
9
13 42
6
7
13 42
6
7
Adding Elements
• For a linked list, however, each
element’s storage space is distinct
and separate from the others.
• New storage may be placed directly
in the middle of the chain.
Adding Elements
Linked Lists
• Naturally, there is the question of
what these “links of the chain”
actually are, or more properly, how
to represent them.
Linked Lists
• In their most basic and simple form…
template <typename T> class Node<T>
{
public:
T value;
Node<T>* next;
}
Linked Lists
template <typename T> class Node<T>
{
public:
T value;
Node<T>* next;
}
value
next
Linked Lists
Remember – objects are handled by reference, so
the class Node<T> doesn’t actually contain another
Node<T> – just a reference to the next one in line.
Linked Lists
The end of the “linked list chain” is denoted by a
null reference in the last node.
The “ground” symbol at the end denotes this.
Lists
• Note that we now have two different
ways of storing data, each of which
has its own pros and cons.
– Arrays
• Good for adding items to the end of lists and
for random access to items within the list.
• Bad for cases with many additions and
removals at various places within the list.
Lists
• Note that we now have two different
ways of storing data, each of which
has its own pros and cons.
– Arrays
• Good for adding items to the end of lists and
for random access to items within the list.
• Bad for cases with many additions and
removals at various places within the list.
Lists
• Note that we now have two different
ways of storing data, each of which
has its own pros and cons.
– Linked Lists
• Better for adding and removing items at
random locations within the list.
• Bad at randomly accessing items from the
list.
– Note that to use a random item within the list, we
must traverse the chain to find it.
Lists
• Note that both of these objects fulfill
the same end goal – to represent a
group of objects with some implied
ordering upon them.
• While they meet this goal differently,
their primary purpose is identical.
Templates
• Templates are integral to generic
programming in C++
– Template is like a blueprint
– Blueprint is used to instantiate function
when it is actually used in code
– “Actual” types are substituted in for the
“formal” types of the template
Why Templates?
What is the difference between the following two
functions?
int compare(const string &v1, const string &v2) {
if (v1 < v2) return -1;
if (v2 < v1) return 1;
return 0;
}
int compare(const double &v1, const double &v2) {
if (v1 < v2) return -1;
if (v2 < v1) return 1;
return 0;
}
Only the types!
Why Templates?
What if we could write the function once for any
type and have the compiler just use the right types?
template <typename T>
int compare(const T &v1, const T &v2) {
if (v1 < v2) return -1;
if (v2 < v1) return 1;
return 0;
}
Requires type T to have < operator
Exercise 1
• Implement the generic compare
function
• Implement a main() that compares
two doubles, two ints, two chars, and
two strings using the compare fcn.
• Compile and see that it is good!
What is Going On?
• Compiler sees structure when
template is defined, blueprint when
generic function is coded (in header)
• When call to function is seen,
compiler substitutes types used in
invocation into blueprint and
generates required code
• Can’t catch many errors until
invocation is seen
Abstracting Beyond Lists
• We have this notion of a “list”
structure, which maps its stored
objects to indices.
– What if we don’t actually need to have a
lookup position for our stored objects?
• But wait! How could we possibly iterate
over the objects in a for loop?
The Iterator
• Many programming languages
provide objects called iterators for
enumerating objects contained within
data structures
– C++ and Java are no exceptions
– C++’s versions are defined in the
<iterator> header file
– (see 3.4 – 3.5)
The Iterator
• This iterator may be used to get each
contained object in order, one at a
time, in a controllable manner.
– It’s especially designed to work well
with for loops.
The Iterator
• Example code:
vector<int> numbers;
// omitted code initializing numbers.
iterator<int> iter;
for(iter = numbers.begin();
iter != numbers.end(); iter++)
{
cout << *iter << ‘ ’;
}
The Iterator
• In C++, iterators are designed to
look like and act something like
pointers.
– The * and -> operators are overloaded
to give pointer-like semantics, allowing
users of the iterator object to
“dereference” the object currently
“referenced” by the iterator.
The Iterator
• In C++, iterators are designed to
look like and act something like
pointers.
– Note the use of operator ++ to
increment the iterator to the next item
• This is another way we can interact with
pointers; it’s useful for iterating across an
array while using pointer semantics… but
keep a copy of the original around!
The Iterator
vector<int> numbers;
// omitted code initializing numbers.
iterator<int> iter;
for(iter = numbers.begin();
iter != numbers.end(); iter++)
{
cout << *iter << ‘ ’;
}
The Iterator
• C++11 (the newest edition/standard)
also provides an alternate version of
the for-loop which is designed to work
with iterable structures and iterators
• Looks like “foreach” in other languages
vector<Person> structure;
for(Person &p:structure)
{
//Code.
}
The Iterator
• Both the std::vector and
std::list classes of C++
implement iterators.
– begin() returns an iterator to the list’s
first element
– end() is a special iterator “just after”
the final element of the list, useful for
checking when we’re done with iteration
– Use != to check for termination
Exercise 2
• Include <iterator> header
• Use iterator to walk through an array
you define and print out its contents
• Compile and run
• See that it is good
Abstracting Beyond Lists
• There are many, many other
techniques for storing data than the
model of a list.
– Such other data structures have
different techniques for accessing stored
data.
– You have seen one in your lab exercises
Other Data Structures
• Let’s move on from this idea of a
“list” structure.
• In particular, note how lists map their
stored objects to indices (or can map
an index to the stored object)
– What if we don’t actually need to have a
lookup position for our stored objects?
– In particular, does it really need to be
an integer?
Other Data Structures
• There are many, many other
techniques for storing data than the
model of a list.
– Such other data structures have
different techniques for accessing and
handling stored data.
– These “different techniques” are often
designed with a focus on different usage
patterns.
Other Data Structures
• A first example: arrays index their
contained objects by integers.
– Should integers be the only thing by
which we can index an item within a
collection-oriented data structure?
– Think up some examples with neighbors
apple bear A113 42
cake blue red
…
Maps
• The interface built on this idea within
Java is the Map.
• TreeMap and HashMap are the two
prominent implementations.
– The value is the object being stored
within the map.
– The key is the data element used as an
index into the map for that value (i.e.,
how you “look up” the value)
– Key is like key in a database, sometimes
call “tag” in associative memory
Maps
• The classes built on this idea within
C++ are map and unordered_map.
• Sidenote – these are also not
polymorphically related.
– Map stores items in order of keys
– Unordered map does not require keys to
have order relation at all!
Maps
• How would such a map work?
– We could just use matching arrays for
the keys and values.
– However, this wouldn’t be the most
efficient idea – better techniques are
known.
Hash Maps
• Hash maps work by converting the
key to a unique integer, where
possible, through a hashing function.
– C++: hash maps are represented by
unordered_map.
– The selection of such a function is not a
simple operation.
• As such, the constructor takes in a hashing
function as an argument, mapping each key
to a nearly-unique integer.
Hash Maps
• This “hash code” is then mapped into
an array for storage.
– Problem: the “hash code” can easily be
larger than the storage array’s size.
– Solution: modular arithmetic. Divide
by the array’s size and use the
remainder.
Hash Maps
New input:
(“Football”, “Will”)
hash(“Football”)
-2070369658
-2070369658 mod 7
0
i
0
1
2
3
4
5
6
Key
Value
“Football”
“Will”
Hash Maps
New input:
(“Basketball”, “Billy”)
hash(“Horton”)
-2127646392
-2127646392 mod 7
-4 => 3
i
0
1
2
3
4
5
6
Key
Value
“Football”
“Will”
“Basketball”
“Billy”
Hash Maps
New input:
(“Gymnastics”, “Rhonda”)
hash(“Gymnastics”)
2068792
2068792 mod 7
5
i
0
1
2
3
4
5
6
Key
Value
“Football”
“Will”
“Basketball”
“Billy”
“Gymnastics”
“Rhonda”
Hash Maps
New input:
(“Soccer”, “Becky”)
hash(“Soccer”)
-2026118662
-2026118662 mod 7
-1 => 6
i
0
1
2
3
4
5
6
Key
Value
“Football”
“Will”
“Basketball”
“Billy”
“Gymnastics”
“Rhonda”
“Soccer”
“Becky”
Hash Maps
• Pros:
– direct, instant lookup of values,
regardless of the key’s type.
• Cons:
– does not support sorting
– requires a specialized hashing function
for keys that creates a unique int for
each possible key.
Map Example
#include <map>
#include <iterator>
main() {
map<string, size_t> wordcount;
String word;
while (cin >> word) {
++word_count[word]; // use map to look up value
}
for (const auto &w : word_count) { // iterator
cout << w.first << “ occurs ” << w.second
<< ((w.second > 1) ? “ times ” : “ time ”)
<< endl;
}
exit 0;
}
Exercise 3
• Include <map> header
• Use unordered map
– to store >= four <key, value> pairs –
your choice
– Look up values based on keys and print
– Or code up previous example
• Compile and run
• See that it is good
Maps
• What if we want to have the entries
sorted by their keys?
– It is possible to build structures that
efficiently keep their data permanently
sorted by key!
Binary Tree
• The binary tree is an example of one
structure that can accomplish this.
– Think of it as a linked list, but with two
links per node instead of one.
Binary Tree
• The corresponding Java structure is
the TreeMap class.
– It implements the SortedMap interface.
Binary Tree
• The corresponding C++ structure, on
the other hand, is the std::map
class.
Binary Tree
• The “first” node of the tree is called
the root.
– Any key smaller than the root’s key is in
the left branch.
– Any key larger than the root’s key is in
the right branch.
Binary Tree
root
13
7
2
25
9
17
42
Binary Tree
• Binary trees require the ability to compare
the keys
– C++ assumes that operator< has been
overloaded for custom data types
Binary Tree
• Of particular note with binary trees –
operations on them tend to be highly
recursive due to their structure.
– You’ve done this in lab – twice now!
Binary Tree
• Pros:
– the items are always in an established,
sorted order! (By key)
• Pro/Con:
– accesses are slower than an
unordered_map, but generally faster
than a list.
Questions?
• You have already implemented trees
Input/Output Modeling
• Certain other structures exist to
model specialized, restricted input
and output behavior.
– Consider the usual interaction someone
might have with a stack of papers.
– Another possibility: the usual behavior
of a group of people waiting in line… in
a queue waiting to be served.
Stacks
• The data structure known as a stack
is a “Last In, First Out” (LIFO)
structure.
– That is, the last input to the structure is
the first output obtained from it.
– Consider a stack of papers – when
searching through it, one typically starts
at the top and searches downward, from
newest to oldest.
Stacks
c
a
d
b
b
b
b
b
a
a
a
a
a
Stacks
• Stacks are a very good model for
function calls.
– When function A calls function B, B
must complete before A resumes
operation.
• Similarly, if B calls C, C completes before B.
– A may then call other methods before
completing.
Stacks
c
a
d
b
b
b
b
b
a
a
a
a
a
Stacks
• Stacks are a very good model for
function calls.
– In fact, this is one reason why we’re
examining it now. Stacks are the model
of how recursion mechanically works.
– In turn, recursion is necessary for
operating upon many data structures.
Stacks
• When debugging, the stack trace (or
call stack) of a program at a given
point of execution is exactly this – a
description of the order of active
method calls within the program.
• The area of memory where function
data lives is literally called the stack
space.
Stacks + Math
• Stacks have often been used in
mathematical operations.
– Some graphing calculators use what is
called “Reverse Polish Notation” (RPN),
which is based upon postfix operators.
– Combined with a stack, this notation is
much easier to program for than infix
operations.
Stacks + Math
• Let’s consider the following
mathematical expression:
2+5*7–6/3
• In what order do we perform the
operations?
– Consider trying to code something that
would be able to interpret this!
Stacks + Math
• Using the standard order of
operations, this becomes:
2 + (5 * 7) – (6 / 3)
• The postfix notation for this:
257*+63/((2 (5 7 *) +) (6 3 /) -)
Stacks + Math
2 + (5 * 7) – (6 / 3)
2 + (35) – (2)
37 – 2
35
Stacks + Math
257*+63/• Let’s see how this facilitates getting
the right answer.
Stacks + Math
257*+63/-
7
2
5
5
35
2
2
2
6
37
37
Stacks + Math
2 + (5 * 7) – (6 / 3)
2 + (35) – (6 / 3)
37 – (6 / 3)
Stacks + Math
257*+63/37 6 3 / -
3
37
6
6
2
37
37
37
35
Stacks + Math
2 + (5 * 7) – (6 / 3)
2 + (35) – (6 / 3)
37 – (6 / 3)
37 – 2
35
Stacks + Math
• Math done in “standard” (i.e, infix
notation) is typically first converted
to postfix notation for actual
computation.
– This “conversion” is known as the
Shunting-yard algorithm. It’s up on
Wikipedia, so feel free to take a look.
Stacks
• C++ provides the std::stack class.
– This implementation is something of a
“wrapper class” that uses a vector,
list, or deque internally, limiting it to
stack-like behavior.
• We’ll see deques in a moment.
• The methods push_back(), pop_back(),
and back() are designed from a stack
perspective.
Questions?
• Home exercise – implement and use
a stack
Queues
• The data structure known as a queue
is a “First In, First Out” (FIFO)
structure.
– That is, the first input to the structure is
the first output obtained from it.
– Consider a line of people – the person in
front has priority to whatever the line is
waiting on… like buying tickets at the
movies or gaining access to a sports
event.
Queues
• Queues are significantly like lists,
except that we have additional
restrictions placed on them.
– Additions may only happen at the list’s
end.
– Removals may only happen at the list’s
beginning.
• As a result, standard array-based
behavior may not be optimal.
Queues
a
a
a
a
b
a
b
c
b
c
b
c
Queues
• In C++, the queue class is provided.
– This implementation is also something
of a “wrapper class” that uses a list,
or deque internally, limiting it to queuelike behavior.
• list works well as a queue, as linked-lists
can easily be altered from both ends.
Stacks + Queues
• The “deque”, or double-ended queue,
combines the behaviors of stacks
and queues into a single structure.
– Items may be added or removed at
either end of the structure.
– This allows for either LIFO or FIFO
behavior – it’s all in how you use the
structure.
• Mixed behavior is also possible, so beware!
Deques
• C++ defines the deque class for such
uses.
– This is a full-fledged object in its own
right, and is array-based.
• It may use multiple arrays and modular
arithmetic, to allow efficient additions at the
front for example.
– It is the default object used internally
by both stack and queue.
Questions?
• Home exercise – implement and use
a queue and a deque