Data Structures and Algorithm Analysis in C++

Download Report

Transcript Data Structures and Algorithm Analysis in C++

Data Structures and Algorithms
http://sist.sysu.edu.cn/~qiaohy/DS2012
Lectures
• 3 hour lecture every Tuesday afternoon
• 2 hour lab session every Tuesday afternoon
• A new approach to DS: interactions, discussions, learning
by doing.
• Check course page regularly:
http://sist.sysu.edu.cn/~qiaohy/DS2012/
• Important to attend both the lectures and lab sessions,
and to finish assignments on time.
References
• Text book: “Data Structures and Algorithm Analysis in
C++”, 3rd edition, Mark A. Weiss.
• References:
– The art of computer programming, V.1, V. 3, Donald
Knuth.
– Data Structures and Program Design in C++, Robert L.
Kruse and Alexander J. Ryba
– Introduction to Algorithms, Thomas H. Cormen, Charles
E. Leiserson, Ronald L. Rovest and Clifford Stein.
– 数据结构,严蔚敏,吴伟民,清华大学出版社。
– Tons of learning materials on the internet.
Grading Scheme
• Grading is based on
– Home work (online exercises, written assignments or
programming assignments) and attendance 20%
– Midterm examination 30 %
– Final examination 50%
– Bonus: active participation
• Scores are given for both “theories” and “labs”.
Assignments
• Everything will be available online, and check course
page regularly
• Written assignments and on-line exercises(Trakla2)
– Due by time specified
– Finished independently
• Programming assignments
– Due by time specified
– Run on PC
– group collaboration of 2 people is encouraged both
for the programming assignments and lab sessions.
Plagiarism Policy
• st1st Time: both get 0
•1 : both get negative;
2nd Time:
need to terminate
•• cannot
be trusted;
Midterm
Final:
FAIL
•• used
to beorfailed
in an
theautomatic
final.
You are encouraged to collaborate in study groups.
But, you cannot copy or slightly change other students’
solutions or codes.
Course Overview
• A fundamental computer science course
- Essential for programming
- Essential for advanced courses
• A challenging course, which concerns problem solving by
programming and needs
– Programming
– Mathematical thinking and Algorithmic thinking.
Why Data Structures Matter?
• Problem: Write a program which can search a person’s
telephone number from a pile of name cards?
From problem space to solution space
(左建卫,84458290)
(Zuo jianwei, 84458290)
(Liu Ning,
84138355)
…
(Gao Min,
39943222)
From problem space to solution space
Abstraction:
• Represent physical items by data, for example, a card is
represented by its name and phone number (“Gao Ming”,
“40088888”).
• A pile of cards -> a set of data of the form (name, phone
number);
• Now the problem is: given a set of the pair (name, phone
number) and a name target, which you want to look up,
for example, “Gao Ming”, is there a pair whose first
component is target? If yes, return its second component.
Otherwise, fail.
From Problem Space to Solution Space
Solutions: think about how you do it in practice
• We want to organize the set of data, for example,
organize them into a sequence, and then you look at
each data one by one.
• We say that the set of data has a linear logical structure
(Wang Bingbing, 98876544)
(Zuo jianwei,
84458290)
(Liu Ning,
84138355)
…
(Gao Min,
39943222)
From Problem Space to Solution Space
Design the Algorithm, which is a list of well-defined
instructions that solve a problem.
• How to represent data? Use data type.
• How to represent each pair (name, phone number)?
– Use record type
• How to represent the set of pairs, or how to store the set
of data?
– Using array, for example,
• The algorithm: given an array of records and the target,
compares every record in the array with target until the
name of the record equals to target.
• Implement the algorithm, or write the program.
Solution 1
• Input is a (linear) sequence of records
and a target
• Storage: using arrays
• Algorithm: sequential search
C++:
struct Card {
string name;
string phone;
};
int sequentialSearch(Card st[], int n, const string &target) {
/*search a record whose name is target and returns the position of the record if
search is successful, returns n otherwise.
*/
int i;
for (i = 0; i<n && st[i].name != target;i++);
return i;
}
Evaluation of the algorithm
• Is the algorithm correct?
– Proving
• Is the implementation correct?
– Testing
• How efficient is the search algorithm?
– Algorithm analysis
Solution 2
• Input is a (linear) sequence of records
and a target
• Storage: using linked lists
• Algorithm: sequential search
用C++表示:
struct Node{
Card data;
Node * next;
};
Node* sequentialSearchLinked(Node *head, const string &target)
{
/*search a record whose name is target and returns the pointer to the record if
search is successful, returns NULL otherwise.
*/
Node *p=head;
while (p!=NULL&&(p->data).name!=target) {
p=p->next;
};
return p;
Solution 3 & 4
• Using binary search: input is ordered by name; storage
using arrays and binary search;
• Using binary search trees: input is organised as a
ordered tree, tree is stored using linked representation
and using binary tree search.
struct TreeNode{
Card data;
TreeNode *llink,*rlink;
};
Solution 5
• Using ADT (abstract data type) map
int main() {
map<const char*, string,ltstr> phones; //initialize a map container
phones[“Dai Yun"] = “34035188”; //insert a record
phones[“Liang Hao"] = “84133365”;
phones["Zhang Hen"] = “34567890”;
phones[“Wang Ming"] = “66908143”;
phones[“Luo Feng"] = “84134566”;
map<const char*, string, ltstr>::iterator cur = phones.find("Zhang Min");
if (cur != phones.end())
//cur points to the end of the map and to the record othereise.
cout << "Zhang Min's phone is " << (*cur).second << endl;
else
cout << "Zhang Min doesn't exist in phones." <<endl;
}
From Problem Space to Solution Space
• Abstraction: what are the input data, the operations and
output data?
• One needs to organise the data or design the logic
structure to be able realize the operations;
• The data and operations can be packed into an
ADT(Abstract Data Type);
• Decide how to store the data or design physical
structures of the data;
• Design the algorithm;
• Evaluate the solution: running time and storage;
• Some solutions are much better than others. Some takes
O(log n) running time while other take O(n).
• Implement the algorithm, and test it again its specification.
Course Outline
• C++ review
• Algorithmic asymptotic analysis
– Big-Oh, big-Theta, and big-Omega
• ADTs: Lists, stacks, and queues
• Sorting
– Insertion, mergesort, quicksort, heapsort, radix, etc.
– Lowest bound on sorting
• Search trees
– Binary search tree
– AVL tree
– B- tree
• Hashing
• Graphs
– Breadth-first search and Depth-first search
– Shortest paths
– Minimal spanning trees
19
Overall Goals of the Course
• Learn to solve problems by programming
• Learn to analyze your solutions
• From programmer to architect
C++ reviews: Pointers and References
• Pointers
– Pointers store addresses and aim to realize dynamic memory
allocation;
– Use new operator to allocate memory and delete to deallocate it.
Make sure no memory leak.
– Problem: if two pointers point to the same address and the
memory is reclaimed by deleting one pointer, then the other
pointer is not valid any more.
– Used to change values in C. For example, void swap(int *p, int
*q). In C++ one can also use pass by reference which is more
efficient.
• References
– A reference is a synonym for a variable and always refers to the
same variable. It must be initialized when it is declared.
– Used in parameter passing for output or for efficiency (const
references).
Parameter Passing
• When parameters are passed by value (call by value), the actual
parameters are copied. Not efficient for large data.
• When parameters are to be changed, pass by reference can be used
instead.
• When parameters are not to be changed, pass by const reference can
be used.
• Example:
double avg(const vector<int>& arr, int n, bool & errorFlag);
/*returns the average of first n numbers in arr, errorFlag is true if
successful, false otherwise. */
– The first one uses pass by const reference, because it is a big
data and should not be changed in the function, the second one is
a primitive type and uses pass by value, and the third one uses
pass by reference because it outputs some information.
The Big Three
• The destructor is called whenever an object goes out of scope or is
subjected to a delete to free up any resources that were allocated
during the use of the object.
• The copy constructor is called when an object is initialized to a copy
of some object.
– IntCell b =c; //or IntCell b( c), but not b=c.
– Parameter passing by value foo(IntCell a);
– An object returned by value.
• The assignment operator = is called when = is applied to two objects
which have been constructed previously. Lhs=rhs is to copy the state
of rhs into lhs.
• The big three, destructor, copy constructor and operator =, are
provided by the compiler when they are not provided by the class.
The default applies destructor for each data member. The default
applies copy constructor to each data member. The default operator
= is implemented by applying operator = to each data member.
Problems with the big three
class IntCell
{
public:
explicit IntCell( int initialValue = 0 );
int read( ) const;
void write( int x );
private:
int *storedValue;
};
IntCell::IntCell( int initialValue =0)
{
storedValue = new int( initialValue );
}
What is missing there?
What will be the effect?
See Fig.1.15.
Problems with the big three
int foo(){
IntCell vitalData;
for (int i=1;i<100000;i++){
IntCell localV;//local object
localV = vitalData; //default ‘=‘ is used
}
/*
Two problems:
1)memory leak by localV if default destructor is applied because default
destructor doesn’t release the memory taken by localV’s member
storedValue;
2) destruction of localV results in vitalData destroyed if user-defined
destructor is applied, which releases memory shared by localV’s
member and vitalData’s member.
*/
Problems with the big three
void destroyData(IntCell copy){//see how actual parameter is
passed.
}
destroyData( vitalData);
/*
default copy used, when the copy is destroyed (if there is a
user-defined destructor), memory allocated to the copy is
reclaimed although vitalData is still alive.
*/
Problems with the big three
• Default are not adequate when a class has a data
member which is a pointer.
• Memory allocated to b remains allocated because the
default destructor does nothing for storedPointer and
results in memory leak.
• The default copy constructor and operator = simply copy
the value of the pointer (shallow copy, shallow semantics,
reference semantics). The copy shares the same IntCell
object with the original object. When the original is
changed or destructed, the result is not predictable.
• What we need is a deep copy, that is, we want to copy
the object, not the reference to the object.
IntCell with the big-three
class IntCell
{
public:
explicit IntCell( int initialValue = 0 );
IntCell( const IntCell & rhs );
~IntCell( );
const IntCell & operator=( const IntCell & rhs );
int read( ) const;
void write( int x );
private:
int *storedValue;
};
Deep copy is needed
IntCell::IntCell(const IntCell &rhs){
storedValue = new int(*rhs.storedValue);
}
IntCell::~IntCell(){
delete storedValue;
}
const IntCell & IntCell::operator=(const IntCell &rhs){
if(this !=&rhs)
*storedValue = *rhs.storedValue;
return *this;
}
Reading and assignments
• Weiss chapter 1
• Understand the following terminologies: memory leak, pass by value,
pass by reference, shallow copy, shallow semantics, deep copy,
deep semantics, automatic objects, dynamic objects, accessors,
mutators, algorithms.
• Try to answer the following questions:
– What are the differences between points and references?
– When the three parameter passing mechanisms are used?
– When default copy constructors are not applicable?
– Where the modifier const should be used?
• Exercises: 1.7a, 1.11a.b, 1.12 a