Transcript dsIntro

Algorithms and Data Structures
Objectives: To review the fundamental
algorithms and data structures that are
commonly used in programs. To see how to
use and implement these algorithms and data
structures in different languages and to see
what language and library support exists for
them.
Topics
Arrays and Vectors
Lists
Linear/Binary Search
Quicksort
Hash Tables - Dictionaries
C, C++, Java, Perl
Arrays
Sequence of items
Indexable (can refer to any element
immediately, by its index number)
(Conceptually) contiguous chunks of
memory
0
1
2
3
4
5
6
Arrays
Access: constant-time (θ(1))
Searching:


Sorted array - θ(lg(n))
Unsorted - θ(n)
Inserting/removing

Unordered - θ(1)
 (Add to end, or swap last guy w/deleted guy)

Ordered - θ(n)
 Need to make (or fill in) a hole, move n/2 elements
(on average) to maintain relative order
Resizing Arrays
If the number of elements in an array is
not known ahead of time it may be
necessary to resize the array.
Involves dynamic memory allocation and
copying
To minimize the cost it is best to resize in
chunks (some factor of its current size)
Libraries - Arrays
Arrays (lists) in Perl, Bash, Python, Awk,
resize automatically, as do C++ vectors,
and the Array in Java
This does not mean that it is free.
Depending on the task, what goes on
underneath the hood may be important
We shall create our own machinery in C
Some C memory things…
void* malloc(int n) – allocates n contiguous bytes from heap,
returns address of first byte (NULL if it failed)
free( void* p ) – returns memory addressed by p to the heap
(does nothing to p itself)
void* memmove( void* d, void* s, size_t n ) – moves n
bytes from s to (possibly overlaping region at) d
void* memcpy( void* d, void* s, size_t n ) – copies n
bytes from s to (non-overlaping region at) d
sizeof() – actually an operator, returns size, in bytes, of given
object or type
void* realloc( void* src, size_t n ) – attempts to resize
array for you, and copy the stuff over. Returns ptr to new location
(might be the same), NULL if it fails.
Growing Arrays in C
enum { INIT_SIZE=1, GROW_FACTOR=2 };
int curr_size = INIT_SIZE;
int nr_elems = 0; /* # of useful elements */
int *a = (int*)malloc( INIT_SIZE * sizeof( int ));
… /* some stuff here */
/* attempt to insert 24 */
if( nr_elems >= curr_size ) { /* need to grow */
int *t = realloc( a, curr_size*GROW_FACTOR*sizeof( int ))
if( t != NULL ) { /* success! */
curr_size *= GROW_FACTOR;
a = t;
a[nr_elems++] = 24;
}
else
/* FAILURE! */
}
Lists
A sequence of elements
Not indexable (immediately)
Space is allocate for each new element
and consecutive elements are linked
together with a pointer.
Note, though, that the middle can be
modified in constant time.
head
NULL
data 1
data 2
data 3
data 4
Lists
Access (same as searching) – linear, θ(n)
Modifying anywhere – constant time, θ(1)
θ(1) time to insert at front, θ(n) to append
unless pointer to last element kept
Lists in Python
The list-type in Python is really an array (I
think. Somebody experiment? How would
you?).
We can make a linked-list in a very LISP
way; build it out of simple pairs


1st element – the data at that node
2nd element – the rest of the list
Lists in Python
L = []
# add 24 to front
L = [ 24, L ]
print L
# add 3 to the front
L = [ 3, L ]
print L
Would output:
[ 24, [] ]
[ 3, [ 24, [] ]]
Lists in Python – append to end
append the number 86:
t = L # start at beginning
# get hands on last cell
while t != [] :
t = t[1] # move to next "cell", or node
t.extend( [ 86, [] ] )
Lists in Python – searching
def search( L, t ) :
"""Return cell of L that contains t,
None if not found"""
while L != [] :
if L[0] == t :
return L
L = L[1] # move to next "cell"
return None
# didn't find it
Apply function to elements
def apply( l, fn ) :
while l != [] :
fn( l )
l = l[1]
fn is any function that takes a single cell, does
something
Lists in Python – apply
Print list:
def printCell( cell ) :
print cell[0],
apply( L, printCell )
Add 2 to each element in list:
def add2( cell ) :
cell[0] += 2
apply( L, add2 )
Lists in C
typedef struct sNode sNode;
struct sNode {
/* a node (cell) in a singly-link list */
int
data; /* the payload */
sNode*
next;
/* next node in list */
};
/* newitem: create new item from name and value */
sNode* newNode( int d ) {
sNode *newp;
newp = (sNode*) malloc( sizeof( sNode ));
if( newp != NULL ) {
newp->data = d;
newp->next = NULL;
}
return newp;
}
Lists in C
/* addfront: add newp to front of listp *
*
return ptr to new list */
sNode* addfront( sNode *listp, sNode *newp )
{
newp->next=listp;
return newp;
}
list = addfront( list, newitem( 13 ));
Append Element to Back of
List
/* append: add newp to end of listp *
*
return ptr to new list */
sNode* addend( sNode *listp, sNode *newp )
{
sNode *p;
if( listp == NULL )
return newp;
for( p=listp; p->next!=NULL; p=p->next )
;
p->next = newp;
return listp;
}
Lookup Element in List
/* lookup: linear search for t in listp *
*
return ptr to node containing t, or NULL */
sNode* lookup( sNode *listp, int t )
{
for( ; listp != NULL; listp = listp->next )
if( listp->data = t )
return listp;
return NULL;
/* no match */
}
Apply Function to Elements in
List
/* apply: execute fn for each element of listp */
void apply( sNode *listp, void (*fn)(sNode ),
void* arg )
{
for ( ; listp != NULL; listp = listp->next)
(*fn)( listp, arg ); /* call the function */
}
void (*fn)( sNode* ) is a pointer to a void function of one
argument – the first argument is a pointer to a sNode and the
second is a generic pointer.
Print Elements of a List
/* printnv: print name and value using format in
arg */
void printnv( sNode *p, void *arg )
{
char *fmt;
fmt = (char*) arg;
printf( fmt, p->data );
}
apply( list, printnv, “%d ” );
Count Elements of a List
/* inccounter: increment counter *arg */
void incCounter( sNode *p, void *arg )
{
int *ip;
/* p is unused – we were called,
there's a node */
ip = (int *) arg;
(*ip)++;
}
int n = 0;
/* n is an int, &n is its address (a ptr) */
apply( list, inccounter, &n );
printf( “%d elements in list\n”, n );
Free Elements in List
/* freeall: free all elements of listp */
void freeall( sNode *listp )
{
sNode *next;
for ( ; listp != NULL; listp = next) {
next = listp->next;
free(listp)
}
}
?
?
for ( ; listp != NULL; listp = listp->next)
free( listp );
Delete Element in List
/* delitem: delete first t from listp */
sNode *delitem( sNode *listp, int t ) {
sNode *p, *prev = NULL;
for( p=listp; p!=NULL; p=p->next ) {
if( p->data == t ) {
if( prev == NULL ) /* front of list */
listp = p->next;
else
prev->next = p->next;
free(p);
break;
}
prev = p;
}
return listp;}
Linear Search
Just exhaustively examine each element
Works on any list (sorted or unsorted)
Only search for a linked-list
Examine each element in the collection,
until you find what you seek, or you've
examined every element

Note that order of examination doesn't matter
Need θ(n), worst and average
Linear Search - example
L = range( 1, 11 ) # fill w/numbers 1-10
random.shuffle( L ) # mix it up
target = 7 # we're looking for 7
i = 0
while i < len( L ) and not bFound :
if L[i] == target :
break
i += 1
if i < len( L ) : # we found it!
print str(target) + \
" was found at index " + str(i)
Binary Search
Only works on sorted collections
Only works on indexed collections (arrays,
vectors)
Start in the middle:

Find it?
 Less than? Look in lower ½
 Greater than? Look in upper ½
Cut search space in ½
Need θ(lg(n)) time, worst (and avg.)
Binary Search
# from previous example
L.sort() # put into order
t = 7 # our target
i = -1
l = 0 ; h = len(L)-1
while l<=h and i==-1 :
m = (l+h)/2
# find middle element
if( L[m] == t ) : # found
i = m
elif t < L[m] : # look in left ½
h = m – 1
else :
# look in right ½
l = m + 1
if i != -1 :
print "Found " + str(t) + " at " + str(i)
Quicksort
pick one element of the array (pivot)
partition the other elements into two
groups


those less than the pivot
those that are greater than or equal to the
pivot
Pivot is now in the right place
recursively sort each (strictly smaller)
group
Partition
p
unexamined
last i
e
p
b
<p
b+1
<p
b
>= p
last
unexamined
i
p
last
e
>= p
e
Quicksort – the recursive
algorithm
def qsort( L, b, e ) :
# base case
if b >= e : # nothing to do - 0 or 1 element
return;
# partition returns index of pivot element
piv = partition( L, b, e )
# Note that the pivot is right where it belongs
# It does not get sent out in either call
qsort( L, b, piv-1 ) # elements in [ b, piv )
qsort( L, piv+1, e ) # elements in ( piv, e ]
Quicksort – the Partition
def partition( L, b, e ) :
"Return: index of pivot"
# move pivot element to L[0]
swap( L, b, random.randint( b, e ))
last = b
# Move <p to left side, marked by `last'
for i in range( b+1, e+1 ) :
if L[i] < L[b] :
last += 1
swap( L, last, i )
swap( L, b, last ) # restore pivot
return last
Quicksort - Swap
def swap( l, i, j ) :
"swaps elements at indices i, j in list l"
tmp = l[i]
l[i] = l[j]
l[j] = tmp
Libraries - sort
C: qsort (in stdlib.h)
C++: sort (algorithm library from STL)
Java:

java.util.collections.sort
Perl: sort
Python: list.sort
qsort – standard C library
qsort( void *a, int n, int s,
int(*cmp)(void *a, void *b) );
Sorts array a, which has n elements
Each element is s bytes
cmp is a function that you must provide
 compares 2 single elements, *a & *b
 qsort must pass void pointers (addresses), since
it doesn't know the type
 cmp does, since you provide it for a given sort
 returns integer -1 if a<b, 0 if a==b, and 1 if a>b
 For sorting in descending order, return 1 if a < b
qsort (int)
int arr[N];
qsort(arr, N, sizeof(arr[0]), icmp);
/* icmp: integer compare of *p1 and *p2 */
int icmp( const void *p1, const void *p2 ) {
int v1, v2;
v1 = *((int*) p1);
v2 = *((int*) p2);
if( v1 < v2 )
return –1;
else if( v1 == v2 )
return 0;
else
return 1;
}
qsort (strings)
char *str[N];
qsort(str, N, sizeof(str[0]), scmp);
/* scmp: string compare of *p1 and *p2 */
/* p1 is a ptr to a string, or char*, so is a */
/* ptr to a ptr, or
a char**
*/
int scmp(const void *p1, const void *p2) {
char *v1, *v2;
v1 = *((char**) p1);
v2 = *((char**) p2);
return strcmp(v1, v2);
}
Dictionary (Map)
Allows us to associate satellite data w/a
key

e.g., phone book, student record (given a
student ID as key), error string (given an error
# as key), etc.
Operations:



insert
find
remove
Dictionary
The key-value pairs may be stored in any
aggregate type (list, struct, etc)
Using an unordered list:


constant-time insertion
linear lookup
Ordered list


linear insertion
lg(n) lookup
Other dictionaries
Binary Search Tree

lg(n) insertion, lookup (if balanced)
Hash table

constant-time insertion, lookup (if done well)
Trees
A binary tree is either NULL or contains a node
with the key/value pair, and a left and right child
which are themselves trees.
In a binary search tree (BST) the keys in the left
child are smaller than the keys at the node and
the values in the right child are greater than the
value at the node. (Recursively true through all
subtrees.)
O(log n) expected search and insertion time
in-order traversal provides elements in sorted
order.
Examples
In the following examples each node
stores the key/value pair:


name – a string
value – a hex #, that represents the character
(MBCS)
As well as a reference (ptr) to each of the
2 subtrees
Trees in Python
We can use a list of 3 elements:
0. The key/value pair
1. The left subtree
2. The right subtree
The following is a tree w/one node
(empty subtrees)
T = [ ["Smiley", 0x263A], [], [] ]
BST Lookup - Python
def lookup( T, name ) :
"""lookup: look up name in tree T, return the
cell, None if not found"""
if T == [] : # T is the empty tree
return None
if T[0][0] == name :
return T
elif name < T[0][0] : # look in left subtree
return lookup( T[1], name );
else : # look in right subtree
return lookup( T[2], name );
}
BST in C
We will use a struct to hold the key, value,
and pointers to the subtrees
Binary Search Tree
typedef struct bNode bNode;
struct Node {
char *name;
int
value;
“Aacute”
“smiley”
sNode *left;
0x263A
0x00c1
sNode *right;
};
“smiley”
0x263A
“smiley”
“AElig”
0x263A
0x00c6
“smiley”
“Acirc”
0x263A
0x00c2
NULL NULL
NULL NULL
“smiley”
“zeta”
0x263A
0x03b6
NULL NULL
Binary Search Tree Lookup
/* lookup: look up name in tree treep */
sNode *lookup( sNode *treep, char *name )
{
int cmp;
if( treep == NULL )
return NULL;
cmp = strcmp(name, tree->name);
if( cmp == 0 )
return treep;
else if( cmp < 0 )
return lookup( treep->left, name );
else
return lookup( treep->right, name );
}
Hash Tables
Provides key lookup and insertion with constant
expected cost
Used to create lookup table (with m slots),
where it is not usually possible to reserve a slot
for each possible element
hash function maps key to index (should evenly
distribute keys)

H(k) -> [ 0, m-1 ]
duplicates stored in a chain (list) – other
mechanisms are possible.
Dictionaries in Python
Python (as well as Perl and Awk) has a
built-in dictionary (associative array)
Type is dict
d = {} # empty dictionary
# initialise
nvList = { 'Smiley': 0x263A, 'Aacute': 0x00c1 }
print nvList[ 'Smiley' ]
Dictionaries - Python
Add an entry:
nvList[ 'Zeta', 0x03b6 ]
Test for existence:
if nvList.has_key( 'AElig' ) :
Remove an entry:
nvList.pop( 'Smiley', None )
Iterate over keys:
for i in nvList.keys() :
Hash Table - C
typedef struct sNode sNode;
struct sNode {
char
*name;
NULL
int
value;
NULL
sNode *next; /* in chain */
NULL
};
NULL
name 1
data 1
NULL
sNode *symtab[NHASH];
/* a symbol table */
NULL
name 2
data 2
NULL
name 3
data 3
Hash
Table
Lookup
/* lookup: find name in symtab, with optional create */
sNode *lookup(char *name, int create, int value)
{
int h;
sNode *sym;
h = hash(name);
for (sym = symtab[h]; sym != NULL; sym = sym->next)
if (strcmp(name, sym->name) == 0)
return sym;
if (create) {
sym = (sNode *) emalloc(sizeof(sNode));
sym->name = name; /* assumed allocated elsewhere */
sym->value = value;
sym->next = symtab[h];
symtab[h] = sym;
}
return sym;
}
Hash Function
enum { MULTIPLIER = 31};
/* hash: compute hash value of string */
unsigned int hash(char* str)
{
unsigned int h;
unsigned char *p;
h = 0;
for (p = (unsigned char *) str; *p != ‘\0’; p++)
h = MULTIPLIER * h + *p;
return h % NHASH;
}