Lecture slides - College of Computer and Information Science

Download Report

Transcript Lecture slides - College of Computer and Information Science

COM1721: Freshman Honors Seminar
A Random Walk Through Computing
Rajmohan Rajaraman
Tuesdays, 5:20 PM, 149 CN
Introduction




Explore a potpourri of concepts in
computing
1: a mixture
of flowers, herbs,
and spices that
Theory,
examples,
and applications
is usually kept in a jar and used for scent
Readings:
Handouts
2: a miscellaneous
collectionand WWW
Etymology: French pot pourri, literally rotten pot
Grading: Quizzes, homework, and class
participation
Sample Concepts







Abstraction
Modularity
Randomization
Recursion
Representation
Self-reference
…
Sample Topics







Dictionary search
Structure of the Web
Self-reproducing programs
Undecidability
Private communication
Relational databases
Quantum computing, bioinformatics,…
Abstraction


A view of a problem that extracts the
essential information relevant to a particular
purpose and ignores inessential details
Driving a car:


Building a house:


We are provided a particular abstraction of the car
in which we only need to know certain controls
Different levels of abstraction for house owner,
architect, construction manager, real estate agent
Related concepts: information hiding,
encapsulation, representation
Modularity



Decomposition of a system into
components, each of which can be
implemented independent of the others
Foundation for good software
engineering
Design of a basic processor from
scratch
Representation




To portray things or relationship
between things
Knowledge representation: model
relationship among objects as an edgelabeled graph
Data representation: bar graphs,
histograms for statistics
Querying a dictionary; Web as a graph
Randomization




An algorithmic technique that uses
probabilistic (rather than deterministic)
selection
A simple and powerful tool to provide efficient
solutions for many complex problems
Has a number of applications in security
Cryptography and private communication
Recursion

A way of specifying a process by means
of itself



Complicated instances are defined in terms
of simpler instances, which are given
explicitly
Closely tied to mathematical induction
Fibonacci numbers
Self-reference


A statement/program that refers to itself
Examples:







“This
“This
“This
“This
statement
statement
statement
statement
contains five words”
contains six words”
is not self-referential”
is false”
Important concept in computing theory
Undecidability of the halting problem, selfreproducing programs
Gödel Escher Bach: an Eternal Golden Braid,
Douglas Hofstader
Illustration: Representation


Problem: Derive an expression for the
sum of the first n natural numbers
1 + 2 + 3 + … + n-2 + n-1 + n = ?
Sum of First n Natural Numbers
1 + 2 + 3 + … + 98 + 99 + 100 = S
100 + 99 + 98 + … + 3 + 2 + 1 = S
101 + 101 + 101 + … + 101 + 101 = 2S
S = 100*101/2
S = n(n+1)/2
A Different Representation
1
2
3
A “Geometric Derivation”
 45
2S  n(n  1)
Other Equalities

Sum of first n odd numbers


1 + 3 + 5 + … + 2n-1 = ?
Sum of first n cubes

1 + 4 + 9 + 16 + … + n^3 = ?
Representation and Programming

Representation is the essence of
programming
Brooks, “The Mythical Man-Month”

Data structures
Dictionary

A collection of words with a specified
ordering



Dictionary of English words
Dictionary of IP addresses
Dictionary of NU student names
Searching a Dictionary


Suppose we have a dictionary of
100,000 words
Consider different operations




Search for a word
List all anagrams of a word
Find the word matching the largest prefix
What representation (data structure)
should we choose?
Search for a Word


Store the words in sorted order in a
linear array
Unsuccessful search:


compare with 100,000 words
Successful search:

on average, compare with 50,000 words
Twenty Questions







Compare with 50,000th word
If match, then done
If further in dictionary order, search right half
If earlier in dictionary order, search left half
Until word found, or search space empty
Recursion
Binary search
How Many Questions?
ajuma
alderaan
alpheratz
amber
dali
escher
picasso
reliable
renoir
yukon
vangogh
How Many Questions?
Question #
0
1
2
3
5
10
15
17
Search space
100,000
50,000
25,000
12,500
3,125
100
4
1
Anagrams
An anagram of a word is another word
with the same distribution of letters,
placed in a different order
 Input: deposit
Output: posited, topside, dopiest
 Anagrams: subessential suitableness

Detecting Anagrams

How do you determine whether two
words X and Y are anagrams?



Compare the letter distributions
Time proportional to number of letters in
each word
Suppose this subroutine anagram(X,Y)
is fast
Listing Anagrams of a Word




Dictionary of 100,000 English words
List all anagrams of least
How should we represent the
dictionary?
Linear array


Loop through dictionary: if
anagram(X,least), include X in list
Running time = 100,000 calls to anagram()
A Different Data Structure


If X and Y are anagrams of each other,
they are equivalent; the list of
anagrams of X is same as the list for Y
This indicates an equivalence class of
anagrams!
deposit posited topside dopiest
race care acre
adroitly dilatory idolatry
Anagram Signatures



Would like to store anagrams in the same
class together
How do we identify a class?
Assign a signature!
Sort all the letters in the anagram
 Same for each word in a class!
acre race care:
deposit posited topside dopiest:
subessential suitableness:

word(s)
acer
deiopst
abeeilnssstu
Anagram Program
acre
acer: acre
acer: acre
pots
opst: pots
acer: care
stop
sign
opst: stop
sort
anps:snap
care
acer: care
opst: pots
post
opst: post
opst: stop
snap
anps:snap
opst: post
Anagram Program
acer: acre
acer: care
anps:snap
opst: pots
opst: stop
opst: post
acer: acre care
merge
anps: snap
opst: pots stop post
Listing Anagrams for Given Word X


Compute sign(X) and lookup sign(X) in
dictionary using binary search
List all words in list adjacent to sign(X)
post
sign
opst
lookup
acer: acre care
anps: snap
opst: pots stop post
Efficiency of Anagram Program

Once dictionary has been stored in new
representation:



What about the cost of new representation?



Lookup takes at most 17 queries
Listing time is proportional to number of anagrams
in the class
Sign each word, sort, and merge
Expensive, but need to do it only once!
Preprocessing
References

Programming Pearls, by Jon Bentley,
Addison-Wesley

Great Ideas in Theoretical Computer
Science, Steven Rudich
A course at CMU