Transcript Lecture1

Introduction
(Outline)
•
•
•
•
The Software Development Process
Performance Analysis: the Big Oh.
Abstract Data Types
Introduction to Data Structures
CS 103
1
The Software Development Process
CS 103
2
Software Development
• Requirement analysis, leading to a
specification of the problem
• Design of a solution
• Implementation of the solution (coding)
• Analysis of the solution
• Testing, debugging and integration
• Maintenance and evolution of the system.
CS 103
3
Specification of a problem
• A precise statement/description of the
problem.
• It involves describing the input, the
expected output, and the relationship
between the input and output.
• This is often done through preconditions
and postconditions.
CS 103
4
Design
• Formulation of a method, that is, of a sequence of
steps, to solve the problem.
• The design “language” can be pseudo-code,
flowcharts, natural language, any combinations of
those, etc.
• A design so expressed is called an algorithm(s).
• A good design approach is a top-down design
where the problem is decomposed into smaller,
simpler pieces, where each piece is designed into a
module.
CS 103
5
Implementation
• Development of actual C++ code that will
carry out the design and solve the problem.
• The design and implementation of data
structures, abstract data types, and classes,
are often a major part of design
implementation.
CS 103
6
Implementation
(Good Principles)
• Code Re-use
– Re-use of other people’s software
– Write your software in a way that makes it (re)usable
by others
• Hiding of implementation details: emphasis on the
interface.
• Hiding is also called data encapsulation
• Data structures are a prime instance of data
encapsulation and code re-use
CS 103
7
Analysis of the Solution
•
•
Estimation of how much time and memory an
algorithm takes.
The purpose is twofold:
–
–
to get a ballpark figure of the speed and memory
requirements to see if they meet the target
to compare competing designs and thus choose the
best before any further investment in the application
(implementation, testing, etc.)
CS 103
8
Testing and Debugging
•
Testing a program for syntactical correctness (no
compiler errors)
Testing a program for semantic correctness, that is,
checking if the program gives the correct output.
This is done by
•
•
–
–
–
–
•
having sample input data and corresponding, known output data
running the programs against the sample input
comparing the program output to the known output
in case there is no match, modify the code to achieve a perfect
match.
One important tip for thorough testing: Fully exercise
the code, that is, make sure each line of your code is
executed.
CS 103
9
Integration
•
Gluing all the pieces (modules) together to
create a cohesive whole system.
CS 103
10
Maintenance and Evolution of a
System
• Ongoing, on-the-job modifications and
updates of the programs.
CS 103
11
Preconditions and Postconditions
• A semi-formal, precise way of specifying
what a function/program does, and under
what conditions it is expected to perform
correctly
• Purpose: Good documentation, and better
communications, over time and space, to
other programmers and user of your code
CS 103
12
Precondition
• It is a statement of what must be true before function is
called.
• This often means describing the input:
– the input type
– the conditions that the input values must satisfy.
• A function may take data from the environment
– Then, the preconditions describe the state of that environment
– that is, the conditions that must be satisfied, in order to guarantee
the correctness of the function.
• The programmer is responsible for ensuring that the
precondition is valid when the function is called.
CS 103
13
Postcondition
• It is a statement of what will be true when the
function finishes its work.
• This is often a description of the function output,
and the relationship between output and input.
• A function may modify data from the environment
(such as global variables, or files)
– the postconditions describe the new values of those data
after the function call is completed, in relation to what
the values were before the function was called.
CS 103
14
Example of Pre/Post-Conditions
void get_sqrt( double x)
// Precondition: x >= 0.
// Postcondition: The output is a number
//
y = the square root of x.
CS 103
15
C++ Way of Asserting
Preconditions
• Use the library call
assert (condition)
• You have to include #include <cassert>
• It makes sure that condition is satisfied (= true), in
which case the execution of the program
continues.
• If condition is false, the program execution
terminates, and an error message is printed out,
describing the cause of the termination.
CS 103
16
Performance Analysis and Big-O
CS 103
17
Performance Analysis
•
•
•
•
•
Determining an estimate of the time and
memory requirement of the algorithm.
Time estimation is called time complexity
analysis
Memory size estimation is called space
complexity analysis.
Because memory is cheap and abundant, we
rarely do space complexity analysis
Since time is “expensive” , analysis now
defaults to time complexity analysis
CS 103
18
Big-O Notation
• Let n be a non-negative integer representing
the size of the input to an algorithm
• Let f(n) and g(n) be two positive functions,
representing the number of basic
calculations (operations, instructions) that
an algorithm takes (or the number of
memory words an algorithm needs).
CS 103
19
Big-O Notation (contd.)
• f(n)=O(g(n)) iff there exist a positive
constant C and non-negative integer n0 such
that
f(n)  Cg(n) for all nn0.
• g(n) is said to be an upper bound of f(n).
CS 103
20
Big-O Notation
(Examples)
• f(n) = 5n+2 = O(n) // g(n) = n
–
f(n)  6n, for n  3 (C=6, n0=3)
• f(n)=n/2 –3 = O(n)
–
f(n)  0.5 n for n  0 (C=0.5, n0=0)
• n2-n = O(n2) // g(n) = n2
–
n2-n  n2 for n  0 (C=1, n0=0)
• n(n+1)/2 = O(n2)
–
n(n+1)/2  n2 for n  0 (C=1, n0=0)
CS 103
21
Big-O Notation
(In Practice)
• When computing the complexity,
– f(n) is the actual time formula
– g(n) is the simplified version of f
• Since f(n) stands often for time, we use T(n)
instead of f(n)
• In practice, the simplification of T(n) occurs
while it is being computed by the designer
CS 103
22
Simplification Methods
• If T(n) is the sum of a constant number of
terms, drop all the terms except for the most
dominant (biggest) term;
• Drop any multiplicative factor of that term
• What remains is the simplified g(n).
• amnm + am-1nm-1+...+ a1n+ a0=O(nm).
• n2-n+log n = O(n2)
CS 103
23
Big-O Notation
(Common Complexities)
•
•
•
•
•
•
•
•
T(n)=O(1)
T(n)=O(log n)
T(n)=O(n)
T(n)=O(n2)
T(n)=O(n3)
T(n)=O(nc),
c 1
T(n)=O(logc n), c 1
T(n)=O(nlog n)
// constant time
// logarithmic
// linear
//quadratic
//cubic
// polynomial
// polylogarithmic
CS 103
24
Big-O Notation
(Characteristics)
• The big-O notation is a simplification
mechanism of time/memory estimates.
• It loses precision, trading precision for
simplicity
• Retains enough information to give a
ballpark idea of speed/cost of an algorithm,
and to be able to compare competing
algorithms.
CS 103
25
Common Formulas
• 1+2+3+…+n= n(n+1)/2 = O(n2).
• 12+22+32+…+n2= n(n+1)(2n+1)/6 = O(n3)
• 1+x+x2+x3+…+xn=(x n+1 – 1)/(x-1) = O(xn).
CS 103
26
Example of Time Complexity
Analysis and Big-O
• Pseudo-code of finding a maximum of x[n]:
double M=x[0];
for i=1 to n-1 do
if (x[i] > M)
M=x[i];
endif
endfor
return M;
CS 103
27
Complexity of the algorithm
• T(n) = a+(n-1)(b+a) = O(n)
• Where “a” is the time of one assignment,
and “b” is the time of one comparison
• Both “a” and “b” are constants that depend
on the hardware
• Observe that the big O spares us from
– Relatively unimportant arithmetic details
– Hardware dependency
CS 103
28
Abstract Data Types
CS 103
29
Abstract Data Types
• An abstract data type is a mathematical set
of data, along with operations defined on
that kind of data
• Examples:
– int: it is the set of integers (up to a certain
magnitude), with operations +, -, /, *, %
– double: it’s the set of decimal numbers (up to a
certain magnitude), with operations +, -, /, *
CS 103
30
Abstract Data Types (Contd.)
• The previous examples belong to what is
called built-in data types
• That is, they are provided by the
programming language
• But new abstract data types can be defined
by users, using arrays, enum, structs,
classes (if object oriented programming),
etc.
CS 103
31
Introduction to Data Structures
CS 103
32
Data Structures
• A data structure is a user-defined abstract data
type
• Examples:
– Complex numbers: with operations +, -, /, *,
magnitude, angle, etc.
– Stack: with operations push, pop, peek, isempty
– Queue: enqueue, dequeue, isempty …
– Binary Search Tree: insert, delete, search.
– Heap: insert, min, delete-min.
CS 103
33
Data Structure Design
• Specification
– A set of data
– Specifications for a number of operations to be
performed on the data
• Design
– A lay-out organization of the data
– Algorithms for the operations
• Goals of Design: fast operations
CS 103
34
Implementation of a Data
Structure
• Representation of the data using built-in
data types of the programming language
(such as int, double, char, strings, arrays,
structs, classes, pointers, etc.)
• Language implementation (code) of the
algorithms for the operations
CS 103
35
Object-Oriented Programming (OOP)
And Data Structures
• When implementing a data structure in non-OOP
languages such as C, the data representation and
the operations are separate
• In OOP languages such as C++, both the data
representation and the operations are aggregated
together into what is called objects
• The data type of such objects are called classes.
• Classes are blue prints, objects are instances.
CS 103
36