02_searching
Download
Report
Transcript 02_searching
Searching
Chapter 2
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
1
Outline
• Linear List Searches
• Sequential Search
» The sentinel search,
» The probability search,
» The ordered search.
• Binary Search
• Hashed List Searches
– Collision Resolution
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
2
Linear List Searches
•
We study searches that work with arrays.
Figure 2-1
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
3
Linear List Searches
There are two basic searches for arrays
1. The sequential search.
–
It can be used to locate an item in any array.
2. The binary search.
–
It requires an ordered list.
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
4
Linear List Searches
Sequential Search
•
The list is not ordered!
•
We will use this technique only for small arrays.
We start searching at the beginning of the list and
continue until we find the target entity.
a.
Eighter we find it,
b. or we reach the end of the list!
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
5
Locating data in unordered list.
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
Figure 2-2
6
Linear List Searches
Sequential Search Algorithm
•
RETURN: The algorithm must be tell two
things to calling algorithm;
1. Did it find the data ?
2. If it did, what is the index (address)?
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
7
Linear List Searches
Sequential Search Algorithm
•
The searching algorithm requires five parameters:
1. The list.
2. An index to the last element in the list.
3. The target.
4. The address where the found element’s index location is
to be stored.
5. The address where the found or not found boolean is to
be stored.
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
8
Sequential Search Algorithm
algorithm SeqSearch (val list <array>, val last <index>,
val target <keyType>, ref locn <index>)
Locate the target in an unordered list of size elements.
PRE list must contain at least one element.
last is index to last element in the list.
target contains the data to be located.
locn is address of index in calling algorithm.
POST if found – matching index stored in locn & found TRUE
if not found – last stored in locn & found FALSE
RETURN found <boolean>
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
9
Sequential Search Algorithm
1.
2.
looker = 1
loop (looker < last AND target not equal list(looker))
1.
3.
4.
locn = looker
if (target equal list(looker))
1.
5.
looker = looker + 1
found = true
else
1.
found = false
6. return found
end SeqSearch
Ceng-112 Data Structures I
Big-O(n)
Serap Atay, Ph.D. 2007
10
Variations On Sequential Search
There are three variations of sequential search algorithm:
1.
The sentinel search,
2.
The probability search,
3.
The ordered search.
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
11
Sequential Search Algorithm
The Sentinel Search
• If the target will be found in the list, we can eliminate the test for the end of
list.
algorithm SentinelSearch (val list <array>, val last <index>,
val target <keyType>, ref locn <index>)
Locate the target in an unordered list of size elements.
PRE list must contain element at the end for the sentinel.
last is index to last element in the list.
target contains the data to be located.
locn is address of index in calling algorithm.
POST if found – matching index stored in locn & found TRUE
if not found – last stored in locn & found FALSE
RETURN found <boolean>
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
12
Sequential Search Algorithm
The Sentinel Search
1.
2.
3.
list[last+1] = target
looker = 1
loop (target not equal list(looker))
1.
4.
if (looker <= last)
1.
2.
5.
looker = looker + 1
found = true
locn = looker
else
1.
2.
found = false
locn = last
Big-O(n)
6.
return found
end SentinelSearch
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
13
Sequential Search Algorithm
The Probability Search
algorithm ProbabilitySearch (val list <array>, val last <index>,
val target <keyType>, ref locn <index>)
Locate the target in a list ordered by the probability of each element being the
target – most probable first, least probable last.
PRE list must contain at least one element.
last is index to last element in the list.
target contains the data to be located.
locn is address of index in calling algorithm.
POST if found – matching index stored in locn & found TRUE and element
moved up in priority.
if not found – last stored in locn & found FALSE
RETURN found <boolean>
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
14
Sequential Search Algorithm
The Probability Search
1.
2.
looker = 1
loop (looker < last AND target not equal list[looker])
1.
3.
looker = looker + 1
if (target = list[looker])
1.
2.
found = true
if (looker > 1)
1.
2.
3.
4.
4.
temp = list[looker-1]
list[looker-1] = list[looker]
list[looker] = temp
looker = looker - 1
else
1.
Big-O(n)
found = false
5.
locn = looker
6.
return found
end ProbabilitySearch
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
15
Sequential Search Algorithm
The Ordered List Search
• If the list is small it can be more efficient to use a sequential search.
• We can stop search loop, when the target becomes less than or equal to the
testing element of the list.
algorithm OrderedListSearch (val list <array>, val last <index>,
val target <keyType>, ref locn <index>)
Locate the target in a list ordered on target.
PRE list must contain at least one element.
last is index to last element in the list.
target contains the data to be located.
locn is address of index in calling algorithm.
POST if found – matching index stored in locn & found TRUE
if not found – last stored in locn & found FALSE
RETURN found <boolean>
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
16
Sequential Search Algorithm
The Ordered List Search
1.
if (target <= list[last])
1.
2.
looker = 1
loop (target > list[looker])
1.
2.
else
1.
3.
looker = last
if (target equal list[looker]
1.
4.
looker = looker + 1
found = true
else
1.
found = false
Big-O(n)
5.
locn = looker
6.
return found
end OrderedListSearch
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
17
Sequential Search
• The sequential search algorithm is very slow for the big lists.
• Big-O(n)
• If the list is ordered, we can use a more efficient algorithm
called the binary search.
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
18
Binary Search
Test the data in the element at the middle of the array.
If it is in the first half!
Test the data in the element
at the middle of the array.
If it is in the
first half!
..
.
Ceng-112 Data Structures I
If it is in the second half!
Test the data in the element
at the middle of the array.
If it is in the
second half!
If it is in the
first half!
..
.
..
.
Serap Atay, Ph.D. 2007
If it is in the
second half!
..
.
19
mid=(first+last)/2
target > mid
first = mid +1
target < mid
last = mid -1
Figure 2-4
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
20
first becomes
larger than last!
Figure 2-5
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
21
Binary Search Algorithm
algorithm BinarySearch(val list <array>, val last <index>,
val target <keyType>, ref locn <index>)
Search an ordered list using binary search.
PRE list is ordered:it must contain at least one element.
last is index to the largest element in the list.
target is the value of element being sought.
locn is address of index in calling algorithm.
POST Found : locn assigned index to target element.
found set true.
Not found: locn = element below or above target.
found set false.
RETURN found <boolean>
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
22
Binary Search Algorithm
1.
2.
3.
first = 1
last = end
loop (first <= last)
1.
2.
mid = (first + last)/2
if (target > list[mid])
1.
3.
last = mid – 1
(Look it lower halt).
first = last + 1
(Found equal: force exit)
else
1.
4.
5.
(Look in upper half).
else if (target < list[mid]
1.
4.
first = mid + 1
locn = mid
if (target equal list[mid])
1.
6.
found = true
else
1.
found = false
Big-O(log2n)
7.
Return
end BinarySearch
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
23
Comparison of
binary and sequential searches
Size
Binary
Sequential
(Average)
16
4
8
16
50
6
25
50
256
8
128
256
1.000
10
500
1.000
10.000
14
5.000
10.000
100.000
17
50.000
100.000
1.000.000
20
500.000
1.000.000
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
Sequential
(Worst case)
24
Hashed List Searches
• In an ideal search, we would know exactly where the data are
and go directly there.
• We use a hashing algorithm to transform the key into the index
of array, that contains the data we need to locate.
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
25
•It is a key-to-address transformation!
Figure 2-6
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
26
•We call set of keys that hash to the same location in our list synonymns.
•A collision is the event that occurs when a hashing algorithm produces
an address for an insertion key and that address is already occupied.
•Each calculation of an address and test for success is known as a probe.
Figure 2-7
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
27
Hashing Methods
Figure 2-8
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
28
Direct Hashing Method
• The key is the address without any algorithmic manipulation.
• The data structure must contain an element for every possible key.
• It quarantees that there are no synonyms.
• We can use direct hashing very limited!
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
29
Direct Hashing Method
Direct hashing of
employee numbers.
Figure 2-9
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
30
Subtraction Hashing Method
• The keys are consecutive and do not start from one.
Example:
• A company have 100 employees,
• Employee numbers start from 1000 to 1100.
x=1001
x=1002
x=1100
x – 1000
1
Ali Esin
2
Sema Metin
1
2
100
99
100
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
Filiz Yılmaz
31
Modulo Division Hashing Method
• The modulo-division method divides the key by the array size
and uses remainder plus one for the address.
address = key mod (listSize) + 1
• If a list size selected a prime number, that produces fewer
collisions than other list sizes.
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
32
Modulo Division Hashing Method
121267 / 307 = 395 and
remainder = 2
hash(121267)= 2 +1 = 3
We have 300 employees, and the first prime greater that 300 is 307!.
Figure 2-10
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
33
Digit Extraction Method
• Selected digits are extracted from the key and used as the
address.
Example:
379452 394
121267 112
378845 388
526842 568
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
34
Midsquare Hashing Method
• The key is squared and the address selected from the middle of
the squared number.
• The most obvious limitation of this method is the size of the
key.
Example:
9452 * 9452 = 89340304 3403 is the address.
Or
379452 379 * 379 = 143641 364
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
35
Folding Hashing Method
Figure 2-11
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
36
Pseudorandom Hashing Method
•
The key is used as the seed in a pseudorandom number
generator and resulting random number then scaled in to a
possiple address range using modulo division.
Use a function such as: y = (ax + b (mod m))+1
1. x is the key value,
2. a is coefficient,
3. b is a constant.
4. m is the count of the element in the list.
5. y is the address.
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
37
Pseudorandom Hashing Method
y = (ax + b (mod m)) + 1 y = (17x + 7 (mod 307)) + 1
1.
•
x = 121267 is the key value,
•
a = 17
•
b=7
•
m =307
y = ((( 17 * 121267) + 7) mod 307) + 1
2. y = ((2061539 +7) mod 307) + 1
3.
y = 2061546 mod 307 + 1
4.
y = 41 + 1
5. y = 42
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
38
Rotation Hashing Method
Rotation is often used in combination with folding and psuedorandom
hashing.
Figure 2-12
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
39
Collision Resolution Methods
All above methods of handling collision are independent of the
hashing algorithm.
Figure 2-13
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
40
Collision Resolution Concepts
“Load Factor”
• We define a full list, as a list in which all elements except one
contain data.
• Rule: A hashed list should not be allowed to become more than
%75 full!
the number of filled elements in the list
Load Factor = ------------------------------------------------------ x 100
total number of elements in the list
k
α=
--------- x 100
the number of elements
n
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
41
Collision Resolution Concepts
“Clustering”
• Some hashing algorithms tend to couse data to group
within the list. This is known as clustering.
• Clustering is created by collision.
• If the list contains a high degree of clustering, then
the number of probes to locate an element grows and
the processing efficiency of the list is reduced.
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
42
Collision Resolution Concepts
“Clustering”
• Clustering types are:
– Primary clustering; clustering around a home address in our
list.
– Secondary clustering; the data are widely distributed across
the whole list so that the list appears to be well distributed,
however, the time to locate a requested element of data can
become large.
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
43
Collision Resolution Methods
Open Addressing
• When a collision occurs, the home area addresses are searched
for an open or unoccupied element where the new data can be
placed.
• We have four different method:
– Linear probe,
– Quadratic probe,
– Double hashing,
– Key offset.
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
44
Open Addressing
“Linear Probe”
•
When data cannot be stored in the home address, we
resolve the collision by adding one to the current address.
Advantage:
•
Simple implementation!
•
Data tend to remain near their home address.
Disadvantages:
• It tends to produce primary clustering.
.
• The search algorithm may become more complex
especially after data have been deleted!
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
45
Open Addressing
“Linear Probe”
15532 / 307 = 50 and
remainder = 2
hash(15532)= 2 +1 = 3
New address = 3+1 =4
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
46
Open Addressing
“Linear Probe”
Figure 2-14
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
47
Open Addressing
“Quadratic Probe”
• Clustering can be eliminated by adding a value other than one
to the current address.
• The increment is the collision probe number squared.
–
–
–
–
For the first probe 12
For the second probe 22
For the third collision probe 32 ...
Until we eighter find an empty element or we exhoust the possible
elements.
– We use the modulo of the quadratic sum for the new address.
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
48
Open Addressing
“Quadratic Probe”
Increase by two
Fore each probe!
Probe*Probe
&
New
Increment
Next
Increment Address
Factor
Increment
Probe
Number
Collision
Location
1
1
1*1=1
2
1
2
2
2*2=4
6
3
6
3*3=9
15
3+
5
4
15
4*4=16
31
7
5
31
5*5=25
56
9
25
6
56
6*6=36
92
11
36
7
92
7*7=49
41
13
49
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
1
+
+4
9
16
49
Open Addressing – Double Hashing
“Pseudorandom Collision Resolution”
In this methot, rather than using an arithmetic probe functions,
the address is rehashed.
y = ((ax + c) mod listSize) +1
y = ((3.2 +(-1) mod 307) +1
y=6
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
Figure 2-15
50
Open Addressing – Double Hashing
“Key Offset Collision Resolution”
• Key offset is another double hashing method and, produces
different collision paths for different keys.
• Key offset calculates the new address as a function of the old
address and the key.
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
51
Open Addressing – Double Hashing
“Key Offset Collision Resolution”
offSet = [key / listSize]
address = ((offSet + old address) mod listSize) + 1
offSet = [166702 / 307] = 543
1. Probe : address = ((543 + 2) mod 307) + 1 = 239
2. Probe : address = ((543 + 239) mod 307) + 1 = 169
Key
Home
Address
Key Offset
Probe 1
Probe 2
166702
2
543
239
169
572556
2
1865
26
50
67234
2
219
222
135
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
52
Collision Resolution
Open Addressing Resolution
• A major disadvantage to open addressing is that each
collision resolution increases the probability of future
collisions!
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
53
Collision Resolution
Linked List Resolution
Link head pointer.
A link list is an ordered
collection of data in which
each element contains the
location of the next element.
Figure 2-16
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
54
Collision Resolution
Bucket Hashing Resolution
Figure 2-17
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
55
Hw #2
1.
2.
Create an array which includes the random integer
100 numbers between 0 and 150.
This should be an unordered list.
1.
2.
3.
Use Linear sentinel search algorithm and find the target
value in the array.
Use the Probability search algorithm and find the target
value in the array.
Create an ordered list which includes the 100
numbers between 0 and 150.
1.
2.
Use ordered list search algorithm and find the target value in
the array.
Use binary search algorithm and find the target value in the
array.
Load your HW-2 to FTP site until 15 Mar. 06 at 17:00.
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
56
Hw #2
–
–
Run the each search algorithm 10 times and report
these performance values for each of them.
Write your comments about the result table.
Sentinel
Search
Probability
Search
Ordered
Search
Binary
Search
Number of Completed Searches
Number of Successful Searches
Avarage number of tests per search
Ceng-112 Data Structures I
Serap Atay, Ph.D. 2007
57