Tries Data Structure

Download Report

Transcript Tries Data Structure

Tries Data Structure
Tries



Trie is a special structure to represent
sets of character strings.
Can also be used to represent data types
that are objects of any type e.g. strings of
integers.
The word “trie” is derived from the
middle letters of the word “retrieval”.
Tries: Example
One way to implement a spelling checker is
 Read a text file.
 Break it into words( character strings
separated by blanks and new lines).
 Find those words not in a standard
dictionary of words.
 Words in the text but not in the dictionary
are printed out as possible misspellings.
Tries: Example
It can be implemented by a set having
operations of :
 INSERT
 DELETE
 MAKENULL
 PRINT
A Trie structure supports these set
operations when the element of the set
are words.
Tries: Example
T
H
E
$
I
I
N
$
N
$
S
I
N
S
$
$
N
$
G
$
Tries: Example




Tries are appropriate when many words begin
with the same sequence of letters.
i.e; when the number of distinct prefixes
among all words in the set is much less than
the total length of all the words.
Each path from the root to the leaf
corresponds to one word in the represented
set.
Nodes of the trie correspond to the prefixes of
words in the set.
Tries: Example





The symbol $ is added at the end of each word
so that no prefix of a word can be a word itself.
The Trie corresponds to the set {THE,THEN
THIN, TIN, SIN, SING}
Each node has at most 27 children, one for
each letter and $
Most nodes will have many fewer than 27
children.
A leaf reached by an edge labeled $ cannot
have any children.
Tries nodes as ADT






A node in a trie can be viewed as:
Mapping whose domain is {A,B,…Z, $}
And whose value set is the type “Pointer
to trie node”.
A trie can be identified with its root.
=> ADT’s TRIE and TRIENODE have the
same data type.
However, there operations are different.
Operations on Tries nodes




ASSIGN(node,c,p): Assign value p (a
pointer to a node) to character c in node
node.
VALUEOF(node, c): Produce the value
associated with character c in node.
GETNEW(node, c): Make the value of
node for character c be a pointer to a
new node.
MAKENULL(node): Makes node to be
null mapping.
Sets
 A Set is a collection of members (or
elements).
 Each member of a set is either itself a set
or is a primitive element called an atom.
 All elements of a set are different.
Sets




Set can be integers, characters or strings.
All elements can be of the same type.
Atoms in a set can be linearly ordered.
A linear order (denoted by <) on a set S
(“less than” or precedes”) satisfies two
properties:
 For any a and b in S, exactly one of a < b, a =
b, or b < a is true.
 For all a, b and c in S, if a < b and b < c, then
a < c (transitivity).
Set Notation
 A set of atoms is generally exhibited by
putting curly brackets around its members.
 Example: {1,4}, denotes the set whose
members are 1 and 4.
 Set is not a list, since order of elements in
a set is not important.
 {4,1} is the same set as {1,4}
Operations on Set
 UNION: If A and B are sets then A  B is
the set of elements that are members of A or
B or both.
 INTERSECTION: A  B is the set of
elements, that are present both in A and B.
 DIFFERENCE: A – B is the set of elements
that are members of A but are not members
of B.
Abstract Data Types Based on
Sets
The Set ADT can incorporate some other
operations as well.
 MERGE(A,B,C): Assigns to the set
variable C the value A  B, the operator is
not defined if A  B  Ø
 MEMBER(x,A): Returns true if x A and
returns false if x  A.
 MAKENULL(A): makes the Null set be the
value for set variable A.
Abstract Data Types Based on
Sets
 INSERT(x,A): x is an element of the type
of A’s members. Makes x a member of A.
A = A  {x}
 DELETE(x,A): removes x from A.
A=A–{x}
 ASSIGN(A,B): sets the value of set
variable A to be equal to the value of set
variable B.
Abstract Data Types Based on Sets
 MIN(A): Returns the least element in set A.This
operator is applicable only when the member of
A are linearly ordered.
 MAX(A): Returns the largest element in set
A.This operator is applicable only when the
member of A are linearly ordered.
 EQUAL(A,B): Returns true if and only if sets A
and B consists of the same elements.
 FIND(x): Works for collection of disjoint sets.
Returns the name of the unique set of which x is
a member.
Reference

“Data Structures and Algorithms” by A. V.
Aho, J. E. Hopcroft, J. D. Ullman.