DB2 Net Search Extenders

Download Report

Transcript DB2 Net Search Extenders

DB2 Net Search
Extender
Presenter:
Sudeshna Banerji
(CIS 595: Bioinformatics)
 Topics to discuss:
– Information retrieval
– Text-indexing
– DB2 Text Extenders
– DB2 Net Search Extender
– References
– Questions
Sudeshna Banerji (CIS 595: Bioinformatics)
A Little Background…
 Information Retrieval(IR):
• Extraction of “relevant” information from huge
volumes of data scattered across different databases.
• Examples: Textual search, image search, video search
etc.
• Efficiency(time and speed) of IR is based on different
INDEXING technologies.
• Indexing increases performance of system.
• An example of indexing technology: Text-indexing
used for textual-search.
Sudeshna Banerji (CIS 595: Bioinformatics)
A Little Background…
 Text-Indexing :
• Process of deciding what will be used to represent a
given document.
• A text index consists of significant terms extracted from
the text documents, each term stored together with
information about the document that contains it.
• The search is then handled as a query to look up the
index.
Sudeshna Banerji (CIS 595: Bioinformatics)
A Little Background…
 Text-Indexing (continued):
• Involves the following:
– Parsing the documents to recognize the structure.
E.g title, date, other fields.
– Scan for word tokens: numbers, special characters,
hyphenation, capitalization etc.
– Stopword removal: based on short list of common
words like “the”, “and”, “or”.
Sudeshna Banerji (CIS 595: Bioinformatics)
Indexing only
Significant Terms
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Extenders
– Product of IBM family that provide support to data
beyond traditional character and numeric data types.
– Extenders available for images, voice, video, complex
documents (full-text search), spatial objects etc.
– Trial and beta versions available for testing.
– Link for extenders:
http://www-3.ibm.com/software/data/db2/extenders/index.html
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Text Extenders
– To meet the increasing demands of content management,
IBM has introduced 3 full-text retrieval applications
available for DB2 Universal Database (DB2 UDB).
• DB2 Net Search Extender
• DB2 Text Information Extender
• DB2 Text Extender
– When to use what?
• Link for comparisons of the above:
http://www-3.ibm.com/software/data/db2/extenders/fulltextcomparison.html
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender
 Replaces DB2 Text Information Extender Version 7.2
 Some important features:
– Indexing speed of about 1GB per hour .
– Different text formats: ASCII Plain text, HTML,XML,
GPP
– Base support for 37 languages including English, Spanish,
French, Japanese and Chinese .
– Sub-second search response times.
– No decrease in search performance with up to 1000
concurrent queries per second.
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender
 Some text-search capabilities:
– Search can be performed using SQL (fourth generation
language…almost like English query).
– Searches can include:
• Boolean operations.
• Proximity search for words in the same sentence or
paragraph: for HTML,XML and GPP.
• “Fuzzy” searches for words having a similar spelling as
the search term: Andrew & Andru
• Thesaurus related search.
• Restrict searching to sections within documents.
• User can limit the search results with a “hit count”, and
can also specify how the results are to be sorted.
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender
 System requirements
– DB2 Version 8.1
– Java Runtime Environment (JRE) Version 1.3.1
 Windows Installation
– Administrative rights required.
– Call db2text start to start the DB2 Net Search
Extender Instance Services.
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender
 Simple example with the SQL queries
– Following steps are required to do a basic textualsearch in DB2 Net Search Extender:
1. Creating a database
2. Enabling a database for text search
3. Creating a table
4. Creating a full-text index
5. Loading sample data
6. Synchronizing the text index
7. Searching with the text index
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender
1. Creating a database:
db2 "create database sample"
2. Enabling a database for text search:
• To start Net Search Extender Service
db2text "START“
• To prepare the database for use with DB2
Net Search Extender:
db2text "ENABLE DATABASE FOR TEXT
CONNECT TO sample"
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender
3. Creating a table:
db2 "CREATE TABLE books (isbn VARCHAR(18) not
null PRIMARY KEY, author VARCHAR(30), story
LONG VARCHAR, year INTEGER)"
4. Creating a full-text index:
db2text "CREATE INDEX db2ext.myTextIndex FOR
TEXT ON books (story) CONNECT TO sample"
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender
5. Loading sample data:
db2 "INSERT INTO books VALUES (‘0-13-0867551’,’John’,’ A man was running down the street.’,2001)“
db2 "INSERT INTO books VALUES (‘0-13-086755-2’ ,
‘Mike’, ’The cat hunts some mice.’, 2000)“
6. Synchronizing the text index:
db2text "UPDATE INDEX db2ext.myTextIndex FOR TEXT
CONNECT TO sample“
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender
7. Searching with the text index:
• Using CONTAINS scalar search function:
db2 "SELECT author, story FROM books WHERE
CONTAINS (story, ‘”cat“’) = 1 AND year >= 2000"
The following result table is returned:
AUTHOR
STORY
Mike
The cat hunts some mice.

NOTE:
–
To create a text-index, the text columns must be one of
the following data types:
CHAR, VARCHAR, LONG VARCHAR, CLOB.
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender
 Thesaurus Support:
– A thesaurus is structured like a network of nodes linked
together by relations:
• Associative relations: RELATED_TO
• Synonym relations: SYNONYM_OF
• Hierarchical relations: LOWER_THAN,
HIGHER_THAN
– Creating and compiling a thesaurus:
1. Create a thesaurus definition file (explained below).
2. Compile the definition file into a thesaurus dictionary using
DB2EXTTH utility.
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender

Create a thesaurus definition file.
–
Define its content in a definition file using a text editor.
Example of some definition groups:
:WORDS
football
.RELATED_TO goal
.SYNONYM_OF soccer
:WORDS
chapel
.LOWER_THAN skyscraper
.HIGHER_THAN house
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender
 An example of a structure of a Thesaurus:
Game
HIGHER_THAN
Ball Game
HIGHER_THAN
HIGHER_THAN
HIGHER_THAN
Tennis
Football
Soccer
SYNONYM_OF
Sudeshna Banerji (CIS 595: Bioinformatics)
DB2 Net Search Extender
 References:
- http://www-3.ibm.com/cgibin/db2www/data/db2/udb/winos2unix/support/
document.d2w/report?fn=desu9m03.htm#ToC
- Information Retrieval Site containing good lecture slides:
http://ciir.cs.umass.edu/cmpsci646/
- Net Search Extender Administration and User’s Guide ,
Version 8.1 (can be downloaded with the software)
Sudeshna Banerji (CIS 595: Bioinformatics)
 ANY QUESTIONS????
Sudeshna Banerji (CIS 595: Bioinformatics)