Improving Intranet Search
Download
Report
Transcript Improving Intranet Search
Arembis
HyperLogic
Intranet Search
partnering with
Session id: 40185
Improving Intranet Search
with the Arembis HyperLogic
Engine built on Oracle
Database-Backed Technology
Arembis
Arembis
Agenda
Issues with Enterprise Search
Oracle’s products behind the Arembis engine
–
–
Infrastructure: Oracle Text
Solution: Oracle Ultra Search
Looking into the details
Overview of main features
Conclusions
Arembis
Current Problems with
Intranet Search
Enterprise Intranet is very different from
typical Internet websites
–
–
–
Users are different
Tasks are different
Amount and quality of information are different
Searching is also different
Arembis
Main Issues with Intranet
Search
Multiple repositories
–
Different data sources (websites, files, email, etc.)
Performance
–
Sub-second query respond time needed - not minutes
Quality
–
Precise search results not thousands of irrelevant stuff
Ease of Use
–
One single search engine not an engine per data source
Bad search is very easy to do
Good search is very difficult
Arembis
What is a Bad Search?
No search box
Too many hits
–
Return 10,000 hits when the average user looks at the
top 10 only
The most relevant item is not at the top of the list
–
Bad scoring
Too many similar documents
–
Poor duplicate detection
Inability to judge user intent
–
–
–
No misspelling interpretation
No context disambiguation (cricket the game or cricket the
bug?)
No recommendation system
Arembis
What is a Bad Search (Cont.)
Inability to understand why a document has
been returned
–
No KWIC
Lack of categorization
–
Similar documents in the same list
Documents change behind your back
–
No cache
Meta information
–
Size, format, date, feedback, etc.
Arembis
Some Examples - I
Where is the search box?
Arembis
Some Examples – II
“ultra seek” or “ultraseek”?
Some Examples - III
Looking for “k-means” in lotus.com
Oracle Products behind the
Arembis Search Technology
Oracle Text
–
–
Complete API for building any type of search application
Features range from basic keyword searching to advanced
techniques like classification and information visualization
Oracle Ultra Search
–
–
–
Out-of-the-box solution that requires minimal coding
Searches across OCS components, websites, databases,
files, email, and Portals, provided OCS already
implemented
Built on top of the Arembis-enhanced Oracle Text
Arembis
The Arembis-Oracle Solution
Looking into the details
–
–
–
–
–
Quality
Performance
Ease of Use
Personalization
Advanced features
Classification and visualization
Arembis
Quality
Link awareness
–
–
–
Popular pages and hubs
Website structure
Page structure
Duplicate elimination
–
Remove URLs with duplicate or near duplicate content
Spelling correction
–
–
Component that uses a dictionary and data from query logs
Did you mean …?
KWIC (Key Word In Context)
–
–
Highlights relevant parts of the document
No need to open the URL if it doesn’t look relevant
Arembis
Performance
Oracle Text enhanced by Arembis integrates with and
benefits from features like
–
–
–
Data partitioning
RAC
Query optimization
Common and rare queries
–
–
Small index on URL and title for common queries
Large index on document content for rare queries
Query Relaxation
–
–
Enables user to execute most restrictive query first
Then relaxes the search
Arembis
Ease of Use
Users want a simple and easy to use search interface
Hide all the complexity and expose simple interface
Ultra Search enhanced by Arembis
Three search modes - all from one search box
–
–
–
Precision: focused results from editor-selected data shown at top of page - the precision power of Arembis
Basic: simple search where results are sorted by relevance
Advanced: interface with more options where user has
more control over the collection
Arembis
Ease of Use (Cont.)
Arembis
Personalization
Know user search patterns
–
–
What do they search?
When do they search?
Search query log analysis
–
–
–
Which queries were made?
Which queries were successful?
How many times was each query made?
Arembis
Advanced Features
Classification
–
–
–
–
Supervised classification of content
Two ways: rules or training sets
You can group a number of categories into a taxonomy
Very useful for defining a common vocabulary in an
enterprise
Clustering - ONLY for sites with Oracle Infrastructure in place
–
–
–
–
Unsupervised classification of patterns into groups
The engine analyzes the document collection and outputs a
set of clusters with documents on it
Very useful for discovering patterns or nuggets in
collections
Could be used as a starting point when there is no
taxonomy present
Arembis
Advanced Features (Cont.)
Information Visualization
Very useful for
–
–
–
Navigation through large data sets
Discover relationships and associations between items
Focus + context tasks
Number of visualizations available
–
–
–
StretchViewer
Interactive Viewer (ThemeMap, Cluster visualization)
Integration with third party vendors
Arembis
Conclusions
Search is hitting a plateau
–
Bad search is easy to implement, good search is difficult
Correcting deficiencies
–
Quality, performance, and other features help
Moving to the next level
–
–
–
–
Classification and clustering
Text mining
Information Visualization
Content structure aware
Oracle Database 9i or 10g provides the complete solution
for enterprise search turbo-charged by the precision
Arembis HyperLogic engine, integrating:
–
–
Oracle Text: complete API where you have total control
Ultra Search: out-of-the-box solution that requires little coding
Arembis
Links
Oracle Text page
http://otn.oracle.com/products/text
Ultra Search page
http://otn.oracle.com/products/ultrasearch
Java library for Text visualization
http://otn.oracle.com/software/products/works
pace_mgr/text_visualizer.html
Arembis
http://www.Arembis.com
Arembis
Arembis+Oracle
A Powerful Partnership
QUESTIONS
ANSWERS
Arembis
Arembis