Improving Intranet Search

Download Report

Transcript Improving Intranet Search

Arembis
HyperLogic
Intranet Search
partnering with
Session id: 40185
Improving Intranet Search
with the Arembis HyperLogic
Engine built on Oracle
Database-Backed Technology
Arembis
Arembis
Agenda
 Issues with Enterprise Search
 Oracle’s products behind the Arembis engine
–
–
Infrastructure: Oracle Text
Solution: Oracle Ultra Search
 Looking into the details
 Overview of main features
 Conclusions
Arembis
Current Problems with
Intranet Search
 Enterprise Intranet is very different from
typical Internet websites
–
–
–
Users are different
Tasks are different
Amount and quality of information are different
 Searching is also different
Arembis
Main Issues with Intranet
Search
 Multiple repositories
–
Different data sources (websites, files, email, etc.)
 Performance
–
Sub-second query respond time needed - not minutes
 Quality
–
Precise search results not thousands of irrelevant stuff
 Ease of Use
–
One single search engine not an engine per data source
 Bad search is very easy to do
 Good search is very difficult
Arembis
What is a Bad Search?
 No search box
 Too many hits
–
Return 10,000 hits when the average user looks at the
top 10 only
 The most relevant item is not at the top of the list
–
Bad scoring
 Too many similar documents
–
Poor duplicate detection
 Inability to judge user intent
–
–
–
No misspelling interpretation
No context disambiguation (cricket the game or cricket the
bug?)
No recommendation system
Arembis
What is a Bad Search (Cont.)
 Inability to understand why a document has
been returned
–
No KWIC
 Lack of categorization
–
Similar documents in the same list
 Documents change behind your back
–
No cache
 Meta information
–
Size, format, date, feedback, etc.
Arembis
Some Examples - I
Where is the search box?
Arembis
Some Examples – II
“ultra seek” or “ultraseek”?
Some Examples - III
Looking for “k-means” in lotus.com
Oracle Products behind the
Arembis Search Technology
 Oracle Text
–
–
Complete API for building any type of search application
Features range from basic keyword searching to advanced
techniques like classification and information visualization
 Oracle Ultra Search
–
–
–
Out-of-the-box solution that requires minimal coding
Searches across OCS components, websites, databases,
files, email, and Portals, provided OCS already
implemented
Built on top of the Arembis-enhanced Oracle Text
Arembis
The Arembis-Oracle Solution
Looking into the details
–
–
–
–
–
Quality
Performance
Ease of Use
Personalization
Advanced features
 Classification and visualization
Arembis
Quality
 Link awareness
–
–
–
Popular pages and hubs
Website structure
Page structure
 Duplicate elimination
–
Remove URLs with duplicate or near duplicate content
 Spelling correction
–
–
Component that uses a dictionary and data from query logs
Did you mean …?
 KWIC (Key Word In Context)
–
–
Highlights relevant parts of the document
No need to open the URL if it doesn’t look relevant
Arembis
Performance
 Oracle Text enhanced by Arembis integrates with and
benefits from features like
–
–
–
Data partitioning
RAC
Query optimization
 Common and rare queries
–
–
Small index on URL and title for common queries
Large index on document content for rare queries
 Query Relaxation
–
–
Enables user to execute most restrictive query first
Then relaxes the search
Arembis
Ease of Use




Users want a simple and easy to use search interface
Hide all the complexity and expose simple interface
Ultra Search enhanced by Arembis
Three search modes - all from one search box
–
–
–
Precision: focused results from editor-selected data shown at top of page - the precision power of Arembis
Basic: simple search where results are sorted by relevance
Advanced: interface with more options where user has
more control over the collection
Arembis
Ease of Use (Cont.)
Arembis
Personalization
 Know user search patterns
–
–
What do they search?
When do they search?
 Search query log analysis
–
–
–
Which queries were made?
Which queries were successful?
How many times was each query made?
Arembis
Advanced Features
 Classification
–
–
–
–
Supervised classification of content
Two ways: rules or training sets
You can group a number of categories into a taxonomy
Very useful for defining a common vocabulary in an
enterprise
 Clustering - ONLY for sites with Oracle Infrastructure in place
–
–
–
–
Unsupervised classification of patterns into groups
The engine analyzes the document collection and outputs a
set of clusters with documents on it
Very useful for discovering patterns or nuggets in
collections
Could be used as a starting point when there is no
taxonomy present
Arembis
Advanced Features (Cont.)
 Information Visualization
 Very useful for
–
–
–
Navigation through large data sets
Discover relationships and associations between items
Focus + context tasks
 Number of visualizations available
–
–
–
StretchViewer
Interactive Viewer (ThemeMap, Cluster visualization)
Integration with third party vendors
Arembis
Conclusions
 Search is hitting a plateau
–
Bad search is easy to implement, good search is difficult
 Correcting deficiencies
–
Quality, performance, and other features help
 Moving to the next level
–
–
–
–
Classification and clustering
Text mining
Information Visualization
Content structure aware
 Oracle Database 9i or 10g provides the complete solution
for enterprise search turbo-charged by the precision
Arembis HyperLogic engine, integrating:
–
–
Oracle Text: complete API where you have total control
Ultra Search: out-of-the-box solution that requires little coding
Arembis
Links
 Oracle Text page
http://otn.oracle.com/products/text
 Ultra Search page
http://otn.oracle.com/products/ultrasearch
 Java library for Text visualization
http://otn.oracle.com/software/products/works
pace_mgr/text_visualizer.html
 Arembis
http://www.Arembis.com
Arembis
Arembis+Oracle
A Powerful Partnership
QUESTIONS
ANSWERS
Arembis
Arembis