CMPUT 391 Lecture Notes

Download Report

Transcript CMPUT 391 Lecture Notes

LogicSQL-based Enterprise
Archive and Search System
Li-Yan Yuan
How to organize the information and
make it accessible and useful ?
Oct 30, 2006
1
Projects

How to develop an enterprise search engine based on
a database management system
challenges:
implementation of the inverted index
Oct 30, 2006
2
Projects

How to implement the TOP K query


Oct 30, 2006
Ranking formula
Inverted indexes are created with respect to frequences
3
Internet search

Search for relevant web pages



Good answers:
 Relevant
 Popular
Public domain knowledge,
Search engines are critical to Internet use


Oct 30, 2006
internal workings are secret
Tremendous political, economical, and cultural power
4
Enterprise search


Search the enterprise information systems for right
information
Enterprise information






Internal web pages
Internal documentation systems
File systems
Databases
Email servers
The internet and enterprise domains differ fundamentally



Oct 30, 2006
Contents
User behavior
Economic motivations
5
Top-K Query

Objective


How to determine the top K objects that are most likely
(approximately) related to the given query
Applications





Oct 30, 2006
Information retrieval
Internet and enterprise searches
Multimedia similarity search
Scheduling large scale on-demand data broadcase
……
6
Oct 30, 2006
7
Oct 30, 2006
8
Development of Enterprise Search Systems
Oct 30, 2006
9
LogicSQL Enterprise information
Archive and Search system

LogicSQL An object-relational database
management system
New


Oct 30, 2006
concurrency control algorithm
Staged database architecture
Developed in the University of Alberta
Commercialized by Shanghai Shifang Software Co.
10
Enterprise Archive and Search System

To archive all the enterprise information contents







To provide a web styled search engine
To support user-specified ranking algorithms


Oct 30, 2006
File systems
Web pages
Emails
Internal documents
Database records?
focus on the platform of archive and search
Easy implementation and test of various ranking algorithms
11
Enterprise Archive and Search System

Extend the database functionalities

Security model
 Users,
roles + security handle
 Security primary key

New database objects
 Inverted
indexes

CREATE INVERTED INDEX
 DROP INVESTED INDEX
 Automatic population, similar to that of index
 ORDER BY clause
 User



Oct 30, 2006
specified aggregate functions
CREATE AGGREGATE FUNCTION
Top-K query evaluation
Specified crawlers
12
Enterprise Archive and Search System

User configuration




Extend the query languages


Oct 30, 2006
Set up crawlers
Create a list of inverted indexes
Create one aggregate function for object ranking
Implement the top K query algorithm
Web based query pages
13