Database/Record Structure

Download Report

Transcript Database/Record Structure

Dialog Databases
Structure & Indexing
Dr. Dania Bilal
IS 530
Fall 2009
Definition
A database is a collection of information
organized in a way that a computer
program can quickly retrieve desired
pieces of data.
Database Components
Fields
Records
Files
Database Fields
Pieces of information a user can access






Author
Title
Journal name
Abstract
Descriptors
Other
Fields Attributes
Numeric
(e.g., accession number)
Textual
(e.g., author name)
Data Structure
A scheme for organizing related pieces of
information.
Basic types of data structures




Files
Records
Trees
Tables
Files
File


A collection of records
In Dialog, a file also refers to a specific
database
Every file/database has a number and/or a
name

ERIC is a database with a file no. 1 in Dialog.
Records
Record

A collection of fields which constitutes a
complete set of information
Author, title, journal name, abstract, etc.

A collection of records constitutes a file.
Trees
Data is organized in a hierarchical
structure




Each element is attached to one or more
elements that is directly beneath it.
Connections between elements ->branches
Elements at bottom of a tree with no elements
below them -> leaves
Example: Yahoo directory.
Tables
Data is organized in rows and columns

Example: Excel spreadsheet
Relational database management systems
store data in the form of related tables

Aleph system (Hodges online catalog) is based on a
relational database management system called Oracle.
Dialog Database
Documents or surrogates are stored in a
linear file


Example of linear organization is cassette
tape
Access to songs on the tape is not “direct” or
“random” in nature.
Linear file is transformed into an inverted
file (in Dialog)
Dialog Database Structure
Linear file

Composed of document surrogates
(abstracts) stored in their full, original form.
Inverted file

Composed of all words included in document
surrogates excluding stop words.
Problem with Linear File
Documents or surrogates will have to be
searched in their entirety to locate specific
information needed.



Slow
Inefficient
Access to information may cause frustration
Inverted File
Words in all document surrogates can be
searched instead of the whole text of the
documents themselves



Music CD is an analogy to an inverted
structure.
Divided into tracks
Random and direct access to each track is
easy
Faster access to information
Dialog Inverted File
A list of words in each document surrogate
is made.
Each word is numbered, including phrases
and excluding stop words (the, a, an, etc.).
Words that are numbered are
alphabetized (numbers precede letters).
Dialog Inverted File
Alphabetized entries are followed by


document number (based on its acquisition
and addition to database)
field entry or entries appeared in
Author field
Title field
Abstract field
Descriptor field
Other fields, as applicable
Linear File: Example
101. The origins of Don Giovanni.
Discusses the history and sources Mozart used in
his opera Don Giovanni.
DE: Mozart, Opera, Historical Analysis.
Inverted File
101. The origins of Don Giovanni.
Discusses the history and sources Mozart used in his opera Don Giovanni.
DE: Mozart, Opera, Historical Analysis.
Word
Doc no.
Field
Word sequence
Origins
Don
101
101
Ti
Ti
2
4
Giovanni
Discusses
History
101
101
101
Ti
Ab
Ab
5
1
3
Sources
Mozart
Used
101
101
101
Ab
Ab
Ab
5
6
7
Inverted File Cont’d.
101. The origins of Don Giovanni.
Discusses the history and sources Mozart used in his opera Don Giovanni.
DE: Mozart, Opera, Historical Analysis.
Word
Doc no.
Field
Word sequence
Mozart
101
DE
1
Opera
101
DE
2
Historical
101
DE
3
Analysis
101
DE
4
Historical Analysis
101
DE
3,4
Indexing
Words (keywords)


Every important word in a document is
indexed
Example: Historical analysis
Indexed as 2 separate words and as a phrase
Historical (word)
Analysis (word)
Historical analysis (phrase)
Google Indexing
Example 1. Google
Phrase/Sentence Indexing.
Example 2. Google
Phrase/Keywords Indexing.
Example 3. Google Natural
Language Search and Retrieval???
Demos
Dialog - ERIC database
EBSCO - ERIC database
Discussion of differences in interface
features