databases and markup languages

Download Report

Transcript databases and markup languages

LIS508 lecture 2
Thomas Krichel
2003-10-07
today's lecture
• Recap on what we did last week.
• Encoding mark-up
• Databases
Recap
• Computers deal with on/off signals called
bits.
• Collections of these bits are binary
numbers.
• Texts are (basically) strings of characters.
To represent text, we need to represent
characters.
• To make a characters understandable to a
computer we associate a number with
each character. The result is a character
set.
Beyond characters
• There is more to text than a string of
characters.
• There is layout
– titles
– abstracts
– mathematical formula spacing
Layout
• Layout can be conveyed by additional text that
has special meaning. Examples
– LaTeX
– HTML
– PostScript
• Another way is to do non-textual layout by
adding some other digital signals. Examples
– DVI
– MS Word
– MS Powerpoint
These can not be shown in these slides!
Example: LaTeX
\bigskip\textbf{Class structure}
Classes will be held in the computer lab in the
Palmer School between 18:15 and 20:45. An
optional practice session will last until 21:15.
\begin{tabular}{@{}llll@{}}
0&2003--09--23&introduction to the course &\\
1&2002--09--30&bits bytes and characters &\\
2&2003--10--07&databases and markup
languages&\\
Example: HTML
<p><strong>Class structure</strong><p>Classes will be
held in the computer lab in the Palmer School between
18:15 and 20:45. An optional practice session will last
until 21:15.<p>Class details:
<p><center><table width=100% border=1>
<tr><td align=left> 0 </td><td align=left>
2003&#8211;09&#8211;23 </td><td align=left><a
href="lis508w03a-00.ppt">introduction to the course</a>
</td></tr><tr><td align=left> 1 </td><td align=left>
2002&#8211;09&#8211;30 </td><td align=left><a
href="lis508w03a-01.ppt">bits bytes and characters</a>
</td>
Example: PostScript
Fc(Class)g(structur)o(e)-104 3956 y
Fd(Classes)26b(will)g(be)e(held)g(in)h(the)f(co
mputer)f(lab)i(in)f(the)h(P)o(almer)f(School)g(be
tween)f(18:15)h(and)g(20:45.)36 b(An)25
b(optional)e(practice)h(session)-104 4055
y(will)d(last)g(until)f(21:15.)-104 4155
y(Class)i(details:)-104 4307 y(0)141
b(2003\22609\22623)94b(introduction)18
b(to)i(the)h(course)-104 4407 y(1)141
b(2002\22609\22630)94 b(bits)21
b(bytes)f(and)g(characters)-104 4507 y(2)141
b(2003\22610\22607)94 b(databases)20
b(and)g(markup)e(languages)-
DVI (rendition, "class structure")
1659: fntnum27 current font is ptmb8t
1660: setchar67 h:=-820459+473168=-347291, hh:=-22
1661: setchar108 h:=-347291+182183=-165108, hh:=-10
1662: setchar97 h:=-165108+327680=162572, hh:=11
1663: setchar115 h:=162572+254928=417500, hh:=27
1664: setchar115 h:=417500+254928=672428, hh:=43
1665: right3 163840 h:=672428+163840=836268, hh:=53
1669: setchar115 h:=836268+254928=1091196, hh:=69
1670: setchar116 h:=1091196+218232=1309428, hh:=83
1671: setchar114 h:=1309428+290976=1600404, hh:=101
1672: setchar117 h:=1600404+364376=1964780, hh:=124
1673: setchar99 h:=1964780+290976=2255756, hh:=142
1674: setchar116 h:=2255756+218232=2473988, hh:=156
1675: setchar117 h:=2473988+364376=2838364, hh:=179
1676: setchar114 h:=2838364+290976=3129340, hh:=197
1677: right2 -11792 h:=3129340-11792=3117548, hh:=196
1680: setchar101 h:=3117548+290976=3408524, hh:=214
Databases
• Databases are collection of data with
some organization to them.
• The classic example is the relational
database.
• But not all database need to be relational
databases.
Relational databases
• A relational database is a set of tables.
There may be relations between the
tables.
• Each table has a number of record. Each
record has a number of fields.
• When the database is being set up, we fix
– the size of each field
– relationships between tables
Example: Movie database
ID
M1
M2
M3
M4
M5
M6
| title
| Gone with the wind
| Room with a view
| High Noon
| Star Wars
| Alien
| Blowing in the Wind
| director
| F. Ford Coppola
| Coppola, F Ford
| Woody Allan
| Steve Spielberg
| Allen, Woody
| Spielberg, Steven
• Single table
• No relations between tables, of course
| date
| 1963
| 1985
| 1974
| 1993
| 1987
| 1962
Problem with this database
• All data wrong, but this is just for
illustration.
• Name covered inconsistently. There is no
way to find films by Woody Allan without
having to go through all spelling variations.
• Mistakes are difficult to correct. We have
to wade through all records, a masochist’s
pleasure.
Better movie database
ID
M1
M2
M3
M4
M5
M6
| title
| Gone with the wind
| Room with a view
| High Noon
| Star Wars
| Alien
| Blowing in the Wind
ID
D1
D2
D3
| director name
| Ford Coppola, Francis
| Allan, Woody
| Spielberg, Steven
| director
| D1
| D1
| D2
| D3
| D2
| D3
| birth year
| 1942
| 1957
| 1942
| year
| 1963
| 1985
| 1974
| 1993
| 1987
| 1962
Relational database
• We have a one to many relationship
between directors and film
– Each film has one director
– Each director has produced many films
• Here it becomes possible for the computer
– To know which films have been directed by
Woody Allen
– To find which films have been directed by a
director born in 1942
Many-to-many relationships
• Each film has one director, but many
actors star in it. Relationship between
actors and films is a many to many
relationship.
• Here are a few actors
ID
A1
A2
A3
| sex
|f
|m
|f
| actor name
| Brigitte Bardot
| George Clooney
| Marilyn Monroe
| birth year
| 1972
| 1927
| 1934
Actor/Movie table
actor id
| movie id
A1
| M4
A2
| M3
A3
| M2
A1
| M5
A1
| M3
A2
| M6
A3
| M4
… as many lines as required
SQL
• Once we have the relational database, we
can ask sophisticated questions:
– Which director has had the most female
actors working for him?
– In which years films have been shot that
starred actors born between 1926 and 1935?
• Such questions can be encoded in a
language know as “structured query
language” or SQL. All relational database
vendors implement a dialect of SQL.
databases in libraries
• Relational databases dominate the world of
structured data
• But not so popular in libraries
– Slow on very large databases (such as
catalogs)
– Library data has nasty ad-hoc relationships, e.g.
• Translation of the first edition of a book
• CD supplement that comes with the print version
Difficult to deal with in a system where all
relations and field have to be set up at the start,
can not be changed easily later.
http://openlib.org/home/krichel
Thank you for your attention!