Transcript 118_21.1

Ashish Sharma
CS-257
ID:118
 Databases are created independently, even if they later
need to work together.
 The use of databases evolves, so
we can not design a database to
support every possible future use.
 We will understand Information integration from an
example of University Database.
 Earlier we had different databases for different
functions like;
 Registrar Database for keeping data about courses and
student grades for generating transcripts.
 Bursar Database for keeping data about the tuition
payments by students.
 Human Resource Department Database for recording
employees including those students with teaching
assistantship jobs.
 Applications were build using these databases like
generation of payroll checks, calculation of taxes and
social security payments to government.
 But these databases independently were of no use as a
change in 1 database would not reflect in the other
database which had to be performed manually. For e.g.
we want to make sure that Registrar does not record
grades of the student who did not pay the fees at
Bursars office.
 Building a whole new database for the system again is
a very expensive and time consuming process.
 In addition to paying for a very expensive
software the University will have to run both the
old and the new databases together for a long
time to see that the new system works properly
or not.
 A Solution for this is to build a layer of abstraction,
called middleware, on top of all legacy databases,
without disturbing the original databases.
 Now we can query this middleware layer to retrieve or
update data.
 Often this layer is defined by a collection of classes and
queried in an Object oriented language.
 New applications can be written to access this
layer for data, while the legacy applications
continue to run using the legacy database.
 When we try to connect information sources that were
developed independently, we invariably find that
sources differ in many ways. Such sources are called
Heterogeneous, and the problem of integrating them is
referred to as the Heterogeneity Problem. There are
different levels of heterogeneity viz.
1.
2.
3.
4.
5.
6.
Communication Heterogeneity.
Query-Language Heterogeneity.
Schema Heterogeneity.
Data type differences.
Value Heterogeneity.
Semantic Heterogeneity.
 Today, it is common to allow access to your
information using HTTP protocols. However, some
dealers may not make their databases available on net,
but instead accept remote accesses via anonymous
FTP.
 Suppose there are 1000 dealers of Aardvark
Automobile Co. out of which 900 use HTTP while the
remaining 100 use FTP, so there might be problems of
communication between the dealers databases.
 The manner in which we query or modify a dealer’s
database may vary.
 For e.g. Some of the dealers may have different
versions of database like some might use relational
database some might not have relational database, or
some of the dealers might be using SQL, some might
be using Excel spreadsheets or some other database.
 Even assuming that the dealers use a relational DBMS
supporting SQL as the query language there might be
still some heterogeneity at the highest level like
schemas can differ.
 For e.g. one dealer might store cars in a single relation
while the other dealer might use a schema in which
options are separated out into a second relation.
 Serial Numbers might be represented by a character
strings of varying length at one source and fixed length
at another. The fixed lengths could differ, and some
sources might use integers rather than character
strings.
 The same concept might be represented by different
constants at different sources. The color Black might
be represented by an integer code at one source, the
string BLACK at another, and the code BL at a third.
 Terms might be given different interpretations at
different sources. One dealer might include trucks in
Cars relation, while the another puts only automobile
data in Cars relation. One dealer might distinguish
station wagons from the minivans, while another
doesn’t.