Week1-DatabaseIntroduction - Cardiff Biodiversity Informatics

Download Report

Transcript Week1-DatabaseIntroduction - Cardiff Biodiversity Informatics

MET282
Information Systems in Bioinformatics
Databases – the Story So Far
Dr. Richard White
The Knowledge Pyramid
2
3
What a database is
• Data is stored separately from any
application programs which might use it
• Multiple uses of the data are envisaged
• Designed for retrieval in various anticipated
and unanticipated ways
4
What databases are not
• unstructured piles of data (including heaps of
web pages in web sites, wikis, blogs etc.)
• directories full of text files
• data collected and stored to be read by a single
program or for one kind of analysis
• spreadsheets
5
Spreadsheets versus databases (1)
• A spreadsheet is typically viewed as an entire
table of cells which may contain
– numbers (data)
– text (labels)
– formulae (calculations producing results)
• A database may be structured in various ways,
usually so that a small subset of the data is
presented as the result of a search
6
Spreadsheets versus databases (2)
Compared with a spreadsheet, a database
• requires planning
• the data is "hidden" until retrieved
• a program may be required to help enter data
• a program may be required to help retrieve data
• integrity checking can be performed (Week 4)
• can be multi-user
• can be available on the Web
7
Uses in bioinformatics
Contents of databases in bioinformatics:
• species names
• nucleotide sequence databases
• protein sequence databases
• protein structure databases
• phenotypic effects
• bibliographic data
• special-purpose databases
8
Uses in biodiversity informatics
Uses of databases in biodiversity:
• information about species names
• data about species
• data about biological specimens
• data about areas, places, sampling sites, etc.
(sometimes stored in Geographical Information
Systems (GIS)
9
Database architecture
• There are several very different ways to
organise data in databases, sometimes called
database architectures
• In the first part of this module we shall focus on
relational databases, widely used for scientific
data
• Later, we shall investigate other types of
database architecture
10
Database system components
A relational database management system
(DBMS) has the following essential components:
• Data tables (the data itself)
• “Storage engine” (stores data to and retrieves
data from the tables)
• User interface software (for programs and
humans to enter, view and edit data)
Some commercial general-purpose DBMSs, such as
Microsoft Access, make the engine and the interface
appear as one
11
Database system software
A DBMS usually also includes, in order of increasing userfriendliness:
1. Database “drivers”
2. APIs (application program interface modules, so that the
driver(s) can be called from, say, Perl, Python or Java)
3. Other import & export modules, etc. (to make it easier for
programs to store, retrieve and alter data)
4. Application programs (using the above to make it easier
for people to store, retrieve and alter data, and do useful
things with it, sometimes called “business logic”,
including ...
12
Database application programs
Application programs allow users to store, retrieve and
alter data, and do useful things with it, sometimes called
“business logic”, including
●
data analysis
●
report writing
●
13
utilities for database managers for
● backup
● integrity checking
● etc.
Example 1
• Imagine a database of your digital photo file
collection
Table of photo file names (with title, location,
date, exposure details, tags)
Table of locations (holidays, visits, etc.) with
dates, coordinates, etc.
Index of tags
Might include physical slides and prints
14
Example 2
• Imagine a database of your CD or music file
(e.g. MP3) collection
Table of CDs or files (with track titles,
performers, record companies)
Table of tracks (linked to CD or file)
Table of performers
Table of record companies
15
Data retrieval from a database
• A relational database consists of one or more
tables
• Data retrieved from a relational database can be
thought of as consisting of another (usually
smaller) table
• So how is this smaller table specified?
16
Specifying the result table
●
By “selecting” rows, by some property such as
performer = “Nigel Kennedy”
By “projecting” (choosing a subset of) the
columns required, as in title, performer, label
●
By “joining” two tables together, by means of a
linking column such as performer
SQL (Structured Query Language), which you met
briefly in the Computing module, is a commonly
used language in which to make these requests
●
17
End
This presentation is available on Learning Central and on
my web pages at
http://users.cs.cf.ac.uk/R.J.White/InfoSystemsInBioinfo/
http://biodiversity.cs.cf.ac.uk/teaching/InfoSystemsInBioinfo/
as file Week1-DatabaseIntroduction.ppt
No trees were harmed in the production of this presentation. However, a large
number of electrons were terribly inconvenienced.
18