Part II: Creating a Relational Database to Model Biological Data

Download Report

Transcript Part II: Creating a Relational Database to Model Biological Data

A Short Introduction to Analyzing Biological
Data Using Relational Databases
Part II: Creating a Relational Database to Model Biological Data
Alex Ropelewski
[email protected]
Pittsburgh Supercomputing Center
National Resource for Biomedical Supercomputing
Bienvenido Vélez
[email protected]
University of Puerto Rico at Mayaguez
Department of Electrical and Computer Engineering
1
Learning Objectives
• The SQL relational database language
• Creating relational DB tables using SQL
• Inserting tuples into DB tables using SQL
3
SQL
• The language of relational databases
– Data definition/schema creation
– Implements relational algebra operations
– Data manipulation
•
•
•
•
Insertion
Manipulation
Updates
Removals
– A standard (ISO) since 1987
4
An Improved Relational Database Design
Sequences
Accession
Description
Species
P14555
Group IIA Phospholipase A2
Human
P81479
Phospholipase A2 isozyme IV Indian Green Tree Viper
P00623
Phospholipase A2
Eastern Diamondback Rattlesnake
Matches
Runs
Accession RunNum
eValue
P14555
1
4.18 E-32
RunNum
Date
P81479
2
2.68 E-52
1
7/21/07 Pam70
P14555
2
3.47 E-33
2
7/20/07 Blosom80
P81479
1
1.20 E-54
P00623
2
1.21 E-08
Matrix
5
SQL: Create Statement
• Describes data to be placed in a table
CREATE TABLE Sequences(
Accession varchar(32),
Description varchar(256),
Species varchar(256) )
Sequences
Accession
Description
Species
6
SQL: Create Statement
• Describes data to be placed in a table
CREATE TABLE Matches(
Accession varchar(32),
RunNum int,
eValue float )
Matches
Accession RunNum
eValue
7
SQL: CREATE TABLE Statement
• Describes data to be placed in a table
CREATE TABLE Runs(
RunNum int,
Matrix varchar(32),
DateRun date )
Runs
RunNum
Matrix
DateRun
8
Frequently Used Relational Data Types
TYPE NAME
DESCRIPTION
BOOLEAN
True or False Value
INT
Integer Number
FLOAT
Floating Point Number
DATE
Date (Year, Month and Day)
TIMESTAMP
Data and Time of an Event
CHAR(n)
Fixed Size Character String
VARCHAR(n)
Variable Size Character
BLOB
Large Segment of Text or Data
ENUM(v1,…,vn)
One of the values v1 … vn
BEWARE: Available Datatypes May Vary Significantly
Across DBMS Implementations
9
Frequently Used Data Type Modifiers
TYPE NAME
DESCRIPTION
DEFAULT v
Default value is v
NOT NULL
Field cannot be empty
AUTO_INCREMENT
On every insert column
automatically assigned n+1 where
n is the value assigned to the last
row inserted
PRIMARY KEY
This column is the primary key for
the table. Value must be unique
among all rows.
BEWARE: Behavior of Data Type Modifiers May Vary Significantly
Across DBMS Implementations
10
SQL: CREATE TABLE Statement
• Using Data Type Modifiers
CREATE TABLE Runs(
RunNum int AUTO_INCREMENT,
Matrix varchar(32) DEFAULT
“BLOSUM62”,
DateRun date NOT NULL)
Runs
RunNum
Matrix
DateRun
11
SQL: INSERT INTO Statement
• Places data into the table for the first time
INSERT INTO Sequences
(Accession, Description, Species)
VALUES
(’P14555’,’Group IIA Phospholipase A2’, ’Human’)
Sequences
Accession
Description
Species
P14555
Group IIA Phospholipase A2
Human
12
Key Concepts
• SQL (usually pronounced “sequel”) is the most common
language for manipulating relational data
• SQL’s CREATE statement can be used to specify and
create relational tables
• CREATE can be used to add attributes of several data
types
• SQL’s INSERT statement can be used to insert rows into
a table.
• Each INSERT statement inserts a single row into a
relational table
• Data can also be imported from flat files and
spreadsheets using importing tools available in most
13
relational database servers