Lecture 3: Molecular database development - BIDD

Download Report

Transcript Lecture 3: Molecular database development - BIDD

LSM3241: Bioinformatics and
Biocomputing
Lecture 3: Molecular database development
Prof. Chen Yu Zong
Tel: 6516-6877
Email: [email protected]
http://bidd.nus.edu.sg
Room 07-24, level 7, SOC1,
National University of Singapore
Development of a protein inhibitor database
• Step 1: Start from a protein (alternative
names from Swissprot)
• Step 2: Search for inhibitors using protein
name and alternative names (from Medline,
or Google)
• Step 3: Find additional information about
each of the inhibitors (from ChemIDplus
database)
• Step 4: Database construction
2
Assignment of a protein
• Each student is given an Id number, use this
number to get your protein from the protein
list posted on the module home page.
• In case that you wish to work on a different
protein instead of the one assigned to you,
get another one from a list of back-up
proteins posted on the module home page.
• Earn extra credit by working on extra
number of proteins
3
Inhibitor search
• Search Medline by using keywords (protein name + inhibitor).
• In case no inhibitor or too few inhibitors can be found from medline,
search google or yahoo as an alternative source.
• Make sure the protein name and its alternative names (get from
Swissprot) are used in the search.
• Record inhibitor name and reference source (journal issue, page
and publication year; internet source etc.).
• Record the number of references checked (including those found
from the keyword-search but without inhibitors)
• Try to find as many inhibitors as you can. The more known
inhibitors you get, the more credit you can earn.
4
Finding chemical information
• Search ChemIDPlus database (http://chem.sis.nlm.nih.gov/chemidplus/)
for additional information of each inhibitor
• Before using ChemIDPlus, download Chime from MDL
Click here to search
Input inhibitor name
Select “Equals” for
inhibitor name
Select “Contains”
for other keywords
Click here to select Chime option in
ChemIDPlus
(Chime is needed to view structure)
More Info at:
http://sis.nlm.nih.gov/Chem/ChemMain.html
5
ChemIDPlus database
Additional information of each inhibitor may be found from this database
How to use this database?
Click here to
get full record
Click here to
get structure
6
ChemIDPlus database
Additional information of each inhibitor may be found from this database
How to use this database?
7
ChemIDPlus database
Additional information of each inhibitor may be found from this database
How to use this database?
8
ChemIDPlus database
Additional information of each inhibitor may be found from this database
How to use this database?
9
ChemIDPlus database
Additional information of each inhibitor may be found from this database
How to use this database?
10
ChemIDPlus database
Additional information of each inhibitor may be found from this database
How to use this database?
11
ChemIDPlus database
Additional information of each inhibitor may be found from this database
How to use this database?
12
ChemIDPlus database
Additional information of each inhibitor may be found from this database
How to use this database?
Right-click in this field to
get 3D structure file
13
ChemIDPlus database
Additional information of each inhibitor may be found from this database
How to use this database?
Steps for saving the file:
• Open a word or text file
• Select all text in the source
file
• Copy (using Ctrl+C) and
paste (using Ctrl+V) to the
word or text file
• Save the word or text file as
chem_name.doc or
chem_name.txt
14
Generating 3D structure from 2D structure
Use of ViewerPro software
Save 2D structure
as a MDL file
15
ViewerPro software
Load 2D structure MDL file (viracept.mol)
Save 2D structure
as a MDL file
16
ViewerPro software
Save to 3D structure: select File => Save as (pdb format)
Save 2D structure
as a MDL file
17
ViewerPro software
Re-load pdb file and clean structure:
select Modify => Clean Structure (click here 10 times)
Save 2D structure
as a MDL file
18
ViewerPro software
Save as pdb file: select File => Save as (pdb file)
Save 2D structure
as a MDL file
19
Database construction
Information needed to construct database
Inhibitor Name Aspirin
Protein Target Cox I (SwissProt accession number) [1]
CAS Number 50-78-2 [2]
Formula C9-H8-O4 [2]
Systematic Name O-Acetylsalicylic acid, Salicylic acid, acetate
Synonyms 2-(Acetyloxy)benzoic acid, 2-Acetoxybenzoic acid [2], A.S.A. empirin [2]
Classification Analgesic, Anti-inflammatory agents, non-steroidal
Notes:
The prototypical analgesic used in the treatment of mild to moderate pain. It has
anti-inflammatory and antipyretic properties and acts as an inhibitor of
cyclooxygenase which results in the inhibition of the biosynthesis of prostaglandins.
Aspirin also inhibits platelet aggregation and is used in the prevention of arterial
and venous thrombosis. (From Martindale, The Extra Pharmacopoeia, 30th ed,p5)
References:
1:OMIM 176805 (http://www.ncbi.nlm.nih.gov/htbin-post/Omim/dispmim?176805)
2:ChemIDplus Database (http://chem.sis.nlm.nih.gov/chemidplus/)
3D structure aspirin.pdb
20
Database construction
Database development
21
Database construction
Database development
22
Database construction
Database development
23
Database Construction
Database Definition:
• A Database is a shared collection of Inter-Related data
designed to meet the needs of multiple types of users
and applications.
This implies that multiple user VIEWS can be defined
• Data stored is independent of the programs which use it
• Data is structured to provide a basis for future
applications
DATABASE = Stored Collection of Related Data
May be physically distributed
24
Database construction
Database development
25
Database Construction
Data/Information
A General Definition:
DATA - raw (unprocessed or partly processed) facts which
represent the state of entities (things) which have
occurred
INFORMATION - data which has been processed into a form
USEFUL TO THE USER
What is Information to one user may be Data to another user.
26
Database construction
Database development
27
Database Models - Hierarchical
owner/parent
owner
child /parent
member
child
child/parent
28
Database Models - Network
‘set’ of data
owner
member
owner
member
‘set’ of data
Note: Only linked sets can be accessed
29
Database Models - Relational
table
A
table
B
table
C
table
D
table
E
Any table(s) can be joined to any other table(s), provided
there is a means of effecting the join
Primary key / Foreign key concept. Data redundancy
No fixed linkages
30
Example: 2 Relations in a Database
EMPNUM
3
7
11
15
18
NAME
JONES
SMITH
ADAMS
NGUYEN
PHAN
Date of Birth
16-05-1956
23-09-1965
11-08-1972
23-10-1964
16-11-1976
DEPTNUM
605
432
201
314
201
Relation (Table) Name : EMP
Relation Schema: EMP(empnum,name,date of birth,deptnum)
DEPTNUM
201
314
432
605
DEPTNAME
Production
Finance
Information Systems
Administration
Relation (Table) Name : DEPT
Relation Schema: DEPT(deptnum, deptname)
31
Definition of a Relational Database
• A relational database is a collection of
relations or two-dimensional tables.
Database
Table Name: EMP
EMPNO
ENAME
JOB
7839
7698
7782
7566
KING
BLAKE
CLARK
JONES
EXECUTIVE
MANAGER
MANAGER
MANAGER
DEPTNO
10
30
10
20
Table Name: DEPT
DEPTNO
10
20
30
40
DNAME
ACCOUNTING
RESEARCH
SALES
OPERATIONS
LOC
CLAYTON
CAULFIELD
PENINSULA
GIPPSLAND
Data Models
Model of
system
in client’s
mind
Entity model of
client’s model
Table model
of entity model
Server
Tables on disk
Relational Database Management System
Server
User tables
Data
dictionary
String Matching
Based on
1. Analysis of query pattern
2. Auxiliary table
3. Character mapping
1 2 3 4 5 6 7 8 9 10 11
LECTUREROOM
L occurs in position 1
E occurs in positions 2 and 7
C occurs in position 3
T occurs in position 4
U occurs in position 5
R occurs in positions 6 and 8
O occurs in positions 9 and 10 M occurs in position 11
Matching starts <Right> character and progressively shifts left on match
36
Database construction
Database development: data format
37
Database construction
Database development: data languages
38
Database construction
Database development: An example of data language
39
Communicating with a RDBMS Using SQL
SQL statement
is entered
SQL> SELECT loc
FROM dept;
Statement is sent to
database
Database
Data is displayed
LOC
------------CLAYTON
CAULFIELD
PENINSULA
GIPPSLAND
Database construction
Data Definition Language (DDL)
used to describe data at the database level
Schema level - complete database description
Sub-Schema level - user views (restricted)
Data Manipulation Language (DML)
Provides for
Create
Update
Delete
Modify
Report
Insert
Retrieve (extract)
Drop
Calculation
capabilities
41
Using Oracle
• SQL
– Structured Query Language: DDL, DML, DCL
– Universally supported by database servers
• SQL*Plus
– Oracle command-line SQL interpreter
– Account (schema) = independent working
space
42
SQL data types
Type description
Oracle SQL
variable-length char. string
VARCHAR2(l)a
fixed-length char. string
CHAR(l) a
number
NUMBER(p,s)b
currency
NUMBER(10,2)
date
DATE
a: length.
b: length, scale.
43
General Data Types
• Used to augment an attribute description and to provide a
means of Integrity
’Normal’ data types are :
Character (or text)
Numeric - Integer, Decimal, Money, Float
(in Access Long and Short Integer,
Decimal with options of a number of ‘decimal’
places)
Date - Standard date format - Access has various forms
Logical - Yes/No True/False Exists/Does Not Exist
44
Data Attribute Size
In many cases this is set by the Data Definition facilities
e.g.. Date, Short Integer, Long Integer, Logical,Number
Others are set by the Designer:
Number of Characters, Size of a ‘Decimal Number’
Access allows for
a Default value
Duplicates/ No Duplicates allowed
Primary Key nomination
Indexing
‘Required’ Status of an Attribute
45
DDL Example
CREATE TABLE books (
title VARCHAR2(60),
author VARCHAR2(60),
isbn NUMBER(10,0)
CONSTRAINT pk_books PRIMARY KEY,
pub_date DATE DEFAULT SYSDATE
);
PRIMARY KEY: column serves as a unique identifier
SYSDATE: returns the system’s current date
“;” terminates a SQL statement
46
DDL Example
CREATE TABLE book_reviews (
isbn NUMBER(10,0)
CONSTRAINT fk_books_booksrev REFERENCES
books(isbn),
reviewer VARCHAR2(30),
comments VARCHAR2(150)
);
• REFERENCES: value in this column must be one of the values in
the isbn column of the table “books”. This is called referential
integrity.
47
DML Example
INSERT INTO books VALUES(
’The Importance of Being Earnest’,
’Oscar Wilde’, -- this is a comment
9876543210,
’14-FEB-1895’
);
--: Comment. The part from this symbol till the end of the line will be
ignored by the database server.
’14-FEB-1895’: Date format. DD-MMM-YYYY or DD-MMM-YY
48
DML Example
SELECT * FROM books;
SELECT * FROM books
WHERE author = ’Oscar Wilde’;
*: every column
’Oscar Wilde’: strings are quoted by ‘. If ‘ is part of the string, it is
escaped by double it (e.g. string a’b should be ‘a’’b’ in SQL)
49
DML Example
SELECT title FROM books
WHERE author LIKE ’Oscar%’;
You will get: The Importance of Being Earnest
LIKE: fuzzy matching (in contrast to =)
%: wild character:
% represents any string of any length (Osc%r matches Oscar,
Osceer…)
_ represents any single character (Osc_r matches Oscar, Oscer…)
50
DML Example
COMMIT;
ROLLBACK;
if you have modified the data in database, such as inserted or deleted
an entry, you can either:
COMMIT to make this change permanent, or
ROLLBACK to cancel your operation (the data will look as if you just
finished your last COMMIT).
51
Further readings
• Oracle TechNet
– http://otn.oracle.com/index.html
– Free subscription
• Online tutorials
– http://oldweb.uwp.edu/academic/mis/baldwin/sqlplus.htm
– http://www.ilook.fsnet.co.uk/index/oracle.htm
– Google for more!
52
Database construction
Database development webpage:
http://bidd.nus.edu.sg/group/teach/submit/welcome.asp
53
Database construction
Database development webpage:
http://bidd.nus.edu.sg/group/teach/submit/welcome.asp
54
Database construction
Database development webpage:
http://bidd.nus.edu.sg/group/teach/submit/welcome.asp
55
Database construction
Database development webpage:
http://bidd.nus.edu.sg/group/teach/submit/welcome.asp
56
Summary of Today’s lecture
• Molecular database development:
– Protein inhibitor search
– Getting additional info about selected inhibitors (using
ChemIDPlus database)
– Generating 3D structure (using DS ViewerPro)
– Database construction
57