Data Management for Clinical Research

Download Report

Transcript Data Management for Clinical Research

Database
Fundamentals
Vanderbilt University
September 24, 2009
Copyright 2009 - P.Harris
In this lecture, you will learn …

Overview of Relational Database concepts
We will use MS-Access for convenience in this course
(many principles scale to other relational databases)

Overview of MS-Access environment (objects)

Table Design Principles
- Table Relationships (Cascading)
- Primary keys, indexes, validation principles
Copyright 2009 - P.Harris
What is a Relational Database?
A relational database can be thought of as a software
“engine” that provides an interface between table-based
data (strong spreadsheets) and user application queries.
Examples of RDB applications include:
•MS Access *** (we’ll use this one for convenience)
•Oracle
•MySQL
•SQL Server (Microsoft)
•DB2 (IBM)
•Informix
Copyright 2009 - P.Harris
Why choose RDB
instead of SPSS / Excel / …
Although there is always overlap, the following rules might
help when deciding when / when not to use a RDB:
•RDB is best used for long-term data storage and/or data
sharing.
•MS Excel is best used for minor data collection,
manipulation, and especially visualization.
•SPSS is best used for minor data collection and especially
data analysis.
It is easy to export data from RDB to Excel  SPSS
Copyright 2009 - P.Harris
Why choose MS-Access over
other DBM systems?
Cheap, readily available (packaged with MS-Office
Premium).
Easy to use (relative to some systems – Oracle may require
one person to maintain the server as a database administrator
and another person to serve as an application developer).
Includes front-end tools for rapid application development
(RAD). This also makes MS-Access a good prototype
environment.
Copyright 2009 - P.Harris
Why choose other DBM
systems over MS-Access?
MS-Access can handle a large number of records, but is
somewhat slow compared to high-end platforms.
Multiple users may use the database simultaneously, but MSAccess is known to become unstable with greater than 3-5
users.
There is a “snob factor”. I personally recommend the use of
other systems (Oracle, SQL Server, MySQL, etc) when
writing grant proposals. If not, at least omit mentioning the
MS-Access software by name – (we will use a relational
database to …)
Copyright 2009 - P.Harris
What is in an MS-Access file - 1?
Although the term “database” typically refers to a collection of
related data tables, an Access database includes more than just
data. In addition to tables, you can add:
•Saved queries (stored procedures) - organizing and/or
manipulating data
•Forms – GUI interaction with data, event programming
•Reports – customized results for printing (~ static forms)
•Macros and VB programs for extending functionality
Microsoft provides some logical integration of these tools
through “wizards”. However, these are pretty basic - most
developers must pick and choose the best approach when
implementing applications.
Copyright 2009 - P.Harris
What is in an MS-Access file - 2?
Unless advanced
techniques are employed,
all entities are stored in
one *.mdb file. When
running, a locking file
(*.ldb) is also visible.
Only the mdb file needs
to be copied to transfer
the database to another
computer or location.
Copyright 2009 - P.Harris
Demonstration

Open MS-Access here and lead demonstration
of the overall environment (tables, forms,
queries, reports, modules)

Demonstrate overall application from user view,
but explain during demonstration which
underlying database object(s) involved.

Finish this demonstration by emphasizing that
the most important piece is the data going into
the tables. Tables are the root of any RDB
system.
Copyright 2009 - P.Harris
What is in an MS-Access file - 3?
VB + Macros – Event Driven Automation, etc.
Forms (Active)
Reports (Static)
Queries
Tables
Demographics
Ethnicity
Labs
Copyright 2009 - P.Harris
H&P
Advanced – Splitting
Front-End File - Contains all Application Entities (Forms,
Queries, etc.) and links to data tables in back-end file. Note
you may have more than one FE to accommodate
different user types (nurses, laboratory, etc).
VB + Macros – Event Driven Automation, etc.
Software
Application
Forms (Active)
Reports (Static)
Database
Queries
Tables
Demographics Ethnicity
Labs
H&P
Back-End File - Contains all Data Tables
Copyright 2009 - P.Harris
Front-End
Example
Copyright 2009 - P.Harris
Microsoft Access – Module 1 Summary
MS-Access is a powerful relational database
program. Table and query concepts are similar for
any RDB package. MS-Access also has an
integrated application development environment for
creation of end-user software that can be
customized to fit most personal/departmental needs
for data collection and storage.
Most of what you learn using Microsoft Access is
applicable to any RDBMS.
Copyright 2009 - P.Harris
Creating / Working with Tables
Copyright 2009 - P.Harris
Tables – Glucose Measurement
Database
We wish to construct a database to track waking
glucose measurements for an indefinite amount of
time on 100 patients receiving 3 possible drug
combinations.
Why would this be difficult in MS-Excel or SPSS?
Copyright 2009 - P.Harris
 Think of Access as a collection of
spreadsheets that are relationally linked.
Demographics
Patient_ID
Fname
Lname
Address
Phone
Gender
Race
DOB
Height
Glucose
Glucose_ID
Patient_ID
Date
Weight
Med_ID
Glucose
Copyright 2009 - P.Harris
Meds
Med_ID
DrugCombonatio
n
STORE DATA ONE TIME / ONE PLACE
DO NOT STORE CALCULATED DATA
Tables Overview
Demonstration
Stop here and build these three tables
 Show data validation rules
 Show relationships
 Stress primary keys
 Demonstration cascading update/delete

Copyright 2009 - P.Harris
Table Demonstration - Live
General Setup for Tables
Describe General Options
Show Validation Rule
Relationships
Lookup Option
Copyright 2009 - P.Harris
Table Relationships - Live
Table Relationships
Describe Cascade Features
Copyright 2009 - P.Harris
MS-Access Power Trick
It is much faster to create an Excel template
spreadsheet and import than to create table fields
column-by-column in Access.
Take advantage of this when needing a table by:
1. Creating spreadsheet with column headers.
2. Create one row of representative dummy data
(every field needs a value)
3. Import spreadsheet to Access
4. Delete dummy row and check field types for
accuracy
Copyright 2009 - P.Harris
Summary
Data storage principles
1. Attempt to store data 1 time / 1 place;
2. Do not store data that may be calculated from other fields (utilize
queries); and
3. Strive for very discrete data storage (no ambiguity – garbage in /
garbage out).
4. Choose real or arbitrary (autonumber) unique identifier for each
record.
Relationships
Use table relationships to automatically cascade delete and update
records.
Other Data Sources
Import = Copy; Link = Live Connect.
Copyright 2009 - P.Harris