Transcript Document

Visual Basic and Perl Applications for Genome Project Management.
N.Yurgel* ,
Brenda K.
Schroeder ,
and Michael L.
Kahn .
Institute of Biological Chemistry1, School of Electrical Engineering and Computer Science2 and School of Molecular Biosciences3
Washington State University, Pullman WA 99164-6340
Genomics research requires coordinating data analysis with
experimental manipulations. Cloning more than 6300 Open
Reading Frames (ORFs) from the bacterium, Sinorhizobium
meliloti (Schroeder et al. 2005) required a database that could be
used to schedule an asynchronous workflow to maintain a high
throughput with minimum clerical effort. A Microsoft® Access®
database operating on linked Microsoft® Excel® spreadsheets
was implemented to track each ORF through the cloning
process. The familiar Excel spreadsheet format decreased the
time needed to train students to input and analyze data and
increased the level of comfort and enthusiasm for automated
tracking. For similar reasons, exchanging spreadsheets with
vendors was relatively simple.
Using Microsoft® Access®
together with Visual Basic applications embedded in Microsoft®
Excel®, allowed us to rapidly develop new utilities as they were
needed. Visual Basic and Perl routines were mainly used for
extracting or manipulating data on a global scale, to change
formats or to check for the presence of certain patterns. The
query language allowed us to select data using criteria specific to
the particular stage of the procedures and to output the results
as worksheets for subsequent experimental manipulation. Finally,
data stored in a Microsoft® Access® database can be transferred
easily to more powerful databases (mySQL or Oracle) for further
analysis. This relatively simple database format was adequate for
this task and we offer it as an example of customizable software
that can significantly accelerate the manipulation of data and
materials in a complex project.
The utilization of a relational database in Microsoft®Access®
using linked Microsoft®Excel® spreadsheets provided several
advantages compared to other database applications. First, the
startup time was immediate, no database design was required.
Second, working in the format of familiar Excel spreadsheets
decreased the time required to learn how to input and manipulate
the data. Third, using Microsoft®Access® tools like the query
language facilitated the data search process, allowing us to select
data using criteria appropriate to the particular stage of the
process and to configure the data output into formats useful for
subsequent experimental manipulations.
Fourth, manually
executed routines written in Visual Basic were used for data
manipulation, generation of data sheets, and analysis. Finally, the
storage format for data in a Microsoft®Access® database is
directly transferable to more powerful databases such as mySQL.
Design, Ordering, and Organization of PCR Primers
There were numerous steps in the cloning process which were
routinely followed with the database. These included generating
primer orders, selecting primers for polymerase chain reaction
(PCR), completing the PCR, determining which clones were cloned
and tracking storage locations for each step in the process. Visual
Basic routines were utilized in numerous ways including
constructing the primer sequences, checking the sequence of the
primers received, and generating worksheets that could be taken
into the laboratory.
This poster describes the use of a
Microsoft®Access® database in which tables are linked to
Microsoft®Excel® spreadsheets.
A Perl routine checked the
position of primers in the genome and extracted DNA sequences
flanked by the primers for further manipulation. This approach
allowed simple questions to be answered in order to move forward
in the laboratory in a very efficient way. The partnering of a
spreadsheets and the complementary Visual Basic routines
generally served as an “electronic notebook” for the project.
Cloning Process
Plasmid Isolation
Electroporation and plating
The design of our nested PCR required that all the gene specific primers be 20 base pairs long
with 12 base pair tags attached to the forward and reverse primers. This was accomplished using
the Visual Basic program ‘Primer Design’. Once the primers were constructed in silico, they were
used to confirm the start and stop information for each in the database using a Perl routine. This
information was then entered into the Microsoft® Access® Database. ‘Primer Compare’ is a Visual
Basic program used to compare the file received after primers were constructed to confirm which
primers were received from which order and if the primer sequences were correct. This
information was then entered into the Access Database using the Visual Basic program ‘Update
Table’. This program allowed an asynchronous process for ordering primers, selecting primers
etc. to be carried out as the separate requirements for each step of each ORF was completed.
Primer Design
Precipitate PCR Product
Complete BP Clonase
Update Table
PCR Confirmation of plasmid DNA
++++++++++++++++ ++++++++++++++++ ++++++++++++++++
Score data
Primer Compare
Update Table
The organization of the database schema is based on the data
collected for the genomic project in the “electronic notebook” and
the data was directly uploaded from the “electronic notebook” to the
web accessible MYSQL database. The website design was based on
how the pieces of data are interrelated. Below is an example of how
to move through the different pages of the website based on their
interrelated features to obtain information about specific ORFs that
were cloned in various destination vectors. This information can
include the sequence of the PCR amplified ORF, the sequence of the
different destination vectors and strain information about the
different constructs that have been constructed.
++- +++++++++++++ ++++++++++++++++ +++- +++++++++- ++
Score Data
Use previously generated
worksheet for this step.
Transfer of data from “electronic notebook” to Web
accessible database
PCR Amplification and Confirmation
After PCR, the PCR products were PEG precipitated and resuspended in buffer. A 3 µl aliquot was
removed and placed in a new set of strip cap tubes. The BP Clonase reaction buffer and entry
vector were added to each tube and the reaction was incubated ath room temperature. A 3 µl
aliquot was then used for electroporation to introduce the cloned plasmid DNA into an E.coli host.
Cells carrying the plasmid were then cultured and the plasmid DNA was isolated. This isolated
DNA was then used as template for a PCR reaction to confirm that each plasmid was carrying the
appropriate size insert. The amplified DNA was run on an electrophoresis gel and the gel was
scored for the presence of the correct size bands using the information found on the previously
generated worksheet for this set. The data was recorded and entered into the Excel file used to
store the data. The data was then added to the database using the Visual Basic program ‘Update
Table’. The database can then be queried for success with an individual ORF at each step. This
information can then be used to start the process over at the appropriate step if necessary.
Update Table
Host strain
Parental strain
Grandparental strain
Create Gel Form
Database schema
The list of the next 94 ORFs to be PCR amplified was obtained by querying the Access® Database, asking for ORFs that had primers
diluted and ready to use. These diluted primers were then aliquoted into strip cap tubes and used to amplify the DNA of the
respective ORFs using PCR. ‘Create Gel Form’ is a Visual Basic program which utilizes the list of 94 ORFs and their nucleotide start
and stop information to construct forms to be used for recording data in the laboratory. This list of 94 ORFs remained fixed during
the cloning process. Data about each ORF at each step in the process was entered into the Database using ‘Table Update’.
Website design