The California Institute for Telecommunications and Information
Download
Report
Transcript The California Institute for Telecommunications and Information
An Introduction to CAMERA and Underlying
Technologies
Philip Papadopoulos
University of California, San Diego
San Diego Supercomputer Center
California Institute of Telecommunications and
Information Technology (Calit2)
PI Larry Smarr
Announced 17 Jan 2006. Public Release 13 March 2007
$24.5M Over Seven Years
DNA Basics for Non-Biologists
•
Nucleotide bases of DNA
– ACTG (Adenine, Cytosine, Guanine, Thymine)
– A Sequence of Bases Forms One Side of a DNA
Strand
– Complementary Bases form the other side of
DNA
– A matches T (pair)
– C matches G (pair)
•
During cell replication, DNA is “unzipped” . The
complementary side can then be replicated
perfectly
•
Human DNA is about 3 billion base pairs on 26
Chromosomes
Bases Amino Acids
•
Triplets of nucleotide bases are called codons and define amino acids.
–
–
–
–
•
Amino acids are the basic building blocks of proteins
There are 20 amino acids, but 4^3 = 64 nucleotide combinations.
Many amino acids have multiple codons
Special codons (called start and stop codons) assist in DNA translation
during cell replication.
Reading Frames of: GGGAAACCC
– This raw sequence could be read as
– GGGAAACCC (GGG AAA CCC) (Glycine, Lysine, Proline)
– GGAAACCC (GGA AAC) (Glycine, Asparagine)
– GAAACCC (GAA ACC) (Glutamic Acid, Threonine)
Sequencing Tidbits
•
The Institute for Genomic Research (TIGR) sequenced the genome of the
bacterium Haemophilus influenzae in 1995 using shotgun sequencing
– 1.8 Million Base Pairs (Human: 3 Billion)
•
Sequencing does NOT tell you what function a particular gene plays
•
It is believed that only ~1.5% of human chromosome codes for expressed
characteristics
– The non-coding portions contain our genetic history
– Unknown what function the rest our DNA plays
Most of Evolutionary Time Was in the Microbial World
You
Are
Here
Tree of Life Derived from 16S rRNA Sequences
Source: Carl Woese, et al
Marine Genome Sequencing Project –
Measuring the Genetic Diversity of Ocean Microbes
Need
Ocean Data
Sorcerer II Data Will Double
Number of Proteins in GenBank!
Some CAMERA Goals
•
Provide an infrastructure where scientists from around the world can
perform analysis on genetic communities
– Global Ocean Sampling (GOS) is the initial large data set
– ~ 8.5 Billion base pairs of raw Reads
– Metadata is available for samples
– Saline, Temperature, Geographic Location, Water Depth, Time of Day …
– Other metadata will be correlated with samples (e.g. MODIS Satellite)
•
Allow others to search and compare input sequences against CAMERA data.
•
Overall provide a resource dedicated to metagenomics
– Support new datasets
– Support new analysis tools and web services
Global Ocean Survey (GOS) Sequences are Largely
Bacterial
~3 Million
Previously Known
Sequences
~5.6 Million
GOS
Sequences
Source: Shibu Yooseph, et al. (PLOS Biology in press 2006)
Reason for CAMERA
• The Global Ocean Survey (GOS) is a huge influx of
sequence data
• Factors that interrelate microbes and microbial
communities are not well known
• Significant analysis requires large resources
– All-to-all comparisons
– Integration of other environmental (meta) data (weather,
temperature, salinity,…) is essential
• Raw Sequence Data sets are mid-sized
– Current set of GOS Raw Reads is about 100GB (FASTA
Files)
Calit2 CAMERA Production
Compute and Storage Complex is On-Line
512 Processors
~5 Teraflops
~ 200 Terabytes Storage
User Map – 03 May 2007
• Site in production on 13 March 2007
• More than 500 Registered users from
around the globe (~10 new users/day)
Calit2’s Direct Access Core Architecture
CAMERA’s Metagenomics Server Complex
Sargasso Sea Data
Moore Marine
Microbial Project
NASA and NOAA
Satellite Data
Community Microbial
Metagenomics Data
DataBase
Farm
Flat File
Server
Farm
10 GigE
Fabric
Request
+ Web Services
JGI Community
Sequencing Project
W E B PORTAL
Sorcerer II Expedition
(GOS)
Traditional
User
Dedicated
Compute Farm
(100s of CPUs)
Response
Direct
Access
Lambda
Cnxns
Local
Environment
Web
(other service)
Local
Cluster
TeraGrid: Cyberinfrastructure Backplane
(scheduled activities, e.g. all by all comparison)
(10000s of CPUs)
Source: Phil Papadopoulos, SDSC, Calit2
Calit2 CAMERA Production
Compute and Storage Complex is On-Line
Web, Application, DB
Servers
200 TB File Storage
10 Gbit/s Network
1 and 10 Gbit/s
Switching
Compute Nodes
Global Elements
• Data location – Storage Resource Broker Meta data
catalog
• Data-type aggregation, cross-correlation, integration
– BIRN Data Mediator
• Identity Management
– Use Grid Security Infrastructure (GSI) Public Key
System
– Integrated Grid Accounts Management Architecture
(GAMA) from SDSC for ease-of-use and Single Sign On
• Portal Services
– Based on GridSphere
– Small Dedicated Compute Cluster (32 nodes)
Logical Layout of Servers
Single Sign On Layer
Web
Server
Portal
Server
(Tomcat)
Single
Sign-on
Server
Public Net
Private Net
Cluster
Frontend
Blast
Master
(Jboss)
Postgres
Database
Cluster Nodes and File Servers
GAMA
Server
An Incomplete List of Software Components
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Postgres Database
Apache Tomcat
Jboss Servlet Container
Google Web Toolkit
Sun Grid Engine
GAMA (Grid Accounting and Management Architecture)/GSI from Globus
OPAL (Grid/Web Services Wrapper)
GridSphere Portlet Container
CAMERA Registration Portal
Venter Application Portal
NCBI Blast, MPIBlast, ClustalW, MrBayes, CDHit, and host of other Bio
Software
Ergatis Workflow Engine
Jforums
Drupl
All Integrated with Rocks … Single Person Deployment
OptIPortal– Another Rocks Cluster
Termination Device for the OptIPuter Global Backplane
•
•
•
20 Dual CPU Nodes, 20 24” Monitors, ~$50,000
1/4 Teraflop, 5 Terabyte Storage, 45 Mega Pixels--Nice PC!
Scalable Adaptive Graphics Environment ( SAGE) Jason Leigh, EVL-UIC
Source: Phil Papadopoulos SDSC, Calit2
Use of OptIPortal
to Interactively View Microbial Genome
15,000 x 15,000 Pixels
Acidobacteria bacterium Ellin345 (NCBI)
Soil Bacterium 5.6 Mb
Source: Raj Singh, UCSD
Use of OptIPortal
to Interactively View Microbial Genome
15,000 x 15,000 Pixels
Acidobacteria bacterium Ellin345 (NCBI)
Source: Raj Singh, UCSD
Soil Bacterium 5.6 Mb
A Look at Networking
Introduction to Quartzite
An Experimental Network
Sunlight (10 Gigabit) Campus/WAN
Using a Lambda Network for CAMERA
•
Many community databases
– Protein Databank (PDB)
– GenBank
– SwissProt
•
Support only web or web services interfaces
– New analysis/programs need access to raw databases/files
– Usually, groups make a point-in-time copy of the database
•
– We call this a data “fork”
– Updates are not processed
– Papers published with point-in-time data out of date by months or
years
CAMERA “Direct Connect” will allow us to provide a high-speed connection
to the backend servers
– Try to eliminate data forking
– Copies of CAMERA data is inevitable
– Need mechanisms that allow others to keep their copies in synch with
CAMERA
UCSD Quartzite Core at Completion (Year 5 of
OptIPuter)
• Funded 15 Sep 2004
Quartzite Communications
Core Year 3 (DWDM)
To 10GigE cluster
node interfaces
.....
Quartzite
Core
• Physical HW to Enable
Optiputer and Other Campus
Networking Research
Wavelength
Selective
Switch
(Lucent)
• Hybrid Network Instrument
To 10GigE cluster
node interfaces and
other switches
To cluster nodes
.....
To cluster nodes
.....
GigE Switch with
Dual 10GigE Upliks
32 10GigE
GlimmerGlass
128 port OOO
To cluster nodes
.....
To
other
nodes
GigE Switch with
Dual 10GigE Upliks
...
GigE Switch with
Dual 10GigE Upliks
Force10 E1200
GigE
10GigE
4 GigE
4 pair fiber
Juniper T320
CalREN-HPR
Research
Cloud
Campus Research
Cloud
Reconfigurable
Network and
Enpoints
4x4 Wavelength Cross-Connect:
•
•
combiners
All integrated optics (except optical amplifiers)
–
4 1x4 WSS modules
–
4 4x1 passive optical combiners
4 x 40l x 40Gbps = 6.4Tbps switching capacity
–
currently using central 8l
WSSs
1x4 WSS
1x4 WSS
1x4 WSS
1x4 WSS
25 | AT&T Labs, October 2007
Optical
Amps
4x4 WXC rack
WXC performance demonstration:
1x4 WSS
ASE source
1x4 WSS
4x1
swit
ch OS
A
1x4 WSS
1x4 WSS
l1
l2
l3
l4
l5
l6
l1
l8
WSS1 WSS2 WSS3 WSS4
1
2
3
4
2
3
4
1
3/1
4
1/3
2
4
1
2
3
1
2
3
4
2
3
4
1
3/1
4
1/3
2
4
1
2
3
8 lasers at centre of C-Band at 100GHz s
use ASE source to illustrate wide bandw
1.use external 4x1 switch to scan WXC p
2.alter switch states of WSS1 and WSS3
26 | AT&T Labs, October 2007
WXC performance demonstration:
27 | AT&T Labs, October 2007
What Does it Cost to Drive the Network
•
Dominant cost is DWDM optics
•
Construction of Multiplexers is Simple, and not expensive ~
$250/Channel/End
Layer 1 – Four Channel DWDM
10Gbps Switch X 4
Per Side (optional)
XFP Switch Module X 4
Per Side (optional)
XFP DWDM
Optics X 4 Per Side
Used in Host or Switch
DWDM Mux
Transmit X 1 Per
Side
DWDM DeMux
Receive X 1 Per Side
SC to LC Fiber 2M X 5
Per Side
1 Fiber
Pair
Channel 31
Channel 32
Channel 33
Channel 34
Corning 1U Rack
Containing DWDM
Mux / DeMux + SC to
SC couplers, 1 Per side
SFP/XFP
Optics Costs
DWDM Optics
from
AACTelecom
10Gbps
3500 US
Luminent XFP
DWDM per unit
(ZR 80Km) OC192 and 10GE
compatible
10Gbps
2900 US
Luminent
(assembled in
US) XFP
DWDM per Unit
(ER 40Km)
OC-192 and
10GE
compatible
1 Gbps SFP
DWDM per
Unit (80KM
model)
OC-48
1220 US
1)Optics
2) Optional - Layer 2 Switch (10Gbps capable)
10Gbps
capable
switch
SMC8748L2
(A0707505)+
EXP MOD10G
(A0707506)
from Dell
Switch
1700 US
2 x 10Gbps
XFP ports, 48
x 1Gbps
Copper
10 Gbps
module
(holds XFP)
300 US
3) DWDM Mux DeMux
DWDM Mux
DeMux (SC
connector
type)
4, 8 , 16
channel =
DWDM-100
From
oemarket.c
om
4 Channel
(31,32,33,34
)
560 US
8 Channel
880 US
16 Channel
1600
(approx) US
4) Corning Rack Mount, Couplers, Fiber
Corning Mux
DeMux
container -1U
rack mount
Corning PCH01U from Ed
Carlin Graybar
1 U (sufficient
for 4, 8 or 16
channel)
200 US
2 sets of SC to
SC adaptors
100 US (approx)
Fiber Patch
Cables, Single
Mode
From Ed Carlin
Graybar
2M, SC to LC
connector type
30 US (approx)
each
Complete Solution
5) Optional- DWDM Media Converter
DWDM to
Copper
Media
Converter
From Carl
Stelling at
Aaxeon.co
m
SFP
pluggable
DWDM to
copper
media
converter
150 US
each, not
including
DWDM
optics (just
converter)
Quartzite State Nov 2007
•
•
•
•
•
Core Packet Switch with 68 10 GigE ports (More than ½ Terabit)
Approximately 30 Channels Lit
64-port All-Optical Glimmerglass Switch - All Fiber into Quartzite is
switchable
4 port x 8 Lambda DWDM switch at Lucent (On site at Calit2 in Dec)
4 Channel DWDM Between Calit2 and SDSC
– One channel is used for 10Gigabit Production to BIRN Data Racks.
•
•
•
Ordered, but waiting for fulfillment
20 Mux/Demux (8 C-band DWDM Channels + 1 1310 (LR) Passband)
32 DWDM XFPS (Channel 40-43 – will fill out rest of channels in 2008)