Title (46 pt. HP Simplified bold)

Download Report

Transcript Title (46 pt. HP Simplified bold)

Vertica to HDFS
Capstone Project
Tharanga Gamaethige,
Engineer, Data Management, Vertica
University of Pittsburgh August30th, 2013
1
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Agenda
•
•
•
•
2
What is Vertica
Bridge from Vertica to HDFS
Success criteria
Benefits to you
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
What Is Vertica
• Founded in 2005 by database researcher Michael Stonebraker and a small
group of engineers
• Acquired by Hewlett Packard on March 2011.
3
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
What Is Vertica
• SQL Database for Real-time Analytics
• Runs on x86 hardware
• MPP Columnar Architecture – scales to
PBs!
• Reduced footprint via Advanced
Compression
• Extensible analytics capabilities
• Easy to setup and use
• Elastic - grow/shrink as needed
• Extensive Ecosystem of analytic tools
4
Speed
Scale
Simplicity
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Bridge from Vertica to HDFS
HDFS
cluster
Vertica database
cluster
•
•
•
5
Use as a database to database
export tool.
Export data from Vertica tables into
external targets e.g. to HDFS
Extensible to facilitate different data
formats, storage formats and data
targets.
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Bridge from Vertica to HDFS
HDFS
cluster
Vertica database
cluster
Formatter >
Tuples to Blocks
•
•
•
6
Pipe
delimited
ORC file
Etc.
Prism > Blocks
to Blocks
•
•
•
Zip
TAR
Etc.
Target > Blocks
to Storage
•
•
•
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HDFS
File system
Etc.
Success criteria
a) Plugin that can read data from Vertica tables and export into an external target.
E.g. HDFS cluster.
b) Design the plugin to be scalable to export terabytes of data.
c) Design the plugin to be extensible to support different data formats (pipe
delimited, ORC files, etc.), storage formats (zip, tar, plain data, etc.) and data
targets (HDFS, QFS, etc.)
7
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Benefits to you
• Get hands-on experience in using Vertica and HDFS.
• Learn to provide real-life design and implementation for extensibility, in the
face of big data and distributed processing.
• Recognition of being part of the open source community.
• Potential recognition from Vertica’s 1000s of customers.
• Most importantly free espressos, t-shirts and a coffee mug.
8
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thanks!
Tharanga Gamaethige : [email protected]
Sennott Square 5404
9
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.