Vertica Database - vertica

Download Report

Transcript Vertica Database - vertica

Vertica Database
A Brief Intro to Vertica Database – Navin Chakraborty
Topics
1. What is Vertica
2. Vertica concepts
a. Column storage
b. Hybrid Storage
c. Projections vs / Tables
d. Types of Projections
3. Vertica Objects
a. Projections
b. Views
c. Tables
d. UDF SQL functions
e. Sequences
4. Vertica Operations
a. DML operations ( Simple SQL statements )
b. Bulk Data Loading
c. Bulk Updating ( MERGE statement )
d. Exporting Data
5. Comparison with Teradata
What is Vertica
• Vertica is a True Columnar, MPP, Analytic database
• Vertica is just a DBMS like other DBMS( Oracle, Teradata,
MySQL)
• Vertica is made for storing Structured and semistrucutred data.
• It supports Standard SQL like other Databases
Release and Versions
• The most latest release of Vertica( launched in December
2013) worldwide is Vertica 7.0 and is called Crane
• The version used in our working environments by the Client is
Vertica 6.1.2 which is called Bulldozer( Funny but they seem
to use the heavy lifting automotive names for every major
release )
Vertica Concepts – Column Storage
• Unlike other RDBMS, Vertica reads the columns from database objects
called projections. No resources are wasted by reading large numbers of
unused columns. Every byte of data is used by the execution engine. For
example, consider this simple two-table schema:
Suppose you want to run this query:
SELECT A, C, N
FROM Table1 JOIN Table2
ON H = J;
• A row store must read 16 columns (A through H and J through Q) from
physical storage for each record in the result set. A column store with a
query-specific projection reads only three columns: A, C, and N.
Vertica Concepts – Hybrid Storage
•
To support Data Manipulation Language (DML) commands (INSERT, UPDATE, and
DELETE) and bulk load operations (COPY), intermixed with queries in a typical data
warehouse workload, Vertica implements the storage model shown in the
illustration below. This model is the same on each Vertica node.
•
•
•
WOS – Write optimised Store (memory)
ROS – Rad Optimised Store(Disk)
Tuple Mover – Operation to move data
Vertica Concepts –Projection vs Tables
• Projection are nothing but materialized views
same like in Oracle or any other database. The
only difference in Vertica is Data is stored in
Projections and not tables.
• When it comes to understanding this term
quickly, My tagline will be
“Projections are physical and Tables are logical
in Vertica”
Continued..
• So all the data that you store in Tables are
physically stored in Projection for those tables.
and the projections are automatically created
for you by vertica when you create your
tables.
Vertica Concepts – Types of Projections
• Super Projections - A superprojection is a
projection for a single table that contains all
the columns in the table.
• PreJoin Projection - Projections can contain
joins between tables that are connected by
PK/FK constraints. These projections are called
pre-join projections. Pre-join projections can
have only inner joins between tables on their
primary and foreign key columns.
Continued..
• Query specific Projections - You can optimize
your queries by creating one or more
projections that contain only the subset of
table columns required to process the query.
These projections are called query-specific
projections.
Vertica Objects - Projections
• For each table in the database, Vertica requires a
minimum of one projection, called a
superprojection. A superprojection is a projection
for a single table that contains all the columns in
the table.
• To get your database up and running quickly,
Vertica automatically creates a default
superprojection for each table created through
the CREATE TABLE and CREATE TEMPORARY
TABLE statements.
Anatomy of Projection
•
•
When you need to create a projection manually for queries to optimize you can
use CREATE PROJECTI)ON statement
The CREATE PROJECTIONS statement
Vertica Objects - Views
• Views in Vertica are just like any other
database views they store the query on base
tables and when you select from views the
query is executed on the base table and the
data is retrieved.
• Syntax
CREATE [ OR REPLACE ] VIEW viewname [ (
column-name [, ...] ) ] AS query ]
Continued..
• Example
CREATE VIEW myview AS
SELECT SUM(annual_income), customer_state
FROM public.customer_dimension
WHERE customer_key IN
(SELECT customer_key
FROM store.store_sales_fact)
GROUP BY customer_state
ORDER BY customer_state ASC;
• Dropping A View
DROP VIEW myview;
Vertica Object - Tables
• Tables are logical in Vertica. You can create a
table and drop a table and alter a table , but it
just changes the logical schema of the
database and not the physical schema.
• A single table in vertica can have 1600
Columns
• The data is stored in the projections
corresponding to a particular table and not in
the table.
Continued..
• Syntax
CREATE TABLE [ IF NOT EXISTS ] [[db-name.]schema.]table-name
{
... ( column-definition (table) (page 779) [ , ... ] )
... | [ column-name-list (create table) (page 780) ] AS [COPY] [ [ AT EPOCH LATEST ]
... | [ AT TIME 'timestamp' ] ] [ /*+ direct */ ] query
... | [ LIKE [[db-name.]schema.]existing-table [ INCLUDING PROJECTIONS | EXCLUDING
PROJECTIONS ] ]
} ... [ ORDER BY table-column [ , ... ] ]
... [ ENCODED BY column-definition [ , ... ]
... [ hash-segmentation-clause (page 750)
... | range-segmentation-clause (page 790)
... | UNSEGMENTED { NODE node | ALL NODES } ]
... [ KSAFE [k_num] ]
... [ PARTITION BY partition-clause ]
Continued..
Example
CREATE TABLE Public.Employee_Dimension (
Employee_key integer PRIMARY KEY NOT NULL,
Employee_gender varchar(8) ENCODING RLE,
Employee_title varchar(8),
Employee_first_name varchar(64),
Employee_middle_initial varchar(8),
Employee_last_name varchar(64), )
ORDER BY Employee_gender, Employee_last_name, Employee_first_name;
Example
CREATE TABLE time( ..., date_col date NOT NULL, ...)
PARTITION BY extract('year' FROM date_col);
Vertica Objects - SQL functions
• SQL function is Vertica is where you can embed small sql
functions to use it in your SQL statements and it also
facilitates code re-usability.
• Lets you store SQL expressions as functions in Vertica for
use in queries. These functions are useful for executing
complex queries or combining Vertica built-in functions.
You simply call the function name you assigned.
• A SQL function can be used anywhere in a query where an
ordinary SQL expression can be used, except in the table
partition clause or the projection segmentation clause.
Conitnued..
• Syntax
CREATE [ OR REPLACE ] FUNCTION
... [[db-name.]schema.]function-name ( [ argname argtype [, ...] ] )
... RETURN rettype
... AS
... BEGIN
...... RETURN expression;
... END;
•
•
Example
This following statement creates a SQL function called myzeroifnull that accepts an
INTEGER argument and returns an INTEGER result.
CREATE FUNCTION myzeroifnull(x INT) RETURN INT
AS BEGIN
RETURN (CASE WHEN (x IS NOT NULL) THEN x ELSE 0 END);
END;
Conitnued..
You can use the new SQL function (myzeroifnull) anywhere you use an ordinary SQL expression. For example, create a simple table:
=> CREATE TABLE tabwnulls(col1 INT);
=> INSERT INTO tabwnulls VALUES(1);
=> INSERT INTO tabwnulls VALUES(NULL);
=> INSERT INTO tabwnulls VALUES(0);
=> SELECT * FROM tabwnulls;
a
--1
0
(3 rows)
Use the myzeroifnull function in a SELECT statement, where the function calls col1 from table tabwnulls:
=> SELECT myzeroifnull(col1) FROM tabwnulls;
myzeroifnull
-------------1
0
0
(3 rows)
Use the myzeroifnull function in the GROUP BY clause:
=> SELECT COUNT(*) FROM tabwnulls GROUP BY myzeroifnull(col1);
count
------2
1
(2 rows)
If you want to change a SQL function's body, use the CREATE OR REPLACE syntax. The following command modifies the CASE expression:
Vertica Objects - Sequences
•
The sequence is a feature by some database products which just creates unique
values. It just increments a value and returns it. The special thing about it is: there
is no transaction isolation, so several transactions can not get the same value, the
incrementation is also not rolled back. Without a database sequence it is very hard
to generate unique incrementing numbers.
Example
• Create a simple sequence:
CREATE SEQUENCE my_seq MAXVALUE 5000 START 150;
CREATE SEQUENCE
Vertica Operation – DML operations
•
SQL data manipulation language (DML) commands INSERT, UPDATE, and DELETE
perform the same functions in Vertica as they do in row-oriented databases. These
commands follow the SQL-92 transaction model and can be intermixed.
•
DML statements acquire write locks to prevent other READ COMMITTED
transactions from modifying the same data. SELECT statements do not acquire
locks, which prevents read and write statements from conflicting.
•
All the DML operations (Insert, Update, Delete) are similar to other row oriented
databases.
Insert
•
Inserts values into all projections of a table. You must insert one complete tuple at a
time. By default, Insert first uses the WOS. When the WOS is full, the INSERT overflows
to the ROS.
• Syntax
INSERT [ /*+ direct */ ] [ /*+ label(label-name)*/ ]
... INTO [[db-name.]schema.]table
... [ ( column [, ...] ) ]
... { DEFAULT VALUES
... | VALUES ( { expression | DEFAULT } [, ...] )
... | SELECT...
Examples
• => INSERT INTO t1 VALUES (101, 102, 103, 104);
• => INSERT INTO customer VALUES (10, 'male', 'DPR', 'MA', 35);
• => INSERT INTO retail.t1 (C0, C1) VALUES (1, 1001);
• => INSERT INTO films SELECT * FROM tmp_films WHERE date_prod < '2004-05-07';
Update
•
Replaces the values of the specified columns in all rows for which a specific condition is true. All
other columns and rows in the table are unchanged. By default, UPDATE uses the WOS and if the
WOS fills up, overflows to the ROS.
• SelfJoins are not permitted in Updates in Vertica
• UPDATE inserts new records into the WOS and marks the old records for deletion.
• You cannot UPDATE columns that have primary key or foreign key referential integrity constraints.
Syntax
UPDATE [ /*+ direct */ ] [ /*+ label(label-name)*/ ]
... [[db-name.]schema.]table-reference (on page 882) [AS] alias
... SET column =
... { expression | DEFAULT } [ , ... ]
... [ FROM from-list ]
... [ WHERE clause
Example
UPDATE Result_Table r
SET address=n.new_address
FROM New_Addresses n
WHERE r.cust_id = n.new_cust_id;
Delete
•
Unlike most databases, the DELETE command in Vertica does not delete data; it marks records as
deleted.
• Marks tuples as no longer valid in the current epoch, marking the records for deletion in the WOS,
rather than deleting data from disk storage. By default, delete uses the WOS and if the WOS fills up,
overflows to the ROS. You cannot delete records from a projection.
• DELETE statements support subqueries and joins, which is useful for deleting values in a table
based on values that are stored in other tables.
Syntax
DELETE [ /*+ direct */ ] [ /*+ label(label-name)*/ ]
... FROM [[db-name.]schema.]table
... WHERE clause
Example
• The following command truncates a temporary table called temp1:
DELETE FROM temp1;
• The following command deletes all records from base table T where C1 = C2 - C1.
DELETE FROM T WHERE C1=C2-C1;
• The following command deletes all records from the customer table in the retail schema where the
state attribute is in MA or NH:
DELETE FROM retail.customer WHERE state IN ('MA', 'NH');
Vertica Operations – Bulk Data
Loading
• The COPY command is designed for bulk load operations and can
load data into the WOS or the ROS.
• In Vertica, the COPY statement is designed for bulk loading data into
the database. COPY reads data from text files or data pipes and
inserts it into WOS (memory) or directly into the ROS (disk). COPY
automatically commits itself and any current transaction but is not
atomic; some rows could be rejected. Note that COPY does not
automatically commit when copying data into temporary tables.
• You can use the COPY statement's NO COMMIT option to prevent
COPY from committing a transaction when it finishes copying data.
You often want to use this option when sequentially running several
COPY statements to ensure the data in the bulk load is either
committed or rolled back at the same time. Also, combining
multiple smaller data loads into a single transaction allows Vertica
to more efficiently load the data.
Conitnued..
Syntax
COPY [ [db-name.]schema-name.]table
... [ ( { column-as-expression | column }
...... [ FILLER datatype ]
...... [ FORMAT 'format' ]
...... [ ENCLOSED BY 'char' ]
...... [ ESCAPE AS 'char' | NO ESCAPE ]
...... [ NULL [ AS ] 'string' ]
...... [ TRIM 'byte' ]
...... [ DELIMITER [ AS ] 'char' ]
... [, ... ] ) ]
... [ COLUMN OPTION ( column
...... [ FORMAT 'format' ]
...... [ ENCLOSED BY 'char' ]
...... [ ESCAPE AS 'char' | NO ESCAPE ]
...... [ NULL [ AS ] 'string' ]
...... [ DELIMITER [ AS ] 'char' ]
... [, ... ] ) ]
FROM { STDIN
...... [ BZIP | GZIP | UNCOMPRESSED ]
...| 'pathToData' [ ON nodename | ON ANY NODE ]
...... [ BZIP | GZIP | UNCOMPRESSED ] [, ...]
...| LOCAL STDIN | 'pathToData'
...... [ BZIP | GZIP | UNCOMPRESSED ] [, ...]
}
...[ NATIVE | NATIVE VARCHAR | FIXEDWIDTH COLSIZES (integer [, ....]) ]
...[ WITH ]
...[ WITH [ SOURCE source(arg='value')] [ FILTER filter(arg='value') ] [ PARSER parser(arg='value') ]]
...[ DELIMITER [ AS ] 'char' ]
...[ TRAILING NULLCOLS ]
...[ NULL [ AS ] 'string' ]
...[ ESCAPE AS 'char' | NO ESCAPE ]
...[ ENCLOSED BY 'char' ]
...[ RECORD TERMINATOR 'string' ]
...[ SKIP records ]
...[ SKIP BYTES integer ]
...[ TRIM 'byte' ]
...[ REJECTMAX integer ]
...[ EXCEPTIONS 'path' [ ON nodename ] [, ...] ]
...[ REJECTED DATA 'path' [ ON nodename ] [, ...] ]
...[ ENFORCELENGTH ]
...[ ABORT ON ERROR ]
...[ AUTO | DIRECT | TRICKLE ]
Conitnued..
•
Examples
The following examples load data with the COPY statement using the FORMAT, DELIMITER, NULL and ENCLOSED BY
string options, as well as a DIRECT option. =>
COPY public.customer_dimension (customer_since FORMAT 'YYYY')
FROM STDIN
DELIMITER ','
NULL AS 'null'
ENCLOSED BY '"'
=> COPY a
FROM STDIN
DELIMITER ','
NULL E'\\\N'
DIRECT;
=> COPY store.store_dimension
FROM :input_file
DELIMITER '|'
NULL ''
RECORD TERMINATOR E'\f'
Vertica Operations - Merge
•
•
•
•
•
Lets you load a batch of new records while simultaneously updating existing
records by internally combining INSERT and UPDATE SQL operations in one
statement. In a MERGE operation, Vertica replaces the values of the specified
columns in all rows of the target table for which a specific condition is true. All
other columns and rows in the table are unchanged. By default MERGE uses the
WOS, and if the WOS fills up data overflows to the ROS.
When you write a MERGE statement, you specify both a target and source table.
You also provide a search condition (ON clause), which Vertica uses to evaluate
each row in the source table in order to update or insert into the target table
records from the source table.
You can use the optional WHEN MATCHED or WHEN NOT MATCHED clauses to
further refine results. For example, if you use:
WHEN MATCHED THEN UPDATE: Vertica updates (replaces) the values of the
specified columns in all rows when it finds more than one matching row in the
target table for a row in the source table. All other columns and rows in the table
are unchanged. If Vertica finds more than one matching row in the source table for
a row in the target table, it returns a run-time error.
WHEN NOT MATCHED THEN INSERT: Vertica inserts into the target table all rows
from the source table that do not match any rows in the target table.
Continued..
• Syntax
MERGE [/*+ direct */] INTO [[db-name.]schema.]target-table-name [alias]
... USING [[db-name.]schema.]source-table-name [alias ] ON ( condition )
... [ WHEN MATCHED THEN UPDATE SET column1 = value1 [, column2 = value2 ... ] ]
... [ WHEN NOT MATCHED THEN INSERT ( column1 [, column2 ...])
VALUES ( value1 [, value2 ... ] ) ]
Vertica Operation -Exporting Data
• Vertica supports spooling of data as in Oracle where you
can retrieve all the results of a select query into a file as
specified.
• \o parameter start the spooling in Vertica after this option
any thing you select in the database will be redirected onto
the file and not on the standard output.
• Again \o will end spooling for that session
Syntax
\o /home/tmp/output.txt
Select * from store.store_sales_fact ;
\o
Comparision with Teradata
• In Teradata there is a provision of Control Flow of
statements ( IF, GOTO) , Vertica does not support
these statements
• Update query in Vertica is different then in
Teradata
• There is no provision for Qualify function in
Vertica as in Teradata
• Volatile tables in Teradata, Local Temporary tables
in Vertica.
• Update queries cannot have a self join in Vertica
as in Teradata
Conitnued..
• There are no templates/utilites in vertica for
loading,exporting data as in Teradata (
FastLoad,Fastexport), Vertica uses the vsql
terminal for all its operations
Thankyou