Greenplum数据库

Transcript Greenplum数据库

Greenplum 数据库基础培训
Wei.Li
Senior Data Architecture
Alpine Solution
2010/05
Greenplum数据库
 海量并行处理 (Massively Parallel Processing) DBMS
 基于 PostgreSQL 8.2
 相同的客户端功能
 增加支持并行处理的技术
 增加支持数据仓库和BI的特性
• 外部表(external tables)/并行加载(parallel loading)
• 资源管理
• 查询优化器增强(query optimizer enhancements)
S1002 Network Configuration
as locally defined
192.168.
Local LAN/WAN
Catalyst 2960
X4200 (Master Host)
iLOM
e1000g3
e1000g2
e1000g5
e1000g4
e1000g1
e1000g0
X4500 (Segment Host 1,
Standby Master)
iLOM
e1000g3
e1000g2
e1000g5
e1000g4
e1000g1
e1000g0
X4500 (Segment Host 2)
iLOM
e1000g3
e1000g2
e1000g1
e1000g0
SMC 8748M
172.16.0
172.16.1
SMC 8748M
172.16.2
172.16.3
Greenplum 基本体系架构
LAN
Client
Master Host
Interconnect - Gigabit Ethernet Switch
Segment Host
Segment Host
Segment Host
Segment Host
客户端接口和程序
Client
 psql
 pgAdmin III
 ODBC/Datadirect
 JDBC
 Perl DBI
 Python
 libpq
OLE DB
Master Host
 访问系统的入口
 数据库侦听进程 (postgres)
 处理所有用户连接
 建立查询计划
 协调工作处理过程
 管理工具
Master Host
 系统目录表和元数据（数据字典）
 不存放任何用户数据
Segment
 每段（Segment）存放一部分用户数据
Segment Host
 一个系统可以有多段
Segment Host
 用户不能直接存取访问
 所有对段的访问都经过Master
Segment Host
 数据库监听进程(postgres)监听来自Master的
连接
Segment Host
Interconnect
LAN
Interconnect - Gigabit Ethernet Switch
 Greenplum数据库之间的连接层
 进程间协调和管理
 基于千兆以太网架构
 属于系统内部私网配置
 支持两种协议：TCP or UDP
Greenplum 高可用性体系架构
standby master host
standby master
synch
process
master host
segment host
primary segment
master instance
mirror segment
segment host
primary segment
mirror segment
gigabit ethernet
client
(private LAN)
segment host
primary segment
mirror segment
Master/Standby 镜像保护
primary master host
standby master host
System Catalogs
System Catalogs
synchronization
process
Transaction Logs
Transaction Logs
 Standby 节点用于当 Master 节点损坏时提供 Master 服务
 Standby 实时与 Master 节点的 Catalog 和事务日志保持同步
数据冗余-Segment 镜像保护
master host
global
catalog
segment host 1
segment host 2
segment host n
Segment 1
(primary)
Segment 2
(primary)
Segment n
(primary)
Segment 2
(mirror)
Segment n
(mirror)
Segment 1
(mirror)
Greenplum Master
 每个Segment的数据冗余存放在另一个Segment上，数据实时同步
 当Primary Segment失败时，Mirror Segment将自动提供服务
 Primary Segment恢复正常后，使用gprecoverseg –F 同步数据。
表分布的策略-并行计算的基础
表分布的策略-并行计算的基础
 Hash分布
 CREATE TABLE … DISTRIBUTED BY (column [,…])
 同样数值的内容被分配到同一个Segment上
segments
S1
master
CC
C
input rows
B
A
B
C
C
A
A
C
hash
assign
S2 A
A
A
S3
BB
 循环分布
 CREATE TABLE … DISTRIBUTED RANDOMLY
 具有同样数值的行内容并不一定在同一个Segment上
master
segments
S1 CC
A
input rows
B
A
B
C
C
A
A
C
assign
next
S2
C
B
S3
B
A
A
查询命令的执行
分布存储
customer
master
segment 1
sale
global
catalog
sale
part 1
cn integer
vn integer customer
pn integer part 1
dt date
qty integer product
part 1
prc float
vendor
part 1
cn integer
cname text
segment 2
segment 3
sale
part
vendor 2
sale
part 3
customer
vn integer
part 2
vname text
loc text
product
part 2
customer
part 3
vendor
product
part 2
product
part 3
vendor
part 3
pn integer
pname text
 举例说明：按卡号、客户号、机构的分布方式优劣点
查询命令的执行查询命令的执行
Segments
Segments
Client
query plan
Segments
Master
Segments
SQL查询处理机制
master
QD process
slice 3
gang
gang
QE process
slice 2
QE process
slice 2
gang
QE process
slice 1
QE process
slice 1
segment1
segment2
并行查询计划
SELECT customer, amount FROM sales JOIN customer USING (cust_id) WHERE date=04302008;
SEGMENT 2
SEGMENT 1
SLICE 3
SLICE 3
Gather
Motion
Gather
Motion
Hash Join
Hash Join
Hash
Redistribute
Motion
Table Scan
Table Scan
SLICE 2
SLICE 1
Redistribute
Motion
Hash
Table Scan
Table Scan
SLICE 1
SLICE 2
表分区的概念
 将一张大表逻辑性地分成多个部分，如按照分区条件进行查询，将减
少数据的扫描范围，提高系统性能。
 提高对于特定类型数据的查询速度和性能
 也可以更方便数据库的维护和更新
 两种类型:
 Range分区 (日期范围或数字范围)/如日期、价格等
 List 分区,例如地区、产品等
 Greenplum中的表分区在使用中具有总表的继承性，并通过Check参数
指定相应的子表
 分区的子表依然根据分布策略分布在各segment上
 分区是一种非常有用的优化措施，例如一年的交易按交易日期分区后，
查询一天的交易性能将提高365倍！！！
Data Distribution & Partitioning
表分区可减少数据的搜索范围，
提高查询性能
每个分区表的数据平均分布到各个节点
Segment 1A
Segment 1B
Segment 1C
Segment 1D
Segment 2A
Segment 2B
Segment 2C
Segment 2D
Segment 3A
Segment 3B
Segment 3C
Segment 3D
Jan 2005
Feb 2005
Mar 2005
Apr 2005
May 2005
Jun 2005
Jul 2005
Aug 2005
Sep 2005
Oct 2005
Nov 2005
Dec 2005
Full Table Scan VS. Partition Pruning
SELECT COUNT(*)
FROM orders
WHERE order_date>= ‘Oct 20 2005’
AND order_date< ‘Oct 27 2005’
Segment 1A
Segment 1B
Segment 1C
Segment 1D
Segment 1A
Segment 1B
Segment 1C
Segment 1D
Segment 2A
Segment 2B
Segment 2C
Segment 2D
Segment 2A
Segment 2B
Segment 2C
Segment 2D
Segment 3A
Segment 3B
Segment 3C
Segment 3D
VS
Segment 3A
Segment 3B
Segment 3C
Hash Distribution
Segment 3D
Hash Distribution+ Table Partitioning
表分区示意图
压缩存储
 压缩存储
 支持ZLIB和QUICKLZ方式的压缩,压缩比可到10:1
 压缩表只能是Append Only方式
 压缩数据不一定会带来性能的下降，压缩表将消耗CPU资源，而
减少I/O资源占用
 语法
CREATE TABLE foo (a int, b text)
WITH (appendonly=true, compresstype=zlib,
compresslevel=5);
行列存储
 Greenplum支持行或列存储模式
 列模式目前只支持Append Only
如果常用的查询只取表中少量字段，则列模式效率更高，如查询需要取表中的大量
字段，行模式效率更高
 语法：
CREATE TABLE sales2 (LIKE sales) WITH (appendonly=true,
orientation=column);
 效率比较测试：
– 测试1：需要去表中所有字段，此时行存储更快。
select * from dw_ods.s1_sema_scmcaccp_row where crdacct_nbr= '4033930000166380411';
41秒
select * from dw_ods.s1_sema_scmcaccp_col where crdacct_nbr= '4033930000166380411';
116秒
– 测试2：只取表中少量字段，列存储更快
select crdacct_status from dw_ods.s1_sema_scmcaccp_row where crdacct_nbr=
'4033930000166380411';
35秒
select crdacct_status from dw_ods.s1_sema_scmcaccp_col where crdacct_nbr=
'4033930000166380411';
3秒
外部表加载
 外部表的特征
 Read-only
 数据存放在数据库外
 可执行SELECT, JOIN, SORT等命令，类似正规表的操作
 外部表的优点
 并行方式加载
 ETL的灵活性
 格式错误行的容错处理
 支持多种数据源
 两种方式
 External Tables: 基于文件
 Web Tables: 基于URL或指令
基于外部表的高速数据加载
 利用并行数据流引擎，Greenplum可以直接用SQL操作外部表
 数据加载完全并行
ETL服务器
Segment主机
—
内
部
互
联
网
Segment主机
Master主机
千
兆
以
太
网
交
换
机
Segment主机
Segment主机
内
部
网
络
gpfdist
外部表文件
gpfdist
外部表文件
外部表加载的特征
 并行数据加载提供最好的性能
 能够处理远程存储的文件
 采用HTTP协议
 200 MB/s data distribution rate per gpfdist
 gpfdist文件分发守护进程启动:
gpfdist -d /var/load_files/expenses -p 8080 -l /home/gpadmin/log &
 外部表定义:
CREATE EXTERNAL TABLE ext_expenses
( name text, date date, amount float4, description text )
LOCATION ('gpfdist//etlhost:8081/*','gpfdist//etlhost:8082/*')
FORMAT 'TEXT' (DELIMITER '|')
ENCODING ’UTF-8’
LOG ERRORS INTO ext_expenses_loaderrors
SEGMENT REJECT LIMIT 10000 ROWS ;
外部表加载异常处理
Load good rows and catch poorly formatted rows, such as:
– rows with missing or extra attributes
– rows with attributes of the wrong data type
– rows with invalid client encoding sequences
Does not apply to constraint errors:
– PRIMARY KEY, NOT NULL, CHECK or UNIQUE constraints
Optional error handling clause for external tables:
[LOG ERRORS INTO error_table] SEGMENT REJECT LIMIT count [ROWS | PERCENT]
( PERCENT based on gp_reject_percent_threshold parameter )
Example
CREATE EXTERNAL TABLE ext_customer
(id int, name text, sponsor text)
LOCATION ( 'gpfdist://filehost:8081/*.txt' )
FORMAT 'TEXT' ( DELIMITER '|' NULL ' ')
LOG ERRORS INTO err_customer SEGMENT REJECT LIMIT 5 ROWS;
外部表静态统计优化
 Data resides outside the database
 No database statistics for external table data
 Not meant for frequent or ad-hoc access
 Can manually set rough statistics in pg_class:
– UPDATE pg_class
SET reltuples=400000, relpages=400
WHERE relname='myexttable';
COPY SQL 命令
 PostgreSQL command
Support loading and unloading
Optimized for loading a large number of rows
Loads all rows in one command (not parallel)
Loads data from a file or from standard input
Supports error handling as does external tables
 EXAMPLE
COPY mytable FROM '/data/myfile.csv' WITH CSV HEADER; （文件生成在Master）
\COPY mytable FROM ‘/data/myfile.csv’ WITH CSV HEADER;（文件生成在本地）
COPY country FROM '/data/gpdb/country_data'
WITH DELIMITER '|' LOG ERRORS INTO err_country
SEGMENT REJECT LIMIT 10 ROWS;
数据加载性能优化提示
 Drop indexes and recreate after load
 Increase maintenance_work_mem parameter to speed up
CREATE INDEX operations
 Run ANALYZE after load
 Run VACUUM after load errors、delete、upate。。。
 Do not use ODBC INSERT to load large volumes of data
资源负载管理(Workload Management)
 限制查询队列的激活数
 防止系统过载(CPU, disk I/O, memory)
Waiting Statements
?
ROLES
?
?
?
Active Statements
?
RESOURCE QUEUE
?
?
?
资源队列的两种典型管理方式
 Resource Queue Limits
 ACTIVE THRESHOLD
EXAMPLE: CREATE RESOURCE QUEUE adhoc ACTIVE
THRESHOLD 10 IGNORE THRESHOLD 1000.0;
 COST THRESHOLD
EXAMPLES: CREATE RESOURCE QUEUE batch1 COST
THRESHOLD 1000000.0 NOOVERCOMMIT;
CREATE RESOURCE QUEUE batch1 COST THRESHOLD
1e+6;
Greenplum 性能监控器
Greenplum Performance Monitor
 Highly interactive web-
based performance
monitoring
 Real-time and historic
views of:
– Resource utilization
– Queries and query
internals
Dashboard
Greenplum Performance Monitor
System Metrics
Greenplum Performance Monitor
Query Monitor
备份与恢复
Backups and Restores
 Parallel backups (gp_dump)
 Parallel restores (gp_restore)
 Automating dump/restores (gpcrondump, gpdbrestore)
 Non-parallel backups and restores(pg_dump/pg_restore/psql)
并行备份和恢复（gp_dump/gp_restore）
备份
恢复
 用于在同构环境间迁移数据结构、数据、function
串行备份和恢复（pg_dump/pg_restore）
 用于在异构环境间迁移数据结构、数据、function
Running a Parallel Backup (gp_dump)
 Each active segment is dumped in parallel
 Dump files created in segment data directory by default
 Supports compression (gzip)
 Ensure sufficient disk space for dump files
 A backup set is identified by a timestamp key
Example:
– Back up a database:
gp_dump gpdb
– Back up a database, and create dump files in a centralized location on all hosts:
gp_dump --gp-d=/home/gpadmin/backups gpdb
– Back up a particular schema only:
gp_dump -n myschema mydatabase
– Back up a single segment instance only (by noting the dbid of the segment
instance):
gp_dump --gp-s=i[5] gpdb
Dump Files Created by gp_dump
On the master host
gp_catalog_1_<dbid>_<timestamp>
gp_cdatabase_1_<dbid>_<timestamp>
gp_dump_1_<dbid>_<timestamp>
gp_dump_status_1_<dbid>_<timestamp>
On the segment hosts
gp_dump_0_<dbid>_<timestamp>
gp_dump_status_0_<dbid>_<timestamp>
Running a Parallel Restore (gp_restore)
 Use gp_restore command
 Need timestamp key from gp_dump
 Make sure dump files are placed on correct segment hosts
 Make sure database exists before restoring
 Database-level server configuration settings are not restored
 Examples
– Restore an Greenplum database using backup files created by
gp_dump:
gp_restore --gp-k=2005103112453 -d gpdb
– Restore a single segment instance only (by noting the dbid of the
segment instance):
gp_restore --gp-k=2005103112453 -d gpdb --gp-s=i[5]
Non-Parallel Backups and Restores
 Green plum also supports pg_dump and pg_restore
 Useful for migrating data to/from other DBMS
 pg_dump creates a single dump file
Can be slow on very large databases
Run at low-usage times
Supports compression
Can dump data as INSERT or COPY commands
gp-syntax option includes DISTRIBUTED BY statements in DDL
Non-Parallel Backups and Restores Example
 Dump a database called mydb into a SQL-script file:
pg_dump mydb > db.sql
 To reload such a script into a (freshly created) database named newdb:
psql -d newdb -f db.sql
 Dump a Greenplum database in tar file format and include distribution
policy information:
pg_dump -Ft --gp-syntax mydb > db.tar
 To dump a database into a custom-format archive file:
pg_dump -Fc mydb > db.dump
 To reload an archive file into a (freshly created) database named newdb:
pg_restore -d newdb db.dump
 To dump a single table named mytab:
pg_dump -t mytab mydb > db.sql
 To specify an upper-case or mixed-case name in -t and related switches,
you need to double-quote the name; else it will be folded to lower case. But
double quotes are special to the shell, so in turn they must be quoted. Thus,
to dump a single table with a mixed-case name, you need something like:
pg_dump -t '"MixedCaseName"' mydb > mytab.sql
客户端工具
 pgAdmin3
 图形化管理和SQL执行/分析/监控工具
 psql
 行命令操作和管理工具
pgAdmin3 for GPDB
 pgAdmin3 is the leading graphical Open Source management,
development and administration tool for PostgreSQL
 Greenplum has contributed extensive GPDB-specific enhancements
 With GPDB 3.3, Greenplum ships and supports this tool
pgAdmin3 for GPDB
pgAdmin3 for GPDB
 监控活动session ,同SQL：select * from pg_stat_activity;
 监控锁，从pg_lock中获取信息
 可以停止正在运行的SQL
PSQL
 Connect through the master
 Connection information
 database name (-d | PGDATABASE)
 master host name (-h | PGHOST)
 master port (-p | PGPORT)
 user name (-U | PGUSER)
 First time connections
 template1 database
 default superuser account (gpadmin)
Issuing SQL Statements
 Interactive mode
psql mydatabase
mydatabase=# SELECT * FROM foo;
 Non-interactive mode (single command)
psql mydatabase –ac “SELECT * FROM foo;”
 Non-interactive mode (multiple commands)
psql mydatabase –af /home/lab1/sql/createdb.sql
 (Use semi-colon (;) to denote end of a statement)
Common PSQL Meta Commands













\? (help on psql meta-commands)
\h (help on SQL command syntax)
\dt (show tables)
\dtS (show system tables)
\dg or \du (show roles)
\l (show databases)
\c db_name (connect to this database)
\q (quit psql)
\! (Enter into shell mode)
\df (show function)
\dn(show schema)
Set search_path=…
\timing
Local, Global, and Master-Only
postgresql.conf
Local
Local
Local
Local
 参数参考Adminguide
 重要参数:max_connection,share_buff,work_mem…
 Local 变量的修改，如max_stack_depth需要修改所有segment上的value
postgresql.conf file
 Located in master or segment instance’s data directory
 Used to set configuration parameters on the system level
 Parameters that are using the default setting are
commented out (#)
 Requires a restart (or reload using gpstop -u) for changes
to take effect
Viewing Parameter Settings
 View a specific parameter setting
Example: SHOW search_path;
 View all parameter settings
Example: SHOW ALL;
 Set parameter
Example: set search_path=public
set client_encoding=gb18030
Configuring Host-Based Authentication
 客户端授权
 是否允许从某个客户端的连接
 用户是否可以连接到所请求的数据库
 pg_hba.conf file
 基于host address, database, and/or DB user account控制权限
 位于master和segment实例的数据目录中
 系统初始化时进行default配置
Default Master Host pg_hba.conf
 Local connections allowed for Greenplum superuser
 Remote connections not allowed
 EXAMPLE
# TYPE DATABASE USER CIDR-ADDRESS
local all gpadmin
ident sameuser
local all all
ident sameuser
host
all gpadmin 127.0.0.0/0 trust
METHOD
配置 pg_hba.conf
 EXAMPLE
# TYPE
local
host
host
host
host
host
DATABASE
USER
all
all
all
all
all
all
all
all
carddw
etl
gpadmin
all
CIDR-ADDRESS
METHOD
trust
127.0.0.1/32
trust
::1/128
trust
10.0.0.206/32
trust
21.104.138.12/32 md5
21.104.138.0/24
md5
 gpstop -u 可与在不重启数据库方式下，让设置生效
SQL语法
 具体参考《Gp sql language》：
 注意事项：
 DELETE ,UPDATE在两表关联时，两个表的distribution必须一致。
如：delete from tablea using tableb where tablea.id=tableb.id
UPDATE table a as a
SET desc= b.desc
FROM tableb as b
WHERE a.id=b.id
 以上操作 table a,table b 必须使用相同的分布，必要时可以使用
alter table set distribution 进行分布修改。
数据类型
 常用数据类型
 CHAR,VARCHAR,TEXT
 Smallint ,integer,bigint
 Numeric, real,double precision
 Timestamp,date,time
 Boolean
 Array 类型。如 integer[]
 其它数据类型请参考
System Catalog Tables and Views
All system catalogs in pg_catalog schema
Standard PostgreSQL system catalogs (pg_*)
Greenplum-specific catalogs:
 gp_configuration
 gp_distribution_policy
 gp_id
 gp_version_at_initdb
 pg_resqueue
 pg_exttable
 pg_tables
 pg_class
 pg_stat_activity
To list all system catalog tables in psql: \dtS
To list all system views in psql: \dvS
其它 catalog 参考
函数
 日期函数
Extract（day|month|year。。。 From date);
Select date + ‘1 day’::interval,date+ ‘1 month’::interval
SELECT date_part('day', TIMESTAMP '2001-02-16 20:38:40'); Result: 16
SELECT date_trunc('hour', TIMESTAMP '2001-02-16 20:38:40'); Result: 2001-02-16
20:00:00
pg_sleep(seconds);
 系统日期变量
Current_date
Current_time
Current_timestamp
Now()
Timeofday() 在事务中发生变化，以上函数在事务中不变
函数
 字符串处理函数
Substr/length/lower/upper/trim/replace/position
rPad/lpad
To_char,
|| (字符串连接)
substring(string from pattern
～/ ～* ，like，simillar to (模式匹配)
 其它杂类
Case 。。。When/Coalesce
nullif
generate_series
In/not in/exists/any/all
各类函数参考：
存储过程
 Greenplum支持SQL/PYTHON/PERL/C语言构建函数，以下着重介绍
SQL 存储过程。
 一个存储过程就是一个事务，包括对子过程的调用都在一个事务内
 存储过程结构：
CREATE FUNCTION somefunc() RETURNS integer AS $$
DECLARE
quantity integer := 30;
BEGIN
RETURN ....;
END;
$$ LANGUAGE plpgsql;
存储过程请参考：
 赋值
给一个变量或行/记录赋值用下面方法：identifier := expression
例子：user_id := 20;
 执行一个没有结果的查询: PERFORM query;
一个例子：
PERFORM create_mv('cs_session_page_requests_mv', my_query);
存储过程
 动态SQL
EXECUTE command-string [INTO [STRICT] target];
 SELECT INTO
Example：SELECT ID INTO VAR_ID FROM TABLEA
 获取结果状态
GET DIAGNOSTICS variable = item [, ...];
一个例子：
·GET DIAGNOSTICS integer_var = ROW_COUNT;
 SQL返回变量
SQLERRM， SQLSTATE
 控制结构
IF ... THEN ... ELSEIF ... THEN ... ELSE
LOOP, EXIT, CONTINUE, WHILE, FOR
 从函数返回
有两个命令可以用来从函数中返回数据：RETURN 和 RETURN NEXT 。
Syntax：RETURN expression;
 设置回调
EXEC SQL WHENEVER condition action;
condition 可以是下列之一：
SQLERROR，SQLWARNING，NOT FOUND
存储过程
 异常处理
EXCEPTION WHEN unique_violation
THEN
-- do nothing
END;
忽略错误：
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'an EXCEPTION is about to be raised';
RAISE EXCEPTION 'NUM:%, DETAILS:%', SQLSTATE, SQLERRM;
END;
 错误和消息
RAISE level 'format' [, expression [, ...]];
Level:
Info:信息输入
Notice:信息提示
Exception:产生一个例外，将退出存储过程
Example: RAISE NOTICE 'Calling cs_create_job(%)', v_job_id;
性能调优
Performance Tuning
 Introduction to performance tuning
 Common performance problems
 Tracking down a performance problem
 Query profiling (EXPLAIN, EXPLAIN ANALYZE)
 Query tuning
 我的经验：
Approaching a Performance Tuning Initiative
Set performance expectations
acceptable response times, queries per minute, etc.
Benchmarks
Know your baseline hardware performance
throughput / capacity
Know your workload
heavy usage times
resource contention
data contention
Focus your optimizations
Common Causes of Performance Issues
Hardware Issues / Failed Segments
Resource Allocation
Contention Between Concurrent Workloads
Inaccurate Database Statistics
Uneven Data Distribution
SQL Formulation
Database Design
Hardware Issues
Disk failures
Host failures
Network failures
OS not tuned for Greenplum
Disk Capacity
 70% maximum recommended
 VACUUM after updates, deletes and loads
VACUUM configuration parameters
 max_fsm_relations = tables + indexes + system tables
 max_fsm_pages = 16 * max_fsm_relations
Resource Allocation and Contention
 Greenplum resource queues
limit active queries in the system
limit the size of a query a particular user can run
Perform admin tasks at low usage times
Data loading, ETL
VACUUM, ANALYZE
Backups
Design applications to prevent lock conflicts
Concurrent sessions not updating the same data at the same
time
Resource-related Configuration Parameters
work_mem = 32MB
maintenance_work_mem = 64MB
shared_buffers = 125MB
Database Statistics (ANALYZE)
Database statistics used by the query planner
Run ANALYZE after
Data loads
Restores from backups
Changes to schema (adding indexes, etc.)
Inserts, Updates, or Deletes
Can configure statistics collection
 default_statistics_target = 25
 gp_analyze_relative_error = .25
 on specific table columns
• ALTER TABLE name ALTER column SET STATISTICS #
Greenplum Data Distribution
Table Distribution Key Considerations
Even data distribution
Local vs. distributed operations
Even query processing
Checking for data skew
gpskew –t schema.table
Unix system utilities (gpssh)：gpssh –f seg_host ->sar 1 100
Rebalancing a Table
CREATE TABLE sales_temp (LIKE sales) DISTRIBUTED BY
(date, total, customer);
INSERT INTO sales_temp SELECT * FROM sales;
DROP sales;
ALTER TABLE sales_temp RENAME TO sales;
Database Design
 Data Type Selection
smallest size possible to fit data
• INTEGER, not NUMERIC(11,2) for integer fields
same data type across tables for join columns
• consider hash of wide join keys, using BYTEA instead of CHAR(100)
 varchar or text for character data
Denormalization (star schema)
Table Partitioning
Database Design - Indexes
Use sparingly in Greenplum Database
Try workload without indexes first
Can improve performance of OLTP type workloads
Other index considerations:
Avoid on frequently updated columns
Avoid overlapping indexes
Use bitmap indexes where applicable instead of B-tree
Drop indexes for loads
Consider a clustered index
Configuring Index Usage:
enable_indexscan = on | off
SQL Formulation
General Considerations
Know your data
Minimize returned rows
Avoid unnecessary columns/tables in result set
Avoid large sorts if possible
Match data types in joins
Greenplum-specific Considerations
Join on common distribution key columns when possible
Consider data distribution policy and query predicates
Tracking Down a Performance Problem
System Catalog Tables and Views
pg_stat_activity
pg_locks / pg_class
Database Logs
Located in master (and segment) data directories
UNIX system utilities (gpssh)
Query Profiling
Examine query plans to identify tuning opportunities
What to look for?
Plan operations that are taking exceptionally long
Are the planner’s estimates close to reality? (EXPLAIN ANALYZE)
Is the planner applying selective predicates early?
Is the planner choosing the best join order?
Is the planner selectively scanning partitioned tables?
Is the planner choosing hash aggregate and hash join operations
where applicable?
Is there sufficient work memory?
The Query Process
MASTER
PARSE
REWRITE
SEGMENTS
PLAN
DISPATCH
EXECUTE
Viewing the Query Plan
To see the plan for a query
EXPLAIN <query>
EXPLAIN ANALYZE <query>
Query plans are read from bottom to top
Motions (Gather, Redistribute, Broadcast)
• Joins, sorts, aggregations
• Table scans
The following metrics are given for each operation
cost (units of disk page fetches)
rows (rows output by this node)
width (bytes of the rows produced by this node)
Reading EXPLAIN Output
EXPLAIN SELECT * FROM names WHERE name = 'Joelle';
--------------------------- QUERY PLAN ------------------------Gather Motion 2:1 (slice1) (cost=0.00..20.88 rows=1 width=13)
-> Seq Scan on 'names' (cost=0.00..20.88 rows=1 width=13)
Filter: name::text ~~ 'Joelle'::text
Reading EXPLAIN ANALYZE Output
EXPLAIN ANALYZE SELECT * FROM names WHERE name = 'Joelle';
---------------------- QUERY PLAN -----------------------Gather Motion 2:1 (slice1) (cost=0.00..20.88 rows=1 width=13)
recv: Total 1 rows with 0.305 ms to first row, 0.537 ms to end.
-> Seq Scan on 'names' (cost=0.00..20.88 rows=1 width=13)
Total 1 rows (seg0) with 0.255 ms to first row, 0.486 ms to end.
Filter: name::text ~~ 'Joelle'::text
22.548 ms elapsed
Transaction Concurrency Control
MVCC (multi-version concurrency control)
Greenplum supports all transaction isolation levels defined in the SQL
standard
INSERT/COPY acquire locks at the row-level
UPDATE/DELETE acquire locks at the table-level
Can use LOCK command to acquire specific locks
Table-level Lock Modes
ACCESS SHARE (SELECT, ANALYZE)
ROW SHARE (SELECT FOR UPDATE, SELECT FOR SHARE)
ROW EXCLUSIVE (INSERT, COPY)
SHARE UPDATE EXCLUSIVE (VACUUM)
SHARE (CREATE INDEX)
SHARE ROW EXCLUSIVE
EXCLUSIVE (UPDATE/DELETE)
ACCESS EXCLUSIVE (ALTER TABLE, DROP TABLE, REINDEX,
CLUSTER, and VACUUM FULL)
Checking for Lock Conflicts
Lock conflicts caused by:
Concurrent transactions accessing the same object
Resource queue locks
Transaction deadlocks between segments (rare)
Query pg_locks system table to see current locks
EXAMPLE:
SELECT locktype, database, c.relname, l.relation, l.transactionid,
l.transaction, l.pid, l.mode, l.granted, a.current_query
FROM pg_locks l, pg_class c, pg_stat_activity a
WHERE l.relation=c.oid AND l.pid=a.procpid
ORDER BY c.relname;
About Transactions in Greenplum Database
Transactions bundle multiple statements into one ‘all-or-nothing’
operation
Transaction commands
BEGIN or START TRANSACTION
END or COMMIT
ROLLBACK
SAVEPOINT and ROLLBACK TO SAVEPOINT
Autocommit mode in psql
\set autocommit on|off
Two-phase commit transactions not supported
PREPARE TRANSACTION
COMMIT PREPARED
ROLLBACK PREPARED
Database Internals
 Physical storage
 Server processes
Statistics Collector
Collects information about database activity
Server configuration parameters
start_stats_collector = on
stats_block_level = off
stats_row_level = off
stats_queue_level = off
stats_command_string = on
To see statistics views and tables in catalog:
 \dtvS pg_stat*
Physical Storage - Data Directory File Structure
Data Directory
Segment Name
base
template1
template0
Directory
postgres
File
User database
Table 1
Table 2
global
Table N
pg_clog
pg_dtm
pg_multixact
pg_subtrans
pg_tblspc
pg_twophase
pg_xlog
Server Processes
Linux: ps ax | grep postgres
Solaris: ps –ef | grep postgres
 pargs <process_id>
Greenplum Master Instance
postgres database listener process
postgres: <sub_process_name>
postgres: seqserver process
postgres: <user> <database> <con#> <host> <cmd#><slice#>
Greenplum Segment Instance
postgres database listener process
 postgres: <sub_process_name>
附
录
OLAP Windowing Extensions
About Window Functions
Constructing a Window Specification
OVER clause
WINDOW clause
Built-in Window Functions
About Window Functions
New class of function allowed only in the SELECT list
Returns a value per row (unlike aggregate functions)
Results interpreted in terms of the current row and its
corresponding window partition or frame
Characterized by the use of the OVER clause
Defines the window partitions (groups of rows) to apply the
function
Defines ordering of data within a window
Defines the positional or logical framing of a row in respect to
its window
Defining Window Specifications (OVER Clause)
All window functions have an OVER() clause
Specifies the ‘window’ of data to which the function
applies
Defines:
 Window partitions (PARTITION BY clause)
 Ordering within a window partition (ORDER BY clause)
 Framing within a window partition (ROWS/RANGE clauses)
OVER (PARTITION BY…) Example
SELECT * ,
row_number()
OVER()
FROM sale
ORDER BY cn;
SELECT * ,
row_number()
OVER(PARTITION BY cn)
FROM sale
ORDER BY cn;
row_number | cn | vn | pn | dt
| qty | prc
------------+----+----+-----+------------+------+-----1
| 1 | 10 | 200 | 1401-03-01 | 1
| 0
2
| 1 | 30 | 300 | 1401-05-02 | 1
| 0
3
| 1 | 50 | 400 | 1401-06-01 | 1
| 0
4
| 1 | 30 | 500 | 1401-06-01 | 12
| 5
5
| 1 | 20 | 100 | 1401-05-01 | 1
| 0
6
| 2 | 50 | 400 | 1401-06-01 | 1
| 0
7
| 2 | 40 | 100 | 1401-01-01 | 1100 | 2400
8
| 3 | 40 | 200 | 1401-04-01 | 1
| 0
(8 rows)
row_number | cn | vn | pn | dt
| qty | prc
------------+----+----+-----+------------+------+-----1
| 1 | 10 | 200 | 1401-03-01 | 1
| 0
2
| 1 | 30 | 300 | 1401-05-02 | 1
| 0
3
| 1 | 50 | 400 | 1401-06-01 | 1
| 0
4
| 1 | 30 | 500 | 1401-06-01 | 12
| 5
5
| 1 | 20 | 100 | 1401-05-01 | 1
| 0
1
| 2 | 50 | 400 | 1401-06-01 | 1
| 0
2
| 2 | 40 | 100 | 1401-01-01 | 1100 | 2400
1
| 3 | 40 | 200 | 1401-04-01 | 1
| 0
(8 rows)
OVER (ORDER BY…) Example
SELECT vn, sum(prc*qty)
FROM sale
GROUP BY vn
ORDER BY 2 DESC;
SELECT vn, sum(prc*qty), rank()
OVER (ORDER BY sum(prc*qty) DESC)
FROM sale
GROUP BY vn
ORDER BY 2 DESC;
vn | sum
----+--------40 | 2640002
30 | 180
50 | 0
20 | 0
10 | 0
(5 rows)
vn | sum | rank
----+---------+-----40 | 2640002 | 1
30 | 180
|2
50 | 0
|3
20 | 0
|3
10 | 0
|3
(5 rows)
OVER (…ROWS…) Example
Window Framing: “Box car” Average
SELECT
vn, dt,
AVG(prc*qty) OVER (
PARTITION BY vn
ORDER BY dt
ROWS BETWEEN
2 PRECEDING AND
2 FOLLOWING)
FROM sale;
vn | dt
| avg
----+------------+--------10 | 03012008
| 30
20 | 05012008
| 20
30 | 05022008
| 0
30 | 06012008
| 60
30 | 06012008
| 60
30 | 06012008
| 60
40 | 06012008
| 140
40 | 06042008
| 90
40 | 06052008
| 120
40 | 06052008
| 100
50 | 06012008
| 30
50 | 06012008
| 10
(12 rows)
Global Window Specifications (WINDOW clause)
Useful for multiple window function queries
Define and name a window specification
Reuse window specification throughout the query
EXAMPLE:
SELECT
RANK() OVER (ORDER BY pn),
SUM(prc*qty) OVER (ORDER BY pn),
AVG(prc*qty) OVER (ORDER BY pn)
FROM sale;
SELECT
RANK() OVER (w1),
SUM(prc*qty) OVER (w1),
AVG(prc*qty) OVER (w1)
FROM sale
WINDOW w1 AS (ORDER BY pn);
Built-In Window Functions
cume_dist()
dense_rank()
first_value(expr)
lag(expr [,offset] [,default])
last_value(expr)
lead(expr [,offset] [,default])
ntile(expr)
percent_rank()
rank()
row_number()
* Any aggregate function (used with the OVER clause) can
also be used as a window function
Q&A
问题与解答