Introduction to Flexviews 1.5

Download Report

Transcript Introduction to Flexviews 1.5

Introduction to Flexviews 1.5
Improving database performance
using
incrementally refreshable
materialized views
http://flexviews.sourceforge.net
Flexviews authored by:
Justin Swanhart
Copyright 2007-2009
Flexviews released under the
LGPL version 3
Introduction
Who am I?
What do I do?
Why did I write Flexviews?
Requirements
 MySQL 5.1.30 or greater
 binlog_format=ROW
 User with REPLICATION SLAVE and/or SUPER
 PHP 5.2 (optionally pcntl and mysqli)
 Works best on a MySQL slave
 Basic SQL knowledge
 log_slave_updates=true
What are views?
 A view is a SQL statement that is saved in the database and
executed on demand. Views have a “first class” name and can be
referenced as if they were tables themselves.
 Because they abstract SQL statements and can be used for security,
views are considered very convenient. When used properly views
can add tremendous value to the database, but when used
incorrectly they have many well known performance problems:
 Queries can take a lot of time.
 Because a view is not an actual table it can’t be indexed.
 Because the query is executed each time, the query plan may change
unexpectedly.
 A temporary table is almost always required.
What are materialized views?
 In an RDBMS, a query is “materialized” when its contents are stored
(cached) in a table.
 A traditional view represents a SQL statement which is executed
each time the view is accessed. In contrast, a materialized view is a
view which contains the result of a query from some point in time
in the past. It is essentially a table which contains the cached
results of a query.
 To be useful, a materialized view should be able to be updated
incrementally. This means that the view should be updated in place
to reflect changes to the base tables.
 Base tables are the underlying tables accessed by a view.
Oracle
Materialized Views
 Uses special CREATE MATERIALIZED VIEW .. SELECT syntax.
 Uses materialized view logs.
 Built on extensible Oracle features and stored procedures (part of Oracle)
 Supports SELECT/PROJECT/AGGREGATE/JOIN queries.
 Supports three different refresh methods
 IMMEDIATE refresh updates a view as soon as the underlying base table
changes. Incurs high cost to each transaction.
 FULL refresh. This is an asynchronous method where the table is rebuilt
completely from scratch. The view is usually unavailable during rebuild.
 FAST refresh. This is an asynchronous method where the table upated only
with the changes to base tables since the view was last refreshed.
 Was the first commercial database to implement materialized views.
DB2
Materialized Query Tables (MQT)
 Uses an extended CREATE TABLE AS SELECT syntax with extensions
for specifying MQT options such as refresh method.
 Uses MQT staging tables
 Built on extensible DB2 features (part of DB2)
 Supports SELECT/PROJECT/AGGREGATE/JOIN queries
 Supports three different refresh methods
 Immediate refresh updates a view as soon as the underlying base
table changes. Incurs high cost to each transaction.
 INCREMENTAL refresh. This is an asynchronous method where the
MQT is updated from a staging table.
 NON-INCREMENTAL refresh. This is an asynchronous method where
the contents are rebuilt from scratch.
MySQL
 MySQL features CREATE TABLE AS SELECT (CTAS) syntax
which will build a table from the results of a SELECT
statement.
 A CTAS created table carries no link to the source tables
and can not be “refreshed”.
 Lacking any refresh capability, tables created with CTAS
may be difficult to manage.
 As far as I know, there are no free or open source tools
available (except Flexviews) which support
asynchronously maintaining materialized views,
particularly those which contain joins.
Flexviews adds materialized views to
MySQL
 Flexviews uses stored procedures to create and manage
materialized views which can be incrementally refreshed.
 It includes a “change data capture” tool. This tool uses
MySQL commands and invokes the ‘mysqlbinlog’ tool to
collect and decode RBR binary logs and uses that
information to insert information into table change logs.
 The refresh process reads from change logs and base tables
to update the view to some point in time after it was
created or last refreshed.
Flexviews
 MySQL doesn’t expose the SQL parser. As such, Flexviews features
an API for building queries.
 The API is simple enough such that if you can write basic SQL
statements you can create a similar view.
 Supports SELECT/PROJECT/AGGREGATE/JOIN queries.
 Built on extensible MySQL features (the binary log) and stored
procedures.
 Supports two different refresh methods
 INCREMENTAL refreshes asynchronously from change logs and delta
tables.
 COMPLETE refresh. The contents are rebuilt from scratch and
atomically replaced with RENAME TABLE.
Demo of the binlog consumer in action
MySQL server
writes binary logs
in ROW format
Flexviews binlog
consumer
 The binlog consumer is a PHP
script which runs in the
background.
 All changes are logged as an
INSERT or a DELETE. UPDATE is
logged as DELETE followed by
INSERT.
Only tables selected for change
logging will be processed.
Change logs are written
Into the database.
“Unit of work” table
is maintained
The consumer must be running
in order to refresh materialized
views.
Why not use triggers instead of the
binlog consumer?
 Triggers can not capture discreet transaction order without external
workload serialization.
 They require accessing the information_schema for every row change
which incurs extreme performance penalties.
 The cost of every transaction is increased since table change logs are
written synchronously.
 Triggers impose serious performance overheard.
 Triggers require the serializable isolation level to ensure valid results.
 Table change logs can not be captured from a remote MySQL server using
triggers.
Flexviews Asynchronous Refresh
 The refresh algorithm executes in two stages
 Delta computation (COMPUTE phase)
 Delta application (APPLY phase)
 The COMPUTE phase is executed independently from the
APPLY phase.
 You can COMPUTE changes any number of times before
applying them.
 The work of maintaining the view is split up into many small
COMPUTE jobs, each which process only the rows which have
changed since the last computation.
 You have the option of refreshing a group of views to the
same exact transactional state.
Flexview Delta Application
 Periodically, rows in the delta tables can be applied to
the Flexview.
 A combination of INSERT .. ON DUPLICATE KEY UPDATE
and DELETE are used to bulk INSERT, UPDATE and
DELETE rows from the Flexview.
 You may optionally include a UOW_ID which
represents a point in time to which to refresh the view.
Refreshing multiple views to the same UOW_ID will
make them transactionally consistent with each other.
 Views may only be rolled forward to a higher UOW_ID.
You may not roll a view backward in time (yet).
Aggregation
 In order to achieve top query performance aggregate tables
must be employed.
 For example, in a data mart with sales fact data, there may
be aggregates calculated by customer by day, or by salesrep
by month.
 Flexviews supports SUM and COUNT and experimentally
supports AVG/MIN/MAX and COUNT(DISTINCT).
 Additional support for GROUP_CONCAT and the VAR_SAMP
and VAR_POP functions coming soon.
 Using aggregate functions outside of SUM/COUNT/AVG
incurs additional storage and processing costs, so use these
functions sparingly.
 Views with aggregation can also include joins.
Using an aggregate view
Finding the top 10 most popular items by sales.
Using the Flexview
Using the fact table
SELECT item_id,
net_amt,
CNT
FROM item_sales_mv
ORDER BY CNT DESC
LIMIT 10;
SELECT f.item_id,
sum(f.qty * f.amt) net_amt,
count(*) CNT
FROM fact f
WHERE f.status = ‘CLOSED’
GROUP BY f.item_id
ORDER BY CNT desc
LIMIT 10;
Simple aggregation example
CALL flexviews.create('large', 'item_sales_mv', 'INCREMENTAL');
SET @mvid := LAST_INSERT_ID();
-- Add the fact table
CALL flexviews.add_table(@mvid, 'large','fact', 'f', NULL);
-- Add a GROUP BY expression
CALL flexviews.add_expr(@mvid, 'GROUP', 'f.item_id', 'status');
-- Add a SUM aggregate expression
CALL flexviews.add_expr(@mvid, 'SUM', 'qty * amt', 'net_amt');
-- Add a COUNT aggregate expression
CALL flexviews.add_expr(@mvid, 'COUNT', '*', 'CNT');
-- Add a WHERE clause
CALL flexviews.add_expr(@mvid, 'WHERE', 'f.status="CLOSED"', 'where');
-- Create the view
CALL flexviews.enable(@mvid);
JOINS
 In a STAR schema, the most common aggregates
include joins to dimension tables. This provides the
means for slicing data by customer group, or by item
type or date, etc.
 Performance can be improved by pre-calculating the
JOIN between tables and/or aggregating data.
 Pre-compute master/detail relationships for improved
report performance.
 By including additional columns which are commonly
filtered on performance can be improved significantly.
 Maintaining such a table without Flexviews is difficult
Join with aggregation view
Item sales by day and by item category.
Using the Flexview
SELECT *
FROM item_category_month_mv
Using the base tables
SELECT i.item_category as category,
d.year as year,
d.month as month,
SUM(qty * amt) as net_amt,
COUNT(*) as CNT
FROM large.fact as f
JOIN large.dim_item as i
ON i.item_id = f.item_id
JOIN large.dim_date as d
ON d.date_id = f.date_id
WHERE f.status="CLOSED"
GROUP BY i.item_category,
d.year,
d.month
Join with aggregation example
-- Create the materialized view placeholder
CALL flexviews.create('large', 'item_category_month_mv', 'INCREMENTAL');
SET @mvid := LAST_INSERT_ID();
-- Add the fact table
CALL flexviews.add_table(@mvid, 'large','fact', 'f', NULL);
CALL flexviews.add_table(@mvid, 'large','dim_item', 'i', 'ON i.item_id = f.item_id');
CALL flexviews.add_table(@mvid, 'large','dim_date', 'd', 'ON d.date_id = f.date_id');
-- Add a GROUP BY expression
CALL flexviews.add_expr(@mvid, 'GROUP', 'i.item_category', 'category');
CALL flexviews.add_expr(@mvid, 'GROUP', 'd.year', 'year');
CALL flexviews.add_expr(@mvid, 'GROUP', 'd.month', 'month');
CALL flexviews.add_expr(@mvid, 'SUM', 'qty * amt', 'net_amt');
CALL flexviews.add_expr(@mvid, 'COUNT', '*', 'CNT');
-- Add a WHERE clause
CALL flexviews.add_expr(@mvid, 'WHERE', 'f.status="CLOSED"', 'where');
-- Create the view
CALL flexviews.enable(@mvid);
Create functional and partial Indexes
 DB2 and Oracle and other databases have added support in recent
years for “function based” or “partial” indexes. These databases
have extensible features that essentially allow a materialized view
to act as an index.
 MySQL lacks such extensibility features, but you can do it manually.
This works best for batch processes, or processes which will tolerate
some data staleness, since all Flexviews are refreshed
asynchronously.
 Eventually, a query rewrite proxy may allow such views to be
exploited automatically.
 When are they useful?
 A column with a skewed cardinality. For instance, an order_line table
where 99% of the ‘status’ values are something other than ‘CLOSED’.
In such a situation it is valuable to index only the values other than
‘CLOSED’, since the ‘CLOSED’ records will inflate an index on the
column as the ‘CLOSED’ keys would be of little use.
 You want to index the output of any DETERMINISTIC function on one
or more database columns.
Special Thanks To
Salem, K., Beyer, K., Lindsay, B., and Cochrane, R. 2000. How to roll a join: asynchronous
incremental view maintenance. SIGMOD Rec. 29, 2 (Jun. 2000), 129-140. DOI=
http://doi.acm.org/10.1145/335191.335393
Mumick, I. S., Quass, D., and Mumick, B. S. 1997. Maintenance of data cubes and summary
tables in a warehouse. SIGMOD Rec. 26, 2 (Jun. 1997), 100-111. DOI=
http://doi.acm.org/10.1145/253262.253277
My former employer AdBrite, Inc http://www.adbrite.com who generously supported the
development and release of Flexviews under the LGPL license.
Gazillion, Inc http://www.gazillion.com my current employer, who has allowed me to
devote a good portion of my time to improving Flexviews.
SourceForge, who hosts the SVN repository, web site and support forums for Flexviews.
Proven Scaling, LLC, and Jeremy Cole, in particular.
QUESTIONS