PostgreSQL * Ch 6 - 10 - Rose

Download Report

Transcript PostgreSQL * Ch 6 - 10 - Rose

and such…
PostgreSQL – Ch 6 - 10
CSSE533 – Weeks 6 and 7
Topics for Tues, Week 6 – Slide 2
Topics for Thurs, Week 6 – Slide 22
Topics for Week 7 – Slide 35
1
Topics for Tues, 4-14-15
From yesterday’s email:
• Discuss what topics out of the rest of the PostgreSQL book are especially
relevant to your project and/or you have a personal interest in.
• I’ll be ready to discuss a lot of them, maybe not all in infinite depth. But
I’ll be happy to dig in on those you think are most valuable.
• And - I’ll be eager to hear how your individual endeavors are going, of
course!
And, I thought of one more thing:
• Any topics you’d like to review from DB “basics”?
• Your book authors assume you’re already a practitioner, who just wants to
learn about PostgreSQL.
– E.g., “Here’s what’s different about doing Joins.”
2
Typical topics taught in a first DB class
Database Theory
• Modeling (ER, Relational,
Normalization, Dependencies)
• Relational Algebra
• SQL, Transact-SQL
• Constraints
• Store Procedures
• Authorization
• Transactions
• XML
• Object Orientation
Database Practice
• Managing DB Files, Security
• Admin Tasks
• Monitoring Performance
• Implementing DB Design
• Managing Databases, Creating Data
Types and Tables
• Data Integrity
• Indexes
• Views
• Stored Procedures
• Triggers
• Retrieving, Grouping
• Joining Tables
• Modifying Data
3
Example – Joins – a review
• Let’s look at http://www.w3schools.com/sql/sql_join.asp.
4
Example – normal forms – a review
• First normal form – “All occurrences of a
record type must contain the same number of
fields.
– Does this one blow away your intended design?
• Second and third normal forms – deal with
nonkey vs key fields:
– A nonkey field must provide a fact about the key,
the whole key, and nothing but the key.
5
Normal forms, cntd
• Second – In this record, part and warehouse are
combined to makeup the key:
• And the warehouse-address would be repeated for
every occurrence of warehouse. Instead, needs to
be 2 records:
6
Normal forms, cntd
• Third – employee is the key in:
• But if each department is located in only one place,
then location is a fact about department, in addition
to being a fact about employee. It will be repeated
unnecessarily. Instead should be two records:
• How does normalizing impact your project design?
7
What’s in Ch 6 – 10?
• Ch 6 – Tables, constraints, and indexes
• Ch 7 – SQL: The PostgreSQL way
– Views – see following slides 
– Handy constructions
– FILTER clause for aggregates
– Window functions
– Common table expressions
– Lateral joins
8
What’s in Ch 6 – 10? - cntd
• Ch 8 – Writing functions (i.e., stored
procedures)
– Anatomy of PostgreSQL functions
– Writing functions with SQL
– Writing PL/pgSQL functions
– Writing PL/Python functions
– Writing PL/V8, PL/CoffeeScript, and PL/LiveScript
functions
9
What’s in Ch 6 – 10? - cntd
• Ch 9 – Query performance tuning
– EXPLAIN
– Caching
– Writing better queries
• Ch 10 – Replication and external data
See following slides 
– Replication overview
– Setting up replication
– Foreign data wrappers
10
Ch 7 – SQL: The PostgreSQL way
Views – p 121:
• Short intro on views
• Single table views
• Short intro on triggers
• Using triggers to update views
• Materialized views
11
Ch 7 – Views, cntd
• Original idea of a “view” was more like:
– A way to provide proper access to the data in the
database, for some class of user. E.g., for security.
• It’s now expanded to be:
– The result of a query, stored in the DB.
• The query command is stored in the DB dictionary.
• The view is not part of the physical schema (in RDB’s).
• Changes in underlying data are reflected in the view.
– The view can be the result of a join or of aggregation (sum,
average, etc.).
– The DB doesn’t actually store a copy of the view’s data.
– Views can be updatable – causing all kinds of fun.
12
Ch 7 – Views in PostgreSQL
• Single table views:
CREATE VIEW comedies AS
SELECT *
FROM films
WHERE kind = 'Comedy';
• Recursive views (really iterative):
CREATE RECURSIVE VIEW nums_1_100 (n) AS
VALUES (1)
UNION ALL
SELECT n+1 FROM nums_1_100 WHERE n < 100;
– Often used with hierarchical data.
13
Ch 7 – Short intro on triggers
• Types of triggers:
• A trigger is associated with a specified table or
view, and is executed when certain events occur.
14
Ch 7 – triggers, cntd
• Can be used to update views:
– When a view draws from more than one table,
updating the underlying data with a simple
command isn’t possible.
•
•
•
•
Inherent ambiguity about changing underlying data!
Can still modify the data using triggers.
Can define one or more INSTEAD OF triggers.
Need to handle INSERT, UPDATE, DELETE.
15
Ch 7 – triggers, cntd
16
Ch 7 – triggers, cntd
17
Ch 7 – Materialized views
• As of PostgreSQL 9.3, these cache data
fetched.
– Use when the underlying query takes a long time,
and
– When having timely data is not critical.
– Like with OLAP applications
(Online Analytical Processing)
Watching stuff materialize –
always a riot!
18
Ch 10 – Replication and external data
Replication overview
• Main reasons:
– Availability
– Scalability
• Replication
jargon:
– Master
– Slave
• Subscriber or
agent
– Write-ahead log
(WAL)
• (transaction log)
– Synchronous
• At least one
slave is updated
before
transaction is
complete
Poor choice for
promotion – Aethelred
the Unready, King of the
English when he was
about ten years old.
– Asynchronous
– Streaming
• A PostgreSQL
protocol
– Cascading
replication
• Slaves passing
along logs
– Remastering
• Promoting a
slave to a master
19
Ch 10 – Replication and external data
Setting up replication
• Configure the master
– Takes five steps – pp 184-5.
• Configure the slaves
– Takes seven steps
• Initiate the replication process
20
Ch 10 – Replication and external data
Foreign data wrappers (FDW’s)
• You can query many kinds of data outside of
PostgreSQL:
– Flat files, including “jagged arrays”
– Other PostgreSQL servers
– Nonconventional data sources – like other kinds of
databases
• Uses a Foreign Table to guide access.
21
Topics for Thursday, by popular
demand
• Jon –
– Ch 8 – writing functions
• PL/pgSQL
• Mike added PL/Python, more advanced functions
– Ch 9 – query performance tuning
• And benchmarks
– Ch 7 – common table expressions
22
Ch 7 – common table expressions
• “Sub-queries”
– Define temporary tables just for this query.
– Uses WITH to introduce the CTE.
• Allows you to define a query that can be used by a larger
query – p 138.
• Basic CTE –
– Makes SQL more readable. Or,
– Encourages the planner to materialize intermediate results for
better performance.
• Writable CTE –
– Adds UPDATE, INSERT and DELETE commands.
• Recursive CTE –
– Rows returned vary during query execution.
23
Basic CTE example
WITH regional_sales AS (
SELECT region, SUM(amount) AS total_sales
FROM orders
GROUP BY region
), top_regions AS (
SELECT region
FROM regional_sales
WHERE total_sales > (SELECT SUM(total_sales)/10 FROM regional_sales)
)
SELECT region,
Displays per-product sales
product,
totals in only the top sales
SUM(quantity) AS product_units,
regions.
SUM(amount) AS product_sales
FROM orders
WHERE region IN (SELECT region FROM top_regions)
GROUP BY region, product;
24
Ch 8 – writing functions - PL/pgSQL
• First part of Ch 8 talks about writing functions in SQL.
• Can “outgrow” what SQL can do – p 154
– See Ex 8-4, top, p 151 for SQL version.
– Compare to Ex 8-10, p 154. Similar!
• Can’t write trigger functions in SQL.
– This decouples those from the triggers themselves.
• Also can group a block of computation or series of
queries on the DB server, making this code like a
procedural language for SQL.
– Less traffic to/from an application.
– Simplifies application coding.
25
PL/pgSQL – Example 1
CREATE FUNCTION cs_refresh_mviews() RETURNS integer AS $$
DECLARE
mviews RECORD;
BEGIN
PERFORM cs_log('Refreshing materialized views...');
FOR mviews IN SELECT * FROM cs_materialized_views ORDER BY sort_key LOOP
-- Now "mviews" has one record from cs_materialized_views
PERFORM cs_log('Refreshing materialized view ' || quote_ident(mviews.mv_name) || ' ...');
EXECUTE 'TRUNCATE TABLE ' || quote_ident(mviews.mv_name);
EXECUTE 'INSERT INTO ' || quote_ident(mviews.mv_name) || ' ' || mviews.mv_query;
END LOOP;
The target is successively
PERFORM cs_log('Done refreshing materialized views.');
assigned each row
RETURN 1;
resulting from the query
END;
and the loop body is
$$ LANGUAGE plpgsql;
executed for each row.
26
This example uses exception handling to perform either
UPDATE or INSERT, as appropriate.
PL/pgSQL – Example 2
CREATE TABLE db (a INT PRIMARY KEY, b TEXT);
CREATE FUNCTION merge_db(key INT, data TEXT) RETURNS VOID AS
$$
BEGIN
LOOP
UPDATE db SET b = data WHERE a = key;
IF found THEN
RETURN;
END IF;
BEGIN
INSERT INTO db(a,b) VALUES (key, data);
RETURN;
EXCEPTION WHEN unique_violation THEN
-- do nothing
END;
END LOOP;
END;
$$
LANGUAGE plpgsql;
SELECT merge_db(1, 'david');
SELECT merge_db(1, 'dennis');
27
PL/pgSQL – Example 3
• A frequent question in IRC (?) is how to delete rows that are
duplicates over a set of columns, keeping only the one with
the lowest ID.
• This query does that for all rows of tablename having the
same column1, column2, and column3.
DELETE FROM tablename
WHERE id IN (SELECT id
FROM (SELECT id,
row_number() over (partition BY column1,
column2, column3 ORDER BY id) AS rnum
FROM tablename) t
WHERE t.rnum > 1);
28
Ch 8 – writing functions - PL/Python
• Lets you use Python libraries.
• Can compose functions in Python.
• Need to install Python on the server.
– The PostgreSQL Python extension is:
• CREATE EXTENSION plpython3u;
29
PL/Python Example – INSERT trigger

30
PL/Python Example – INSERT trigger,
cntd

31
PL/Python Example – INSERT trigger,
cntd
32
Ch 9 – query performance
What the book goes into:
• Use EXPLAIN (ANALYZE),p 164
– Graphical tool in pgAdmin
• Analyze causes the statement to be executed.
– Testing without making changes!
• Wrap the statement in a transaction you can abort.
• Put BEGIN before the statement, ROLLBACK after.
33
Ch 9 – query performance - cntd
A basic tip sheet:
• Use a current version.
• Set shared_buffers and effective_cache_size
based on total memory. (Like ¼ and ½ of total
memory, respectively!)
• ANALYZE your database:
– VACUUM ANALYZE forces a statistics update and table
cleanup.
– Can also “automatically vacuum” tables.
• EXPLAIN ANALYE your slow statements
– Like the book recommends.
34
Week 7!
Ch 9 – query performance - cntd
Some special areas to think about:
• See the startup tips at
• http://www.chesnok.com/daily/2013/11/13/everyday-postgrestuning-a-brand-new-server-the-10-minute-edition/, and
• The readable general performance guides are at:
– http://www.revsys.com/writings/postgresql-performance.html, and
– https://wiki.postgresql.org/wiki/Performance_Optimization , and
– http://www.postgresql.org/docs/9.2/static/performance-tips.html.
• Issues with GUCs (Grand unified configuration settings. Rhymes
with Ducks.) – see https://wiki.postgresql.org/wiki/GUCS_Overhaul
• Details 
35
Benchmarking
See, for example:
• Basic benchmarking tips - http://edoceo.com/howto/postgresql-benchmark.
• How to benchmark a query using EXPLAIN – ANALYZE http://dba.stackexchange.com/questions/42012/how-can-i-benchmark-a-postgresql-query.
• “PostgreSQL out-performs MongoDB, at http://www.enterprisedb.com/postgres-plus-edbblog/marc-linster/postgres-outperforms-mongodb-and-ushers-new-developer-reality
• Ideas for how to run a benchmark, at http://edoceo.com/howto/postgresql-benchmark
• Not to be missed – Benchmarking PostgreSQL on Raspberry Pi, at
http://raspberrypg.org/2013/11/raspberry-pg-server-some-benchmark-results/.
• Benchmarking on AWS - http://www.pythian.com/blog/benchmarking-postgres-on-aws4000-piops-ebs-instances/.
• How to make it run like Hadoop! https://www.citusdata.com/blog/86-making-postgresqlscale-hadoop-style.
• Relative speed over multiple releases - http://blog.pgaddict.com/posts/performance-sincepostgresql-7-4-to-9-4-pgbench.
36