Presentation Slides - Geoff Montee`s Ramblings

Download Report

Transcript Presentation Slides - Geoff Montee`s Ramblings

LinuxFest
Northwest
2016
Introduction to tds_fdw
By Geoff Montee
Heterogeneous Data Sources
• It is common for applications to reply on
multiple types of database systems.
• For example:
– The application is being migrated from one
database system to another.
– The application uses different database systems
for different types of data.
Heterogeneous Data Sources
Can they exchange
data?
PostgreSQL
Microsoft SQL Server
Heterogeneus Data Sources Until Recently
Can they
communicate?
PostgreSQL
Microsoft SQL Server
Transformation
Heterogeneous Data Sources –
Foreign Data Wrappers
tds_fdw
PostgreSQL
Microsoft SQL Server
What is FDW?
• FDW stands for Foreign Data Wrapper.
• A Foreign Data Wrapper is a plugin that allows
PostgreSQL to pull table data from some
foreign source (such as another database
server, a flat file, a web service, etc.).
• PostgreSQL 9.1-9.2 support read-only foreign
data wrapper.
• PostgreSQL 9.3+ added write support.
What is TDS?
• TDS stands for Tabular Data Stream.
• Protocol used by MS SQL Server and Sybase
database servers.
• Different versions of the protocol are used by
different products and support different
features.
• FreeTDS is an open source library that
implements the TDS protocol.
What is tds_fdw?
• tds_fdw is a foreign data wrapper that allows
PostgreSQL to connect to foreign servers over the TDS
protocol.
• Foreign tables can be defined in PostgreSQL to pull
data from the foreign server.
• Supports:
–
–
–
–
–
Microsoft SQL Server (all versions)
Microsoft Azure SQL Database
Sybase Adaptive Server Enterprise (ASE)
Sybase IQ
Possibly others that I’m less familiar with. (e.g. Sybase SQL
Anywhere)
tds_fdw features
•
•
•
•
•
tds_fdw 1.0 supports PostgreSQL 9.1+.
2.0 supports PostgreSQL 9.2+.
2.0 supports WHERE and column push-downs.
No write support yet.
Setting tds_version allows Postgres to connect to
different products. (e.g. see this:
http://www.freetds.org/userguide/choosingtdsprotoco
l.htm)
• Foreign table data source can be:
– a table on the foreign server
– a query that’s executed on the foreign server
tds_fdw project
• Development is not backed by any company.
• Me. I currently work for MariaDB though, so
PostgreSQL work mainly happens on the
weekends.
• Julio González Gil maintains a Jenkins CI server
for the project.
• Community involvement:
– Received a few patches.
– Lots of GitHub issues. Bug reports are much
appreciated!
Installation
• Step 1: Install PostgreSQL. e.g.:
sudo yum install -y
http://yum.postgresql.org/9.5/redhat/rhe
l-7-x86_64/pgdg-redhat95-9.52.noarch.rpm
sudo yum install -y postgresql95
postgresql95-server postgresql95-libs
postgresql95-devel
sudo /usr/pgsql-9.5/bin/postgresql95setup initdb
sudo systemctl start postgresql9.5.service
Installation
• Step 2: Install FreeTDS. Requires EPEL on
RHEL/CentOS. e.g.:
sudo yum install -y
https://dl.fedoraproject.org/pub/e
pel/epel-release-latest7.noarch.rpm
sudo yum install -y freetds
freetds-devel
Installation
• Step 3: Build tds_fdw. e.g.:
wget https://github.com/tdsfdw/tds_fdw/archive/v2.0.0alpha.1.tar.gz --outputdocument=tds_fdw-2.0.0-alpha.1.tar.gz
tar -xvzf tds_fdw-2.0.0-alpha.1.tar.gz
cd tds_fdw-2.0.0-alpha.1
PATH=/usr/pgsql-9.5/bin:$PATH make
USE_PGXS=1
sudo PATH=/usr/pgsql-9.5/bin:$PATH make
USE_PGXS=1 install
Installation
• Step 4: Create the extension. e.g.:
sudo -u postgres /usr/pgsql-9.5/bin/psql
CREATE EXTENSION tds_fdw;
• Step 5: Confirm.
postgres=# \dew
List of foreign-data wrappers
Name
| Owner
|
Handler
|
Validator
---------+----------+-----------------+-----------------tds_fdw | postgres | tds_fdw_handler |
tds_fdw_validator
(1 row)
Using tds_fdw
• There are three different types of objects to
use in tds_fdw:
– Foreign server: A foreign server is your MS SQL
Server or Sybase database server.
– User mapping: A user mapping maps a local
PostgreSQL role to a role on the foreign server.
– Foreign table: A foreign table’s data gets pulled
from the foreign server either via a physical table
or the execution of a query.
Creating a foreign server
• Simple:
CREATE SERVER mssql_svr
FOREIGN DATA WRAPPER tds_fdw
OPTIONS (
servername 'sqlserver01.mydomain.com',
port '1433',
database 'tds_fdw_test',
tds_version '7.1');
• TDS protocol versions 7.0 and above are used
by MS SQL Server and Azure SQL Database.
Creating a user mapping
• Also simple:
CREATE USER MAPPING FOR postgres
SERVER mssql_svr
OPTIONS (
username 'sa',
password 'mysapassword');
• A PostgreSQL role requires a user mapping in
order to login to the foreign server.
Creating a foreign table
• Do you want to query a foreign table directly?
Then define the table attribute.
• Do you want your table to reflect the results
of a query executed on the foreign server?
Then define the query attribute.
• If you are using MS SQL Server, then you can
set row_estimate_method to showplan_all to
use statistics from MS SQL Server’s SET
SHOWPLAN ALL in query plan calculations.
Creating a foreign table –
From a remote table
• Let’s say that your table on a foreign server looks like this (on MS SQL
Server):
CREATE TABLE dbo.tab (
id int primary key,
value nvarchar(50),
submitted datetime2
);
• The foreign table would look like this:
CREATE FOREIGN TABLE mssql_tab (
id int,
value varchar(50),
submitted timestamptz
)
SERVER mssql_svr
OPTIONS (
table 'dbo.tab',
row_estimate_method 'showplan_all');
Querying a foreign table –
From a remote table
postgres=# \dE
List of relations
Schema |
Name
|
Type
| Owner
--------+-----------+---------------+---------public | mssql_tab | foreign table | postgres
(1 row)
postgres=# select * from mssql_tab;
id | value |
submitted
----+-------+---------------------------1 | str
| 2015-11-16 01:00:13.755218-08
(1 row)
Creating a foreign table –
From a remote query
• Let’s say that your table on a foreign server looks
like this (on MS SQL Server):
CREATE TABLE dbo.user_data (
id int primary key,
value nvarchar(50),
owner nvarchar(50)
);
• And you want your PostgreSQL table to be the
results of this query:
SELECT id, value, owner FROM
dbo.user_data WHERE owner = 'geoff'
Creating a foreign table –
From a remote query
• The foreign table would look like this:
CREATE FOREIGN TABLE my_mssql_data (
id int,
value varchar(50),
owner varchar(50)
)
SERVER mssql_svr
OPTIONS (
query 'SELECT id, value, owner FROM
dbo.user_data WHERE owner = ''geoff''',
row_estimate_method 'showplan_all');
• The “table” and “query” parameters cannot be used at the same
time.
Querying a foreign table –
From a remote query
postgres=# \dE
List of relations
Schema |
Name
|
Type
| Owner
--------+----------------+---------------+---------public | all_mssql_data | foreign table | postgres
public | mssql_tab
| foreign table | postgres
public | my_mssql_data | foreign table | postgres
(3 rows)
postgres=# SELECT * FROM my_mssql_data;
id |
value
| owner
----+----------------+------1 | some data
| geoff
3 | some more data | geoff
(2 rows)
Querying a foreign table –
From a remote query
postgres=# SELECT * FROM all_mssql_data;
id |
value
| owner
----+----------------+------1 | some data
| geoff
2 | alice's data
| alice
3 | some more data | geoff
(3 rows)
row_estimate_method
• row_estimate_method is a foreign table option that controls
how tds_fdw estimates the number of rows that will be in a given
table’s or query’s result set. This estimation is used by the planner.
• The option that has two possible values:
– execute: Execute the query on the remote server, so that
PostgreSQL knows the exact number of rows.
– showplan_all: Use Microsoft SQL Server’s “SET
SHOWPLAN_ALL ON” to estimate the number of rows.
• No optimization for Sybase yet.
• If use_remote_estimate is disabled, then only local estimates
are used.
• A local estimate can be defined with local_tuple_estimate.
SHOWPLAN_ALL
• Let’s say that I executed the following on
Microsoft SQL Server…
SET SHOWPLAN_ALL ON;
SELECT * FROM dbo.user_data;
SHOWPLAN_ALL
•
Then I would get results like the following:
StmtText,StmtId,NodeId,Parent,PhysicalOp,LogicalOp,Argument,De
finedValues,EstimateRows,EstimateIO,EstimateCPU,AvgRowSize,
TotalSubtreeCost,OutputList,Warnings,Type,Parallel,Estimate
Executions
SELECT * FROM
dbo.user_data;,1,1,0,NULL,NULL,1,NULL,3,NULL,NULL,NULL,0.00
32853,NULL,NULL,SELECT,0,NULL
|--Clustered Index
Scan(OBJECT:([tds_fdw_jenkins].[dbo].[user_data].[PK__user_
dat__3213E83F668CC6F2])),1,2,1,Clustered Index
Scan,Clustered Index
Scan,OBJECT:([tds_fdw_jenkins].[dbo].[user_data].[PK__user_
dat__3213E83F668CC6F2]),[tds_fdw_jenkins].[dbo].[user_data]
.[id], [tds_fdw_jenkins].[dbo].[user_data].[value],
[tds_fdw_jenkins].[dbo].[user_data].[owner],3,0.003125,0.00
01603,117,0.0032853,[tds_fdw_jenkins].[dbo].[user_data].[id
], [tds_fdw_jenkins].[dbo].[user_data].[value],
[tds_fdw_jenkins].[dbo].[user_data].[owner],NULL,PLAN_ROW,0
,1
•
Only “EstimateRows” is used by tds_fdw at the moment.
Recent noteworthy features/fixes
• Added WHERE and column push-downs in
version 2.0.
• Fixed some large memory leaks.
• Improved performance by 50% by starting to use
raw data for some data types, rather than relying
entirely on FreeTDS’s string conversions.
• Changed row estimates to use double instead of
int, allowing tds_fdw to handle more than
INT_MAX rows.
• Made datetime conversions locale-independent
in PostgreSQL 9.4+, thanks to make_timestamp().
WHERE and column push-downs
• Let’s say that you have a table with a lot of rows, and some large
columns, that looks like this:
CREATE FOREIGN TABLE mssql_tab (
id int,
small_col varchar(50),
large_col varchar(50000),
submitted timestamptz
)
SERVER mssql_svr
OPTIONS (
table 'dbo.tab',
row_estimate_method 'showplan_all');
• Let’s say that you only wanted 1 column from 1 row. E.g.:
SELECT small_col FROM mssql_tab WHERE id = 1;
WHERE and column push-downs
• tds_fdw 1.0 would fetch the whole table, and
then filter results locally. e.g. it would execute
the following:
SELECT * FROM dbo.tab;
• tds_fdw 2.0 only fetches the rows that are
needed by pushing down columns and WHERE
clauses. e.g. it would execute the following:
SELECT small_col FROM dbo.tab
WHERE id=1;
New variables
• Added some variables to help show how tds_fdw’s memory grows.
– tds_fdw.show_before_row_memory_stats
– tds_fdw.show_after_row_memory_stats
– tds_fdw.show_finished_memory_stats
• Calls MemoryContextStats() at different points in the code
to show memory usage at a given time.
• Useful for detecting memory leaks.
• Use SET to set them:
postgres=# SET tds_fdw.show_before_row_memory_stats=1;
SET
postgres=# SET tds_fdw.show_after_row_memory_stats=1;
SET
postgres=# SET tds_fdw.show_finished_memory_stats=1;
SET
New variables
• After executing the query, data like this gets printed to the error log:
Showing memory statistics before row 1.
ExecutorState: 8192 total in 1 blocks; 2528 free (2 chunks); 5664 used
printtup: 0 total in 0 blocks; 0 free (0 chunks); 0 used
tds_fdw data: 8192 total in 1 blocks; 7560 free (0 chunks); 632 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Showing memory statistics after row 1.
ExecutorState: 8192 total in 1 blocks; 2528 free (2 chunks); 5664 used
printtup: 0 total in 0 blocks; 0 free (0 chunks); 0 used
tds_fdw data: 8192 total in 1 blocks; 7560 free (0 chunks); 632 used
ExprContext: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
Showing memory statistics before row 2.
ExecutorState: 8192 total in 1 blocks; 2256 free (2 chunks); 5936 used
printtup: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
tds_fdw data: 8192 total in 1 blocks; 7560 free (0 chunks); 632 used
ExprContext: 8192 total in 1 blocks; 8160 free (0 chunks); 32 used
Features to implement next
• Write support
• JOIN push-down
• row_estimate_method optimized for
Sybase
Questions?
Evaluate this session
Introduction to tds_fdw
http://lfnw.org/node/3906
Session videos will be posted to the LinuxFest Northwest YouTube channel.
Thank you!