Transcript File
StubHub & Oracle
Advanced Analytics
Brian Motzer, Senior Database Administrator
StubHub
Brian Motzer, [email protected]
Senior Database Administrator, StubHub
21 years of Oracle DBA experience specializing in Performance Tuning and
Architecture ( DigitalRiver.com, BestBuy HQ )
2
StubHub, an ebay company
The world's largest fan-to-fan ticket marketplace
StubHub in numbers
38 Million Registered Users
8.9 Million Monthly Unique
Visitors
216 Million Monthly Page Views
2.5 Million Tickets Sold in 2012
$2.57 Billion Ticketing Sales in
RIP 2012-2015
2012
$3k-335K
3
Before Oracle Analytics we had this…
4
We needed this…POC proved this.
R
5
Attempted fraud reduction of 95%!!
6
What do Kim K. and servers have in common?
7
15M txns sampled !
=
Data Warehouse Specs
Hardware and OS
Sun SPARC T5-2
1000G of RAM
256 processors
Oracle Software and Tools
8
Oracle Database Enterprise Edition11gR203
Partitioning
Oracle Auto DOP
Oracle Active Data Guard 11gR203
Flash Back Database
ADG Snapshot
Oracle R Enterprise
Informatica
DEFAULT = 2xcpu*threads
Data Warehouse stats/size
Informatica is ETL being used.
Total DW Database Size 20TB with 8 years of history
700 ETL jobs as part of the daily batch load, completes in 4-5 Hours ( was 15-18 per auto DOP )
160 MM Records / 40GB of raw data from 25+ data sources loaded daily
300 plus targets (Dimensions + Facts + Aggregates + Snapshots)
225 BO Users and 300+ AD-Hoc Users
9
Oracle “R” statisitics
Model is executed ~2x/second.
1.25 seconds per execution ( needs enhancing on the clob sys.rq$script read )
Model is about 500K in size with ~3000 lines of code.
Stored procedure has 1500 lines of code
75 external connections started/waiting
Initial extproc spawn is 15-20 seconds, prespawn to fix this ( Mark Hornick )?
10
Stubhub R Architecture
JAVA
ECOM
11
public
oracle
Model A
procedure
( A)
90%
(prod)
synonym
30 tables
GTT
population
oracle
Model B
procedure
( B)
10%
(beta)
Birth of an R model
ecommerce
RAC cluster
model
imported into
ecom
Informatica Daily ETL
DW
primary
Active Data Guard
DW
standby
model
exported
Oracle R
database
model trained
12
DB link
Homegrown scripts
drop_r_objects.sh
– unix command line interface for dropping models.
– Makes dropping of “live” models/functions impossible as it checks to see what’s “live” and will bail out.
export_import_data_model.sh
– command line export/import of models/functions to/from a database.
– This allows any dba to step in and export a model/function from the data warehouse, migrate it the ecommerce
database and import it. Checks for issues on import/export.
13
Homegrown scripts
14
Calling Model from SP
SELECT * FROM
TABLE (rqEval ( CURSOR (
SELECT * FROM GTT_SCORE_MODEL_ATTRIB ),
q '[SELECT 1 in_TRANSACTION_ID, 1 score,
cast(' ab ' as varchar2(50)) MODEL_VERSION
FROM sys.dual]','FARE_RFUN_ModelScoring_V3'))
15
Issues along the way….
16
Issues encountered
Unable to create synonyms on models, ended up going with stored procedure.
Use GTT to pass 55+ variables into the model ( working on fix to not use rqEval )
The function call seems to be a select of sys.rq$script ( clob field ) for each and every call which is
very expensive, need to make that a call to a binary object to speed it up.
LD_LIBRARY_PATH, the oracle listener and RAC. If this isn’t correct then R cant find the CRAN
packages that are installed and fails. The srvctl LD_LIBRARY_PATH setting is typically different than
unix level environment.
–
–
–
17
srvctl config fix
$ORACLE_HOME/hs/admin/extproc.ora hardcode of LD_LIBRARY_PATH
Bounce listener from the cmd line
Issues encountered
For hundreds of development databases/machines it can be overwhelming to ensure that all the
CRAN packages are installed at the OS level/Oracle home. Can they all be stored inside of the
database instead of at the OS level?
Sun Solaris Studio had to be installed on development database machines to allow compilation of
outside CRAN packages.
Packages ( cran etc. ) aren’t always available on your platform ( Solaris for example ). We’ve had
Oracle support compile and distribute these to back to us.
Need more documentation on debugging/tracing model calls. Are there any examples out there, was
talked about by support but never provided. We can trace Oracle sql calls but the model execution
processing is a block box.
18
Issues encountered ( cont’d )
So far no GUI tools for manipulating/modify models/functions. Difficult for right-mouse-clicky DBA’s to
drop/create models easily. Anything Coming for SQL developer/TOAD?
ensure cursor_sharing is set to EXACT for R to run, otherwise you'll see errors in ore.connect such as
"ORA-00904 invalid identifier“. What about db’s that need cursor_sharing = FORCE for lack of bind
variables?
19
Why Oracle Advanced Analytics?
Highly Scalable and Performant
– Fully utilize the existing data warehouse infrastructure, dedicated Primary database for ETL (real + batch ) load
and offload Operation/Ad-hoc reports , ORE workload to Standby database - ADG to fully meet our daily DW
operation SLAs, data scientists data analyze as well as Stubhub business growth needs.
High Availability, Enterprise & DR Ready
– With Oracle Active Data Guard , the current DW and ORE ready for the planned and unplanned downtime and
disaster recovery
Enabling Big Data Analysis
– Due to resource ( CPU + Memory + Network ) its now possible for Stubhub to use Oracle R to perform data
analysis against large data volumes.
Security Control , Access Control
– Take full advantage of Oracle database advance security future for big data analysts
20
Why Oracle Advanced Analytics?
Availability of tight integration of open source R with Oracle database
DW capacity allows us to build several such concurrent models simultaneously allowing
us to fine tune our models to precision
ORE allows open source R packages to be readily deployed and leveraged on
database data
ORE allows SQL access to the models
21
22