Slides from Dave Maier

Download Report

Transcript Slides from Dave Maier

Benchmark Brainstorming
Dave Maier
Mike Stonebraker
and
All of You!
With thanks to Jim Gray for suggestions
SWiM 2003
1
Benchmark Properties
•
•
•
•
•
•
•
•
Streamish
Credible
Scalable
Realistic Input
Approximable
Expressively Challenging
Portable
Runnable
SWiM 2003
2
Streamish
•
•
•
•
Source-driven data delivery
Rapid arrival
Infeasible to store all? (or low value to save?)
“Live” output (output during input)
SWiM 2003
3
Credible
• Motivated by a likely application
• Measures useful work
• Simple to understand
One approach: find an existing application that
is done with custom coding, abstract from it
SWiM 2003
4
Scalable
•
•
•
•
•
•
Stream rate & output volume
# of streams
Size of stream elements?
Number of queries
Memory requirements
Stored data
SWiM 2003
5
Realistic Input
• Streams vary
– bursts
– stalls
– diurnal cycles
• Stream sources come and go
SWiM 2003
6
Approximable
Best stream rate vs. best answer at a given rate
vs. most queries at a given rate
Need metric for answer quality
–
–
–
–
latency
precision
correctness
completeness
SWiM 2003
7
Expressively Challenging?
Range of query types
– full stream
– windowed
– historic
Range of stream semantics
–
–
–
–
signal
snapshots
cyclic
deltas
SWiM 2003
8
Portable
• Representation neutral: can be done with
tuples, XML, messages
• Can be implemented on a wide variety of
platforms: RDBMS, stream database, webservice engine
SWiM 2003
9
Runnable
• Can be run in a reasonable time
– hard to test space management
– limit on variations and cases
• Can generate streams in a repeatable
manner, controlled variability
• Can build harness for testing quality metrics
– comparison to ideal
– capture timings
– hard to cheat
SWiM 2003
10
NEXMark Stream Benchmark
Niagara Extension of XMark
XMark: XML Query Benchmark
Models an on-line auction site
Person(id, name, email, ccard, city,
state)
Auction(id, itemname, desc, initbid,
reserve, expires, seller, category)
Bid(auction, bidder, price, dt-time)
Plus static category data
SWiM 2003
11
Auction Monitoring System
Auction
Monitoring
System
Bid
BidBid
Auction
Streamed
Results
Category
Data
Person
SWiM 2003
12
Queries
Full-stream and windowed
– single-stream
– stream and stored
– multi-stream
Query 5 (Hot items): Item with the most bids in past hour,
each minute.
SELECT Rstream(auction)
FROM (SELECT B1.auction, count(*) AS num
FROM Bid [RANGE 60 MINUTE
SLIDE 1 MINUTE] B1
GROUP BY B1.auction)
WHERE num >= ALL (SELECT count(*)
FROM Bid [RANGE 60 MINUTE
SLIDE 1 MINUTE] B2
GROUP BY B2.auction)
SWiM 2003
13
Metrics
• Quality-Latency Product
Penalties for wrong, missing, extra tuples times
average latency
Can weight importance
• Output Matching
Difference from ideal
SWiM 2003
14
Scaling
•
•
•
•
Number of Bid streams
Rate on Person, Auction streams
Stored data size
Test duration (?)
SWiM 2003
15
Application: TV Remote Controls
Massive clickstream (thx to D. Schrader, NCR)
– 140 Million households w/ TV
– 3½ hours of viewing per day
– 19 clicks per hour
You do the math …
Obvious data mining uses, but also presents
operational opportunities
– Guarantee a given number “distinct viewings” of a
commercial
– need to correlate with schedule info (network,
local station, cable co.)
SWiM 2003
16