Transcript Slide
Sessions about
to start –
Get your rig on!
Chris J.T. Auld
Intergen
Agenda
1)
2)
3)
4)
5)
Document DB Refresher
CUs, RUs and Indexing
Polyglot Persistence and Data Modelling
Data Tier Programmability
Trading Off Consistency
A fully-managed, highly-scalable, NoSQL
document database service.
{ }
Schema free
storage, indexing
and query of JSON
documents
Write optimized,
SSD backed and
tuneable via
indexing and
consistency
Transaction aware
service side
programmability
with JavaScript
Built to be delivered
as a service. Pay as
you go. Achieve
faster time to value.
5
DocumentDB in One Slide
• Simple HTTP RESTful model.
• Access can be via any client that supports
HTTP. Libraries for; Node, .NET, Python, JS
• All resources are uniquely addressable by
a URI.
• Partitioned for scale out and replicated
for HA. Tunable indexing & consistency
• Granular access control through item
level permissions
• Attachments stored in Azure Blobs and
support document lifecycle.
• T-SQL like programmability.
• Customers buy storage and throughput
capacity basis at account level
Item
resource
Tenant
Feed
POST
URI
POST http://myaccount.documents.azure.net/dbs
Create a new resource
{ "name":"My Company
/Execute Db"}
a sprocs/trigger/query
...
[201
{
Item
resource
Created]
/dbs/{id}
PUT
/colls/{id}
Replace an existing resource
Item
URI
/docs/{id}
/attachments/{id}
“id": "My Company Db",
“_rid": "UoEi5w==",
DELETE
/users/{id}
“_self": "dbs/UoEi5w==/",
Delete an existing resource
Item
URI
/sprocs/{id}
“_colls": "colls/",
}
“_users": "users/"
GET
Tenant
Feed Or
Item URI
/triggers/{id}
Read/Query an existing
resource
/functions/{id}
6
Capacity Units
•
Customers provision one or more Database Accounts
•
A database account can be configured with one to five
Capacity Units (CUs). Call for more.
•
A CU is a reserved unit of storage (in GB) and throughput (in
Request Units RU)
•
Reserved storage is allocated automatically but subject to a
minimum allocation per collection of 3.3GB (1/3 of a CU) and a
maximum amount stored per collection of 10GB (1 whole CU)
•
Reserved throughput is automatically made available, in equal
amounts, to all collections within the account subject to
min/max of 667 RUs (1/3 of a CU) and 2000 RUs (1 whole CU)
•
Throughput consumption levels above provisioned units are
throttled
Provisioned
capacity units
Throughput
RUs
Storage
GB
* All limits noted above are the Preview Limitations. Subject to change
Request Units
• A CU includes ability to execute up to 2000 Request Units per Second
• I.e. With 1 CU peak throughput needs to be below 2000 RUs/sec
• When reserved throughput is exceeded, any subsequent request will be pre-emptively ended
• Server will respond with HTTP status code 429
• Server response includes x-ms-retry-after-ms header to indicate the amount of time the client must wait
before retrying
• .NET client SDK implicitly catches this response, respects the retry-after header and retries the request (3x)
• You can setup alert rules in the Azure portal to be notified when requests are throttled
Request Units
DATABASE OPERATIONS
NUMBER OF RUs
NUMBER OP/s/CU
Reading single document by _self
1
2000
Inserting/Replacing/Deleting a single document
4
500
Query a collection with a simple predicate and returning a
single document
2
1000
100
20
Stored Procedure with 50 document inserts
Rough estimates: Document size is 1KB consisting of 10 unique property values with the default
consistency level is set to “Session” and all of the documents automatically indexed by
DocumentDB.
As long as the Database stays the same the RUs consumed should stay the same
Cool Tool:
Document DB Studio
Useful tool with source for sending
queries to Document DB.
http://tiny.cc/docdbstudio
12
LET’S CALL A SPADE A SPADE
Indexing in DocumentDB
• By default everything is indexed
• Indexes are schema free
• Indexing is not a B-Tree and works really well under
write pressure and at scale.
• Out of the Box. It Just Works.
• But…
… it cannot read your mind all of the time…
15
Tuning Indexes
• We can change the way that DocumentDB indexes
• We’re trading off
• Write Performance
How long does it take? How many RUs does it use?
• Read Performance
How long does it take? How many RUs does it use?
Which queries will need a scan?
• Storage
How much space does the document + index require?
• Complexity and Flexibility
Moving away from the pure schema-free model
16
Index Policy and Mode
• Index Policy
• Defines index rules for that collection
• Index mode
• Consistent
• Lazy
• Automatic
• True: Documents automatically
added (based on policy)
• False: Documents must be manually
added via IndexingDirective on
document PUT.
• Anything not indexed can only be
retrieved via _self link (GET)
var collection = new DocumentCollection
{
Id = “myCollection”
};
collection.IndexingPolicy.IndexingMode = IndexingMode.Lazy;
collection.IndexingPolicy.Automatic = false;
collection = await client.CreateDocumentCollectionAsync
(databaseLink, collection);
17
Index Paths & Index Types
• Include/Exclude Paths
• Include a specific path
• Exclude sub paths
• Exclude a specific path
• Specify Index Type
• Hash (default)
• Range (default for _ts)
not on strings
• Specify Precision
• Byte precision (1-7)
• Affects storage overhead
collection.IndexingPolicy.IncludedPaths.Add(new IndexingPath
{
IndexType = IndexType.Hash,
Path = "/",
});
collection.IndexingPolicy.IncludedPaths.Add(new
IndexingPath
{
IndexType = IndexType.Range,
Path = @"/"“modifiedTimeStamp""/?",
NumericPrecision = 7
});
collection.IndexingPolicy.ExcludedPaths.Add("/\“longHTML\"/*");
18
IT’S LESS ABOUT BUILDING AND MORE ABOUT BOLTING
Worth Reading:
NoSQL Distilled
By Martin Fowler
of ‘Design Patterns’ fame and fortune
Provides a good background on
characteristics of NoSQL style data
stores and strategies for combining
multiple stores.
http://tiny.cc/fowler-pp
23
DocumentDB
transactional processing
rich query
managed as a service
elastic scale
schema-free data model
internet accessible http/rest
arbitrary data formats
24
Attachments
• Store large blobs/media outside core storage
• Document DB managed
• Submit raw content in POST
• Document DB stores into Azure Blob storage (2GB today)
• Document DB manages lifecycle
• Self managed
• Store content in service of your choice
• Create Attachment providing URL to content
25
Demo
• Show managed attachment
• Lifecycle Follows Document
27
Storage Strategies
• Things to think about
• How much storage do I use; where? $$$?
• How is my data being indexed?
• Entropy & Precision
• Will it ever be queried? Should I exclude it?
• How many network calls to; save & retrieve
• Complexity of implementation & management
• Consistency. The Polyglot isn’t consistent
28
Embed (De-Normalize) or Reference?
{
"Products":[
{
"id":"BK-M18S",
"ProductCode":"BK-M18S",
"ProductType":"Mountain-500",
"Manufacturer":{
"Name":"Adventure Works",
"Website":"www.adventureworks.com",
}
}
]
}
{
"Products":[
{
"id":"BK-M18S",
"ProductCode":"BK-M18S",
"ProductType":"Mountain-500",
"Manufacturer":"ADVWKS"
}
],
"Manufacturers":[
{
"id":"ADVWKS",
"Name":"Adventure Works",
"Website":"www.adventureworks.com",
}
]
}
29
Embed (De-Normalize) or Reference?
• Embed
• Well suited to containment
• Typically bounded 1:Few
• Slowly changing data
• M:N Requires
management of duplicates
• One call to read all data
• Write call must write whole
document
• Reference
• Think of this as 3NF
• Provides M:N without
duplicates
• Allows unbounded 1:N
• Multiple calls to read all
data (hold that thought…)
• Write call may write single
referenced document
30
How Do We Relate?
• ID or _self
• A matter of taste.
• _self will be more efficient (half as many RUs or better)
• Direction
• Manufacturer > Product. 1:N
• We have to update manufacturer every time we add a new product
• Products are unbounded
• Product > Manufacturer N:1
• We have to update product if manufacturer changes
• Manufacturers per product are bounded (1)
• Sometimes both makes sense.
31
The
Canonical
Polyglot
Online
Store
storage table
Azure SQL Database
Azure Search
Azure
Web Site
Document DB
storage blob
A Product Catalog
• Product
• Probably want to
search
• Hash index is fine
• May duplicate into
Azure Search
• Name (String 100)
• SKU (String 100 YYYYCCCNNNNN e.g. ‘2013MTB13435’)
• Description (HTML up to 8kb)
• Probably want to index in Azure Search
• Manufacturer (String 100)
• Do we ‘save space’ and push to an
• Price (Amount + Currency)
attachment?
a corewithin
lookup
field. Needs will
a
• •A Probably
sub
document
DocumentDB
•
Do
we
often
retrieve
Product
without
• Images (0-N Images Up to 100kb) allow
hashmultiple
index. base currencies.
description?
•Probably
How to doesn’t
we manage
precision?
How
deep
does the rabbit
hole •go?
Attachments
change
much
so de- it from
• ProductSizes• (0-N
including
a sort
order)
• We probably
do want
to exclude
•normalize
We couldthe
store
reversed?
currency
identifier
the
index
• Reviews (0-N• reviews,
+ •Up
to
10kb
text)
Do we embedReviewer
these?
•We
We
could
store
a
duplicate
reversed and
probably want price in Search….but…
• Dostrongly
we reference?typed
On product?
On
Both?
include/exclude.
• Attributes (0-N
complex
details)
• Ifreviewer/user?
we
are
providing
localized prices then
• Do we reference and embed? Say •embed
last 10?
Weconsistency
might
want issues;
to pull huge
Year out
into
have
churn
• Which direction does the referencewhen
go?
another
field and
range index.
we change
exchange
rates
• Almost certainly push to search.
33
The Promise of Schema Free
• Fully indexed complex type structures
• Ability to define schema independent of data store
• Reflect for editing and complex search filters
• Create templates to produce HTML from JSON for
editing and rendering
http://www.mchem.co.nz/msds/Tutti%20Frutti%20Disinfectant.pdf
http://www.toxinz.com/Demo
35
Programmability in DocumentDB
• Familiar constructs
• Stored procs, UDFs, triggers
• Transactional
• Each call to the service is in
ACID txn
• Uncaught exception to rollback
var helloWorldStoredProc = {
id: "helloRealWorld",
body: function () {
var context = getContext();
var response = context.getResponse();
• Sandboxed
•
•
•
•
No imports
No network calls
No Eval()
Resource governed
& time bound
response.setBody("Hello, Welcome To The Real World");
response.setBody("Here Be Dragons...");
response.setBody("Oh... and network latency");
}
}
37
Where To Use Programmability
• Reduce Network Calls
• Send multiple
documents & shred in a
SPROC
• Multi-Document
Transactions
• Each call in ACID txn
• No multi-statement txns
One REST call = One txn
• Transform & Join
• Pull content from
multiple docs. Perform
calculations
• JOIN operator intradoc
only
• Drive lazy processes
• Write journal entries
and process later
40
Worth Reading:
Replicated Data
Consistency
Explained Through
Baseball
By Doug Terry
MS Research
http://tiny.cc/cons-baseball
42
Tuning Consistency
• Database Accounts are configured with a default consistency
level. Consistency level can be weakened per read/query
request
Writes
Reads
Strong
sync quorum
writes
quorum
reads
Bounded
async
replication
quorum
reads
Session*
async
replication
session
bound
replica
Eventual
async
replication
any replica
• Four consistency levels
• STRONG – all writes are visible to all readers. Writes committed by
a majority quorum of replicas and reads are acknowledged by the
majority read quorum
• BOUNDED STALENESS – guaranteed ordering of writes, reads
adhere to minimum freshness. Writes are propagated
asynchronously, reads are acknowledged by majority quorum
lagging writes by at most N seconds or operations (configurable)
• SESSION (Default) – read your own writes. Writes are propagated
asynchronously while reads for a session are issued against the
single replica that can serve the requested version.
• EVENTUAL – reads eventually converge with writes. Writes are
propagated asynchronously while reads can be acknowledged by
any replica. Readers may view older data then previously
observed.
43
•Document DB is a preview service… expect
and enjoy change over time
•Think outside the relational model…
… if what you really want is an RDBMS
then use one of those…
45
http://techedmelbourne.hubb.me/Sessions/Details/19489
Thanks!
Don’t forget to complete
your evaluations
aka.ms/mytechedmel