第1页
DynamoDB
Design Patterns and Best Practices
Rick Houlihan, Principal Solutions Architect
1/20/2016
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
第2页
What to expect from the session
• Brief history of data processing • DynamoDB Internals
• Tables, API, data types, indexes • Scaling and data modeling
• Design patterns and best practices • Event driven applications and DDB Streams
第3页
Timeline of Database Technology
第4页
Data Volume Since 2010
• 90% of stored data generated in last 2 years
• 1 Terabyte of data in 2010 equals 6.5 Petabytes today
• Linear correlation between data pressure and technical innovation
• No reason these trends will not continue over time
第5页
Technology Adoption and the Hype Curve
第6页
Why NoSQL?
SQL
Optimized for storage Normalized/relational Ad hoc queries Scale vertically Good for OLAP
NoSQL
Optimized for compute Denormalized/hierarchical Instantiated views Scale horizontally Built for OLTP at scale
第7页
SQL vs. NoSQL Access Pattern
第8页
Table
Attributes
Table Items
Mandatory Key-value access pattern Determines data distribution
Partition Sort Key Key
Optional Model 1:N relationships Enables rich query capabilities
All items for key ==, <, >, >=, <= “begins with” “between” “contains” “in” sorted results counts top/bottom N values
第9页
Partition Keys
Partition Key uniquely identifies an item Partition Key is used for building an unordered hash index Allows table to be partitioned for scale
Id = 1 Name = Jim
Hash (1) = 7B
Id = 2 Name = Andy Dept = Eng
Hash (2) = 48
Id = 3 Name = Kim Dept = Ops
Hash (3) = CD
54 55 Key Space A9 AA
FF
第10页
Partition:Sort Key
Partition:Sort Key uses two attributes together to uniquely identify an Item
Within unordered hash index, data is arranged by the sort key No limit on the number of items (∞) per partition key
• Except if you have local secondary indexes
Partition 1
Partition 2
Partition 3
00:0 54:∞ 55
A9:∞ AA
FF:∞
Customer# = 2 Order# = 10 Item = Pen
Customer# = 2 Order# = 11 Item = Shoes
Customer# = 1 Order# = 10 Item = Toy
Customer# = 1 Order# = 11 Item = Boots
Customer# = 3 Order# = 10 Item = Book
Customer# = 3 Order# = 11 Item = Paper
Hash (2) = 48
Hash (1) = 7B
Hash (3) = CD
第11页
Partitions are three-way replicated
Id = 2 Name = Andy Dept = Engg
Id = 1 Name = Jim
Id = 3 Name = Kim Dept = Ops
Id = 2 Name = Andy Dept = Engg
Id = 1 Name = Jim
Id = 3 Name = Kim Dept = Ops
Id = 2 Name = Andy Dept = Engg
Partition 1
Id = 1 Name = Jim
Partition 2
Id = 3 Name = Kim Dept = Ops
Partition N
Replica 1 Replica 2 Replica 3
第12页
Indexes
第13页
Local secondary index (LSI)
Alternate sort key attribute Index is local to a partition key
Table LSIs
A1 A2 A3 A4 A5 (partition) (sort)
10 GB max per partition key, i.e. LSIs limit the # of range keys!
A1 A3 (partition) (sort)
A2 (item key)
KEYS_ONLY
A1 A4 (partition) (sort)
A2 (item key)
A3 (projected) INCLUDE A3
A1 A5
A2
(partition) (sort) (item key)
A3 (projected)
A4 (projected)
ALL
第14页
Global secondary index (GSI)
Alternate partition and/or sort key
Index is across all partition keys
Online indexing
Table GSIs
A1 A2 A3 A4 A5 (partition)
RCUs/WCUs provisioned separately for GSIs
A2 (partition)
A1 (itemkey)
KEYS_ONLY
A5 A4 (partition) (sort)
A1 (item key)
A3 (projected)
INCLUDE A3
A4 A5
A1
A2
A3
(partition) (sort) (item key) (projected) (projected) ALL
第15页
How do GSI updates work?
Client
Table
2. Asynchronous PtPraimtPrbaimtrPlabeaSimrtrlabeeayiGmrclabeylorolaeynbrdyaal ry update (in progress)
Index
If GSIs don’t have enough write capacity, table writes will be throttled!
第16页
LSI or GSI?
LSI can be modeled as a GSI If data size in an item collection > 10 GB, use GSI If eventual consistency is okay for your scenario, use GSI!
第17页
Scaling
第18页
Scaling
Throughput
• Provision any amount of throughput to a table
Size
• Add any number of items to a table
• Max item size is 400 KB • LSIs limit the number of range keys due to 10 GB limit
Scaling is achieved through partitioning
第19页
Throughput
Provisioned at the table level
• Write capacity units (WCUs) are measured in 1 KB per second • Read capacity units (RCUs) are measured in 4 KB per second
• RCUs measure strictly consistent reads • Eventually consistent reads cost 1/2 of consistent reads
Read and write throughput limits are independent
RCU
WCU
第20页
Partitioning math
By Capacity By Size Total Partitions
Number of Partitions (Total RCU / 3000) + (Total WCU / 1000) Total Size / 10 GB CEILING(MAX (Capacity, Size))
In the future, these details might change…
第21页
Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500
By Capacity By Size Total Partitions
Number of Partitions (5000 / 3000) + (500 / 1000) = 2.17 8 / 10 = 0.8 CEILING(MAX (2.17, 0.8)) = 3
RCUs and WCUs are uniformly spread across partitions
RCUs per partition = 5000/3 = 1666.67 WCUs per partition = 500/3 = 166.67 Data/partition = 10/3 = 3.33 GB
第22页
What causes throttling?
If sustained throughput goes beyond provisioned throughput per partition
Non-uniform workloads
• Hot keys/hot partitions • Very large bursts
Mixing hot data with cold data
• Use a table per time period
From the example before:
• Table created with 5000 RCUs, 500 WCUs • RCUs per partition = 1666.67 • WCUs per partition = 166.67 • If sustained throughput > (1666 RCUs or 166 WCUs) per key or partition,
DynamoDB may throttle requests
• Solution: Increase provisioned throughput
第23页
Partition
What bad NoSQL looks like…
Time
Heat
第24页
Getting the most out of DynamoDB throughput
“To get the most out of DynamoDB throughput, create tables where the hash key element has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible.”
Space: access is evenly spread over the key-space
Time: requests arrive evenly spaced in time
—DynamoDB Developer Guide
第25页
Much better picture…
第26页
Data modeling
第27页
1:1 relationships or key-values
Use a table or GSI with an alternate partition key Use GetItem or BatchGetItem API Example: Given an SSN or license number, get attributes
Users Table Partiton key SSN = 123-45-6789 SSN = 987-65-4321
Attributes Email = johndoe@nowhere.com, License = TDL25478134 Email = maryfowler@somewhere.com, License = TDL78309234
Users-Email-GSI
Partition key
Attributes
License = TDL78309234 Email = maryfowler@somewhere.com, SSN = 987-65-4321
License = TDL25478134 Email = johndoe@nowhere.com, SSN = 123-45-6789
第28页
1:N relationships or parent-children
Use a table or GSI with partition and sort key Use Query API
Example:
• Given a device, find all readings between epoch X, Y
Partition Key DeviceId = 1 DeviceId = 1
Device-measurements
Sort key
Attributes
epoch = 5513A97C Temperature = 30, pressure = 90
epoch = 5513A9DB Temperature = 30, pressure = 90
第29页
N:M relationships
Use a table and GSI with partition and sort key elements switched Use Query API Example: Given a user, find all games. Or given a game, find all users.
User-Games-Table Partition Key Sort key UserId = bob GameId = Game1
UserId = fred GameId = Game2
UserId = bob GameId = Game3
Game-Users-GSI
Partition Key
Sort key
GameId = Game1 UserId = bob
GameId = Game2 UserId = fred
GameId = Game3 UserId = bob
第30页
Hierarchical Data
Tiered relational data structures
第31页
Hierarchical Data Structures as Items…
Use composite sort key to define a Hierarchy Highly selective result sets with sort queries Index anything, scales to any size
Primary Key
ProductID
type
1 bookID
2 albumID
2 albumID:trackID
2 albumID:trackID 2 albumID:trackID 3 movieID 3 movieID:actorID 3 movieID:actorID 3 movieID:actorID
title Ringworld
title Dark Side of the Moon
title
Speak to Me
title
Breathe
title On the Run
title Idiocracy
name Luke Wilson
name Maya Rudolph
name Dax Shepard
Attributes
author Larry Niven
artist Pink Floyd
length
1:30
length
2:43
length 3:30 genre Scifi Comedy character Joe Bowers character Rita character Frito Pendejo
genre Science Fiction
genre Progressive Rock
music
Mason
music
Waters, Gilmour, Wright
music Gilmour, Waters
writer Mike Judge
image img2.jpg
image img3.jpg
image img1.jpg
publisher Ballantine
label Harvest vocals
Instrumental
vocals
Gilmour
vocals Instrumental
producer 20th Century Fox
datePublished Oct-70 studio
Abbey Road
ISBN 0-345-02046-4
relesed producer 3/1/73 Pink Floyd
Items
第32页
… or as Documents (JSON)
JSON data types (M, L, BOOL, NULL) Document SDKs Available Indexing only via Streams/Lambda 400KB max item size (limits hierarchical data structure)
Primary Key ProductID
id bookID
id
albumID
id
movieID
title Ringworld
title
Dark Side of the Moon
title
Idiocracy
author Larry Niven
artist
Pink Floyd
genre
Scifi Comedy
Attributes
genre Science Fiction
genre
Progressive Rock
writer
Mike Judge
publisher datePublished
ISBN
Ballantine
Oct-70 0-345-02046-4
Attributes
{ label:"Harvest", studio: "Abbey Road", published: "3/1/73", producer: "Pink
Floyd", tracks: [{title: "Speak to Me", length: "1:30", music: "Mason", vocals:
"Instrumental"},{title: ”Breathe", length: ”2:43", music: ”Waters, Gilmour,
Wright", vocals: ”Gilmour"},{title: ”On the Run", length: “3:30", music: ”Gilmour,
Waters", vocals: "Instrumental"}]}
Attributes
{ producer: "20th Century Fox", actors: [{ name: "Luke Wilson", dob: "9/21/71", character: "Joe Bowers", image: "img2.jpg"},{ name: "Maya Rudolph", dob: "7/27/72", character: "Rita", image: "img1.jpg"},{ name: "Dax Shepard", dob: "1/2/75", character: "Frito Pendejo", image: "img3.jpg"}]
Items
第33页
Scenarios and best practices
第34页
Event logging
Storing time series data
第35页
Time series tables
Hot data
Cold data
Current table
Events_table_2015_April Event_id Timestamp Attribute1 …. Attribute N
RCUs = 10000
(Partition) (Sort)
WCUs = 10000
Events_table_2015_March Event_id Timestamp Attribute1 …. Attribute N
(Partition) (Sort)
RCUs = 1000 WCUs = 1
Older tables
Events_table_2015_Feburary Event_id Timestamp Attribute1 …. Attribute N (Partition) (Sort)
RCUs = 100 WCUs = 1
Events_table_2015_January Event_id Timestamp Attribute1 …. Attribute N
(Partition) (Sort)
RCUs = 10 WCUs = 1
Don’t mix hot and cold data; archive cold data to Amazon S3
第36页
Use a table per time period
Pre-create daily, weekly, monthly tables Provision required throughput for current table Writes go to the current table Turn off (or reduce) throughput for older tables
Dealing with time series data
第37页
Product catalog
Popular items (read)
第38页
Scaling bottlenecks
SELECT Id, Description, ... FROM ProductCatalog WHERE Id="POPULAR_PRODUCT"
Shoppers
Partition 1 2000 RCUs
Product A
Partition K 2000 RCUs
Partition M 2000 RCUs
Product B ProductCatalog Table
Partition 50 2000 RCU
第40页
Cache popular items
SELECT Id, Description, ... FROM ProductCatalog WHERE Id="POPULAR_PRODUCT"
User
User
DynamoDB
Partition 1
Partition 2
ProductCatalog Table
第42页
Messaging app
Large items Filters vs. indexes M:N Modeling—inbox and outbox
第43页
Messages App
Inbox
SELECT * FROM Messages WHERE Recipient='David' LIMIT 50 ORDER BY Date DESC
Messages Table
David
Outbox
SELECT * FROM Messages WHERE Sender ='David' LIMIT 50 ORDER BY Date DESC
第44页
Large and small attributes mixed
Partition key
Sort key
Messages Table
Recipient Date
Sender Message
David
2014-10-02 Bob
…
Inbox
David
SELECT * FROM Messages WHERE Recipient='David' LIMIT 50 ORDER BY Date DESC
… 48 more messages for David …
50 items × 256 KB each
David
2014-10-03 Alice …
Alice
2014-09-28 Bob
…
Alice
2014-10-01 Carol
(Many more messages)
…
Large message bodies Attachments
第45页
Computing inbox query cost
50 * 256KB * (1 RCU / 4KB) * (1 / 2) = 1600 RCU
Average item size
Eventually consistent reads
Items evaluated by query
Conversion ratio
第46页
Separate the bulk data
Uniformly distributes large item reads
(50 sequential items at 128 bytes)
1. Query Inbox-GSI: 1 RCU 2. BatchGetItem Messages: 1600 RCU
(50 separate items at 256 KB)
David
Inbox-GSI
Recipient David David Alice Alice
Date 2014-10-02 2014-10-03 2014-09-28 2014-10-01
Sender Bob Alice Bob Carol
Subject Hi!… RE: The… FW: Ok…
Hi!...
MsgId afed 3kf8 9d2b ct7r
Messages Table
MsgId 9d2b 3kf8 ct7r afed
Body … … … …
第47页
Inbox GSI Define which attributes to copy into the index
第48页
Outbox GSI
Outbox Sender
SELECT * FROM Messages WHERE Sender ='David' LIMIT 50 ORDER BY Date DESC
第49页
Messaging app
Inbox
David
Inbox Global secondary
index
Messages Table
Outbox
Outbox Global secondary
index
第50页
Distribute large items
Reduce one-to-many item sizes Configure secondary index projections Use GSIs to model M:N relationship between sender and recipient
Outbox
Messages
Inbox
Querying many large items at once
第51页
Multiplayer online gaming
Query filters vs. composite key indexes
第52页
Hierarchical Data Structures
Games Table
Partition key
GameId d9bl3 72f49 o2pnb b932s ef9ca
Date 2014-10-02 2014-09-30 2014-10-08 2014-10-03 2014-10-03
Host David Alice Bob Carol David
Opponent Alice Bob Carol Bob Bob
Status DONE PENDING IN_PROGRESS PENDING IN_PROGRESS
第53页
Query for incoming game requests
DynamoDB indexes provide partiton and sort What about queries for two equalities and a sort?
(hash)
(?) (range)
SELECT * FROM Game WHERE Opponent='Bob‘ AND Status=‘PENDING' ORDER BY Date DESC
第54页
Approach 1: Query filter
Partition key
Sort key
Bob
Secondary Index
Opponent Alice Carol Bob Bob Bob
Date 2014-10-02 2014-10-08 2014-09-30 2014-10-03 2014-10-03
GameId d9bl3 o2pnb 72f49 b932s ef9ca
Status DONE IN_PROGRESS PENDING PENDING IN_PROGRESS
Host David Bob Alice Carol David
第55页
Approach 1: Query filter
SELECT * FROM Game WHERE Opponent='Bob' ORDER BY Date DESC FILTER ON Status='PENDING'
Secondary Index
Opponent Alice Carol Bob Bob Bob
Date 2014-10-02 2014-10-08 2014-09-30 2014-10-03 2014-10-03
GameId d9bl3 o2pnb 72f49 b932s ef9ca
Status DONE IN_PROGRESS PENDING PENDING IN_PROGRESS
Host David Bob Alice Carol David
Bob
(filtered out)
第56页
Needle in a haystack
Bob
第57页
Use query filter
Send back less data “on the wire” Simplify application code Simple SQL-like expressions
• AND, OR, NOT, ()
Your index isn’t entirely selective
第58页
Approach 2: Composite key
Status DONE IN_PROGRESS IN_PROGRESS PENDING PENDING
Date 2014-10-02
+ 2014-10-08 2014-10-03 2014-10-03 2014-09-30
StatusDate DONE_2014-10-02
= IN_PROGRESS_2014-10-08 IN_PROGRESS_2014-10-03 PENDING_2014-09-30 PENDING_2014-10-03
第59页
Approach 2: Composite key
Partition key
Sort key
Secondary Index
Opponent Alice Carol Bob Bob Bob
StatusDate DONE_2014-10-02 IN_PROGRESS_2014-10-08 IN_PROGRESS_2014-10-03 PENDING_2014-09-30 PENDING_2014-10-03
GameId d9bl3 o2pnb ef9ca 72f49 b932s
Host David Bob David Alice Carol
第60页
Approach 2: Composite key
SELECT * FROM Game WHERE Opponent='Bob'
AND StatusDate BEGINS_WITH 'PENDING'
Secondary Index
Opponent Alice Carol Bob Bob Bob
StatusDate DONE_2014-10-02 IN_PROGRESS_2014-10-08 IN_PROGRESS_2014-10-03 PENDING_2014-09-30 PENDING_2014-10-03
GameId d9bl3 o2pnb ef9ca 72f49 b932s
Host David Bob David Alice Carol
Bob
第61页
Needle in a sorted haystack
Bob
第62页
Sparse indexes
Game-scores-table
Id (Partition)
User
Game Score Date
1 Bob G1 1300 2012-12-23
2 Bob G1 1450 2012-12-23 3 Jay G1 1600 2012-12-24
4 Mary G1 2000 2012-10-24 5 Ryan G2 123 2012-03-10 6 Jones G2 345 2012-03-20
Scan sparse GSIs
Award-GSI
Award
Award (Partition)
Id
User Score
Champ 4 Mary 2000
Champ
第63页
Real-Time voting
Write-heavy items
第64页
Scaling bottlenecks
Partition 1 1000 WCUs
Candidate A
Partition K 1000 WCUs
Voters
Provision 200,000 WCUs
Partition M 1000 WCUs
Candidate B Votes Table
Partition N 1000 WCUs
第65页
Write sharding
Voter
Candidate A_7 Candidate A_1 Candidate A_4
Candidate A_3 Candidate A_2
Candidate A_5
Candidate A_6 Candidate A_8
Votes Table
Candidate B_4
Candidate B_8
Candidate B_1
Candidate B_5
Candidate B_3
Candidate B_7
Candidate B_2
Candidate B_6
第66页
Write sharding
Voter
UpdateItem: “CandidateA_” + rand(0, 10) ADD 1 to Votes
Candidate A_7 Candidate A_1 Candidate A_4
Candidate A_3 Candidate A_2
Candidate A_5
Candidate A_6 Candidate A_8
Votes Table
Candidate B_4
Candidate B_8
Candidate B_1
Candidate B_5
Candidate B_3
Candidate B_7
Candidate B_2
Candidate B_6
第67页
Shard aggregation
2. Store
Periodic
Process 1. Sum
Voter
Candidate A_7 Candidate A_1 Candidate A_4
Candidate A Total: 2.5M
Candidate A_3 Candidate A_2
Candidate A_5
Candidate A_6 Candidate A_8
Votes Table
Candidate B_4
Candidate B_8
Candidate B_1
Candidate B_5
Candidate B_3
Candidate B_7
Candidate B_2
Candidate B_6
第68页
Shard write-heavy partition keys
Trade off read cost for write scalability Consider throughput per partition key
Your write workload is not horizontally scalable
第69页
Replace filter with indexes
Concatenate attributes to form useful secondary index keys
Take advantage of sparse indexes
Status + Date
You want to optimize a query as much as possible
第70页
DynamoDB Streams
第71页
DynamoDB Streams
Stream of updates to a table Asynchronous Exactly once Strictly ordered
• Per item
Highly durable
• Scale with table
24-hour lifetime Sub-second latency
第72页
View types
UpdateItem (Name = John, Destination = Pluto)
View Type Old image—before update
Destination
Name = John, Destination = Mars
New image—after update
Name = John, Destination = Pluto
Old and new images Keys only
Name = John, Destination = Mars Name = John, Destination = Pluto
Name = John
第73页
DynamoDB Streams and Amazon Kinesis Client Library
Partition 1
DynamoDB Client Application
Updates
Partition 2
Partition 3
Partition 4
Table
Partition 5
Table
Shard 1 Shard 2 Shard 3 Shard 4
Stream
KCL Worker
KCL Worker
KCL Worker
KCL Worker
Amazon Kinesis Client Library Application
第74页
Cross-region replication
US East (N. Virginia)
DynamoDB Streams
Asia Pacific (Sydney)
Open Source CrossRegion Replication Library
EU (Ireland) Replica
第75页
DynamoDB Streams and AWS Lambda
第76页
Triggers
Derivative Tables ElastiCache
Lambda function
CloudSearch Notify change
第77页
Analytics with DynamoDB Streams
Collect and de-dupe data in DynamoDB Aggregate data in-memory and flush periodically
Performing real-time aggregation and analytics
第78页
Architecture
第79页
Reference Architecture
第80页
Elastic Event Driven Applications
第81页
Thank you!