DynamoDB Design Patterns and Best Practices(DynamoDB设计模式和最佳实践) by Rick Houlihan

AirJD

没有录音文件

00:00/00:00

加收藏

DynamoDB Design Patterns and Best Practices(DynamoDB设计模式和最佳实践) by Rick Houlihan

发布者 arch

发布于 1457312734769 浏览 9034 关键词 数据库, NoSQL, English

分享到

第1页

DynamoDB

Design Patterns and Best Practices

Rick Houlihan, Principal Solutions Architect

1/20/2016

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

第2页

What to expect from the session

• Brief history of data processing • DynamoDB Internals

• Tables, API, data types, indexes • Scaling and data modeling

• Design patterns and best practices • Event driven applications and DDB Streams

第3页

Timeline of Database Technology

第4页

Data Volume Since 2010

• 90% of stored data generated in last 2 years

• 1 Terabyte of data in 2010 equals 6.5 Petabytes today

• Linear correlation between data pressure and technical innovation

• No reason these trends will not continue over time

第5页

Technology Adoption and the Hype Curve

第6页

Why NoSQL?

SQL

Optimized for storage Normalized/relational Ad hoc queries Scale vertically Good for OLAP

NoSQL

Optimized for compute Denormalized/hierarchical Instantiated views Scale horizontally Built for OLTP at scale

第7页

SQL vs. NoSQL Access Pattern

第8页

Table

Attributes

Table Items

Mandatory Key-value access pattern Determines data distribution

Partition Sort Key Key

Optional Model 1:N relationships Enables rich query capabilities

All items for key ==, <, >, >=, <= “begins with” “between” “contains” “in” sorted results counts top/bottom N values

第9页

Partition Keys

Partition Key uniquely identifies an item Partition Key is used for building an unordered hash index Allows table to be partitioned for scale

Id = 1 Name = Jim

Hash (1) = 7B

Id = 2 Name = Andy Dept = Eng

Hash (2) = 48

Id = 3 Name = Kim Dept = Ops

Hash (3) = CD

54 55 Key Space A9 AA

FF

第10页

Partition:Sort Key

Partition:Sort Key uses two attributes together to uniquely identify an Item

Within unordered hash index, data is arranged by the sort key No limit on the number of items (∞) per partition key

• Except if you have local secondary indexes

Partition 1

Partition 2

Partition 3

00:0 54:∞ 55

A9:∞ AA

FF:∞

Customer# = 2 Order# = 10 Item = Pen

Customer# = 2 Order# = 11 Item = Shoes

Customer# = 1 Order# = 10 Item = Toy

Customer# = 1 Order# = 11 Item = Boots

Customer# = 3 Order# = 10 Item = Book

Customer# = 3 Order# = 11 Item = Paper

Hash (2) = 48

Hash (1) = 7B

Hash (3) = CD

第11页

Partitions are three-way replicated

Id = 2 Name = Andy Dept = Engg

Id = 1 Name = Jim

Id = 3 Name = Kim Dept = Ops

Id = 2 Name = Andy Dept = Engg

Id = 1 Name = Jim

Id = 3 Name = Kim Dept = Ops

Id = 2 Name = Andy Dept = Engg

Partition 1

Id = 1 Name = Jim

Partition 2

Id = 3 Name = Kim Dept = Ops

Partition N

Replica 1 Replica 2 Replica 3

第12页

Indexes

第13页

Local secondary index (LSI)

Alternate sort key attribute Index is local to a partition key

Table LSIs

A1 A2 A3 A4 A5 (partition) (sort)

10 GB max per partition key, i.e. LSIs limit the # of range keys!

A1 A3 (partition) (sort)

A2 (item key)

KEYS_ONLY

A1 A4 (partition) (sort)

A2 (item key)

A3 (projected) INCLUDE A3

A1 A5

A2

(partition) (sort) (item key)

A3 (projected)

A4 (projected)

ALL

第14页

Global secondary index (GSI)

Alternate partition and/or sort key

Index is across all partition keys

Online indexing

Table GSIs

A1 A2 A3 A4 A5 (partition)

RCUs/WCUs provisioned separately for GSIs

A2 (partition)

A1 (itemkey)

KEYS_ONLY

A5 A4 (partition) (sort)

A1 (item key)

A3 (projected)

INCLUDE A3

A4 A5

A1

A2

A3

(partition) (sort) (item key) (projected) (projected) ALL

第15页

How do GSI updates work?

Client

Table

2. Asynchronous PtPraimtPrbaimtrPlabeaSimrtrlabeeayiGmrclabeylorolaeynbrdyaal ry update (in progress)

Index

If GSIs don’t have enough write capacity, table writes will be throttled!

第16页

LSI or GSI?

LSI can be modeled as a GSI If data size in an item collection > 10 GB, use GSI If eventual consistency is okay for your scenario, use GSI!

第17页

Scaling

第18页

Scaling

Throughput

• Provision any amount of throughput to a table

Size

• Add any number of items to a table

• Max item size is 400 KB • LSIs limit the number of range keys due to 10 GB limit

Scaling is achieved through partitioning

第19页

Throughput

Provisioned at the table level

• Write capacity units (WCUs) are measured in 1 KB per second • Read capacity units (RCUs) are measured in 4 KB per second

• RCUs measure strictly consistent reads • Eventually consistent reads cost 1/2 of consistent reads

Read and write throughput limits are independent

RCU

WCU

第20页

Partitioning math

By Capacity By Size Total Partitions

Number of Partitions (Total RCU / 3000) + (Total WCU / 1000) Total Size / 10 GB CEILING(MAX (Capacity, Size))

In the future, these details might change…

第21页

Partitioning example Table size = 8 GB, RCUs = 5000, WCUs = 500

By Capacity By Size Total Partitions

Number of Partitions (5000 / 3000) + (500 / 1000) = 2.17 8 / 10 = 0.8 CEILING(MAX (2.17, 0.8)) = 3

RCUs and WCUs are uniformly spread across partitions

RCUs per partition = 5000/3 = 1666.67 WCUs per partition = 500/3 = 166.67 Data/partition = 10/3 = 3.33 GB

第22页

What causes throttling?

If sustained throughput goes beyond provisioned throughput per partition

Non-uniform workloads

• Hot keys/hot partitions • Very large bursts

Mixing hot data with cold data

• Use a table per time period

From the example before:

• Table created with 5000 RCUs, 500 WCUs • RCUs per partition = 1666.67 • WCUs per partition = 166.67 • If sustained throughput > (1666 RCUs or 166 WCUs) per key or partition,

DynamoDB may throttle requests

• Solution: Increase provisioned throughput

第23页

Partition

What bad NoSQL looks like…

Time

Heat

第24页

Getting the most out of DynamoDB throughput

“To get the most out of DynamoDB throughput, create tables where the hash key element has a large number of distinct values, and values are requested fairly uniformly, as randomly as possible.”

Space: access is evenly spread over the key-space

Time: requests arrive evenly spaced in time

—DynamoDB Developer Guide

第25页

Much better picture…

第26页

Data modeling

第27页

1:1 relationships or key-values

Use a table or GSI with an alternate partition key Use GetItem or BatchGetItem API Example: Given an SSN or license number, get attributes

Users Table Partiton key SSN = 123-45-6789 SSN = 987-65-4321

Attributes Email = johndoe@nowhere.com, License = TDL25478134 Email = maryfowler@somewhere.com, License = TDL78309234

Users-Email-GSI

Partition key

Attributes

License = TDL78309234 Email = maryfowler@somewhere.com, SSN = 987-65-4321

License = TDL25478134 Email = johndoe@nowhere.com, SSN = 123-45-6789

第28页

1:N relationships or parent-children

Use a table or GSI with partition and sort key Use Query API

Example:

• Given a device, find all readings between epoch X, Y

Partition Key DeviceId = 1 DeviceId = 1

Device-measurements

Sort key

Attributes

epoch = 5513A97C Temperature = 30, pressure = 90

epoch = 5513A9DB Temperature = 30, pressure = 90

第29页

N:M relationships

Use a table and GSI with partition and sort key elements switched Use Query API Example: Given a user, find all games. Or given a game, find all users.

User-Games-Table Partition Key Sort key UserId = bob GameId = Game1

UserId = fred GameId = Game2

UserId = bob GameId = Game3

Game-Users-GSI

Partition Key

Sort key

GameId = Game1 UserId = bob

GameId = Game2 UserId = fred

GameId = Game3 UserId = bob

第30页

Hierarchical Data

Tiered relational data structures

第31页

Hierarchical Data Structures as Items…

Use composite sort key to define a Hierarchy Highly selective result sets with sort queries Index anything, scales to any size

Primary Key

ProductID

type

1 bookID

2 albumID

2 albumID:trackID

2 albumID:trackID 2 albumID:trackID 3 movieID 3 movieID:actorID 3 movieID:actorID 3 movieID:actorID

title Ringworld

title Dark Side of the Moon

title

Speak to Me

title

Breathe

title On the Run

title Idiocracy

name Luke Wilson

name Maya Rudolph

name Dax Shepard

Attributes

author Larry Niven

artist Pink Floyd

length

1:30

length

2:43

length 3:30 genre Scifi Comedy character Joe Bowers character Rita character Frito Pendejo

genre Science Fiction

genre Progressive Rock

music

Mason

music

Waters, Gilmour, Wright

music Gilmour, Waters

writer Mike Judge

image img2.jpg

image img3.jpg

image img1.jpg

publisher Ballantine

label Harvest vocals

Instrumental

vocals

Gilmour

vocals Instrumental

producer 20th Century Fox

datePublished Oct-70 studio

Abbey Road

ISBN 0-345-02046-4

relesed producer 3/1/73 Pink Floyd

Items

第32页

… or as Documents (JSON)

JSON data types (M, L, BOOL, NULL) Document SDKs Available Indexing only via Streams/Lambda 400KB max item size (limits hierarchical data structure)

Primary Key ProductID

id bookID

id

albumID

id

movieID

title Ringworld

title

Dark Side of the Moon

title

Idiocracy

author Larry Niven

artist

Pink Floyd

genre

Scifi Comedy

Attributes

genre Science Fiction

genre

Progressive Rock

writer

Mike Judge

publisher datePublished

ISBN

Ballantine

Oct-70 0-345-02046-4

Attributes

{ label:"Harvest", studio: "Abbey Road", published: "3/1/73", producer: "Pink

Floyd", tracks: [{title: "Speak to Me", length: "1:30", music: "Mason", vocals:

"Instrumental"},{title: ”Breathe", length: ”2:43", music: ”Waters, Gilmour,

Wright", vocals: ”Gilmour"},{title: ”On the Run", length: “3:30", music: ”Gilmour,

Waters", vocals: "Instrumental"}]}

Attributes

{ producer: "20th Century Fox", actors: [{ name: "Luke Wilson", dob: "9/21/71", character: "Joe Bowers", image: "img2.jpg"},{ name: "Maya Rudolph", dob: "7/27/72", character: "Rita", image: "img1.jpg"},{ name: "Dax Shepard", dob: "1/2/75", character: "Frito Pendejo", image: "img3.jpg"}]

Items

第33页

Scenarios and best practices

第34页

Event logging

Storing time series data

第35页

Time series tables

Hot data

Cold data

Current table

Events_table_2015_April Event_id Timestamp Attribute1 …. Attribute N

RCUs = 10000

(Partition) (Sort)

WCUs = 10000

Events_table_2015_March Event_id Timestamp Attribute1 …. Attribute N

(Partition) (Sort)

RCUs = 1000 WCUs = 1

Older tables

Events_table_2015_Feburary Event_id Timestamp Attribute1 …. Attribute N (Partition) (Sort)

RCUs = 100 WCUs = 1

Events_table_2015_January Event_id Timestamp Attribute1 …. Attribute N

(Partition) (Sort)

RCUs = 10 WCUs = 1

Don’t mix hot and cold data; archive cold data to Amazon S3

第36页

Use a table per time period

Pre-create daily, weekly, monthly tables Provision required throughput for current table Writes go to the current table Turn off (or reduce) throughput for older tables

Dealing with time series data

第37页

Product catalog

Popular items (read)

第38页

Scaling bottlenecks

SELECT Id, Description, ... FROM ProductCatalog WHERE Id="POPULAR_PRODUCT"

Shoppers

Partition 1 2000 RCUs

Product A

Partition K 2000 RCUs

Partition M 2000 RCUs

Product B ProductCatalog Table

Partition 50 2000 RCU

第40页

Cache popular items

SELECT Id, Description, ... FROM ProductCatalog WHERE Id="POPULAR_PRODUCT"

User

User

DynamoDB

Partition 1

Partition 2

ProductCatalog Table

第42页

Messaging app

Large items Filters vs. indexes M:N Modeling—inbox and outbox

第43页

Messages App

Inbox

SELECT * FROM Messages WHERE Recipient='David' LIMIT 50 ORDER BY Date DESC

Messages Table

David

Outbox

SELECT * FROM Messages WHERE Sender ='David' LIMIT 50 ORDER BY Date DESC

第44页

Large and small attributes mixed

Partition key

Sort key

Messages Table

Recipient Date

Sender Message

David

2014-10-02 Bob

…

Inbox

David

SELECT * FROM Messages WHERE Recipient='David' LIMIT 50 ORDER BY Date DESC

… 48 more messages for David …

50 items × 256 KB each

David

2014-10-03 Alice …

Alice

2014-09-28 Bob

…

Alice

2014-10-01 Carol

(Many more messages)

…

Large message bodies Attachments

第45页

Computing inbox query cost

50 * 256KB * (1 RCU / 4KB) * (1 / 2) = 1600 RCU

Average item size

Eventually consistent reads

Items evaluated by query

Conversion ratio

第46页

Separate the bulk data

Uniformly distributes large item reads

(50 sequential items at 128 bytes)

1. Query Inbox-GSI: 1 RCU 2. BatchGetItem Messages: 1600 RCU

(50 separate items at 256 KB)

David

Inbox-GSI

Recipient David David Alice Alice

Date 2014-10-02 2014-10-03 2014-09-28 2014-10-01

Sender Bob Alice Bob Carol

Subject Hi!… RE: The… FW: Ok…

Hi!...

MsgId afed 3kf8 9d2b ct7r

Messages Table

MsgId 9d2b 3kf8 ct7r afed

Body … … … …

第47页

Inbox GSI Define which attributes to copy into the index

第48页

Outbox GSI

Outbox Sender

SELECT * FROM Messages WHERE Sender ='David' LIMIT 50 ORDER BY Date DESC

第49页

Messaging app

Inbox

David

Inbox Global secondary

index

Messages Table

Outbox

Outbox Global secondary

index

第50页

Distribute large items

Reduce one-to-many item sizes Configure secondary index projections Use GSIs to model M:N relationship between sender and recipient

Outbox

Messages

Inbox

Querying many large items at once

第51页

Multiplayer online gaming

Query filters vs. composite key indexes

第52页

Hierarchical Data Structures

Games Table

Partition key

GameId d9bl3 72f49 o2pnb b932s ef9ca

Date 2014-10-02 2014-09-30 2014-10-08 2014-10-03 2014-10-03

Host David Alice Bob Carol David

Opponent Alice Bob Carol Bob Bob

Status DONE PENDING IN_PROGRESS PENDING IN_PROGRESS

第53页

Query for incoming game requests

DynamoDB indexes provide partiton and sort What about queries for two equalities and a sort?

(hash)

(?) (range)

SELECT * FROM Game WHERE Opponent='Bob‘ AND Status=‘PENDING' ORDER BY Date DESC

第54页

Approach 1: Query filter

Partition key

Sort key

Bob

Secondary Index

Opponent Alice Carol Bob Bob Bob

Date 2014-10-02 2014-10-08 2014-09-30 2014-10-03 2014-10-03

GameId d9bl3 o2pnb 72f49 b932s ef9ca

Status DONE IN_PROGRESS PENDING PENDING IN_PROGRESS

Host David Bob Alice Carol David

第55页

Approach 1: Query filter

SELECT * FROM Game WHERE Opponent='Bob' ORDER BY Date DESC FILTER ON Status='PENDING'

Secondary Index

Opponent Alice Carol Bob Bob Bob

Date 2014-10-02 2014-10-08 2014-09-30 2014-10-03 2014-10-03

GameId d9bl3 o2pnb 72f49 b932s ef9ca

Status DONE IN_PROGRESS PENDING PENDING IN_PROGRESS

Host David Bob Alice Carol David

Bob

(filtered out)

第56页

Needle in a haystack

Bob

第57页

Use query filter

Send back less data “on the wire” Simplify application code Simple SQL-like expressions

• AND, OR, NOT, ()

Your index isn’t entirely selective

第58页

Approach 2: Composite key

Status DONE IN_PROGRESS IN_PROGRESS PENDING PENDING

Date 2014-10-02

+ 2014-10-08 2014-10-03 2014-10-03 2014-09-30

StatusDate DONE_2014-10-02

= IN_PROGRESS_2014-10-08 IN_PROGRESS_2014-10-03 PENDING_2014-09-30 PENDING_2014-10-03

第59页

Approach 2: Composite key

Partition key

Sort key

Secondary Index

Opponent Alice Carol Bob Bob Bob

StatusDate DONE_2014-10-02 IN_PROGRESS_2014-10-08 IN_PROGRESS_2014-10-03 PENDING_2014-09-30 PENDING_2014-10-03

GameId d9bl3 o2pnb ef9ca 72f49 b932s

Host David Bob David Alice Carol

第60页

Approach 2: Composite key

SELECT * FROM Game WHERE Opponent='Bob'

AND StatusDate BEGINS_WITH 'PENDING'

Secondary Index

Opponent Alice Carol Bob Bob Bob

StatusDate DONE_2014-10-02 IN_PROGRESS_2014-10-08 IN_PROGRESS_2014-10-03 PENDING_2014-09-30 PENDING_2014-10-03

GameId d9bl3 o2pnb ef9ca 72f49 b932s

Host David Bob David Alice Carol

Bob

第61页

Needle in a sorted haystack

Bob

第62页

Sparse indexes

Game-scores-table

Id (Partition)

User

Game Score Date

1 Bob G1 1300 2012-12-23

2 Bob G1 1450 2012-12-23 3 Jay G1 1600 2012-12-24

4 Mary G1 2000 2012-10-24 5 Ryan G2 123 2012-03-10 6 Jones G2 345 2012-03-20

Scan sparse GSIs

Award-GSI

Award

Award (Partition)

Id

User Score

Champ 4 Mary 2000

Champ

第63页

Real-Time voting

Write-heavy items

第64页

Scaling bottlenecks

Partition 1 1000 WCUs

Candidate A

Partition K 1000 WCUs

Voters

Provision 200,000 WCUs

Partition M 1000 WCUs

Candidate B Votes Table

Partition N 1000 WCUs

第65页

Write sharding

Voter

Candidate A_7 Candidate A_1 Candidate A_4

Candidate A_3 Candidate A_2

Candidate A_5

Candidate A_6 Candidate A_8

Votes Table

Candidate B_4

Candidate B_8

Candidate B_1

Candidate B_5

Candidate B_3

Candidate B_7

Candidate B_2

Candidate B_6

第66页

Write sharding

Voter

UpdateItem: “CandidateA_” + rand(0, 10) ADD 1 to Votes

Candidate A_7 Candidate A_1 Candidate A_4

Candidate A_3 Candidate A_2

Candidate A_5

Candidate A_6 Candidate A_8

Votes Table

Candidate B_4

Candidate B_8

Candidate B_1

Candidate B_5

Candidate B_3

Candidate B_7

Candidate B_2

Candidate B_6

第67页

Shard aggregation

2. Store

Periodic

Process 1. Sum

Voter

Candidate A_7 Candidate A_1 Candidate A_4

Candidate A Total: 2.5M

Candidate A_3 Candidate A_2

Candidate A_5

Candidate A_6 Candidate A_8

Votes Table

Candidate B_4

Candidate B_8

Candidate B_1

Candidate B_5

Candidate B_3

Candidate B_7

Candidate B_2

Candidate B_6

第68页

Shard write-heavy partition keys

Trade off read cost for write scalability Consider throughput per partition key

Your write workload is not horizontally scalable

第69页

Replace filter with indexes

Concatenate attributes to form useful secondary index keys

Take advantage of sparse indexes

Status + Date

You want to optimize a query as much as possible

第70页

DynamoDB Streams

第71页

DynamoDB Streams

Stream of updates to a table Asynchronous Exactly once Strictly ordered

• Per item

Highly durable

• Scale with table

24-hour lifetime Sub-second latency

第72页

View types

UpdateItem (Name = John, Destination = Pluto)

View Type Old image—before update

Destination

Name = John, Destination = Mars

New image—after update

Name = John, Destination = Pluto

Old and new images Keys only

Name = John, Destination = Mars Name = John, Destination = Pluto

Name = John

第73页

DynamoDB Streams and Amazon Kinesis Client Library

Partition 1

DynamoDB Client Application

Updates

Partition 2

Partition 3

Partition 4

Table

Partition 5

Table

Shard 1 Shard 2 Shard 3 Shard 4

Stream

KCL Worker

KCL Worker

KCL Worker

KCL Worker

Amazon Kinesis Client Library Application

第74页

Cross-region replication

US East (N. Virginia)

DynamoDB Streams

Asia Pacific (Sydney)

Open Source CrossRegion Replication Library

EU (Ireland) Replica

第75页

DynamoDB Streams and AWS Lambda

第76页

Triggers

Derivative Tables ElastiCache

Lambda function

CloudSearch Notify change

第77页

Analytics with DynamoDB Streams

Collect and de-dupe data in DynamoDB Aggregate data in-memory and flush periodically

Performing real-time aggregation and analytics

第78页

Architecture

第79页

Reference Architecture

第80页

Elastic Event Driven Applications

第81页

Thank you!