第1页
Nate Slater, AWS Solutions Architect
July 30, 2015
Introduction to DynamoDB
第2页
Agenda
What is DynamoDB?
DynamoDB Fundamentals
Typical Workloads and Use Cases
Demo
第3页
What is DynamoDB?
第4页
What is DynamoDB?
DynamoDB is a fully managed, NoSQL document and key-value data store.
第5页
What is NoSQL?
NoSQL is a term to describe data stores that trade full ACID compliance for high availability and scale.
A
C
I
D
solation
urability
onsistency
tomicity
Single row/single item only
Eventual consistency
Dirty Read
Data replication on commodity storage
第6页
Why NoSQL?
Dirty Reads?
Eventual Consistency?
Single row transactions only?
Why would anybody trade ACID compliance for this?
第7页
NoSQL – Availability and Scale
Traditional SQL
NoSQL
DB
Primary
Secondary
Scale Up
DB
DB
DB
DB
DB
DB
Scale Out
第8页
Scale Up vs Scale Out
Scale-Up
Scale-Out
Cost
Complexity
第9页
The CAP Theorem
Network partitions will happen in distributed systems:
DB
DB
DB
DB
DB
Consistency
Availability
Partition Tolerance
C
A
P
CA
AP
CP
第10页
Why NoSQL?
Horizontal Scaling allows for infinite scalability
Cheaper to scale out than to scale up
Full consistency or availability that can survive a network partition
Full ACID compliance is often not needed
第11页
What is DynamoDB?
DynamoDB is a fully managed, NoSQL document and key-value data store.
第12页
What is a Managed Service?
A managed service is a web service in which consumers of the service never need to interact directly with the underlying compute, storage, and network resources.
第13页
Why use a Managed Service?
第14页
DynamoDB is a Managed Service
AWS runs all the database infrastructure for you!
All the benefits and none of the operational overhead of running a distributed system:
Infinitely scalable read and write I/O
High availability within a region
Data durably stored in 3 availability zones
Cross-region replication
Easily export data to S3
Triggers using Lambda functions
Tight integration with Kinesis, Lambda, EMR, and Redshift
Pay only for what you use, when you need it
第15页
DynamoDB Fundamentals
第16页
DynamoDB Table
Table
Items
Attributes
Hash
Key
Range
Key
Mandatory
Key-value access pattern
Determines data distribution
Optional
Model 1:N relationships
Enables rich query capabilities
第17页
Data types
String (S)
Number (N)
Binary (B)
String Set (SS)
Number Set (NS)
Binary Set (BS)
Boolean (BOOL)
Null (NULL)
List (L)
Map (M)
Used for storing nested JSON documents
第18页
Hash table
Hash key uniquely identifies an item
Hash key is used for building an unordered hash index
Table can be partitioned for scale
Id = 1
Name = Jim
Hash (1) = 7B
Id = 2
Name = Andy
Dept = Engg
Hash (2) = 48
Id = 3
Name = Kim
Dept = Ops
Hash (3) = CD
Key Space
第19页
Partitions are three-way replicated
Replica 1
Replica 2
Replica 3
Partition 1
Partition 2
Partition N
第20页
Hash-range table
Hash key and range key together uniquely identify an Item.
Within unordered hash index, data is sorted by the range key.
No limit on the number of items (∞) per hash key.
Unless you have local secondary indexes
00:0
FF:∞
Hash (2) = 48
Customer# = 2
Order# = 10
Item = Pen
Customer# = 2
Order# = 11
Item = Shoes
Customer# = 1
Order# = 10
Item = Toy
Customer# = 1
Order# = 11
Item = Boots
Hash (1) = 7B
Customer# = 3
Order# = 10
Item = Book
Customer# = 3
Order# = 11
Item = Paper
Hash (3) = CD
55
A9:∞
54:∞
AA
Partition 1
Partition 2
Partition 3
第21页
Local Secondary Index (LSI)
alternate range key + same hash key
index and table data is co-located (same partition)
10 GB max per hash key, i.e. LSIs limit the # of range keys!
第22页
Global Secondary Index
any attribute indexed as new hash and/or range key
RCUs/WCUs provisioned separately for GSIs
Online indexing
第23页
LSI or GSI?
LSI can be modeled as a GSI
If data size in an item collection > 10 GB, use GSI
If eventual consistency is okay for your scenario, use GSI!
第24页
CreateTable
UpdateTable
DeleteTable
DescribeTable
ListTables
PutItem
UpdateItem
DeleteItem
BatchWriteItem
GetItem
Query
Scan
BatchGetItem
ListStreams
DescribeStream
GetShardIterator
GetRecords
Table API
Item API
New
DynamoDB API
Stream API
第25页
DynamoDB Streams and AWS Lambda
第26页
Emerging Architecture Pattern
第27页
Throughput
Provisioned at the table level
Write capacity units (WCUs) are measured in 1 KB per second
Read capacity units (RCUs) are measured in 4 KB per second
RCUs measure strongly consistent reads
Eventually consistent reads cost 1/2 of consistent reads
Read and write throughput limits are independent
第28页
Partitioning example
Table size = 8 GB, RCUs = 5000, WCUs = 500
# of partitions (IO capacity) = 5000/3000 RCU + 500/1000 WCU = 2.17
# of partitions (storage) = 8/10 GB = 0.8
# of partitions = ceiling(max(2.17, 0.8)) = 3
第29页
Typical Workloads and Use-Cases
第30页
DynamoDB table examples
第31页
Typical Workloads
Ad-tech
IoT
Gaming
Web Analytics
Mobile Applications
Large Scale Websites
…And much more!
第32页
Demo
第33页