第1页
第2页
Strongly Typed Languages and Flexible Schemas
第3页
Agenda
Strongly Typed Languages
Flexible Schema Databases
Change Management
Strategies
Tradeoffs
第4页
Strongly Typed Languages
第5页
"A programming language that requires a variable to be defined as well as the variable it is"
Do not confuse strongly typed with statically typed languages because they tend to be different.
You can find different definitions out in the internet on what does this means and what's the different categorization aspects
第6页
Flexible Schema Databases
第7页
Traditional RDMS
create table users (id int, firstname text, lastname text);
Table definition
Column structure
Definition
第8页
Traditional RDMS
Table with checks
create table cat_pictures(
id int not null,
size int not null,
picture blob not null,
user_id int,
primary key (id),
foreign key (user_id) references users(id));
Null checks
Foreign and Primary key checks
Definition
第9页
Traditional RDMS
users
cat_pictures
1
N
Once you have this structure you can now start building up your application
第10页
Is this Flexible?
What happens when we need to change the schema?
Add new fields
Add new relations
Change data types
What happens when we need to scale out our data structure?
第11页
Flexible Schema Database
Document
Graph
Key Value
There are few examples of Flexible schema databases:
Document oriented databases
Graph databases
Key Value Stores
第12页
Flexible Schema
No mandatory schema definition
No structure restrictions
No schema validation process
No mandatory schema definiton
If a collection does not exist one will be created
If a database does not exist one will be created
No Structure Restrictions
No forced fields or data types
第13页
We start from code
public class CatPicture {
int size;
byte[] blob;
}
public class User {
int id;
String firstname;
String lastname;
CatPicture[] cat_pictures;
}
Definition
第14页
Document Structure
{
_id: 1234,
firstname: 'Juan',
lastname: 'Olivo',
cat_pictures: [ {
size: 10,
picture: BinData("0x133334299399299432"),
}
]
}
Rich Data Types
Embedded Documents
Definition
第15页
Flexible Schema Databases
Challenges
Different Versions of Documents
Different Structures of Documents
Different Value Types for Fields in Documents
第16页
Different Versions of Documents
Same document across time suffers changes on how it represents data
{ "_id" : 174, "firstname": "Juan" }
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" }
First Version
Second Version
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" , "cat_pictures":
[{"size": 10, picture: BinData("0x133334299399299432")}]
}
Third Version
第17页
Different Versions of Documents
Same document across time suffers changes on how it represents data
{ "_id" : 174, "firstname": "Juan" }
{ "_id" : 174, "name": { "first": "Juan", "last": "Olivo"} }
Different Structure
第18页
Different Structures of Documents
Different documents coexisting on the same collection
{ "_id" : 175, "brand": "Ford", "model": "Mustang", "date": ISODate("XXX") }
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" }
Within same collection
第19页
Different Data Types for Fields
Different documents coexisting on the same collection
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "bdate": 1224234312}
{ "_id" : 175, "firstname": "Paco", "lastname": "Hernan", "bdate": "2015-06-27"}
{ "_id" : 176, "firstname": "Tomas", "lastname": "Marce", "bdate": ISODate("2015-06-27")}
Same field, different data type
第20页
Change Management
第21页
Change Management
第22页
Strategies
第23页
Strategies
Decoupling Architectures
ODM'S
Versioning
Data Migrations
第24页
Decoupled Architectures
第25页
Strongly Coupled
第26页
Becomes a mess in your hair…
第27页
Coupled Architectures
Database
Application A
Application C
Application B
Let me perform some schema changes!
第28页
Decoupled Architecture
Database
Application A
API
Application C
Application B
第29页
Decoupled Architectures
Allows the business logic to evolve independently of the data layer
Decouples the underlying storage / persistency option from the business service
Changes are "requested" and not imposed across all applications
Better versioning control of each request and it's mapping
第30页
ODM's
第31页
ODM
Reduce impedance between code and Databases
Data management facilitator
Hides complexity of operators
Tries to decouple business complexity with "magic" recipes
第32页
Spring Data
POJO centric model
MongoTemplate || CrudRepository extensions to make the connection to the repositories
Uses annotations to override default field names and even data types (data type mapping)
public interface UserRepository extends MongoRepository<User, Integer>{
}
public class User {
@Id
int id;
@Field("first_name")
String firstname;
String lastname;
第33页
Spring Data Document Structure
{
"_id": 1,
"first_name": "first",
"lastname": "last",
"catpictures": [
{
"size": 10,
"blob": BinData(0, "Kr3AqmvV1R9TJQ==")
},
]
}
Definition
第34页
Spring Data Considerations
Data formats, versions and types still need to be managed
Does not solve issues like type validation out-of-box
Can make things more complicated but more "controllable"
@Field("first_name")
String firstname;
第35页
Morphia
Data source centric
Will do all the discovery of POJO's for given package
Also uses annotations to perform overrides and deal with object mapping
@Entity("users")
public class User {
@Id
int id;
String firstname;
String lastname;
morphia.mapPackage("examples.odms.morphia.pojos");
Datastore datastore = morphia.createDatastore(new MongoClient(), "morphia_example");
datastore.save(user);
第36页
Morphia Document Structure
{
"_id": 1,
"className": "examples.odms.morphia.pojos.User",
"firstname": "first",
"lastname": "last",
"catpictures": [
{
"size": 10,
"blob": BinData(0, "Kr3AqmvV1R9TJQ==")
},
]
}
Class Definition
Definition
第37页
Morphia Considerations
Enables better control at Class loading
Also facilitates, like Spring Data, the field overriding (tags to define field keys)
Better support for Object Polymorphism
第38页
Versioning
第39页
Versioning
Versioning of data structures (specially documents) can be very helpful
You must correctly generate the new version number in a multithreaded system
You must return only the current version of each document when there is a query
You must "update" correctly by including all current attributes in addition to newly provided attributes
If the system fails at any point, you must either have a consistent state of the data, or it must be possible on re-start to infer the state of the data and clean it up, or otherwise bring it to consistent state.
第40页
Versioning – Option 0
Change existing document each time there is a write with monotonically increasing version number inside
{ "_id" : 174, "v" : 1, "firstname": "Juan" }
{ "_id" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }
{ "_id" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
> db.users.update( {"_id":174 } , { {"$set" :{ ... }, {"$inc": { "v": 1 }} } )
Increment field value
第41页
Versioning – Option 1
Store full document each time there is a write with monotonically increasing version number inside
{ "docId" : 174, "v" : 1, "firstname": "Juan" }
{ "docId" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }
{ "docId" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
> db.users.insert( {"docId":174 …})
> db.docs.find({"docId":174}).sort({"v":-1}).limit(-1);
Find always latest version
第42页
Versioning – Option 2
Store all document versions inside a single document.
> db.users.update( {"_id": 174 } , { {"$set" :{ "current": ... },
{"$inc": { "current.v": 1 }}, {"$addToSet": {"prev": {... }}} } )
Current value
{ "_id" : 174, "current" : { "v" :3, "attr1": 184, "attr2" : "A-1" },
"prev" : [
{ "v" : 1, "attr1": 165 },
{ "v" : 2, "attr1": 165, "attr2": "A-1" }
]
}
Previous values
第43页
Versioning – Option 3
Keep collection for "current" version and past versions
> db.users.find( {"_id": 174 })
> db.users_past.find( {"pid": 174 })
{ "pid" : 174, "v" : 1, "firstname": "Juan" }
{ "pid" : 174, "v" : 2, "firstname": "Juan", "lastname": "Olivo" }
{ "_id" : 174, "v" : 3, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
Previous versions collection
Current collection
第44页
Versioning
第45页
Migrations
第46页
Migrations
Several types of "Migrations":
You must correctly generate the new version number in a multithreaded system
You must return only the current version of each document when there is a query
You must "update" correctly by including all current attributes in addition to newly provided attributes
If the system fails at any point, you must either have a consistent state of the data, or it must be possible on re-start to infer the state of the data and clean it up, or otherwise bring it to consistent state.
第47页
Add / Remove Fields
For Flexible Schema Database this is our Bread & Butter
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "gender": "M" }
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo", "newfield": "value" }
> db.users.update( {"_id": 174}, {"$set": { "newfield": "value" }, "$unset": {"gender":""} })
第48页
Change Field Names
Again, programmatically you can do it
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo",}
{ "_id" : 174, "first": "Juan", "last": "Olivo" }
> db.users.update( {"_id": 174}, {"$rename": { "firstname": "first", "lastname":"last"} })
第49页
Change Field Data Type
Align to a new code change and move from Int to String
{..."bdate": 1435394461522}
{..."bdate": "2015-06-27"}
1) Batch Process
2) Aggregation Framework
3) Change based on usage
第50页
Change Field Data Type
1) Batch Process – bulk api
public void migrateBulk(){
DateFormat df = new SimpleDateFormat("yyyy-MM-DD");
...
List<UpdateOneModel<Document>> toUpdate =
new ArrayList<UpdateOneModel<Document>>();
for (Document doc : coll.find()){
String dateAsString = df.format( new Date( doc.getInteger("bdate", 0) ));
Document filter = new Document("_id", doc.getInteger("_id"));
Document value = new Document("bdate", dateAsString);
Document update = new Document("$set", value);
toUpdate.add(new UpdateOneModel<Document>(filter, update));
}
coll.bulkWrite(toUpdate);
第51页
Change Field Data Type
1) Batch Process – bulk api
public void migrateBulk(){
...
for (Document doc : coll.find()){
...
}
coll.bulkWrite(toUpdate);
Is there any problem with this?
第52页
Change Field Data Type
1) Batch Process – bulk api
public void migrateBulk(){
...
//bson type 16 represents int32 data type
Document query = new Document("bdate", new Document("$type", "16"));
for (Document doc : coll.find(query)){
...
}
coll.bulkWrite(toUpdate);
More efficient filtering!
第53页
Extract Document into Collection
Normalize your schema
{"size": 10, picture: BinData("0x133334299399299432")}
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo",}
> db.users.aggregate( [
{$unwind: "$cat_pictures"},
{$project: { "_id":0, "uid":"$_id", "size": "$cat_pictures.size", "picture": "$cat_pictures.picture"}},
{$out:"cats"}])
{ "_id" : 174, "firstname": "Juan", "lastname": "Olivo" , "cat_pictures":
[{"size": 10, picture: BinData(0, "m/lhLlLmoNiUKQ==")}]
}
{"size": 10, "picture": BinData(0, "m/lhLlLmoNiUKQ==")}
第54页
Tradeoffs
第55页
Tradeoffs
第56页
Recap
第57页
Recap
Flexible and Dynamic Schemas are a great tool
Use them wisely
Make sure you understand the tradeoffs
Make sure you understand the different strategies and options
Works well with Strongly Typed Languages
第58页
Free Education
https://university.mongodb.com/courses/M101J/about
Next Session starting on Aug 04
第59页
Obrigado!
Norberto Leite
Technical Evangelist
http://www.mongodb.com/norberto
norberto@mongodb.com
@nleite
第60页