第1页
Blink:阿里新一代实时计算引擎
马国维 2017.4
第3页
Who am I?
2010 – 2017 Alibaba Search
• iStream • Blink
2007 – 2010 Baidu Web Search
第4页
Outline
1 The Streaming Architecture 2 What is Flink 3 What is Blink 4 Future Plans
第5页
The streaming architecture
Part I
第6页
What is streaming?
Your code
Your code
Your code
Your code
What is streaming?
• Unbounded data
What is streaming process engine?
• The data process engine that is designed with infinite data set in mind
第7页
What is stateful streaming
Your code
state
Computation and state
• E.g., counters, windows of past events, state machines, trained ML models
Result depends on history of stream
Stateful stream processor gives the tools to manage state
• Recover, roll back, version, upgrade, etc
第8页
What is event-time streaming
t4 t3 t1 t2
t1-t2
t3-t4
Your code
state
Data records associated with timestamps (time series data)
Processing depends on timestamps
Event-time stream processor gives you the tools to reason about time
• E.g., handle streams that are out of order • Core feature is watermarks – a clock to measure
event time
第9页
Streaming Subsumes Batch
Stream (low latency)
partition partition
…2016-3-1
2016-3-1
2016-3-1
2016-3-11
2016-3-11
2016-3-12
2016-3-12
2016-3-12
2016-3-12
12:00 am
1:00 am
2:00 am
10:00pm
11:00pm
12:00am
1:00am
2:00am
3:00am
Stream (high latency)
Batch (bounded stream)
第10页
What is Flink?
Part 2
第11页
Flink - Streaming Compute Engine Latency down to the milliseconds
Latency
Volume/ Throughput
State & Accuracy
10s of millions evts/sec
Exactly-once semantics
for stateful applications
Event time processing
http://flink.apache.org
第12页
Flink – Unified Compute Engine
Long batch pipelines
Machine Learning at scale
Streaming topologies
resource utilization
Low latency
Flink
iterative algorithms Graph Analysis
Mutable state
第13页
Flink Ecosystem
Gelly (Graph Processing)
Table (Relational)
FLinkML (Machine Learning)
CEP (Event Processing)
Table (Relational)
APIs & LIBRARIES
STORAGE DEPLOY Runtime
DataStream (Java/Scala) Stream Processing
DataSet (Java/Scala) Batch Processing
Runtime - Distributed Streaming Dataflow
Local Single JVM
Files Local, HDFS, …
Cluster Standalone, YARN, Mesos, …
Databases Cassandra, HBase, …
Cloud Google’s GCE, Amazon’s EC2, …
Streams Flume, Kafka, …
第14页
Checkpoint/Recovery
• Chandy-Lamport algorithm
• Periodic asynchronous consistent snapshots of application state
• Provide exactly-once state guarantees under failures
9/2/2016
datastream
stream_barriers.svg
newerrecords
olderrecords
checkpoint barriern
checkpoint barriern1
streamrecord (event)
partof checkpointn+1
partof checkpointn
partof checkpointn1
第15页
Stateful Steam Processing
Scalable embedded state
Access at memory speed & scales with parallel operators
第16页
What is Blink?
Part 3
第17页
Blink – Alibaba’s version of Flink
Looked into Flink two years ago
• best choice of unified computing engine
• a few of issues in flink that can be problems for large scale applications
Started Blink project
• aimed to make Flink work reliably and efficiently at the very large scale at Alibaba
Made various improvements in Flink runtime
• native run on yarn cluster • failover optimizations for fast recovery • incremental checkpoint for super large state • async operator for high throughputs
Working with Flink community to contribute back since last August
• several key designs • hundreds of patches
第18页
Blink in Alibaba Production
In production for almost one year More than 3000 nodes are running Blink The largest Blink cluster is more than 1000 nodes There are hundreds of production jobs supported by Blink Supported key online Service on last Nov 11th
• The largest Blink job has 5000 concurrent, 10TB state, billions of QPS • Based on the Blink machine learning platform to significantly increase the transaction
conversion
第19页
Blink Ecosystem in Alibaba
Alibaba Apps
Search
Recommendation
Ads
Ant
BI Security
Blink Hadoop
Machine Learning Platform (Porsche)
Table API
SQL
DataStream API
Runtime Engine
DataSet API
YARN (Resource Management)
HDFS (Persistent Storage)
第20页
Use Case — Search Index Build & Update
第21页
Use Case — Realtime A/B Test
第22页
Use Case — Online Machine Learning
第23页
Blink Architecture
Launch AM
YARN (Resource Management)
Request TM
Launch TM
Submit Job
Flink Client
YARN App Master
Task Manager
Blink Runtime
Debug
Web Monitor
Resource Manager
Tasks
Connectors
Read/Write
Alibaba Data Lake
Job Manager
Checkpoint Coordination
Task Scheduling
Rocksdb State Backend
Metric Reporter
Metrics
Alibaba Monitor System
Checkpoint Incrementally
HDFS (Persistent Storage)
Apache Flink
Alibaba Blink
第24页
Improvements to Flink Runtime
Native integration with Resource Managment for dynamic resource allacation and more larger scale
Performance Improvements
• Incremental Checkpoint • Asynchronous Operator
Failover Optimization
• Fine-grained Recovery for Task Failures • Allocation Reuse for Task Recovery • Non-disruptive JobManager Failures via Reconciliation
第25页
Future Plans
Section 4
第26页
Future Plans
Blink is already popular in the streaming scenarios
• more and more streaming applications will run on blink
Make batch applications run on production
• increase the resource utilization of the clusters
Blink as Service
• Alibaba Group Wide
Cluster is growing very fast
• cluster size will double • thousands of jobs run on production
第27页
THANKS
--------- Q&A Section --------