第1页
Thread Mist
luikore
第2页
About Me
第3页
精通各种… 喜 各种hello world, 欢 … 彩虹小马和什么 GUI, 钻研各种… 虚拟机编译器和, 有空就写一个… 跳票多年的编辑器
@luikore
第4页
Odigo
https://www.odigo.travel
第5页
Threads
第6页
Theory
第7页
Practice
第8页
Threads are Hard
mutex futex condvar semaphore deadlock, livelock…
第9页
Questions
No "TRUE" threads in Ruby? GIL is bad?
第10页
Remove Threads?
第11页
Remove Threads?
Celluloid! Fiber! CSP! pi-calculus!
第12页
But…
第13页
Thread is what we have from OS or C
第14页
Threads are working great in…
第15页
And…
Thread Multiplexing Libraries
I18n.locale Mongoid query cache …
第16页
And…
System provides threads OS optimize threads over years Preemptive scheduling is a need
第17页
So let’s look deep down how threads schedule.
第18页
Glossary
CRuby = MRI yield GVL = GIL lock, mutex, spin lock condvar time slice
第19页
Threads Scheduling
第20页
Green Threads
Ruby < = 1.8 comes with green threads Blocking IO (read/write) blocks other threads
第21页
Native Threads
Introduced in Ruby 1.9 Very same implementation as green threads at first
第22页
Experiment
Let’s check a simple program
ruby -e 'gets'
第23页
Experiment
第24页
Timer Thread
There is one thread that not visible in Ruby program, which schedules other threads.
第25页
Why Timer?
Supervisor Ensure threads run "Fairly"
第26页
Timer Overview
第27页
We are Going Down…
第28页
When does timer initialized?
第29页
What does timer thread actually do?
static void timer_thread_function(void *arg) {
rb_vm_t *vm = GET_VM(); native_mutex_lock(&vm->thread_destruct_lock); if (vm->running_thread)
ATOMIC_OR(vm->running_thread->interrupt_flag, TIMER_INTERRUPT_MAS native_mutex_unlock(&vm->thread_destruct_lock); }
第30页
What other thread does in safe-points?
if ((th)->interrupt_flag & ~(th)->interrupt_mask) { rb_threadptr_execute_interrupts();
}
第31页
What are safe-points?
Returning from a function …
第32页
What is "execute interrupts"?
void rb_threadptr_execute_interrupts() {
timer_interrupt = interrupt & TIMER_INTERRUPT_MASK; ... if (timer_interrupt) {
... th->running_time_us += runnint_time_us; ... rb_thread_schedule_limits(limits_us); } }
第33页
What does "schedule limits" do?
static void rb_thread_schedule_limits(unsigned long limits_us) {
... if (th->running_time_us >= limits_us) {
RB_GC_SAVE_MACHINE_CONTEXT(th); gvl_yield(th->vm, th); // release and compete GVL } }
第34页
What does the GVL look like?
typedef struct rb_global_vm_lock_struct { /* fast path */ unsigned long acquired; rb_nativethread_lock_t lock;
/* slow path */ volatile unsigned long waiting; rb_nativethread_cond_t cond;
/* yield */ rb_nativethread_cond_t switch_cond; rb_nativethread_cond_t switch_wait_cond; int need_yield; int wait_yield; } rb_global_vm_lock_t;
第35页
What is gvl_yield?
native_mutex_lock(&vm->gvl.lock); gvl_release_common(vm); native_mutex_unlock(&vm->gvl.lock);
... sched_yield(); ...
native_mutex_lock(&vm->gvl.lock); native_cond_broadcast(&vm->gvl.switch_wait_cond); gvl_acquire_common(vm); native_mutex_unlock(&vm->gvl.lock);
第36页
What is gvl_release_common?
vm->gvl.acquired = 0; if (vm->gvl.waiting > 0)
native_cond_signal(&vm->gvl.cond);
第37页
What is gvl_acquire_common?
while(vm->gvl.acquired) { native_cond_wait(&vm->gvl.cond, &vm->gvl.lock);
}
第38页
What does sched_yield come from?
OS call to put current thread to lowest priority Think about Fiber.yield
第39页
Condvars
rb_nativethread_cond_t represent condition variables.
Condition variable can wait and release a mutex until some condition is met.
第40页
Condvars
Resource Consumer
# pseudo code lock(mutex) while cond
wait(condvar, mutex) end unlock(mutex)
第41页
Condvars
Resource Producer
pthread_cond_signal(condvar) pthread_cond_broadcast(condvar)
第42页
Conclusion
Threads are preemptive…right? Ruby threads are cooperative too…
第43页
What GIL Guarantees
Standalone C functions are "atomic" in MRI Easy for writing C extensions Release GIL when you know you are going into a "blocking region"
第44页
Blocking Region
VALUE rb_thread_io_blocking_region(rb_blocking_function_t *func, void *data1, {
... th->waiting_fd = fd; ... BLOCKING_REGION({
val = func(data1); saved_errno = errno; }, ubf_select, th, FALSE); }
第45页
Blocking Region (2)
What BLOCKING_REGION macro does
RB_GC_SAVE_MACHINE_CONTEXT(th); gvl_release(th->vm); ... // do your blocking work! gvl_acquire(th->vm, th);
第46页
A "blocking work" is a call that may sleep current thread and woke up later.
第47页
Blocking Calls
accept—accept an incoming connection read—read some data from a socket or file select—choose available file descriptors poll—simpler select epoll—inverse control, wake up by OS kevent—the kqueue version of epoll …many more
第48页
Blocking Calls
Costs a lot of CPU, or cost an unknown amount of time
第49页
Optimization
Regarding the threads with GIL
第50页
Parallelism?
Don’t Use Threads For Parallelism
It is slower …
a = 1..20 b = 1..20 asum, bsum = 0, 0 t = Thread.new{ asum = a.inject :+ } bsum = b.inject :+ t.join puts asum + bsum
第51页
Parallelism?
Just:
a.inject(:+) + b.inject(:+)
第52页
Choose # of Threads
Too many: time wasted in context switching Too few: users have to wait in a queue while your CPU io-waits
第53页
Reduce Thread Stack Size
less Middleware MVC
第54页
Manually Release GVL
Already used in zlib, so ruby multi-thread spiders can utilize up to 2 cores
gvl_release(th) ... // you know that code doesn't affect Ruby gvl_acquire(th)
第55页
When It’s not Ruby’s Fault…
第56页
IO: your Bottle Neck
100.times do Comment.create post: post, "rabbit #{rand}"
end
第57页
Batch: Reduce IO Latency
Simple way to batch SQL—transaction
transaction do 100.times do Comment.create post: post, "rabbit #{rand}" end
end
第58页
IO is Everywhere…
NUMA abstracts memory visits DMA reduces memory copying Sharing between APU and FPU Sharing between LLC and main memory Sharing between CPU and GPU …
第59页
Connection Pool Limit
Some long jobs took too long to finish, exhausting the connection pool, then blocked other threads…
第60页
Manually Release Connection
Release it manually before starting something that takes a long time:
# in rails action or sidekiq job ActiveRecord::Base.clear_active_connections! Net::HTTP.get 'http://example.com'
第61页
Removing GIL
JVM and Rubinius posses no GIL…How?
第62页
Fine Grained Locks
Problem: Very complex VM—50+ Locks!
第63页
Use HTM instead
Hardware transactional memory "Spinlock" Problem: different usage for different hardware platforms and compilers
第64页
Thread-safe Data Structures
Problem: slower single threaded code
第65页
Concurrent GC
GC must make sure mark/sweep Problem: slower single threaded code
第66页
Thread-safe Extensions
Simple solution: still acquire GVL when calling nonthread-safe cfunc
第67页
Long Long Way to Go
Removing threads is actually easier:
第68页
Ref
http://www.jstorimer.com/blogs/workingwithcode/8100871understands-the-gil-part-2-implementation http://www.cs.fsu.edu/~baker/realtime/restricted/notes/prodc