第1页
Improving Linux Development with better tools
Andi Kleen
Oct 2013 Intel Corporation ak@linux.intel.com
第2页
Linux complexity growing
M-LOC
16.5 16
15.5 15
14.5 14
13.5 V3.6
Source lines in Linux kernel
All source code
V3.7
V3.8
V3.9
Kernel version
V3.10
Linux kernel source lines IO
net/ fs/ block/
2.5 2
1.5 1
0.5 0 V2.6.16V2.6.32 V3.6 V3.7 V3.8
Kernel version
V3.9 V3.10 V3.11
M-LOC
V3.11
Source lines Linux Kernel core
kernel/ lib
0.35
0.3
0.25
0.2
0.15
0.1
0.05
V2.6.32
V2.6.16
V3.6
V3.7 V3.8
Kernel
V3.9
V3.11
V3.10
M-LOC
第3页
Do we have a problem?
● If we assume number of bugs stays constant per line there would be more and more bugs
● If we assume programmers don't get cleverer some code may become too complex to change/debug
● Of course modularity saves us to some degree
第4页
Or we can use better tools to find bugs
● Static code checker tools ● Dynamic runtime checkers ● Fuzzers/test suites ● Debuggers/Tracers to understand code ● Tools to read/understand source
第5页
Static checkers
● sparse, smatch, coccinelle, clang checker, checkpatch, gcc -W/LTO, stanse
● Can check a lot of things, simple mistakes, complex problems
● Generic C and kernel specific rules
第6页
Static checker challenges
● Some are very slow ● False positives
– Often only can do new warnings – Otherwise too many false positives
● May need concentrated effort to get false positives down
– Only done for gcc/sparse so far – Needs both changes to Linux and to checkers
第7页
Study bug fixes
● “At least 14.8%∼24.4% of the sampled bug fixes are incorrect. Moreover, 43% of the incorrect fixes resulted in severe bugs that caused crash, hang, data corruption or security problems.”
● “How do fixes become bugs” Yin/Yuan et.al. ● http://opera.ucsd.edu/~zyin2/fse11.pdf ● Great paper, every kernel programmer should read it
● Can new rules for static checkers help?
第8页
Cocinelle example
/// Find &&/|| operations that include the same argument more than once //# A common source of false positives is when the argument performs a side //# effect. @r expression@ expression E; position p; @@ ( * E@p
|| ... || E | * E@p
&& ... && E ) @script:python depends on org@ p << r.p; @@ cocci.print_main("duplicated argument to && or ||",p)
第9页
Challenge: global checks
● No static checker I found can follow indirect calls (“OO in C”, common in kernel)
struct foo_ops { int (*do_foo)(struct foo *obj);
} foo->do_foo(foo);
● Can be done by using type information ● Misses a lot of potential bugs
第10页
Lock ordering: lockdep
● Deadlock from lock ordering (“ABBA” bugs) used to be common
● Lockdep basically eliminated this problem ● Checks lock ordering, interrupt
第11页
Kmemcheck / AddressSanitizer
● Check uninitialized/freed/out of bounds data ● Kmemcheck based on page faults
– Quite slow
● AddressSanitizer seems to be a better alternative
– Compiler instrumentation, much faster – Still need port to kernel (some reports already)
第12页
Thread checkers
● Find data races:
– Shared data accesses not protected by locks
● User space: helgrind, ThreadSanitizer, .. ● Problem: kernel does not mark lock less accesses.
Solvable?
User: __atomic_write(&foo, 1, __ATOMIC...); – Kernel: Foo = 1; mb();
●
–
第13页
Undefined behavior checker
● UBSan. New gcc/LLVM feature ● Checks undefined C behavior at runtime
– e.g. x << 100, signed integer overflows, …
● Needs special runtime library ● Would need to be ported to kernel
第14页
Fuzzers
● Trinity is a great tool
– Finds many bugs
● Needs manual model for each syscall
How do we cover all the ioctls/sys/proc files?
● Modern fuzzers around using automatic feedback
– But not for kernel yet
– http://taviso.decsystem.org/making_software_dumber.pdf
●
第15页
The biggest challenge
● How to run all these tools on every new patch:
– Cannot ask every developer to use all of them
● Static checkers are relatively easy
– But can we get beyond just deltas for new code?
● But how to run the dynamic tools?
第16页
Test suites
● Ideally all kernel code would come with a test suite
– Then someone could run all the dynamic checkers
● Difficult for hardware drivers ● LKP, kernel unit tests, tools/* limited ● Need a real unit testing framework
第17页
Coverage
● Kernel gcov can be used to test coverage of test suites
● Should be used much more widely
第18页
Tracers
● Long beyond “real men don't use debuggers”
– Linux has good debuggers these days (kgdb etc.)
● But how to debug hard to reproduce bugs
– Ideal enough information to debug on first trigger
● Tracing:
– Low overhead instrumentation – When problem triggers dump data
第19页
ftrace: function tracer
• Trace all functions in the kernel for PID
# trace-cmd record -p function -e sched_switch -P $(pidof firefox-bin)
plugin function
disable all enable sched_switch path = /sys/kernel/debug/tracing/events/sched_switch/enable path = /sys/kernel/debug/tracing/events/*/sched_switch/enable
All kernel functions executed
path = /sys/kernel/debug/tracing/events/sched_switch/enable
path = /sys/kernel/debug/tracing/events/*/sched_switch/enable
Hit Ctrl^C to stop recording
….
# trace-cmd report
…
firefox-bin-13822 [002] 36628.537061: function:
sys_poll
firefox-bin-13822 [002] 36628.537062: function:
poll_select_set_timeout
firefox-bin-13822 [002] 36628.537062: function:
ktime_get_ts
firefox-bin-13822 [002] 36628.537062: function:
timekeeping_get_ns
firefox-bin-13822 [002] 36628.537063: function:
set_normalized_timespec
firefox-bin-13822 [002] 36628.537063: function:
timespec_add_safe
firefox-bin-13822 [002] 36628.537063: function:
set_normalized_timespec
firefox-bin-13822 [002] 36628.537064: function:
do_sys_poll
firefox-bin-13822 [002] 36628.537064: function:
copy_from_user
firefox-bin-13822 [002] 36628.537065: function:
might_fault
firefox-bin-13822 [002] 36628.537065: function:
_cond_resched
firefox-bin-13822 [002] 36628.537065: function:
should_resched
firefox-bin-13822 [002] 36628.537065: function:
need_resched
firefox-bin-13822 [002] 36628.537066: function:
test_ti_thread_flag
…
第20页
ftrace
● Can dump on events / oops / custom triggers ● But still too much overhead in many cases to
run always during debug
第21页
Intel PT
● Upcoming Intel CPU feature ● Traces all branches with low overhead ● Will be supported in perf and gdb and with
“FlightRecorder”
第22页
Biggest challenge is better tools to understand traces (too much data)
第23页
Understanding source code
● Often biggest problem finding code ● grep/cscope work great for many cases ● But do not understand indirect pointers (OO in C model
used in kernel): Give me all “do_foo” instances
struct foo_ops { int (*do_foo)(struct foo *obj);
} = { .do_foo = my_foo }; foo->do_foo(foo)
● Would be great to have a cscope like tool that understands this based on types/initializers
第24页
Conclusion
● Linux has a lot of great tools for making kernel development easier
● We need them to control complexity ● But still many improvements possible
● Questions?