用更好的工具改善Linux开发[Improving Linux Development with better tools] by Andi Kleen@Intel

AirJD

没有录音文件

00:00/00:00

加收藏

用更好的工具改善Linux开发[Improving Linux Development with better tools] by Andi Kleen@Intel

发布者 linux

发布于 1432254368445 浏览 7153 关键词 Linux

分享到

第1页

Improving Linux Development with better tools

Andi Kleen

Oct 2013 Intel Corporation ak@linux.intel.com

第2页

Linux complexity growing

M-LOC

16.5 16

15.5 15

14.5 14

13.5 V3.6

Source lines in Linux kernel

All source code

V3.7

V3.8

V3.9

Kernel version

V3.10

Linux kernel source lines IO

net/ fs/ block/

2.5 2

1.5 1

0.5 0 V2.6.16V2.6.32 V3.6 V3.7 V3.8

Kernel version

V3.9 V3.10 V3.11

M-LOC

V3.11

Source lines Linux Kernel core

kernel/ lib

0.35

0.3

0.25

0.2

0.15

0.1

0.05

V2.6.32

V2.6.16

V3.6

V3.7 V3.8

Kernel

V3.9

V3.11

V3.10

M-LOC

第3页

Do we have a problem?

● If we assume number of bugs stays constant per line there would be more and more bugs

● If we assume programmers don't get cleverer some code may become too complex to change/debug

● Of course modularity saves us to some degree

第4页

Or we can use better tools to find bugs

● Static code checker tools ● Dynamic runtime checkers ● Fuzzers/test suites ● Debuggers/Tracers to understand code ● Tools to read/understand source

第5页

Static checkers

● sparse, smatch, coccinelle, clang checker, checkpatch, gcc -W/LTO, stanse

● Can check a lot of things, simple mistakes, complex problems

● Generic C and kernel specific rules

第6页

Static checker challenges

● Some are very slow ● False positives

– Often only can do new warnings – Otherwise too many false positives

● May need concentrated effort to get false positives down

– Only done for gcc/sparse so far – Needs both changes to Linux and to checkers

第7页

Study bug fixes

● “At least 14.8%∼24.4% of the sampled bug fixes are incorrect. Moreover, 43% of the incorrect fixes resulted in severe bugs that caused crash, hang, data corruption or security problems.”

● “How do fixes become bugs” Yin/Yuan et.al. ● http://opera.ucsd.edu/~zyin2/fse11.pdf ● Great paper, every kernel programmer should read it

● Can new rules for static checkers help?

第8页

Cocinelle example

/// Find &&/|| operations that include the same argument more than once //# A common source of false positives is when the argument performs a side //# effect. @r expression@ expression E; position p; @@ ( * E@p

|| ... || E | * E@p

&& ... && E ) @script:python depends on org@ p << r.p; @@ cocci.print_main("duplicated argument to && or ||",p)

第9页

Challenge: global checks

● No static checker I found can follow indirect calls (“OO in C”, common in kernel)

struct foo_ops { int (*do_foo)(struct foo *obj);

} foo->do_foo(foo);

● Can be done by using type information ● Misses a lot of potential bugs

第10页

Lock ordering: lockdep

● Deadlock from lock ordering (“ABBA” bugs) used to be common

● Lockdep basically eliminated this problem ● Checks lock ordering, interrupt

第11页

Kmemcheck / AddressSanitizer

● Check uninitialized/freed/out of bounds data ● Kmemcheck based on page faults

– Quite slow

● AddressSanitizer seems to be a better alternative

– Compiler instrumentation, much faster – Still need port to kernel (some reports already)

第12页

Thread checkers

● Find data races:

– Shared data accesses not protected by locks

● User space: helgrind, ThreadSanitizer, .. ● Problem: kernel does not mark lock less accesses.

Solvable?

User: __atomic_write(&foo, 1, __ATOMIC...); – Kernel: Foo = 1; mb();

●

–

第13页

Undefined behavior checker

● UBSan. New gcc/LLVM feature ● Checks undefined C behavior at runtime

– e.g. x << 100, signed integer overflows, …

● Needs special runtime library ● Would need to be ported to kernel

第14页

Fuzzers

● Trinity is a great tool

– Finds many bugs

● Needs manual model for each syscall

How do we cover all the ioctls/sys/proc files?

● Modern fuzzers around using automatic feedback

– But not for kernel yet

– http://taviso.decsystem.org/making_software_dumber.pdf

●

第15页

The biggest challenge

● How to run all these tools on every new patch:

– Cannot ask every developer to use all of them

● Static checkers are relatively easy

– But can we get beyond just deltas for new code?

● But how to run the dynamic tools?

第16页

Test suites

● Ideally all kernel code would come with a test suite

– Then someone could run all the dynamic checkers

● Difficult for hardware drivers ● LKP, kernel unit tests, tools/* limited ● Need a real unit testing framework

第17页

Coverage

● Kernel gcov can be used to test coverage of test suites

● Should be used much more widely

第18页

Tracers

● Long beyond “real men don't use debuggers”

– Linux has good debuggers these days (kgdb etc.)

● But how to debug hard to reproduce bugs

– Ideal enough information to debug on first trigger

● Tracing:

– Low overhead instrumentation – When problem triggers dump data

第19页

ftrace: function tracer

• Trace all functions in the kernel for PID

# trace-cmd record -p function -e sched_switch -P $(pidof firefox-bin)

plugin function

disable all enable sched_switch path = /sys/kernel/debug/tracing/events/sched_switch/enable path = /sys/kernel/debug/tracing/events/*/sched_switch/enable

All kernel functions executed

path = /sys/kernel/debug/tracing/events/sched_switch/enable

path = /sys/kernel/debug/tracing/events/*/sched_switch/enable

Hit Ctrl^C to stop recording

….

# trace-cmd report

…

firefox-bin-13822 [002] 36628.537061: function:

sys_poll

firefox-bin-13822 [002] 36628.537062: function:

poll_select_set_timeout

firefox-bin-13822 [002] 36628.537062: function:

ktime_get_ts

firefox-bin-13822 [002] 36628.537062: function:

timekeeping_get_ns

firefox-bin-13822 [002] 36628.537063: function:

set_normalized_timespec

firefox-bin-13822 [002] 36628.537063: function:

timespec_add_safe

firefox-bin-13822 [002] 36628.537063: function:

set_normalized_timespec

firefox-bin-13822 [002] 36628.537064: function:

do_sys_poll

firefox-bin-13822 [002] 36628.537064: function:

copy_from_user

firefox-bin-13822 [002] 36628.537065: function:

might_fault

firefox-bin-13822 [002] 36628.537065: function:

_cond_resched

firefox-bin-13822 [002] 36628.537065: function:

should_resched

firefox-bin-13822 [002] 36628.537065: function:

need_resched

firefox-bin-13822 [002] 36628.537066: function:

test_ti_thread_flag

…

第20页

ftrace

● Can dump on events / oops / custom triggers ● But still too much overhead in many cases to

run always during debug

第21页

Intel PT

● Upcoming Intel CPU feature ● Traces all branches with low overhead ● Will be supported in perf and gdb and with

“FlightRecorder”

第22页

Biggest challenge is better tools to understand traces (too much data)

第23页

Understanding source code

● Often biggest problem finding code ● grep/cscope work great for many cases ● But do not understand indirect pointers (OO in C model

used in kernel): Give me all “do_foo” instances

struct foo_ops { int (*do_foo)(struct foo *obj);

} = { .do_foo = my_foo }; foo->do_foo(foo)

● Would be great to have a cscope like tool that understands this based on types/initializers

第24页

Conclusion

● Linux has a lot of great tools for making kernel development easier

● We need them to control complexity ● But still many improvements possible

● Questions?