第1页
VFS hot tracking Development Overview
Zhi Yong Wu – wuzhy@linux.vnet.ibm.com Kernel Team IBM Linux Technology Center October 19, 2013
Linux is a registered trademark of Linus Torvalds.
第2页
● Background ● Methodology ● Internals ● Performance ● Status & Next ● References
Agenda
第3页
Background
● Problem:
– SSD disks has high IOPS(Input/Output Per Second) and low capability, while traditional disks has opposite peculiarities.
– Some data are highly accessed, while some are rarely.
● Vision:
– Place hot data on fast disks as far as possible.
– Defrag hot files as first as possible
● Proposal:
– Trace and detect hot data on the filesystem – Relocate hot data to fast disks – Defrag the files based on hot information
第4页
How to track?
● Track real disk I/O access, not I/O hit in page cache
● Track accessed inodes and its ranges whose granularity is 1 MegaByte
● The key is
– ino → inode – offset → range
● Track each read/write on I/O path, including buffered and direct mode
第5页
How to find hot spots?
● Each hot item is stored in
– Inserted into rb_tree – Linked to hot map list
● Each hot item is indexed in two ways
– One by ino or offset in rb_tree, used to quickly update data access frequency
– One by temperature in hot map array, used to quickly lookup hot spots
● One delayed worker is queued periodically to update the temperature of each hot item, and move it to irresponding hot map list based on its temperature.
第6页
Data Structures
第7页
Structure Relationship
第8页
Record I/O access info
Start
Create hot_inode_item
Check if VFS hot tracking is enabled
Lookup hot_inode_item by its ino
Check if hot_inode_item has existed
Update hot_inode_item
Lookup hot_range_item by its offset
Check if hot_range_item has existed
Update hot_range_item
Check if length is reached
Create hot_range_item
End
第9页
Update hot map periodically
Start
Walk through the rb_tree of hot_inode_item
Check if the end of the rb_tree of hot_inode_item has been reached
Requeue the worker
Calculate its temperature of hot_inode_item
Walk through the rb_tree of hot_range_item
Check if the end of the rb_tree of hot_range_item has been reached
Calculate its temperature of hot_range_item
End
Check if the temperature is changed
Move hot_inode_item to the corresponding hot map
Check if the temperature is changed
Move hot_range_item to the corresponding hot map
第10页
Curtail cache by shrinker
End
Start
Walk through hot_inode_item map array from low temperature
to high temperature
Check if the end of the map array of hot_inode_item has been reached
Check if hot_inode_item is only referenced by one user
Get curent occupied resource amount
Check if hot map list is NULL
Walk through hot_inode_item map list
Check if the end of hot map list Of hot_inode_item has been reached
Kill hot_*_item
Get current occupied resource amount and calculate the delta
Check if the resource delta is less than ZERO
End
第11页
Control cache by proc interface
● Approximately same as the shrinker
● Invoke the public inteface
– static unsigned long hot_item_evict(struct hot_info *root, unsigned long work, unsigned long (*work_get)(struct hot_info *root))
● Only passed work_get() is different
第12页
FFSB - large_file_create
第13页
FFSB - large_file_seq_read
第14页
FFSB - random_write
第15页
FFSB - random_read
第16页
FFSB - mail_server
第17页
Status & Next
● The patchset is currently reviewed by VFS maintainer
– Focus on performance
● Latest patchset
– git://github.com/wuzhy/kernel.git hot_tracking
● The debugfs support
– Lively dump hot information
● Btrfs hot relocation support
– git://github.com/wuzhy/kernel.git hot_reloc
● XFS/BTRFS Defragment improvement
– Based on hot information
第18页
References
● The lwn editor's article
– http://lwn.net/Articles/525651/
● An intro in-kernel document
– Documentation/filesystems/hot_tracking.txt
● Mingming Cao's slides
– http://www.linuxplumbersconf.org/2010/ocw/ system/presentations/219/original/hot_cold _data_LPC.pdf
第19页
Thank you!
Questions?