作者介紹:
陳松,畢業(yè)于天津大學(xué)精密儀器專(zhuān)業(yè)。先后在索尼愛(ài)立信,意法愛(ài)立信和華為技術(shù)有限公司任軟件工程師,軟件架構(gòu)師等職務(wù),主要負(fù)責(zé)手機(jī)內(nèi)核和應(yīng)用的開(kāi)發(fā)工作。2019年加入麒麟,現(xiàn)任裝備事業(yè)部軟件工程師,主要負(fù)責(zé)實(shí)時(shí)內(nèi)核研發(fā)和新技術(shù)預(yù)研等工作。
1、objtrace和Ftrace的關(guān)系
我們?cè)趦?nèi)核代碼調(diào)試過(guò)程中,經(jīng)常會(huì)遇到需要跟蹤函數(shù)參數(shù)值變化的情況,objtrace是一個(gè)非常有創(chuàng)造性的工具,它可以將我們從笨重的調(diào)試工作中解脫出來(lái),讓我們能夠很簡(jiǎn)潔,直觀而且優(yōu)雅的找到我們期望在代碼中發(fā)現(xiàn)的內(nèi)容。
Kprobe和function tracer共建于ftrace的基礎(chǔ)設(shè)施,objtrace將二者巧妙的結(jié)合起來(lái),在開(kāi)發(fā)者設(shè)置了kprobe的時(shí)候,objtrace打開(kāi)了function tracer,并注冊(cè)了回調(diào)函數(shù)。
當(dāng)代碼運(yùn)行到kprobe關(guān)聯(lián)的函數(shù)的時(shí)候,objtrace將開(kāi)發(fā)者所設(shè)定的參數(shù)的地址,偏移量,大小等信息保存起來(lái)。
之后,當(dāng)內(nèi)核代碼運(yùn)行到任何一個(gè)function tracer可以跟蹤的函數(shù)的時(shí)候,objtrace都會(huì)將該函數(shù)的參數(shù)依次取出來(lái),與已經(jīng)存儲(chǔ)的,需要跟蹤的參數(shù)的地址進(jìn)行比對(duì),如果發(fā)現(xiàn)符合條件的話,就將參數(shù)的值打印到ftrace中。
2、如何獲取objtrace源代碼
Objtrace的源碼最早由閱碼場(chǎng)資深用戶JeffXie提交給內(nèi)核社區(qū),大家可以在github上找到,大家可以下載,編譯,試用,很便捷。目前objtrace應(yīng)該已經(jīng)到了v12,大家找到v12的分支即可。作者之前一直在v10上調(diào)試,下載代碼的命令:git clone -b objtrace-v10 https://github.com/x-lugoo/linux.git
3、功能描述
我們?cè)陂_(kāi)發(fā)和調(diào)試代碼的過(guò)程中,經(jīng)常會(huì)遇到一個(gè)場(chǎng)景,變量在函數(shù)之間傳遞的過(guò)程中,我們需要追蹤這個(gè)變量的值變化的過(guò)程,比如下面這個(gè)例子中的bio:
bio_add_page(struct bio *bio ...) -->
__bio_try_merge_page(bio ...) -->
bio_flagged(bio...)
__bio_add_page(bio...)
...
通常,我們會(huì)增加很多printk,然后編譯安裝內(nèi)核,重啟。如果有的函數(shù)分支忽略掉的話,還需要再增加printk,再次編譯安裝內(nèi)核和重啟,是不是很笨重?
內(nèi)核中的kprobe提供了可以不用修改代碼而直接查看函數(shù)參數(shù)值的方法:
cd /sys/kernel/debug/tracing
echo 'p bio_add_page arg1=$arg1 arg2=$arg2' > ./kprobe_events
echo 1 > events/kprobes/p_bio_add_page_0/enable
cat your_file > /dev/null
cat trace
jbd2/sda5-8-220 [000] ...1. 55531.677782: p_bio_add_page_0: (bio_add_page+0x0/0x90) arg1=0xffff8ecac98aba00 arg2=0xfffff58a0122a400
jbd2/sda5-8-220 [000] ...1. 55531.677786: p_bio_add_page_0: (bio_add_page+0x0/0x90) arg1=0xffff8ecac98abc00 arg2=0xfffff58a04141140
jbd2/sda5-8-220 [000] ...1. 55531.677790: p_bio_add_page_0: (bio_add_page+0x0/0x90) arg1=0xffff8ecac98ab100 arg2=0xfffff58a02e9f880
jbd2/sda5-8-220 [000] ...1. 55531.677794: p_bio_add_page_0: (bio_add_page+0x0/0x90) arg1=0xffff8ecac98ab300 arg2=0xfffff58a041d1b00
jbd2/sda5-8-220 [000] ...1. 55531.677798: p_bio_add_page_0: (bio_add_page+0x0/0x90) arg1=0xffff8ecac98ab600 arg2=0xfffff58a04141280
...
Kprobe利用了ftrace中function tracer的架構(gòu),將代碼段中__fentry__(或__mcount)替換為kprobe_ftrace_handler,kprobe_ftrace_handler將函數(shù)名稱(chēng)和參數(shù)值打印到了ftrace的ringbuffer之中。
但還有一個(gè)問(wèn)題,kprobe只能追蹤變量在bio_add_page內(nèi)的值,不能跟蹤變量在其之后的變化。
國(guó)內(nèi)社區(qū)的jeff老師近期開(kāi)發(fā)了objtrace可以實(shí)現(xiàn)變量持續(xù)跟蹤的功能,objtrace結(jié)合使用了kprobe和function tracer的功能,持續(xù)跟蹤一個(gè)或多個(gè)變量,同時(shí)還可以設(shè)置偏移量,獲取變量結(jié)構(gòu)內(nèi)部元素的值。
目前objtrace正在評(píng)審中,很快就會(huì)進(jìn)入到內(nèi)核代碼樹(shù)的主線。
4、使用方法
1)進(jìn)入debugfs的目錄:
cd /sys/kernel/debug/tracing
2)設(shè)置kprobe
echo 'p bio_add_page arg1=$arg1 arg2=$arg2' > ./kprobe_events
查看kprobe:
ls events/kprobes/ -al
里面增加了p_bio_add_page_0
cat events/kprobes/p_bio_add_page_0/trigger
# Available triggers:
# traceon traceoff snapshot stacktrace enable_event disable_event enable_hist disable_hist hist objtrace
可以看到除了原有的traceon traceoff snapshot stacktrace之外,增加了一個(gè)新的觸發(fā)器objtrace
3)設(shè)置objtrace
echo 'objtracearg1,0x285 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger
表示在kprobe:bio_add_page需要跟蹤第一個(gè)參數(shù),也就是結(jié)構(gòu)bio偏移量為0x28的變量的值。
cat events/kprobes/p_bio_add_page_0/trigger
objtracearg1,0x28count=0 if comm == "cat"
4)觸發(fā)
cat your_file > /dev/null //讀取一個(gè)大文件,觸發(fā)bio_add_page,也就是內(nèi)核需要調(diào)用到bio_add_page函數(shù)
5)查看日志
cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 309/401 #P:4
#
# _-----=> irqs-off/BH-disabled
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / _-=> migrate-disable
# |||| / delay
# TASK-PID CPU# ||||| TIMESTAMP FUNCTION
# | | | ||||| | |
kworker/3:1H-79 [003] ...1. 248.076143: wbt_issue <-__rq_qos_issue object:0xffffa0c886cc8700 value:0x1000
kworker/3:1H-79 [003] ...1. 248.076143: blk_add_timer <-blk_mq_start_request object:0xffffa0c886cc8700 value:0x1000
##### CPU 1 buffer started ####
-0 [001] .Ns2. 248.080898: bio_endio <-blk_update_request object:0xffffa0c886cc8700 value:0x0
-0 [001] .Ns2. 248.080900: __rq_qos_done_bio <-bio_endio object:0xffffa0c886cc8700 value:0x0
-0 [001] .Ns2. 248.080900: bio_uninit <-bio_endio object:0xffffa0c886cc8700 value:0x0
-0 [001] .Ns2. 248.080901: bio_uninit <-bio_endio object:0xffffa0c886cc8700 value:0x0
-0 [001] .Ns2. 248.080901: __rcu_read_lock <-bio_uninit object:0xffffa0c886cc8700 value:0x0
-0[001].Ns2.248.080901:__rcu_read_lock<-bio_uninit?object:0xffffa0c886cc8700?value:0x0
5、代碼分析
5.1初始化
初始化就是讓內(nèi)核的ftrace架構(gòu)知道有objtrace這樣的一個(gè)trigger(觸發(fā)器),當(dāng)開(kāi)發(fā)者創(chuàng)建一個(gè)kprobe的時(shí)候,在這個(gè)kprobe的trigger目錄下,會(huì)出現(xiàn)objtrace的選項(xiàng)。代碼流程如下:
start_kernel -->
trace_init -->
trace_event_init -->
event_trace_enable -->
register_trigger_cmds -->
register_trigger_object_cmd -- >
register_event_command(&trigger_object_cmd); 注冊(cè)trigger_object_cmd
trigger_object_cmd包含了event_command所需要的函數(shù),如event_object_trigger_parse就是工程是輸入objtrace:add...之后,將這個(gè)字符串解析的過(guò)程。
static struct event_command trigger_object_cmd = {
.name = "objtrace",
.trigger_type = ETT_TRACE_OBJECT,
.flags = EVENT_CMD_FL_NEEDS_REC,
.parse = event_object_trigger_parse,
.reg = register_trigger,
.unreg = unregister_trigger,
.get_trigger_ops = objecttrace_get_trigger_ops,
.set_filter = set_trigger_filter,
};
此外,start_kernel還有一個(gè)路徑初始化objtrace的另一部分
start_kernel -->
early_trace_init -->
tracer_alloc_buffers -->
allocate_objtrace_data
allocate_objtrace_data中申請(qǐng)并初始化了一個(gè)結(jié)構(gòu)體objtrace_data *obj_data,其中最重要的是obj_data->fops->func = trace_object_events_call,函數(shù)trace_object_events_call負(fù)責(zé)跟蹤參數(shù)值并打印到ftrace的ring buffer中,可以說(shuō)trace_object_events_call是objtrace中最重要的一個(gè)函數(shù)。
注意,此時(shí)trace_object_events_call只是放到了obj_data->fops->func中,并沒(méi)有開(kāi)始使用。
5.2設(shè)置 “echo 'objtracearg1,0x285 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger”
當(dāng)開(kāi)發(fā)者輸入這個(gè)命令的時(shí)候,系統(tǒng)需要將 'objtracearg1,0x285 if comm == "cat"' 解析出來(lái),并調(diào)用event_trigger_register注冊(cè)到ftrace中,這樣當(dāng)內(nèi)核走到bio_add_page這個(gè)函數(shù)的時(shí)候,就會(huì)觸發(fā)kprobe。函數(shù)調(diào)用過(guò)程如下:
我在event_trigger_set_filter中增加了一個(gè)WARN_ONCE,可以看到調(diào)用棧,同時(shí)在event_object_trigger_parse中增加了一個(gè)printk,看到一些變量。通常我都是使用function_grath來(lái)看代碼調(diào)用的流程,但ftrace本身的函數(shù)不能使用ftrace來(lái)觀測(cè),這樣會(huì)造成遞歸調(diào)用,很不幸不是,只能使用這種笨辦法了:
[ 180.024670] chensong: event_object_trigger_parse add:arg1,0x285, objtrace_cmd:add
[ 180.024685] ------------[ cut here ]------------
[ 180.024688] chensong: set_trigger_filter, filter_str:if comm == "cat"
[ 180.024698] WARNING: CPU: 3 PID: 4332 at kernel/trace/trace_events_trigger.c:1056 set_trigger_filter+0x13d/0x170
...
[ 180.024844] CPU: 3 PID: 4332 Comm: bash Not tainted 5.18.0-rc2-objtrace-v10+ #2
[ 180.024848] Hardware name: LENOVO YangTianT4900v-00/, BIOS FCKT65AUS 01/12/2015
[ 180.024851] RIP: 0010:set_trigger_filter+0x13d/0x170
[ 180.024856] Code: 83 c4 18 44 89 e0 5b 41 5c 41 5d 5d c3 48 89 fa 48 c7 c6 e0 15 23 84 48 c7 c7 15 e4 5a 84 c6 05 27 84 94 01 01 e8 d2 f7 9c 00 <0f> 0b 48 8b 45 d0 e9 f6 fe ff ff e8 73 57 a3 00 48 8b 7b 20 41 bc
[ 180.024859] RSP: 0018:ffffa28003a77d60 EFLAGS: 00010286
...
[ 180.024879] CR2: 00007fbeb41fa020 CR3: 000000000a0f6003 CR4: 00000000001706e0
[ 180.024882] Call Trace:
[ 180.024884]
[ 180.024888] event_trigger_set_filter+0x22/0x30
[ 180.024894] event_object_trigger_parse.cold+0x313/0x345
[ 180.024904] trigger_process_regex+0xc1/0x110
[ 180.024908] event_trigger_write+0x70/0xd0
...
[ 180.024913] vfs_write+0xc0/0x2b0
[ 180.024920] ksys_write+0x67/0xe0
[ 180.024926] __x64_sys_write+0x1a/0x20
[ 180.024931] do_syscall_64+0x3b/0x90
[ 180.024937] entry_SYSCALL_64_after_hwframe+0x44/0xae
event_object_trigger_parse首先解析’objtracearg1,0x285’,并將結(jié)果保存到變量trigger_data中,最后調(diào)用event_trigger_register,過(guò)程如下:
static int
event_object_trigger_parse(struct event_command *cmd_ops,...
{
struct event_trigger_data *trigger_data;
struct objtrace_trigger_data *obj_data;
...
obj_data->field = field; --- (1)
obj_data->obj_offset = offset;
obj_data->obj_value_type_size = obj_value_type_size;
obj_data->tr = file->tr;
snprintf(obj_data->objtrace_cmd, OBJTRACE_CMD_LEN, objtrace_cmd);
trigger_data = event_trigger_alloc(cmd_ops, cmd, param, obj_data); ---(2)
...
ret = event_trigger_set_filter(cmd_ops, file, filter, trigger_data); ---(3)
if (ret < 0)
goto out_free;
ret = event_trigger_register(cmd_ops, file, glob, trigger_data); ---(4)
...
}
int event_trigger_register(struct event_command *cmd_ops,
{
return cmd_ops->reg(glob, trigger_data, file); ---(5)
}
int event_trigger_set_filter(struct event_command *cmd_ops,
{
if (param && cmd_ops->set_filter)
return cmd_ops->set_filter(param, trigger_data, file); ---(6)
return 0;
}
static struct event_command trigger_object_cmd = { --- (7)
.name = "objtrace",
.trigger_type = ETT_TRACE_OBJECT,
.flags = EVENT_CMD_FL_NEEDS_REC,
.parse = event_object_trigger_parse,
.reg = register_trigger,
.unreg = unregister_trigger,
.get_trigger_ops = objecttrace_get_trigger_ops,
.set_filter = set_trigger_filter,
};
-
(1)解析的結(jié)果放到objtrace_trigger_data中。
-
(2)同時(shí),將objtrace_trigger_data放置到event_trigger_data中。
-
(3)注冊(cè)filter,也就是’if comm == "cat"’。
-
(4)注冊(cè)trigger。
-
(5)event_trigger_register就是調(diào)用trigger_object_cmd中的.reg,也就是register_trigger。
-
(6)event_trigger_set_filter就是調(diào)用trigger_object_cmd中的.set_filter,也就是set_trigger_filter。
-
(7)trigger_object_cmd的定義。
objtrace還有一個(gè)重要的工作,就是要跟蹤變量的流向,這個(gè)就超出了kprobe的功能范圍,需要借助function tracer了,所以kprobe部分還有一個(gè)重要的工作,就是使能function tracer。我花了很多時(shí)間來(lái)找如何使能function tracer的,一直沒(méi)有找到。但如果你了解function tracer的話,會(huì)知道在使能function tracer,必然會(huì)調(diào)用ftrace_replace_code來(lái)替換代碼段,將__fentry__(或__mcount)替換為ftrace_call,那么我就在ftrace_replace_code中增加一個(gè)WARN,看一下調(diào)用棧:
增加調(diào)試代碼:
arch/x86/kernel/ftrace.c +194
ftrace_replace_code
+static int counter = 0;
void ftrace_replace_code(int enable)
{
rec = ftrace_rec_iter_record(iter);
+ WARN_ONCE(1, "chensong: ip:%pS ", rec->ip);
+ counter++;
+ if (counter%100 == 0)
+ printk("chensong:%s, ip:%pS ", __FUNCTION__, rec->ip);
輸入echo 'objtracearg1,0x285 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger后,得到調(diào)用棧
[ 168.674941] chensong: ip:__traceiter_initcall_level+0x0/0x40
[ 168.674954] WARNING: CPU: 3 PID: 3032 at arch/x86/kernel/ftrace.c:205 ftrace_replace_code+0x10b/0x1d0
...
[ 168.675108] CPU: 3 PID: 3032 Comm: bash Not tainted 5.18.0-rc2-objtrace-v10+ #6
[ 168.675112] Hardware name: LENOVO YangTianT4900v-00/, BIOS FCKT65AUS 01/12/2015
[ 168.675115] RIP: 0010:ftrace_replace_code+0x10b/0x1d0
[ 168.675121] Code: 89 fe 89 c7 e8 f7 97 b4 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 8b 30 48 c7 c7 78 c5 b8 a5 c6 05 f0 6e ab 01 01 e8 4f f3 b3 00 <0f> 0b eb 80 e8 bc d1 13 00 31 db 45 85 f6 49 c7 c6 30 32 49 a6 0f
[ 168.675124] RSP: 0018:ffffb8a90299fc78 EFLAGS: 00010286
[ 168.675128] RAX: 0000000000000000 RBX: ffffffffa6493230 RCX: 0000000000000000
...
[ 168.675149] Call Trace:
[ 168.675151]
[ 168.675156] ftrace_modify_all_code+0xcc/0x160
[ 168.675165] arch_ftrace_update_code+0x9/0x10
[ 168.675170] ftrace_run_update_code+0x1a/0x70
[ 168.675176] ftrace_startup_enable+0x2e/0x50
[ 168.675181] ftrace_startup+0xa8/0x120
[ 168.675185] register_ftrace_function+0x25/0x70
[ 168.675189] event_object_trigger_init+0x32/0x50
[ 168.675194] register_trigger+0x69/0xd0
[ 168.675200] event_trigger_register+0x1c/0x20
[ 168.675204] event_object_trigger_parse+0x323/0x390
[ 168.675211] trigger_process_regex+0xc1/0x110
[ 168.675216] event_trigger_write+0x70/0xd0
[ 168.675221] vfs_write+0xc0/0x2b0
[ 168.675228] ksys_write+0x67/0xe0
[ 168.675234] __x64_sys_write+0x1a/0x20
[ 168.675240] do_syscall_64+0x3b/0x90
[168.675246]entry_SYSCALL_64_after_hwframe+0x44/0xae
這部分代碼也是從register_trigger調(diào)用過(guò)來(lái)的,最終調(diào)用了
register_trigger -->
event_object_trigger_init -->
register_ftrace_function -->
ftrace_modify_all_code
其中,register_ftrace_function還將&tr->obj_data->fops注冊(cè)到function tracer的框架中,那么函數(shù)調(diào)用function tracer時(shí),function tracer會(huì)調(diào)用tr->obj_data->fops中的回調(diào)函數(shù)func,也就是在allocate_objtrace_data中賦值的trace_object_events_call函數(shù),此時(shí)trace_object_events_call開(kāi)始工作。
至此,function tracer被打開(kāi),以后內(nèi)核中所有可以trace的函數(shù),都將要走過(guò)function tracer這條路徑,也就是每一個(gè)函數(shù)調(diào)用之后,都會(huì)運(yùn)行到trace_object_events_call中。
5.3觸發(fā) “cat /home/kylin/workspace/opensource/linux-objtrace-jeff/kernel/sched/core.c > /dev/null”
本節(jié)是objtrace中最重要的一個(gè)環(huán)節(jié),當(dāng)kprobe和objtrace都設(shè)置好了,一旦內(nèi)核調(diào)用到bio_add_page,就要開(kāi)始跟蹤bio這個(gè)變量了,objtrace是如何在原有的kprobe的基礎(chǔ)上實(shí)現(xiàn)這個(gè)功能的呢?
我在閱讀代碼的時(shí)候,將這個(gè)過(guò)程分為兩部分,一個(gè)是kprobe,另一個(gè)是function tracer,我們先來(lái)看看當(dāng)kprobe被觸發(fā)的時(shí)候,objtrace做了什么工作:
同樣的,我還是使用了printk和WARN_ONCE來(lái)進(jìn)行調(diào)試,我首先在函數(shù)set_trace_object中增加了一個(gè)WARN_ONCE,得到的調(diào)用棧如下:
[ 248.069971] chensong:set_trace_object, call stack of set trace object
[ 248.069980] WARNING: CPU: 3 PID: 4343 at kernel/trace/trace_object.c:162 trace_object_trigger+0x184/0x1a0
....
[ 248.070450] Hardware name: LENOVO YangTianT4900v-00/, BIOS FCKT65AUS 01/12/2015
[ 248.070455] RIP: 0010:trace_object_trigger+0x184/0x1a0
[ 248.070462] Code: 7a ae 48 c7 c6 2d 69 3d ad e8 28 82 fe ff eb bc 48 c7 c6 60 05 43 ae 48 c7 c7 80 ce 7a ae c6 05 2c 11 95 01 01 e8 db 84 9d 00 <0f> 0b eb 9e 48 85 db 0f 84 16 ff ff ff 48 c7 c0 80 96 bc ae e9 d1
[ 248.070467] RSP: 0018:ffffb122007d36a8 EFLAGS: 00010086
....
[ 248.070515] Call Trace:
[ 248.070520]
[ 248.070533] trace_object_count_trigger+0x1e/0x30
[ 248.070541] event_triggers_call+0x5d/0xe0
[ 248.070556] trace_event_buffer_commit+0x1a2/0x260
[ 248.070573] kprobe_trace_func+0x1ad/0x2a0
[ 248.070602] kprobe_dispatcher+0x42/0x70
[ 248.070610] ? bio_add_page+0x1/0x90
[ 248.070622] kprobe_ftrace_handler+0x168/0x1f0
[ 248.070629] ? bio_add_page+0x5/0x90
[ 248.070636] ? __bio_try_merge_page+0x140/0x140
[ 248.070648] arch_ftrace_ops_list_func+0xe5/0x160
[ 248.070655] ? submit_bh_wbc+0xca/0x140
[ 248.070677] ftrace_regs_call+0x5/0x52
[ 248.070689] RIP: 0010:bio_add_page+0x1/0x90
....
[ 248.070781] ? bio_add_page+0x5/0x90
[ 248.070789] ? submit_bh_wbc+0xca/0x140
我們可以看到,當(dāng)bio_add_page被觸發(fā)之后,進(jìn)入和kprobe的調(diào)用路徑:
bio_add_page -->
ftrace_regs_call -->
arch_ftrace_ops_list_func-->
kprobe_ftrace_handler -->
kprobe_dispatcher -->
kprobe_trace_func -->
event_triggers_call -->
trace_object_count_trigger -->
set_trace_object
其中trace_object_count_trigger也是trigger_object_cmd中定義的回調(diào)函數(shù),在trigger條件符合被觸發(fā)的時(shí)候被調(diào)用,我在set_trace_object中還增加了一個(gè)printk,打印obj的地址,也就是需要跟蹤的變量的地址。
static void set_trace_object(void *obj, int obj_offset,
int obj_value_type_size, struct trace_array *tr)
{
unsigned long flags;
struct object_instance *obj_ins;
struct objtrace_data *obj_data;
...
obj_ins = &obj_data->traced_obj[atomic_read(&obj_data->num_traced_obj)];
obj_ins->obj = obj; --- (1)
printk("chensong:%s, obj:0x%lx", __FUNCTION__, obj);
obj_ins->obj_value_type_size = obj_value_type_size;
obj_ins->obj_offset = obj_offset;
obj_ins->tr = tr;
/* make sure the num_traced_obj update always appears after traced_obj update */
smp_wmb();
atomic_inc(&obj_data->num_traced_obj);
//WARN_ONCE(1, "chensong:%s, call stack of set trace object ", __FUNCTION__);
out:
raw_spin_unlock_irqrestore(&obj_data->obj_data_lock, flags);
}
(1)這個(gè)時(shí)候內(nèi)核已經(jīng)走到了bio_add_page,所以已經(jīng)知道了bio的地址了,所以需要將bio的地址保存到object_instance的obj中,object_instance中其余的元素保存偏移量,變量大小等信息。
接下來(lái)是function tracer的部分,在上一部分講過(guò),開(kāi)發(fā)者運(yùn)行命令“echo 'objtracearg1,0x285 if comm == "cat"' > ./events/kprobes/p_bio_add_page_0/trigger”之后,function tracer被打開(kāi),以后內(nèi)核中所有可以trace的函數(shù),都將要走過(guò)function tracer這條路徑,每一個(gè)函數(shù)調(diào)用之后,都會(huì)運(yùn)行到trace_object_events_call中,trace_object_events_call的代碼如下:
static void
trace_object_events_call(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *op, struct ftrace_regs *fregs)
{
struct pt_regs *pt_regs = ftrace_get_regs(fregs);
int cpu, n;
for (n = 0; n < max_args_num; n++) {
obj = regs_get_kernel_argument(pt_regs, n); --- (1)
if (object_exist((void *)obj, tr)) { --- (2)
...
if (get_object_value(&val, (void *)(obj + obj_offset), val_type_size)) --- (3)
continue;
submit_trace_object(ip, parent_ip, obj, val, tr); ---(4)
}
...
}
static inline unsigned long regs_get_register(struct pt_regs *regs,
unsigned int offset)
{
return *(unsigned long *)((unsigned long)regs + offset); --- (5)
}
static bool object_exist(void *obj, struct trace_array *tr)
{
...
for (i = 0; i < max; i++) {
if (obj_data->traced_obj[i].obj == obj) --- (6)
return true;
}
return false;
}
-
(1)Function tracer使能后,每個(gè)可以trace的函數(shù)都會(huì)調(diào)用到trace_object_events_call中,調(diào)用棧如下:
Any kernel tracable function -->
ftrace_regs_call -->
arch_ftrace_ops_list_func -->
__ftrace_ops_list_func -->
op->func(ip, parent_ip, op, fregs); -->
trace_object_events_call -->
get_object_value
trace_object_events_call首先調(diào)用regs_get_kernel_argument,依次獲得函數(shù)的參數(shù),本例中為bio_add_page的參數(shù)。
-
(2)調(diào)用object_exist來(lái)檢查是不是需要跟蹤的變量,這個(gè)變量的地址在開(kāi)發(fā)者輸入”cat file > /dev/null”之后, 內(nèi)核代碼走到了bio_add_page,bio就會(huì)在set_trace_object中保存到了object_instance的obj中,所以在這個(gè)時(shí)候,后續(xù)函數(shù)如果使用了bio,object_exist返回為TRUE。如果內(nèi)核代碼沒(méi)有走到過(guò)bio_add_page或者該函數(shù)的參數(shù)中沒(méi)有bio,object_exist會(huì)返回FALSE。
-
(3)獲取變量值,保存到val中。
-
(4)調(diào)用submit_trace_object將value打印到ftrace的ringbuffer中。
-
(5)regs_get_kernel_argument最終會(huì)調(diào)用到regs_get_register中,返回指定參數(shù)的地址。
-
(6)object_exist通過(guò)比較地址來(lái)判斷,當(dāng)前函數(shù)的參數(shù)中是不是包含需要跟蹤的變量。
好了,到此為止,objtrace的大部分工作都結(jié)束了,找到了變量,獲得了它的值,并打印到ringbuffer中,很巧妙,避免了大量的printk和WARN這樣笨重的方法。
但objtrace打開(kāi)了所有函數(shù)的function tracer,每個(gè)函數(shù)都要去調(diào)用trace_object_events_call,還有調(diào)用object_exist進(jìn)行變量的比對(duì),會(huì)增加一定的系統(tǒng)開(kāi)銷(xiāo),所以objtrace最好不用和性能調(diào)試的trace一起工作。
5.4打印 “cat trace”
從debugfs的trace文件中打印到中斷,顯示給開(kāi)發(fā)者,調(diào)用棧如下:
[ 1617.394055] chensong:trace_object_print
[ 1617.394073] WARNING: CPU: 2 PID: 4761 at kernel/trace/trace_output.c:1567 trace_object_print+0x73/0x90
[ 1617.394733] RIP: 0010:trace_object_print+0x73/0x90
[ 1617.394749] Code: c6 83 b5 fa b8 e8 9d 0d 00 00 4c 89 e7 be 0a 00 00 00 e8 d0 0a 00 00 48 c7 c6 30 f5 c2 b8 48 c7 c7 90 b5 fa b8 e8 ec a6 9e 00 <0f> 0b 4c 89 e7 e8 23 37 ff ff 5b 41 5c 5d c3 0f 0b eb 97 66 2e 0f
[ 1617.394841] Call Trace:
[ 1617.394850]
[ 1617.394864] print_trace_line+0x25e/0x570
[ 1617.394878] ? ring_buffer_iter_advance+0x32/0x40
[ 1617.394902] s_show+0x4c/0x160
[ 1617.394919] seq_read_iter+0x2a3/0x450
[ 1617.394947] seq_read+0xee/0x120
[ 1617.394983] vfs_read+0xa7/0x1a0
[ 1617.395006] ksys_read+0x67/0xe0
[ 1617.395029] __x64_sys_read+0x1a/0x20
[ 1617.395042] do_syscall_64+0x3b/0x90
[ 1617.395056] entry_SYSCALL_64_after_hwframe+0x44/0xae
原文標(biāo)題:Ftrace源碼分析系列之objtrace
文章出處:【微信公眾號(hào):Linux閱碼場(chǎng)】歡迎添加關(guān)注!文章轉(zhuǎn)載請(qǐng)注明出處。
-
內(nèi)核
+關(guān)注
關(guān)注
3文章
1361瀏覽量
40185 -
調(diào)試
+關(guān)注
關(guān)注
7文章
571瀏覽量
33875 -
源代碼
+關(guān)注
關(guān)注
96文章
2943瀏覽量
66619
原文標(biāo)題:Ftrace源碼分析系列之objtrace
文章出處:【微信號(hào):LinuxDev,微信公眾號(hào):Linux閱碼場(chǎng)】歡迎添加關(guān)注!文章轉(zhuǎn)載請(qǐng)注明出處。
發(fā)布評(píng)論請(qǐng)先 登錄
相關(guān)推薦
評(píng)論