a linux trace/probe tool.
官网:https://sourceware.org/systemtap/
SystemTap脚本主要是由探测点和探测点处理函数组成的,来看下都有哪些探测点可用。
The essential idea behind a systemtap script is to name events, and to give them handlers.
Systemtap works by translating the script to C, running the system C compiler to create a kernel module from that.
When the module is loaded, it activates all the probed events by hooking into the kernel.
(1) where to probe
Built-in events (probe point syntax and semantics)
begin:The startup of the systemtap session.
end:The end of the systemtap session.
kernel.function("sys_open"):The entry to the function named sys_open in the kernel.
syscall.close.return:The return from the close system call.
module("ext3").statement(0xdeadbeef):The addressed instruction in the ext3 filesystem driver.
timer.ms(200):A timer that fires every 200 milliseconds.
timer.jiffies(200):A timer that fires every 200 jiffies.
timer.profile:A timer that fires periodically on every CPU.
perf.hw.cache_misses:A particular number of CPU cache misses have occurred.
procfs("status").read:A process trying to read a synthetic file.
process("a.out").statement("*@main.c:200"):Line 200 of the a.out program.
更多信息,可见stapprobes mannual page:
https://sourceware.org/systemtap/man/stapprobes.3stap.html
http://linux.die.net/man/5/stapprobes
(2) what to print
Systemtap provides a variety of such contextual data, ready for formatting.
The usually appear as function calls within the handler.
tid():The id of the current thread.
pid():The process (task group) id of the current thread.
uid():The id of the current user.
execname():The name of the current process.
cpu():The current cpu number.
gettimeofday_s():Number of seconds since epoch.
get_cycles():Snapshot of hardware cycle counter.
pp():A string describing the probe point being currently handled.
probefunc():If known, the name of the function in which this probe was placed.
$$vars:If available, a pretty-printed listing of all local variables in scope.
print_backtrace():If possible, print a kernel backtrace.
print_ubacktrace():If possible, print a user-space backtrace.
$$parms:表示函数参数
$$return:表示函数返回值
thread_indent():tapset libary中一个很有用的函数,它的输出格式:
A timestamp (number of microseconds since the initial indentation for the thread)
A process name and the thread id itself.
更多信息,可见stapfuncs mannual page:
https://sourceware.org/systemtap/man/stapfuncs.3stap.html
http://linux.die.net/man/5/stapfuncs
(3) Built-in probe point types (DWARF probes)
内置的探测点,安装debuginfo后可使用。
This family of probe points uses symbolic debugging information for the target kernel or module,
as may be found in executables that have not been stripped, or in the separate debuginfo packages.
目前支持的内置探测点类型:
kernel.function(PATTERN) // 在函数的入口处放置探测点,可以获取函数参数$PARM
kernel.function(PATTERN).return // 在函数的返回处放置探测点,可以获取函数的返回值$return,以及可能被修改的函数参数$PARM
kernel.function(PATTERN).call // 取补集,取不符合条件的函数
kernel.function(PATTERN).inline // 只选择符合条件的内联函数,内联函数不能使用.return
kernel.function(PATTERN).exported // 只选择导出的函数
module(MPATTERN).function(PATTERN)
module(MPATTERN).function(PATTERN).return
module(MPATTERN).function(PATTERN).call
module(MPATTERN).function(PATTERN).inline
kernel.statement(PATTERN)
kernel.statement(ADDRESS).absolute
module(MPATTERN).statement(PATTERN)
示例:
# Refers to all kernel functions with "init" or "exit" in the name
kernel.function("*init*"), kernel.function("*exit*")
# Refers to any functions within the "kernel/time.c" file that span line 240
kernel.function("*@kernel/time.c:240")
# Refers to all functions in the ext3 module
module("ext3").function("*")
# Refers to the statement at line 296 within the kernel/time.c file
kernel.statement("*@kernel/time.c:296")
# Refers to the statement at line bio_init+3 within the fs/bio.c file
kernel.statement("bio_init@fs/bio.c+3")
部分在编译单元内可见的源码变量,比如函数参数、局部变量或全局变量,在探测点处理函数中同样是可见的。
在脚本中使用$加上变量的名字就可以饮用了。
变量的引用有两种风格:
$varname // 引用变量varname
$var->field // 引用结构的成员变量
$var[N] // 引用数组的元素
&$var // 变量的地址
@var("varname") // 引用变量varname
@var("var@src/file.c") // 引用src/file.c在被编译时的全局变量varname
@var("varname@file.c")->field // 引用结构的成员变量
@var("var@file.c")[N] // 引用数组的元素
&@var("var@file.c") // 变量的地址
$var$ // provide a string that includes the values of basic type values
$var$$ // provide a string that includes all values of nested data types
$$vars // 一个包含所有函数参数、局部变量的字符串
$$locals // 一个包含所有局部变量的字符串
$$params // 一个包含所有函数参数的字符串
(4) DWARF-less probing
当没有安装debuginfo时,不能使用内置的探测点。
In the absence of debugging information, you can still use the kprobe family of probes to examine the
entry and exit points of kernel and module functions. You cannot lookup the arguments or local variables
of a function using these probes.
当目标内核或模块缺少调试信息时,虽然不能使用内置的探测点,但仍然可以使用kprobe来探测函数的入口点
和退出点。此时不能使用“$+变量名”来获取函数参数或局部变量的值。
SystemTap仍然提供了一种访问参数的方法:
当函数因被探测而停滞在它的进入点时,可以使用编号来引用它的参数。
例如,假设被探测的函数声明如下:
ssize_t sys_read(unsigned int fd, char __user *buf, size_t count)
可以分别使用unit_arg(1)、pointer_arg(2)、ulong_arg(3)来获取fd、buf和count的值。
此种探测点虽然不支持$return,但可以通过调用returnval()来获取寄存器的值,函数的返回值通常是保存在
这一寄存器里的,也可以调用returnstr()来获取返回值的字符串形式。
在处理函数代码里面,可以调用register("regname")来获取它被调用时特定CPU寄存器的值。
使用格式(不能用通配符):
kprobe.function(FUNCTION)
kprobe.function(FUNCTION).return
kprobe.module(NAME).function(FUNCTION)
kprobe.module(NAME).function(FUNCTION).return
kprobe.statement(ADDRESS).absolute
(1) 基本格式
probe probe-point probe- handler,即probe Probe-Point { statement }
用probe指定一个探测点(probe-point),以及在这个探测点处执行的处理函数(probe-handler)。
每条语句不用结束符,分号“;”表示空语句。函数用{}括起来。
允许多种注释语句:
Shell-stype:#
C-style:/* */
C++-style://
next语句用于提前退出Probe-handler。
String连接符是“.”,比较符为“==”。
例如:"hello" . "world" ,连接成"helloword"
变量属于弱数据类型,不用事先声明,不用指定数据类型。
字符串类型和数字类型的转换:
s = sprint(123) # s becomes the string "123"
probe-handler中定义的变量是局部的,不能在其它探测点处理函数中使用。
global符号用于定义全局变量。
Because of possible concurrency (multiple probe handlers running on different CPUs, each global variable
used by a probe is automatically read-locked or write-locked while the handler is running.
(2) 函数
function name(param1, param2)
{
statements
return ret
}
Recursion is possible, up to a nesting depth limit.
(3) 条件语句
if (EXPR) STATEMENT [else STATEMENT]
(4) 循环语句
while (EXPR) STATEMENT
for (A; B; C) STATEMENT
break可以提前退出循环,continue可以跳过本次循环。
(5) 上下文变量
Allow access to the probe point context. To know which variables are likely to be available, you will need to
be familiar with the kernel source you are probing.
You can use stap -L PROBEPOINT to enumerate the variables available there.
使用stap -L probe-point,来查看执行到这个探测点时,哪些上下文变量是可用的。
Two functions, user_string and kernel_string, can copy char *target variables into systemtap strings.
实例:
(6) 数组
These arrays are implemented as hash tables with a maximum size that is fixed at startup.
Because they are too large to be created dynamically for individual probes handler runs, they must be
declared as global.
例如:global array[400]
6.1 元素定位
可以用多个索引来定位数组元素
例如:foo[4, "hello"]++
6.2 元素是否存在
例如:if ([4, "hello"] in foo) { }
6.3 元素删除
例如:delete
delete times[tid()] # deletion of a single element
delete times # deletion of all elements
6.4 遍历
使用foreach关键字,允许使用break/continue,在遍历期间不允许修改数组。
foreach (x = [a, b] in foo) { fuss_with(x) } # simple loop in arbitrary sequence
foreach ([a, b] in foo+ limit 5) {} # loop in increasing sequence of value, stop after 5
foreach ([a-, b] in foo) {} # loop in decreasing sequence of first key
(7) 统计
statistics aggregates是SystemTap特有的数据类型,用于统计全局变量。
操作符为“<<<”
例如:g_value <<< b # 相当于C语言的g_value += b
这种变量只能用特定函数操作,主要包括:
@count(g_value):所有统计操作的操作次数
@sum(g_value):所有统计操作的操作数的总和
@min(g_value):所有统计操作的操作数的最小值
@max(g_value):所有统计操作的操作数的最大值
@avg(g_value):所有统计操作的操作数的平均值
(8) 语言安全性
8.1 时间限制
探测点处理函数是有执行时间限制的,不能占用太多时间,否则SystemTap在把脚本编译为C语言时会报错。
每个探测点处理函数只能执行1000条语句,这个数量是可配置的。
8.2 动态内存分配
探测点处理函数中不允许动态内存分配。
No dynamic memory allocation whatsoever takes place during the execution of probe handlers.
Arrays, function contexts, and buffers are allocated during initialization.
8.3 锁
多个探测点处理函数抢占一个全局变量锁时,某几个探测点处理函数可能会超时,被放弃执行。
If multiple probes seek conflicting locks on the same global variables, one or more of them will time out and be
aborted. Such events are tailed as skipped probes, and a count is displayed at session end.
8.4 bug
内核中少数对时间非常敏感的地方(上下文切换、中断处理),是不能设为探测点的。
Putting probes indiscriminately into unusually sensitive parts of the kernel (low level context switching, interrupt
dispatching) has reportedly caused crashes in the past. We are fixing these bugs as they are found, and
constructing a probe "blacklist", but it is not complete.
内核调试神器SystemTap — 探测点与语法(二),布布扣,bubuko.com
原文地址:http://blog.csdn.net/zhangskd/article/details/27320965