事件的另一个来源是处理器本身及其性能监控装置(PMU)。它提供了一系列事件来监测 micro-architectural 事件,如cycle的数量、指令退出、L1缓存不命中等等。
这些事件被称为PMU硬件事件或硬件事件。每种处理器型号都有各种不同的硬件事件类型。perf record & perf report:先记录再通过详细报告的形式打印事件信息
[root@localhost test]# perf list
List of pre-defined events (to be used in -e): cpu-cycles OR cycles [Hardware event] instructions [Hardware event] cache-references [Hardware event] cache-misses [Hardware event] branch-instructions OR branches [Hardware event] branch-misses [Hardware event] bus-cycles [Hardware event] stalled-cycles-frontend OR idle-cycles-frontend [Hardware event] stalled-cycles-backend OR idle-cycles-backend [Hardware event] ref-cycles [Hardware event] cpu-clock [Software event] task-clock [Software event] page-faults OR faults [Software event] context-switches OR cs [Software event] cpu-migrations OR migrations [Software event] minor-faults [Software event] major-faults [Software event] alignment-faults [Software event] emulation-faults [Software event] L1-dcache-loads [Hardware cache event] L1-dcache-load-misses [Hardware cache event] L1-dcache-stores [Hardware cache event] L1-dcache-store-misses [Hardware cache event] L1-dcache-prefetches [Hardware cache event] L1-dcache-prefetch-misses [Hardware cache event] L1-icache-loads [Hardware cache event] L1-icache-load-misses [Hardware cache event] L1-icache-prefetches [Hardware cache event] L1-icache-prefetch-misses [Hardware cache event] LLC-loads [Hardware cache event] LLC-load-misses [Hardware cache event] LLC-stores [Hardware cache event] LLC-store-misses [Hardware cache event] LLC-prefetches [Hardware cache event] LLC-prefetch-misses [Hardware cache event] dTLB-loads [Hardware cache event] dTLB-load-misses [Hardware cache event] dTLB-stores [Hardware cache event] dTLB-store-misses [Hardware cache event] dTLB-prefetches [Hardware cache event] dTLB-prefetch-misses [Hardware cache event] iTLB-loads [Hardware cache event] iTLB-load-misses [Hardware cache event] branch-loads [Hardware cache event] branch-load-misses [Hardware cache event] rNNN [Raw hardware event descriptor] cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor] sunrpc:rpc_call_status [Tracepoint event] sunrpc:rpc_bind_status [Tracepoint event] sunrpc:rpc_connect_status [Tracepoint event] sunrpc:rpc_task_begin [Tracepoint event] sunrpc:rpc_task_run_action [Tracepoint event] sunrpc:rpc_task_complete [Tracepoint event] sunrpc:rpc_task_sleep [Tracepoint event] sunrpc:rpc_task_wakeup [Tracepoint event] ext4:ext4_free_inode [Tracepoint event] ext4:ext4_request_inode [Tracepoint event] ext4:ext4_allocate_inode [Tracepoint event] ext4:ext4_write_begin [Tracepoint event] ext4:ext4_da_write_begin [Tracepoint event] 省略............
其中通过perf list命令运行在不同的系统会列出不同的结果,在 2.6.35 版本的内核中,该列表已经相当的长,但无论有多少,我们可以将它们划分为三类:
Hardware Event 是由 PMU 硬件产生的事件,比如 cache 命中,当您需要了解程序对硬件特性的使用情况时,便需要对这些事件进行采样;
Samples: 475 of event 'cycles', Event count (approx.): 165419249 10.30% [kernel] [k] kallsyms_expand_symbol 7.75% perf [.] 0x000000000005f190 5.68% libc-2.12.so [.] memcpy 5.45% [kernel] [k] format_decode 5.45% libc-2.12.so [.] __strcmp_sse42 5.45% perf [.] symbols__insert 5.24% [kernel] [k] memcpy 4.86% [kernel] [k] vsnprintf 4.62% [kernel] [k] string 4.45% [kernel] [k] number 3.35% libc-2.12.so [.] __strstr_sse42 3.15% perf [.] hex2u64 2.72% perf [.] rb_next 2.10% [kernel] [k] pointer 1.89% [kernel] [k] security_real_capable_noaudit 1.89% [kernel] [k] strnlen 1.88% perf [.] rb_insert_color 1.68% libc-2.12.so [.] _int_malloc 1.68% libc-2.12.so [.] _IO_getdelim 1.28% [kernel] [k] update_iter 1.05% [kernel] [k] s_show 1.05% libc-2.12.so [.] memchr 1.05% [kernel] [k] get_task_cred 0.88% [kernel] [k] seq_read 0.85% [kernel] [k] clear_page_c 0.84% perf [.] symbol__new 0.63% libc-2.12.so [.] __libc_calloc 0.63% [kernel] [k] copy_user_generic_string 0.63% [kernel] [k] seq_vprintf 0.63% perf [.] kallsyms__parse 0.63% libc-2.12.so [.] _IO_feof 0.63% [kernel] [k] strcmp 0.63% [kernel] [k] page_fault 0.63% perf [.] dso__load_sym 0.42% perf [.] strstr@plt 0.42% libc-2.12.so [.] __strchr_sse42 0.42% [kernel] [k] seq_printf 0.42% libc-2.12.so [.] __memset_sse2 0.42% libelf-0.152.so [.] gelf_getsym 0.25% [kernel] [k] __mutex_init 0.21% [kernel] [k] _spin_lock 0.21% [kernel] [k] s_next 0.21% [kernel] [k] fsnotify 0.21% [kernel] [k] sys_read 0.21% [kernel] [k] __link_path_walk 0.21% [kernel] [k] intel_pmu_disable_all 0.21% [kernel] [k] mem_cgroup_charge_common 0.21% [kernel] [k] update_shares 0.21% [kernel] [k] native_read_tsc 0.21% [kernel] [k] apic_timer_interrupt 0.21% [kernel] [k] auditsys 0.21% [kernel] [k] do_softirq 0.21% [kernel] [k] update_process_times 0.21% [kernel] [k] native_sched_clock 0.21% [kernel] [k] mutex_lock 0.21% [kernel] [k] module_get_kallsymperf top 类似top的作用,可以整体查看系统的全部事件并且按数量排序。
perf stat可以直接接上命令或者通过pid形式对进程进行性能统计,完成后会打印出这个过程的性能数据统计
[root@localhost test]# perf stat -B dd if=/dev/zero of=/dev/null count=1000000 记录了1000000+0 的读入 记录了1000000+0 的写出 512000000字节(512 MB)已复制,0.790059 秒,648 MB/秒 Performance counter stats for 'dd if=/dev/zero of=/dev/null count=1000000': 791.176286 task-clock # 1.000 CPUs utilized 1 context-switches # 0.001 K/sec 0 cpu-migrations # 0.000 K/sec 247 page-faults # 0.312 K/sec 1,248,519,452 cycles # 1.578 GHz [83.29%] 366,166,452 stalled-cycles-frontend # 29.33% frontend cycles idle [83.33%] 155,120,002 stalled-cycles-backend # 12.42% backend cycles idle [66.62%] 1,947,919,494 instructions # 1.56 insns per cycle # 0.19 stalled cycles per insn [83.31%] 355,465,524 branches # 449.287 M/sec [83.41%] 2,021,648 branch-misses # 0.57% of all branches [83.42%] 0.791116595 seconds time elapsed
[root@localhost test]# perf stat -p 18669 ^C Performance counter stats for process id '18669': 1.520699 task-clock # 0.001 CPUs utilized 56 context-switches # 0.037 M/sec 0 cpu-migrations # 0.000 K/sec 0 page-faults # 0.000 K/sec 2,178,120 cycles # 1.432 GHz [63.18%] 1,410,393 stalled-cycles-frontend # 64.75% frontend cycles idle [90.94%] 942,665 stalled-cycles-backend # 43.28% backend cycles idle 1,067,824 instructions # 0.49 insns per cycle # 1.32 stalled cycles per insn 193,104 branches # 126.984 M/sec 14,544 branch-misses # 7.53% of all branches [61.93%] 2.061889979 seconds time elapsed
[root@localhost test]# perf stat usage: perf stat [<options>] [<command>] -e, --event <event> event selector. use 'perf list' to list available events --filter <filter> event filter -i, --no-inherit child tasks do not inherit counters -p, --pid <pid> stat events on existing process id -t, --tid <tid> stat events on existing thread id -a, --all-cpus system-wide collection from all CPUs -g, --group put the counters into a counter group -c, --scale scale/normalize counters -v, --verbose be more verbose (show counter open errors, etc) -r, --repeat <n> repeat command and print average + stddev (max: 100) -n, --null null run - dont start any counters -d, --detailed detailed run - start a lot of events -S, --sync call sync() before starting a run -B, --big-num print large numbers with thousands' separators -C, --cpu <cpu> list of cpus to monitor in system-wide -A, --no-aggr disable CPU count aggregation -x, --field-separator <separator> print counts with custom separator -G, --cgroup <name> monitor event in cgroup name only -o, --output <file> output file name --append append to the output file --log-fd <n> log output to fd, instead of stderr --pre <command> command to run prior to the measured command --post <command> command to run after to the measured command
perf stat -e cycles,instructions,cache-misses [...]也可以指定监控的级别
perf stat -e cycles:u dd if=/dev/zero of=/dev/null count=100000具体含义是:
Modifiers | Description | Example |
---|---|---|
u | monitor at priv level 3, 2, 1 (user) | event:u |
k | monitor at priv level 0 (kernel) | event:k |
h | monitor hypervisor events on a virtualization environment | event:h |
H | monitor host machine on a virtualization environment | event:H |
G | monitor guest machine on a virtualization environment | event:G |
操作系统资源一般分为四大部分:CPU、内存、IO、网络
容易出现瓶颈的一般是CPU和IO,所以我这里写了两个小程序分别模拟这两种情况,然后通过使用perf一系列的工具找到症结所在。
程序1是计算素数,专门用来耗系统CPU的
#include <stdio.h> int IsPrime(int num) { int i=2; for(;i<=num/2;i++) if(0==num%i) return 0; return 1; } void main() { int num; for(num=2;num<=10000000;num++) IsPrime(num); }
#include <unistd.h> #include <stdio.h> #include <string> #include <sys/types.h> #include <sys/wait.h> #include <sys/stat.h> #include <fcntl.h> #include <stdlib.h> #include <vector> using namespace std; bool gquit = false; void* threadfun(void* param) { string filename = *(string*)param; unlink(filename.c_str()); int fd = open(filename.c_str(), O_CREAT | O_RDWR, 666); if (fd < 0) return 0; string a(1024*16*1024, 'a'); bool flag = true; while (!gquit) { if (flag) { write(fd, a.c_str(), a.length()); } else { read(fd, &a[0], a.length()); lseek(fd, 0, SEEK_SET); } flag = !flag; usleep(1); } close(fd); return 0; } int main(int argc, char* argv[]) { if (argc != 2) { printf("need thread num\n"); return 0; } int threadnum = atoi(argv[1]); vector<string> v(threadnum, "./testio.filex_"); vector<pthread_t> vpid(threadnum, 0); for(int i = 0 ; i < threadnum ; ++ i) { pthread_attr_t attr; pthread_attr_init(&attr); static char buf[32]; snprintf(buf, sizeof(buf), "%d", i); v[i] += buf; pthread_create(&vpid[i], & attr, threadfun, (void*)&v[i]); } printf("press q to quit\n"); while(getchar() != 'q'); gquit = true; for(int i = 0 ; i < threadnum ; ++ i) { void* ret = NULL; pthread_join(vpid[i], &ret); unlink(v[i].c_str()); } return 0; }
Samples: 1K of event 'cycles', Event count (approx.): 690036067 74.14% cpu [.] IsPrime 2.45% [kernel] [k] kallsyms_expand_symbol 2.21% perf [.] 0x000000000005f225 1.65% perf [.] symbols__insert 1.50% perf [.] hex2u64 1.41% libc-2.12.so [.] __strcmp_sse42 1.26% [kernel] [k] string 1.25% [kernel] [k] format_decode 1.10% [kernel] [k] vsnprintf 0.95% [kernel] [k] strnlen 0.85% [kernel] [k] memcpy 0.80% libc-2.12.so [.] memcpy 0.70% [kernel] [k] number 0.60% perf [.] rb_next 0.60% libc-2.12.so [.] _int_malloc 0.55% libc-2.12.so [.] __strstr_sse42 0.55% [kernel] [k] pointer 0.55% perf [.] rb_insert_color
然后通过perf stat 指定pid进行监控
[root@localhost test]# perf stat -p 10704 ^C Performance counter stats for process id '10704': 4067.488066 task-clock # 1.001 CPUs utilized [100.00%] 5 context-switches # 0.001 K/sec [100.00%] 0 cpu-migrations # 0.000 K/sec [100.00%] 0 page-faults # 0.000 K/sec 10,743,902,971 cycles # 2.641 GHz [83.32%] 5,863,649,174 stalled-cycles-frontend # 54.58% frontend cycles idle [83.32%] 1,347,218,352 stalled-cycles-backend # 12.54% backend cycles idle [66.69%] 14,625,550,503 instructions # 1.36 insns per cycle # 0.40 stalled cycles per insn [83.34%] 1,950,981,334 branches # 479.653 M/sec [83.35%] 46,273 branch-misses # 0.00% of all branches [83.32%] 4.064800952 seconds time elapsed
然后使用perf record和report进行更加具体的分析
[root@localhost test]# perf record -g -- ./cpu ^C[ perf record: Woken up 4 times to write data ] [ perf record: Captured and wrote 0.875 MB perf.data (~38244 samples) ] ./cpu: Interrupt [root@localhost test]# perf report Samples: 11K of event 'cycles', Event count (approx.):6815943846 + 99.70% cpu cpu [.] IsPrime + 0.05% cpu [kernel.kallsyms] [k] hrtimer_interrupt + 0.03% cpu cpu [.] main + 0.02% cpu [kernel.kallsyms] [k] native_write_msr_safe + 0.02% cpu [kernel.kallsyms] [k] _spin_unlock_irqrestore + 0.02% cpu [kernel.kallsyms] [k] perf_adjust_freq_unthr_context + 0.01% cpu [kernel.kallsyms] [k] rb_insert_color + 0.01% cpu [kernel.kallsyms] [k] raise_softirq + 0.01% cpu [kernel.kallsyms] [k] rcu_process_gp_end + 0.01% cpu [kernel.kallsyms] [k] scheduler_tick + 0.01% cpu [kernel.kallsyms] [k] do_timer + 0.01% cpu [kernel.kallsyms] [k] _spin_lock_irqsave + 0.01% cpu [kernel.kallsyms] [k] apic_timer_interrupt + 0.01% cpu [kernel.kallsyms] [k] tick_sched_timer + 0.01% cpu [kernel.kallsyms] [k] _spin_lock + 0.01% cpu [kernel.kallsyms] [k] __remove_hrtimer + 0.01% cpu [kernel.kallsyms] [k] idle_cpu非常清晰可以看到各种函数的调用情况,由于这个程序是个很简单的代码,所以容易看出问题所在,但是如果是大项目的某个功能出现问题,通过perf工具就可以很方便地定位问题,后面再看看一个占用IO的例子。
同样也是先启动程序2,先通过perf top、perf stat -p pid这两个命令先定位是哪个进程导致性能问题
Samples: 394K of event 'cycles', Event count (approx.): 188517759438 75.39% [kernel] [k] _spin_lock 11.39% [kernel] [k] copy_user_generic_string 1.89% [jbd2] [k] jbd2_journal_stop 1.66% [kernel] [k] find_get_page 1.51% [jbd2] [k] start_this_handle 0.95% [ext4] [k] ext4_da_write_end 0.76% [kernel] [k] iov_iter_fault_in_readable 0.74% [ext4] [k] ext4_journal_start_sb 0.56% [jbd2] [k] __jbd2_log_space_left 0.44% [ext4] [k] __ext4_journal_stop 0.44% [kernel] [k] __block_prepare_write 0.41% [kernel] [k] __wake_up_bit 0.35% [kernel] [k] _cond_resched 0.32% [kernel] [k] kmem_cache_free
直接通过perf record和report进行分析,得出结果
[root@localhost test]# perf record -g -- ./io 10 press q to quit q [ perf record: Woken up 83 times to write data ] [ perf record: Captured and wrote 23.361 MB perf.data (~1020643 samples) ] [root@localhost test]# perf report Samples: 138K of event 'cycles', Event count (approx.): 64858537028 + 65.87% io [kernel.kallsyms] [k] _spin_lock + 14.19% io [kernel.kallsyms] [k] copy_user_generic_string + 2.74% io [jbd2] [k] jbd2_journal_stop + 2.46% io [kernel.kallsyms] [k] find_get_page + 2.11% io [jbd2] [k] start_this_handle + 1.06% io [kernel.kallsyms] [k] iov_iter_fault_in_readable + 1.05% io [ext4] [k] ext4_journal_start_sb + 0.85% io [ext4] [k] ext4_da_write_end + 0.73% io [jbd2] [k] __jbd2_log_space_left + 0.66% io [kernel.kallsyms] [k] generic_write_end + 0.61% io [ext4] [k] __ext4_journal_stop + 0.57% io [kernel.kallsyms] [k] __block_prepare_write + 0.54% io [kernel.kallsyms] [k] __wake_up_bit
好的,以上就是perf常用的工具,当然perf不止这几个工具,但这几个是最基础最常用的工具
通过查看官方wiki https://perf.wiki.kernel.org/index.php/Tutorial 可以了解更多信息
版权声明:本文为博主原创文章,未经博主允许不得转载。
原文地址:http://blog.csdn.net/jeffreynicole/article/details/46877033