标签:情况 sem com nat vol nvm 遇到 应该 ola
某外场环境,连续2-3次出现panic,故障当时dmesg日志如下:
[86406.499787] BUG: unable to handle kernel NULL pointer dereference at 0000000000000106 [86406.499808] IP: skb_release_data+0xa3/0x170 [86406.499811] PGD 0 P4D 0 [86406.499818] Oops: 0000 [#1] SMP PTI [86406.499822] Modules linked in: sctp_diag sctp dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag bnx2fc(OE) cnic(OE) uio fcoe libfcoe libfc drbd lru_cache ocfs2(OE) quota_tree 8021q mrp garp stp llc ocfs2_dlmfs(OE) ocfs2_stack_o2cb(OE) ocfs2_dlm(OE) ocfs2_nodemanager(OE) event_keeper(OE) ocfs2_stackglue(OE) dm_round_robin chbk(OE) ib_iser rdma_cm ib_cm iw_cm iscsi_tcp libiscsi_tcp lib iscsi scsi_transport_iscsi cls_u32 sch_sfq sch_htb openvswitch(OE) udp_tunnel nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 nf_nat nf_conntrack libcrc32 c dm_multipath vhost_net vhost tap kvm_intel kvm irqbypass ipmi_ssif dm_mod nfsd nfs_acl auth_rpcgss nfs ttm fscache drm_kms_helper lockd drm sunrpc fb_sys_fops sysimgblt sysfillrect syscopyarea s hpchp mei_me mei grace mxm_wmi [86406.499903] lpc_ich ipmi_si ipmi_devintf ipmi_msghandler wmi nls_iso8859_1 acpi_power_meter mac_hid lp parport ext4 jbd2 mbcache ses sd_mod enclosure qla2xxx(OE) ixgbe igb scsi_transport_fc nvme_fc dca i2c_algo_bit ahci nvme_fabrics ptp aacraid(OE) libahci pps_core nvme_core scsi_transport_sas mdio nbd mlx4_ib ib_core mlx4_core devlink [86406.499942] CPU: 28 PID: 180 Comm: ksoftirqd/28 Tainted: G OE 4.14.0-generic #1 [86406.499947] Hardware name: New H3C Technologies Co., Ltd. UIS R390X G2/RS32M2C9S, BIOS 1.01.23 09/07/2018 [86406.499952] task: ffff9612bb564440 task.stack: ffffabb086974000 [86406.499957] RIP: 0010:skb_release_data+0xa3/0x170 [86406.499962] RSP: 0018:ffffabb086977cb8 EFLAGS: 00010206 [86406.499966] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 [86406.499969] RDX: 0000000000000002 RSI: 0000000000000040 RDI: 0000000000000106 [86406.499973] RBP: ffffabb086977cd8 R08: 0000000000000000 R09: ffffffffb4f711c1 [86406.499977] R10: ffffdd858089de00 R11: ffffffffb4c3a380 R12: ffff9612a277cef0 [86406.499980] R13: ffff9611b2a3f100 R14: ffff9612a277cec0 R15: ffff9612b08c3540 [86406.499985] FS: 0000000000000000(0000) GS:ffff9612bfc80000(0000) knlGS:0000000000000000 [86406.499989] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [86406.499993] CR2: 0000000000000106 CR3: 00000016c960a003 CR4: 00000000003626e0 [86406.499997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [86406.500000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [86406.500003] Call Trace: [86406.500011] skb_release_all+0x24/0x30 [86406.500016] napi_consume_skb+0x54/0x110 [86406.500032] ixgbe_poll+0x7d7/0x14a0 [ixgbe] [86406.500041] ? insert_work+0x69/0x70 [86406.500045] ? __queue_work+0x119/0x410 [86406.500051] ? __switch_to_asm+0x40/0x70 [86406.500056] ? call_timer_fn+0x35/0x140 [86406.500062] net_rx_action+0x140/0x3a0 [86406.500067] __do_softirq+0xe4/0x2c5 [86406.500073] run_ksoftirqd+0x22/0x60 [86406.500079] smpboot_thread_fn+0xfc/0x170 [86406.500084] kthread+0x128/0x140 [86406.500087] ? sort_range+0x30/0x30 [86406.500091] ? kthread_create_on_node+0x70/0x70 [86406.500096] ret_from_fork+0x35/0x40 [86406.500099] Code: 45 fa 0f 1f 44 00 00 f0 ff 4f 1c 75 05 e8 06 5a a6 ff 41 0f b6 46 02 83 c3 01 49 83 c4 10 39 d8 7f ce 49 8b 7e 08 48 85 ff 74 10 <48> 8b 1f e8 f5 f4 ff ff 48 85 db 48 89 df 75 f0 4d 85 ed 74 54 [86406.500149] RIP: skb_release_data+0xa3/0x170 RSP: ffffabb086977cb8 [86406.500152] CR2: 0000000000000106
在出现几次后,发现有如下规律:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000106
RIP: skb_release_data+0xa3/0x170 RSP: ffffabb086977cb8
查看panic的堆栈,并返汇编可以得到其对应的代码为:
crash> bt PID: 180 TASK: ffff9612bb564440 CPU: 28 COMMAND: "ksoftirqd/28" #0 [ffffabb086977930] machine_kexec at ffffffffb48621f9 #1 [ffffabb086977990] __crash_kexec at ffffffffb4926079 #2 [ffffabb086977a58] crash_kexec at ffffffffb4926dd1 #3 [ffffabb086977a78] oops_end at ffffffffb482fcd8 #4 [ffffabb086977aa0] no_context at ffffffffb487275c #5 [ffffabb086977b08] __bad_area_nosemaphore at ffffffffb4872b13 #6 [ffffabb086977b48] bad_area_nosemaphore at ffffffffb4872be4 #7 [ffffabb086977b58] __do_page_fault at ffffffffb4872fe4 #8 [ffffabb086977bd0] do_page_fault at ffffffffb487340e #9 [ffffabb086977c00] page_fault at ffffffffb5201605 [exception RIP: skb_release_data+163] RIP: ffffffffb4f75143 RSP: ffffabb086977cb8 RFLAGS: 00010206 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 0000000000000002 RSI: 0000000000000040 RDI: 0000000000000106 RBP: ffffabb086977cd8 R8: 0000000000000000 R9: ffffffffb4f711c1 R10: ffffdd858089de00 R11: ffffffffb4c3a380 R12: ffff9612a277cef0 R13: ffff9611b2a3f100 R14: ffff9612a277cec0 R15: ffff9612b08c3540 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffffabb086977cb0] skb_release_data at ffffffffb4f751b4 #11 [ffffabb086977ce0] skb_release_all at ffffffffb4f74614 #12 [ffffabb086977cf8] napi_consume_skb at ffffffffb4f782e4 #13 [ffffabb086977d18] ixgbe_poll at ffffffffc0244dd7 [ixgbe] #14 [ffffabb086977de0] net_rx_action at ffffffffb4f8bf80 #15 [ffffabb086977e60] __softirqentry_text_start at ffffffffb54000e4 #16 [ffffabb086977ec8] run_ksoftirqd at ffffffffb48918b2 #17 [ffffabb086977ed8] smpboot_thread_fn at ffffffffb48b068c #18 [ffffabb086977f08] kthread at ffffffffb48ac9c8 #19 [ffffabb086977f50] ret_from_fork at ffffffffb5200205
crash> dis -l skb_release_data+163 /linux-iso/code/linux-dev/4.14.88/linux-4.14/net/core/skbuff.c: 666 0xffffffffb4f75143 <skb_release_data+163>: mov (%rdi),%rbx
663 void kfree_skb_list(struct sk_buff *segs) 664 { 665 while (segs) { 666 struct sk_buff *next = segs->next; <<<<---- panic地址 667 668 kfree_skb(segs); 669 segs = next; 670 } 671 }
结合代码分析可知: 导致故障的原因应该是入参segs本身变成了0x0000000000000106,然后通过指针访问segs->next出现了空指针panic。kfree_skb_list函数应该被编译器优化掉了,因此无法直接获取到segs的值,必须要通过其上级函数skb_release_data才能解析出segs的具体值。查看代码skb_release_data的入参是skb,只要得到了skb就可以很容易的解析出segs。
static void skb_release_data(struct sk_buff *skb) { struct skb_shared_info *shinfo = skb_shinfo(skb); int i; if (skb->cloned && atomic_sub_return(skb->nohdr ? (1 << SKB_DATAREF_SHIFT) + 1 : 1, &shinfo->dataref)) return; for (i = 0; i < shinfo->nr_frags; i++) __skb_frag_unref(&shinfo->frags[i]); /* * If skb buf is from userspace, we need to notify the caller * the lower device DMA has done; */ if (shinfo->tx_flags & SKBTX_DEV_ZEROCOPY) { struct ubuf_info *uarg; uarg = shinfo->destructor_arg; if (uarg->callback) uarg->callback(uarg, true); } if (shinfo->frag_list) kfree_skb_list(shinfo->frag_list); <<<<----- 反汇编可以看到其被优化掉了 skb_free_head(skb); }
将skb_release_data函数进行反汇编,入参skb初始时候是存放在rdi寄存器里面的,
crash> dis -l skb_release_data /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 554 0xffffffff913750a0 <skb_release_data>: nopl 0x0(%rax,%rax,1) [FTRACE NOP] 0xffffffff913750a5 <skb_release_data+5>: push %rbp 0xffffffff913750a6 <skb_release_data+6>: mov %rsp,%rbp 0xffffffff913750a9 <skb_release_data+9>: push %r14 0xffffffff913750ab <skb_release_data+11>: push %r13 0xffffffff913750ad <skb_release_data+13>: push %r12 0xffffffff913750af <skb_release_data+15>: push %rbx /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 558 0xffffffff913750b0 <skb_release_data+16>: movzbl 0x8e(%rdi),%eax /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1248 0xffffffff913750b7 <skb_release_data+23>: mov 0xcc(%rdi),%r14d 0xffffffff913750be <skb_release_data+30>: add 0xd0(%rdi),%r14 /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 558 0xffffffff913750c5 <skb_release_data+37>: test $0x1,%al 0xffffffff913750c7 <skb_release_data+39>: je 0xffffffff913750f8 <skb_release_data+88> /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 559 0xffffffff913750c9 <skb_release_data+41>: and $0x2,%eax 0xffffffff913750cc <skb_release_data+44>: cmp $0x1,%al 0xffffffff913750ce <skb_release_data+46>: mov %eax,%edx 0xffffffff913750d0 <skb_release_data+48>: sbb %eax,%eax 0xffffffff913750d2 <skb_release_data+50>: or $0xfffeffff,%eax 0xffffffff913750d7 <skb_release_data+55>: cmp $0x1,%dl 0xffffffff913750da <skb_release_data+58>: sbb %edx,%edx 0xffffffff913750dc <skb_release_data+60>: xor %dx,%dx 0xffffffff913750df <skb_release_data+63>: add $0x10001,%edx /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./arch/x86/include/asm/atomic.h: 157 0xffffffff913750e5 <skb_release_data+69>: lock xadd %eax,0x24(%r14) /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 558 0xffffffff913750eb <skb_release_data+75>: cmp %eax,%edx 0xffffffff913750ed <skb_release_data+77>: je 0xffffffff913750f8 <skb_release_data+88> /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 571 0xffffffff913750ef <skb_release_data+79>: pop %rbx 0xffffffff913750f0 <skb_release_data+80>: pop %r12 0xffffffff913750f2 <skb_release_data+82>: pop %r13 0xffffffff913750f4 <skb_release_data+84>: pop %r14 0xffffffff913750f6 <skb_release_data+86>: pop %rbp 0xffffffff913750f7 <skb_release_data+87>: retq /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 563 0xffffffff913750f8 <skb_release_data+88>: xor %ebx,%ebx 0xffffffff913750fa <skb_release_data+90>: cmpb $0x0,0x2(%r14) 0xffffffff913750ff <skb_release_data+95>: mov %rdi,%r13 <<<----- rdi被保存到了r13里面 0xffffffff91375102 <skb_release_data+98>: lea 0x30(%r14),%r12 0xffffffff91375106 <skb_release_data+102>: je 0xffffffff9137513a <skb_release_data+154> 0xffffffff91375108 <skb_release_data+104>: mov (%r12),%rdi /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/compiler.h: 183 0xffffffff9137510c <skb_release_data+108>: mov 0x20(%rdi),%rax /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/page-flags.h: 150 0xffffffff91375110 <skb_release_data+112>: lea -0x1(%rax),%rdx 0xffffffff91375114 <skb_release_data+116>: test $0x1,%al 0xffffffff91375116 <skb_release_data+118>: cmovne %rdx,%rdi 0xffffffff9137511a <skb_release_data+122>: nopl 0x0(%rax,%rax,1) /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./arch/x86/include/asm/atomic.h: 118 0xffffffff9137511f <skb_release_data+127>: lock decl 0x1c(%rdi) /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/mm.h: 854 0xffffffff91375123 <skb_release_data+131>: jne 0xffffffff9137512a <skb_release_data+138> /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/mm.h: 855 0xffffffff91375125 <skb_release_data+133>: callq 0xffffffff90ddab30 <__put_page> /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 563 0xffffffff9137512a <skb_release_data+138>: movzbl 0x2(%r14),%eax 0xffffffff9137512f <skb_release_data+143>: add $0x1,%ebx 0xffffffff91375132 <skb_release_data+146>: add $0x10,%r12 0xffffffff91375136 <skb_release_data+150>: cmp %ebx,%eax 0xffffffff91375138 <skb_release_data+152>: jg 0xffffffff91375108 <skb_release_data+104> /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 566 0xffffffff9137513a <skb_release_data+154>: mov 0x8(%r14),%rdi 0xffffffff9137513e <skb_release_data+158>: test %rdi,%rdi 0xffffffff91375141 <skb_release_data+161>: je 0xffffffff91375153 <skb_release_data+179> /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 666 0xffffffff91375143 <skb_release_data+163>: mov (%rdi),%rbx <<<---- rip的值,当前代码运行到了这个地方 /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 668 0xffffffff91375146 <skb_release_data+166>: callq 0xffffffff91374640 <kfree_skb> /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 665 0xffffffff9137514b <skb_release_data+171>: test %rbx,%rbx 0xffffffff9137514e <skb_release_data+174>: mov %rbx,%rdi 0xffffffff91375151 <skb_release_data+177>: jne 0xffffffff91375143 <skb_release_data+163> /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1277 0xffffffff91375153 <skb_release_data+179>: test %r13,%r13 0xffffffff91375156 <skb_release_data+182>: je 0xffffffff913751ac <skb_release_data+268> /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1248 0xffffffff91375158 <skb_release_data+184>: mov 0xcc(%r13),%eax 0xffffffff9137515f <skb_release_data+191>: add 0xd0(%r13),%rax /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1277 0xffffffff91375166 <skb_release_data+198>: movzbl 0x3(%rax),%edx 0xffffffff9137516a <skb_release_data+202>: test $0x8,%dl 0xffffffff9137516d <skb_release_data+205>: je 0xffffffff913751ac <skb_release_data+268> /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1279 0xffffffff9137516f <skb_release_data+207>: mov 0x28(%rax),%rdi /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1312 0xffffffff91375173 <skb_release_data+211>: test %rdi,%rdi 0xffffffff91375176 <skb_release_data+214>: je 0xffffffff913751ac <skb_release_data+268> /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1313 0xffffffff91375178 <skb_release_data+216>: mov (%rdi),%rcx 0xffffffff9137517b <skb_release_data+219>: cmp $0xffffffff91374e00,%rcx 0xffffffff91375182 <skb_release_data+226>: je 0xffffffff913751ed <skb_release_data+333> /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1316 0xffffffff91375184 <skb_release_data+228>: test $0x1,%dil 0xffffffff91375188 <skb_release_data+232>: jne 0xffffffff913751a6 <skb_release_data+262> /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1317 0xffffffff9137518a <skb_release_data+234>: mov $0x1,%esi 0xffffffff9137518f <skb_release_data+239>: callq 0xffffffff91803040 <__x86_indirect_thunk_rcx> 0xffffffff91375194 <skb_release_data+244>: mov 0xcc(%r13),%eax 0xffffffff9137519b <skb_release_data+251>: add 0xd0(%r13),%rax 0xffffffff913751a2 <skb_release_data+258>: movzbl 0x3(%rax),%edx /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1320 0xffffffff913751a6 <skb_release_data+262>: and $0xffffffd7,%edx 0xffffffff913751a9 <skb_release_data+265>: mov %dl,0x3(%rax) /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 570 0xffffffff913751ac <skb_release_data+268>: mov %r13,%rdi 0xffffffff913751af <skb_release_data+271>: callq 0xffffffff913711a0 <skb_free_head> /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/net/core/skbuff.c: 571 0xffffffff913751b4 <skb_release_data+276>: pop %rbx 0xffffffff913751b5 <skb_release_data+277>: pop %r12 0xffffffff913751b7 <skb_release_data+279>: pop %r13 0xffffffff913751b9 <skb_release_data+281>: pop %r14 0xffffffff913751bb <skb_release_data+283>: pop %rbp 0xffffffff913751bc <skb_release_data+284>: retq /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/mm.h: 790 0xffffffff913751bd <skb_release_data+285>: mov (%rdi),%rax 0xffffffff913751c0 <skb_release_data+288>: shr $0x33,%rax 0xffffffff913751c4 <skb_release_data+292>: and $0x7,%eax /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/memremap.h: 165 0xffffffff913751c7 <skb_release_data+295>: cmp $0x4,%eax 0xffffffff913751ca <skb_release_data+298>: jne 0xffffffff9137511f <skb_release_data+127> /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/memremap.h: 166 0xffffffff913751d0 <skb_release_data+304>: mov 0x20(%rdi),%rax /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/memremap.h: 165 0xffffffff913751d4 <skb_release_data+308>: mov 0x38(%rax),%eax 0xffffffff913751d7 <skb_release_data+311>: sub $0x1,%eax 0xffffffff913751da <skb_release_data+314>: cmp $0x1,%eax 0xffffffff913751dd <skb_release_data+317>: ja 0xffffffff9137511f <skb_release_data+127> /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/mm.h: 850 0xffffffff913751e3 <skb_release_data+323>: callq 0xffffffff90dc3b40 <put_zone_device_private_or_public_page> 0xffffffff913751e8 <skb_release_data+328>: jmpq 0xffffffff9137512a <skb_release_data+138> /linux-iso/code/linux-dev/4.14.88-disklock/linux-4.14/./include/linux/skbuff.h: 1315 0xffffffff913751ed <skb_release_data+333>: callq 0xffffffff91374f90 <sock_zerocopy_put> 0xffffffff913751f2 <skb_release_data+338>: mov 0xcc(%r13),%eax 0xffffffff913751f9 <skb_release_data+345>: add 0xd0(%r13),%rax 0xffffffff91375200 <skb_release_data+352>: movzbl 0x3(%rax),%edx 0xffffffff91375204 <skb_release_data+356>: jmp 0xffffffff913751a6 <skb_release_data+262> 0xffffffff91375206 <skb_release_data+358>: nopw %cs:0x0(%rax,%rax,1)
从汇编代码可以看到,rdi被保存到了r13寄存器中,而r13寄存器在后面的过程中一直没有被修改过,所以r13寄存器里面的值,肯定就是skb的值。
RIP: ffffffffb4f75143 RSP: ffffabb086977cb8 RFLAGS: 00010206 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000001 RDX: 0000000000000002 RSI: 0000000000000040 RDI: 0000000000000106 RBP: ffffabb086977cd8 R8: 0000000000000000 R9: ffffffffb4f711c1 R10: ffffdd858089de00 R11: ffffffffb4c3a380 R12: ffff9612a277cef0 R13: ffff9611b2a3f100 R14: ffff9612a277cec0 R15: ffff9612b08c3540
根据skb_release_data函数的代码,计算可以得到 shinfo = skb->head + skb->end,解析shinfo->frag_list即可得到segs
crash> sk_buff.head ffff9611b2a3f100 head = 0xffff9612a277cc00 "\342J\343.\254b\275!" crash> sk_buff.end ffff9611b2a3f100 end = 704 crash> eval 0xffff9612a277cc00 + 704 hexadecimal: ffff9612a277cec0 <<<----- shinfo decimal: 18446627605512179392 (-116468197372224) octal: 1777774541124235747300 binary: 1111111111111111100101100001001010100010011101111100111011000000 crash> skb_shared_info.frag_list ffff9612a277cec0 frag_list = 0x106 <<<---- panic时的空指针,这样现象和内存里面的数据就对应起来了
结合内核__alloc_skb 函数代码可知,存放frag_list的skb_shared_info,是通过kmalloc分配。
struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, int flags, int node) { 。。。 /* Get the HEAD */ skb = kmem_cache_alloc_node(cache, gfp_mask & ~__GFP_DMA, node); if (!skb) goto out; prefetchw(skb); /* We do our best to align skb_shared_info on a separate cache * line. It usually works because kmalloc(X > SMP_CACHE_BYTES) gives * aligned memory blocks, unless SLUB/SLAB debug is enabled. * Both skb->head and skb_shared_info are cache line aligned. */ size = SKB_DATA_ALIGN(size); size += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); <<<---- 分配skb_shared_info的地方,head和end值通过计算得到
。。。
}
从代码可知,kmalloc_reserve的size为:sk_buff.end +sizeof(skb_shared_info)= 1024,即0xffff9612a277cc00为通过kmallc-1024
skb_shared_info的某个数据变成了0x106,出现这种情况又两种可能:
这种踩内存的问题不好界定到底是第一种场景,还是第二种场景导致的,除非打开slub_debug=FZPU开关复现,抓取kmallc-1024这个slab上次申请释放的堆栈,才能准确的确认到底是第一种场景,还是第二种场景。
为了找到另一个使用kmallc-1024的模块,强制读取0xffff9612a277cc00的内存,来看看规律。
crash> rd 0xffff9612a277cc00 128 ffff9612a277cc00: 21bd62ac2ee34ae2 ffff9612a277cc00 .J...b.!..w..... ffff9612a277cc10: 0000000101487d43 0000000000000000 C}H............. ffff9612a277cc20: 0000000000000000 0000000000000000 ................ ffff9612a277cc30: ffff9612b54c9920 0000000000000000 .L............. ffff9612a277cc40: 0000c8010000001c ffffffff00027003 .........p...... ffff9612a277cc50: 0000000000000001 ffffffff00000000 ................ ffff9612a277cc60: 00000000262a1800 0000000000000000 ..*&............ ffff9612a277cc70: ffff96111aa82d00 ffff9612a277cc78 .-......x.w..... ffff9612a277cc80: ffff9612a277cc78 ffff9612a277cc88 x.w.......w..... ffff9612a277cc90: 0000000000000000 0000000000000000 ................ ffff9612a277cca0: 0000000000000000 0000000000000000 ................ ffff9612a277ccb0: 0000000000000000 0000000000000000 ................ ffff9612a277ccc0: ffff9622bb065000 ffff9622b5f3f800 .P.."......."... ffff9612a277ccd0: 0000000101487861 04004e9614f2f7d4 axH..........N.. ffff9612a277cce0: ffff9612b54c9960 58ba00001d64d97e `.L.....~.d....X ffff9612a277ccf0: 6f030b60735b8152 3400004500080be0 R.[s`..o....E..4 ffff9612a277cd00: b5b5064000405d22 010b19267f0b1926 "]@.@...&...&... ffff9612a277cd10: 19962441bc0c1ac5 96ce10805e23199f ....A$....#^.... ffff9612a277cd20: 0a0801010000d862 c5b194890246180f b.........F..... ffff9612a277cd30: 0000000000000000 0000000000000000 ................ ffff9612a277cd40: 0000000000000000 020000182a26002a ........*.&*.... ffff9612a277cd50: 0000000000000000 ffff9612a277cd48 ........H.w..... ffff9612a277cd60: 0000000000000010 0000000000000000 ................ ffff9612a277cd70: 0000000000000000 ffff9612ba0f4500 .........E...... ffff9612a277cd80: ffff9612bb8a5800 0000000000000000 .X.............. ffff9612a277cd90: 0000000000000000 ffff9612a277cd98 ..........w..... ffff9612a277cda0: ffff9612a277cd98 0000000fffffffe0 ..w............. ffff9612a277cdb0: ffff9612a277cdb0 ffff9612a277cdb0 ..w.......w..... ffff9612a277cdc0: ffffffffb4e2ff70 0000000000000000 p............... ffff9612a277cdd0: 0000000000000000 0000000000000000 ................ ffff9612a277cde0: ffffffffb48a46c0 ffff9612a277cda8 .F........w..... ffff9612a277cdf0: 000000000020001c 0000000000000000 .. ............. ffff9612a277ce00: 0000000000000000 0000000000000000 ................ ffff9612a277ce10: 0000000000000000 0000000000000000 ................ ffff9612a277ce20: 0000000000000000 0000000101487861 ........axH..... ffff9612a277ce30: 0000000500000000 0000000a00000000 ................ ffff9612a277ce40: 0000000000000001 ffff9612a277cd48 ........H.w..... ffff9612a277ce50: 0000000000000000 0000000000000000 ................ ffff9612a277ce60: 0000000000000000 0000000000000000 ................ ffff9612a277ce70: 0000020000040000 ffff9612a277cc00 ..........w..... ffff9612a277ce80: ffff9612ba0f4500 ffffffffb4e33610 .E.......6...... ffff9612a277ce90: 0000000000000000 0000000000000000 ................ ffff9612a277cea0: 0000000000000000 0000000000000000 ................ ffff9612a277ceb0: 0000000000000000 0000000000000000 ................ ffff9612a277cec0: 0001000000000000 0000000000000106 ................ ffff9612a277ced0: 0000000000000000 0000000000000001 ................ ffff9612a277cee0: 0000000100000000 0000000000001000 ................ ffff9612a277cef0: ffffdd8580e53800 0000003000000000 .8..........0... ffff9612a277cf00: 0000000e89033000 0000000000001000 .0.............. ffff9612a277cf10: 0000000e89034000 0000000000001000 .@.............. ffff9612a277cf20: 0000000e89035000 0000000000001000 .P.............. ffff9612a277cf30: 0000000e89036000 0000000000001000 .`.............. ffff9612a277cf40: 0000000e89037000 0000000000001000 .p.............. ffff9612a277cf50: 0000000e891f0000 0000000000001000 ................ ffff9612a277cf60: 0000000e891f1000 0000000000001000 ................ ffff9612a277cf70: 0000000e891f2000 0000000000001000 . .............. ffff9612a277cf80: 0000000e891f3000 0000000000001000 .0.............. ffff9612a277cf90: 0000000e891f4000 0000000000001000 .@.............. ffff9612a277cfa0: 0000000e891f5000 0000000000001000 .P.............. ffff9612a277cfb0: 0000000e891f6000 0000000000001000 .`.............. ffff9612a277cfc0: 0000000e891f7000 0000000000001000 .p.............. ffff9612a277cfd0: 0000000eb2538000 0000000000003000 ..S......0...... ffff9612a277cfe0: ffffdd85b92e29c0 0000000000001000 .).............. ffff9612a277cff0: ffffdd85b9abb940 0000000000001000 @...............
根据经验可知,上面标红的地址都是一些函数地址,解析这种函数地址
crash> sym ffffffffb4e2ff70 ffffffffb4e2ff70 (T) scmd_eh_abort_handler /linux-iso/code/linux-dev/4.14.88/linux-4.14/drivers/scsi/scsi_error.c: 122 crash> sym ffffffffb48a46c0 ffffffffb48a46c0 (T) delayed_work_timer_fn /linux-iso/code/linux-dev/4.14.88/linux-4.14/kernel/workqueue.c: 1498 crash> sym ffffffffb4e33610 ffffffffb4e33610 (t) scsi_done /linux-iso/code/linux-dev/4.14.88/linux-4.14/drivers/scsi/scsi_lib.c: 1840
这几个函数地址都和io流程有关,直接搜索引用这几个函数地址的内核代码发现:
但是直接把0xffff9612a277cc00当成scsi_cmnd来解析,可以看到数据明显不对?难道之前的分析都是错的?
crash> scsi_cmnd 0xffff9612a277cc00 struct scsi_cmnd { req = { __cmd = "\342J\343.\254b\275!\000\314w\242\022\226\377\377", cmd = 0x101487d43 <Address 0x101487d43 out of bounds>, <<<----- scsi_cmd填写的也不对 cmd_len = 0, result = 0, sense_len = 0, resid_len = 0, retries = 0, sense = 0xffff9612b54c9920 }, device = 0x0, list = { next = 0xc8010000001c, <<<<---- 这种链表明显不对 prev = 0xffffffff00027003 }, eh_entry = { next = 0x1, <<<<----- 这种链表明显不对 prev = 0xffffffff00000000 }, abort_work = { work = { data = { counter = 640292864 }, entry = { next = 0x0, prev = 0xffff96111aa82d00 }, func = 0xffff9612a277cc78 }, 。。。 }
直接解析scsi_done函数也不对,这说明scsi_cmnd不是从0xffff9612a277cc00地址开始的。
crash> scsi_cmnd.scsi_done 0xffff9612a277cc00 scsi_done = 0x0
结合内核block和scsi层代码可知,scsi命令本身没有单独分配内存,而是公用block层给request分配的kmallc-1024(request占用0x148字节,scsi_cmd占用后面的字节)
因此将0xffff9612a277cc00当成request来解析,可以看到数据基本都是正确的
crash> request 0xffff9612a277cc00 struct request { 。。。 q = 0xffff9612b54c9920, mq_ctx = 0x0, cpu = 28, cmd_flags = 51201, rq_flags = 159747, internal_tag = -1, atomic_flags = 1, __data_len = 0, tag = -1, __sector = 640292864, bio = 0x0, biotail = 0xffff96111aa82d00, { hash = { next = 0xffff9612a277cc78, pprev = 0xffff9612a277cc78 }, ipi_list = { next = 0xffff9612a277cc78, prev = 0xffff9612a277cc78 } }, { rb_node = { __rb_parent_color = 18446627605512178824, rb_right = 0x0, rb_left = 0x0 }, special_vec = { bv_page = 0xffff9612a277cc88, bv_len = 0, bv_offset = 0 }, completion_data = 0xffff9612a277cc88, error_count = -1569207160 }, { elv = { icq = 0x0, priv = {0x0, 0x0} }, flush = { seq = 0, list = { next = 0x0, prev = 0x0 }, saved_end_io = 0x0 } }, rq_disk = 0xffff9622bb065000, part = 0xffff9622b5f3f800, start_time = 4316493921, issue_stat = { stat = 288316782655240148 }, rl = 0xffff9612b54c9960, start_time_ns = 6393422621498988926, io_start_time_ns = 7999249872020275538, nr_phys_segments = 3040, nr_integrity_segments = 8, ioprio = 69, timeout = 4218146, special = 0x10b19267f0b1926, extra_len = 3154909893, write_hint = 9281, deadline = 10866641094412802463, timeout_list = { next = 0xa0801010000d862, prev = 0xc5b194890246180f }, end_io = 0x0, end_io_data = 0x0, next_rq = 0x0 }
解析里面的request_queue可以看到也是对的
crash> request_queue 0xffff9612b54c9920 struct request_queue { queue_head = { next = 0xffff9612b54c9920, prev = 0xffff9612b54c9920 }, last_merge = 0x0, elevator = 0xffff9612b608d800, 。。。 request_fn = 0xffffffffb4e33d00, make_request_fn = 0xffffffffb4bcebc0, prep_rq_fn = 0xffffffffb4e34d70, unprep_rq_fn = 0xffffffffb4e30d20, softirq_done_fn = 0xffffffffb4e33b90, rq_timed_out_fn = 0xffffffffb4e2fd00, dma_drain_needed = 0x0, lld_busy_fn = 0xffffffffb4e33530, init_rq_fn = 0xffffffffb4e31de0, exit_rq_fn = 0xffffffffb4e31b60, initialize_rq_fn = 0xffffffffb4e31bb0, mq_ops = 0x0, mq_map = 0x0, 。。。
}
解析放在requeset后面的scsi_cmnd,也是对的0xffff9612a277cc00 + sizeof(request) = 0xffff9612a277cc00 + 0x148 = ffff9612a277cd48
crash> scsi_cmnd ffff9612a277cd48 struct scsi_cmnd { req = { __cmd = "*\000&*\030\000\000\002\000\000\000\000\000\000\000", cmd = 0xffff9612a277cd48 "*", cmd_len = 16, result = 0, sense_len = 0, resid_len = 0, retries = 0, sense = 0xffff9612ba0f4500 }, device = 0xffff9612bb8a5800, 。。。 timer = { entry = { next = 0x0, pprev = 0x0 }, expires = 0, function = 0xffffffffb48a46c0, data = 18446627605512179112, flags = 2097180 }, wq = 0x0, cpu = 0 }, 。。。。 request = 0xffff9612a277cc00, sense_buffer = 0xffff9612ba0f4500 "&\201\321lt\376", <incomplete sequence \363>, scsi_done = 0xffffffffb4e33610, 。。。 }
根据上面的信息可以确定,另一个引用故障slab的模块就是request
通过计算可以看到,修改的内容与slab开始相差:ffff9612a277cec8 - 0xffff9612a277cc00 = 0x2c8
crash> skb_shared_info.frag_list -xo ffff9612a277cec0 struct skb_shared_info { [ffff9612a277cec8] struct sk_buff *frag_list; }
对应到request里面偏移0x2c8的结构体为:
而struct request本身大小只有0x148,后面的空间用来放scsi_cmnd(内核这样设计应该是提升io路劲的性能)
crash> struct request -xo struct request { [0x0] struct list_head queuelist; 。。。 [0x138] void *end_io_data; [0x140] struct request *next_rq; } SIZE: 0x148
而scsi_cmnd中偏移0x2c8 – 0x148 = 0x180的数据为:scsi_pointer -> phase
crash> struct -xo scsi_cmnd struct scsi_cmnd { [0x0] struct scsi_request req; 。。。 [0x100] unsigned char *cmnd; [0x108] struct scsi_data_buffer sdb; [0x120] struct scsi_data_buffer *prot_sdb; [0x128] unsigned int underflow; [0x12c] unsigned int transfersize; [0x130] struct request *request; [0x138] unsigned char *sense_buffer; [0x140] void (*scsi_done)(struct scsi_cmnd *); [0x148] struct scsi_pointer SCp; <<<----- 0x180所在位置,减去0x148,可知其在scp中偏移0x38 [0x188] unsigned char *host_scribble; [0x190] int result; [0x194] int flags; [0x198] unsigned char tag; } SIZE: 0x1a0 crash> struct scsi_pointer -xo struct scsi_pointer { 。。。 [0x30] volatile int have_data_in; [0x34] volatile int sent_command; [0x38] volatile int phase; <<<<---- 对应到skb中的frag_list
}
phase在io下发的过程中,由底层raid卡驱动填写。在io的正常下发过程中,会将scsi_cmnd传递给底层raid卡驱动,而底层驱动也会将改指针保存下来,便于后续io回来时填写数据。如果底层驱动在io已经完成时,还在引用这个指针,就可能导致我们遇到的这个问题。
看了一下底层raid卡驱动是用的aacraid,代码中存在多处将phase修改为0x106的地方
aacraid.h:2633:#define AAC_OWNER_FIRMWARE 0x106 aachba.c:597: scsicmd->SCp.phase = AAC_OWNER_FIRMWARE; aachba.c:727: scsicmd->SCp.phase = AAC_OWNER_FIRMWARE; aachba.c:767: scsicmd->SCp.phase = AAC_OWNER_FIRMWARE; aachba.c:1134: scsicmd->SCp.phase = AAC_OWNER_FIRMWARE; aachba.c:2342: scsicmd->SCp.phase = AAC_OWNER_FIRMWARE; aachba.c:2434: scsicmd->SCp.phase = AAC_OWNER_FIRMWARE; aachba.c:2516: if (cmd->SCp.phase == AAC_OWNER_FIRMWARE) { aachba.c:2591: scsicmd->SCp.phase = AAC_OWNER_FIRMWARE; aachba.c:2668: scsicmd->SCp.phase = AAC_OWNER_FIRMWARE; aachba.c:3693: scsicmd->SCp.phase = AAC_OWNER_FIRMWARE; aachba.c:3738: scsicmd->SCp.phase = AAC_OWNER_FIRMWARE;
这个时候恰好家里环境添加 slub_debug=FZPU开关后,也复现了该问题,下面为抓到的日志:
Jan 12 16:11:01 cvk126 kernel: [57117.948006] ============================================================================= Jan 12 16:11:01 cvk126 kernel: [57117.948014] BUG kmalloc-1024 (Tainted: G B OE ): Poison overwritten Jan 12 16:11:01 cvk126 kernel: [57117.948016] ----------------------------------------------------------------------------- Jan 12 16:11:01 cvk126 kernel: [57117.948016] Jan 12 16:11:01 cvk126 kernel: [57117.948019] INFO: 0xffff9561784602d0-0xffff9561784602d3. First byte 0x6 instead of 0x6b Jan 12 16:11:01 cvk126 kernel: [57117.948028] INFO: Allocated in alloc_request_size+0x2b/0x70 age=36 cpu=1 pid=3011 //分配的堆栈 Jan 12 16:11:01 cvk126 kernel: [57117.948034] __slab_alloc+0x1c/0x30 Jan 12 16:11:01 cvk126 kernel: [57117.948036] __kmalloc_node+0xb7/0x2a0 Jan 12 16:11:01 cvk126 kernel: [57117.948038] alloc_request_size+0x2b/0x70 Jan 12 16:11:01 cvk126 kernel: [57117.948045] mempool_alloc+0x67/0x1a0 Jan 12 16:11:01 cvk126 kernel: [57117.948049] get_request+0x3b6/0x770 Jan 12 16:11:01 cvk126 kernel: [57117.948052] blk_queue_bio+0x114/0x400 Jan 12 16:11:01 cvk126 kernel: [57117.948054] generic_make_request+0x123/0x300 Jan 12 16:11:01 cvk126 kernel: [57117.948056] submit_bio+0x6c/0x140 Jan 12 16:11:01 cvk126 kernel: [57117.948075] ext4_writepages+0x463/0xf40 Jan 12 16:11:01 cvk126 kernel: [57117.948078] do_writepages+0x4b/0xe0 Jan 12 16:11:01 cvk126 kernel: [57117.948080] __filemap_fdatawrite_range+0xc1/0x100 Jan 12 16:11:01 cvk126 kernel: [57117.948083] INFO: Freed in __blk_put_request+0xb7/0x190 age=36 cpu=3 pid=0 //释放的堆栈 Jan 12 16:11:01 cvk126 kernel: [57117.948085] __blk_put_request+0xb7/0x190 Jan 12 16:11:01 cvk126 kernel: [57117.948091] scsi_end_request+0xf7/0x1e0 Jan 12 16:11:01 cvk126 kernel: [57117.948093] scsi_io_completion+0x237/0x650 Jan 12 16:11:01 cvk126 kernel: [57117.948096] blk_done_softirq+0x8e/0xc0 Jan 12 16:11:01 cvk126 kernel: [57117.948101] __do_softirq+0xde/0x2b3 Jan 12 16:11:01 cvk126 kernel: [57117.948105] irq_exit+0xae/0xb0 Jan 12 16:11:01 cvk126 kernel: [57117.948129] start_secondary+0x1ac/0x200 Jan 12 16:11:01 cvk126 kernel: [57117.948132] secondary_startup_64+0xa5/0xb0 Jan 12 16:11:01 cvk126 kernel: [57117.948135] INFO: Slab 0xffffdf1680e11800 objects=24 used=24 fp=0x (null) flags=0x17ffffc0008100 Jan 12 16:11:01 cvk126 kernel: [57117.948137] INFO: Object 0xffff956178460008 @offset=8 fp=0xffff956178461fe8 Jan 12 16:11:01 cvk126 kernel: [57117.948137] Jan 12 16:11:01 cvk126 kernel: [57117.948140] Redzone ffff956178460000: bb bb bb bb bb bb bb bb ........ Jan 12 16:11:01 cvk126 kernel: [57117.948142] Object ffff956178460008: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Jan 12 16:11:01 cvk126 kernel: [57117.948145] Object ffff956178460018: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 。。。。 Jan 12 16:11:01 cvk126 kernel: [57117.948234] Object ffff9561784602a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Jan 12 16:11:01 cvk126 kernel: [57117.948236] Object ffff9561784602b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk Jan 12 16:11:01 cvk126 kernel: [57117.948238] Object ffff9561784602c8: 6b 6b 6b 6b 6b 6b 6b 6b 06 01 00 00 6b 6b 6b 6b kkkkkkkk....kkkk //检查到被改的内容 Jan 12 16:11:01 cvk126 kernel: [57117.948240] Object ffff9561784602d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 。。。。。 Jan 12 16:11:01 cvk126 kernel: [57117.948279] Object ffff9561784603f8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkkkkkkkkkk. Jan 12 16:11:01 cvk126 kernel: [57117.948282] Redzone ffff956178460408: bb bb bb bb bb bb bb bb ........ Jan 12 16:11:01 cvk126 kernel: [57117.948284] Padding ffff956178460548: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ Jan 12 16:11:01 cvk126 kernel: [57117.948287] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G B OE 4.14.0-generic #1 Jan 12 16:11:01 cvk126 kernel: [57117.948288] Hardware name: New H3C Technologies Co., Ltd. UIS R390X G2/RS32M2C9S, BIOS 1.00.23P01 07/07/2017 Jan 12 16:11:01 cvk126 kernel: [57117.948288] Call Trace: Jan 12 16:11:01 cvk126 kernel: [57117.948289] <IRQ> Jan 12 16:11:01 cvk126 kernel: [57117.948294] dump_stack+0x5c/0x85 //释放后被分配给了skb_buff,再分配的时候检查到了内存有被修改过 Jan 12 16:11:01 cvk126 kernel: [57117.948295] check_bytes_and_report+0xbc/0xe0 Jan 12 16:11:01 cvk126 kernel: [57117.948297] check_object+0x230/0x2a0 Jan 12 16:11:01 cvk126 kernel: [57117.948301] ? pskb_expand_head+0x71/0x2a0 Jan 12 16:11:01 cvk126 kernel: [57117.948302] alloc_debug_processing+0x106/0x1b0 Jan 12 16:11:01 cvk126 kernel: [57117.948303] ___slab_alloc+0x441/0x4b0 Jan 12 16:11:01 cvk126 kernel: [57117.948304] ? pskb_expand_head+0x71/0x2a0 Jan 12 16:11:01 cvk126 kernel: [57117.948305] ? ___slab_alloc+0x479/0x4b0 Jan 12 16:11:01 cvk126 kernel: [57117.948306] ? ___slab_alloc+0x479/0x4b0 Jan 12 16:11:01 cvk126 kernel: [57117.948307] ? pskb_expand_head+0x71/0x2a0 Jan 12 16:11:01 cvk126 kernel: [57117.948308] ? __slab_alloc+0x1c/0x30 Jan 12 16:11:01 cvk126 kernel: [57117.948314] ? __nla_reserve+0x38/0x50 Jan 12 16:11:01 cvk126 kernel: [57117.948315] __slab_alloc+0x1c/0x30 Jan 12 16:11:01 cvk126 kernel: [57117.948317] __kmalloc_node_track_caller+0xb9/0x2c0 Jan 12 16:11:01 cvk126 kernel: [57117.948339] __kmalloc_reserve.isra.43+0x2e/0x80 Jan 12 16:11:01 cvk126 kernel: [57117.948340] pskb_expand_head+0x71/0x2a0 Jan 12 16:11:01 cvk126 kernel: [57117.948344] netlink_trim+0xa8/0xc0 Jan 12 16:11:01 cvk126 kernel: [57117.948346] netlink_unicast+0x4c/0x240 Jan 12 16:11:01 cvk126 kernel: [57117.948352] queue_userspace_packet+0x2d4/0x5c0 [openvswitch] Jan 12 16:11:01 cvk126 kernel: [57117.948355] ovs_dp_upcall+0x50/0x60 [openvswitch] Jan 12 16:11:01 cvk126 kernel: [57117.948357] ovs_dp_process_packet+0x12b/0x150 [openvswitch] Jan 12 16:11:01 cvk126 kernel: [57117.948361] ovs_vport_receive+0x6c/0xc0 [openvswitch] Jan 12 16:11:01 cvk126 kernel: [57117.948362] ? secondary_startup_64+0xa5/0xb0 Jan 12 16:11:01 cvk126 kernel: [57117.948366] ? arp_process+0x13c/0x820 Jan 12 16:11:01 cvk126 kernel: [57117.948367] ? set_track+0xb5/0x130 Jan 12 16:11:01 cvk126 kernel: [57117.948372] ? init_object+0x76/0x90 Jan 12 16:11:01 cvk126 kernel: [57117.948378] ? get_stack_info+0x37/0x160 Jan 12 16:11:01 cvk126 kernel: [57117.948379] ? common_interrupt+0x7d/0x7d Jan 12 16:11:01 cvk126 kernel: [57117.948384] ? __module_text_address+0xe/0x60 Jan 12 16:11:01 cvk126 kernel: [57117.948388] ? start_kernel+0x4db/0x4fb Jan 12 16:11:01 cvk126 kernel: [57117.948393] ? is_bpf_text_address+0xa/0x20 Jan 12 16:11:01 cvk126 kernel: [57117.948399] ? unwind_next_frame+0xf4/0x660 Jan 12 16:11:01 cvk126 kernel: [57117.948401] ? set_track+0xb5/0x130 Jan 12 16:11:01 cvk126 kernel: [57117.948402] ? init_object+0x76/0x90 Jan 12 16:11:01 cvk126 kernel: [57117.948405] netdev_frame_hook+0x9e/0x140 [openvswitch] Jan 12 16:11:01 cvk126 kernel: [57117.948407] ? ___slab_alloc+0x479/0x4b0 Jan 12 16:11:01 cvk126 kernel: [57117.948410] __netif_receive_skb_core+0x25c/0xb50 Jan 12 16:11:01 cvk126 kernel: [57117.948412] ? rcu_process_callbacks+0x1b2/0x4c0 Jan 12 16:11:01 cvk126 kernel: [57117.948414] ? __build_skb+0x25/0xe0 Jan 12 16:11:01 cvk126 kernel: [57117.948416] ? netif_receive_skb_internal+0x34/0xd0 Jan 12 16:11:01 cvk126 kernel: [57117.948417] netif_receive_skb_internal+0x34/0xd0 Jan 12 16:11:01 cvk126 kernel: [57117.948419] napi_gro_receive+0xb8/0xe0 Jan 12 16:11:01 cvk126 kernel: [57117.948424] igb_poll+0x466/0xe70 [igb] Jan 12 16:11:01 cvk126 kernel: [57117.948426] ? common_interrupt+0x7d/0x7d Jan 12 16:11:01 cvk126 kernel: [57117.948428] net_rx_action+0x149/0x3c0 Jan 12 16:11:01 cvk126 kernel: [57117.948429] __do_softirq+0xde/0x2b3 Jan 12 16:11:01 cvk126 kernel: [57117.948431] irq_exit+0xae/0xb0 Jan 12 16:11:01 cvk126 kernel: [57117.948432] do_IRQ+0x81/0xd0 Jan 12 16:11:01 cvk126 kernel: [57117.948434] common_interrupt+0x7d/0x7d Jan 12 16:11:01 cvk126 kernel: [57117.948434] </IRQ> Jan 12 16:11:01 cvk126 kernel: [57117.948436] RIP: 0010:cpuidle_enter_state+0xa2/0x2e0 Jan 12 16:11:01 cvk126 kernel: [57117.948437] RSP: 0018:ffffffff89e03e90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffb4 Jan 12 16:11:01 cvk126 kernel: [57117.948439] RAX: ffff95617f822cc0 RBX: 000033f2ceffe9f3 RCX: 000000000000001f Jan 12 16:11:01 cvk126 kernel: [57117.948440] RDX: 000033f2ceffe9f3 RSI: ffff64404f9e7f32 RDI: 0000000000000000 Jan 12 16:11:01 cvk126 kernel: [57117.948440] RBP: ffffd42b400010b0 R08: 0000000000000e06 R09: 0000000000000c1b Jan 12 16:11:01 cvk126 kernel: [57117.948441] R10: ffffffff89e03e70 R11: 0000000000000b28 R12: 0000000000000004 Jan 12 16:11:01 cvk126 kernel: [57117.948442] R13: ffffffff89f7af58 R14: 0000000000000000 R15: 000033f2ced8b4bf Jan 12 16:11:01 cvk126 kernel: [57117.948444] do_idle+0x181/0x1e0 Jan 12 16:11:01 cvk126 kernel: [57117.948446] cpu_startup_entry+0x6f/0x80 Jan 12 16:11:01 cvk126 kernel: [57117.948447] start_kernel+0x4db/0x4fb Jan 12 16:11:01 cvk126 kernel: [57117.948449] secondary_startup_64+0xa5/0xb0 Jan 12 16:11:01 cvk126 kernel: [57117.948451] FIX kmalloc-1024: Restoring 0xffff9561784602d0-0xffff9561784602d3=0x6b //恢复修改的内存 Jan 12 16:11:01 cvk126 kernel: [57117.948451] Jan 12 16:11:01 cvk126 kernel: [57117.948454] FIX kmalloc-1024: Marking all objects used
根据堆栈并结合之前的分析,故障导致的根因应该是request释放后,还在被底层aacraid引用修改了scsi_cmd里面的phase。
分析到这儿感觉故障有很大的概率是aacraid卡驱动导致的,因此升级aacraid为厂商提供的最新驱动,长时间测试故障不再复现。
标签:情况 sem com nat vol nvm 遇到 应该 ola
原文地址:https://www.cnblogs.com/wendaorenwei/p/10524893.html