【线上维护的资料】中供JVM
Crash命令列表top -H -b -n 1 -p
$pidecho "ibase=10;obase=16;$pid"
| bcgrep xx
jstack.logps auxf | grep
$gidnetstat -nal | grep
1521lsof -i:1521jpsjstat -gcutil $pid
1000jstack $pid >
xx.logjmap -heap
$pidjmap
-dump:format=b,file=heap.bin $pid1.
故障反馈机制(单独去讲,最近的故障分析,需要有人定期Review团队故障例子)2.
Dump脚本(康康已经发布)3. 分析环境(胖胖搞好了)4. 典型Case4.1 JVM Crash排查步骤Crash日志位置:zeus:/home/admin/output/java_error8921.log;Martini:/home/admin/output/logs/jetty/jetty_stdout.logjstack案例说明Martini P2故障老中供JVM Bug,热点编译+GC Bug。java
version 1.6.0_11, 1.6.0_184.2 OOM 异常dump.sh(jstack,
bin)Martini图片、消息盒子Load大数据4.3 CPU Load飙升sudo -u admin top -H -b -n
1 -p $pid4.4 连接池爆满netstat -nal | grep
1521lsof -i:xxxx公共部分:echo
"ibase=10;obase=16;$pid" | bcjstack pid >
jstack.loggrep 16进制的pid -i
jstack.log4.5 文件修订jar tvf web.war | grep
esb-client-provider.xmljar xvf web.war
WEB-INF/esb/esb-client-provider.xmlvim
WEB-INF/esb/esb-client-provider.xmljar uvf web.war
WEB-INF/esb/esb-client-provider.xml不给出固定脚本的意图在于,大家确认自己知道在做哪些事情。编写一个观察线上应用启动过程的脚本[huiling.fenghl@cm3a14-crm-martini-admin-xen admin]$ cd
tmp[huiling.fenghl@cm3a14-crm-martini-admin-xen tmp]$ mkdir -p
/home/huiling.fenghl/start_log_02[huiling.fenghl@cm3a14-crm-martini-admin-xen tmp]$ sh startUp.sh
/home/huiling.fenghl/start_log_011.
观察线程存活时间,并排序cat top.* | awk ‘{print
$11}‘ | sort -n -r | uniq | head2:33.082. 观察线程出现次数cat top.* | awk ‘{print
$1}‘ | sort | uniq -c | sort -n -r| head -n503. 观察线程对应名称cat top.* | awk ‘{print
$1}‘ | sort | uniq -c | sort -n -r| head -n200 取线程出现次数,按最大排序|awk ‘{print $2}‘ | xargs
--replace echo "ibase=10;obase=16;{}" | bc 按线程id换算16进制线程标识| xargs --replace grep
"0x{}" -i all.jstack -m1 取得线程标识对应的线程名称4.
根据上述结果,汇集一个视图1). pid count
出现次数2). pid Time
存活时间3). pid 0x16进制标识-- >
thread name4). merge 结果5.
startUp.sh内容[huiling.fenghl@cm3a14-crm-martini-admin-xen tmp]$ cat
startUp.sh#!/bin/bashpid=`sudo -u admin jps |
awk ‘{if($2=="start.jar" || ($2=="Main")){print $1}}‘`log_path=$1min=0while(($min<60))do echo
"$pid: $min" name=`date
+%Y-%m-%d-%H-%M-%S` sudo
-u admin jstack -l $pid > $log_path/jstack.$name.log sudo
-u admin top -H -b -n 1 -p $pid > $log_path/top.$name.log sleep
5 min=$(($min+1))doneecho
"$pid"原文地址:http://www.cnblogs.com/xuelu/p/3731949.html