Trace file /u01/oracle/diag/rdbms/cn100/cn100/trace/cn100_ora_10848.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
ORACLE_HOME = /u01/oracle/product/11.2.0
System name: Linux
Node name: 192oracle.cn100.com
Release: 2.6.32-358.el6.x86_64
Version: #1 SMP Fri Feb 22 00:31:26 UTC 2013
Machine: x86_64
Instance name: cn100
Redo thread mounted by this instance: 1
Oracle process number: 26
Unix process pid: 10848, image: oracle@192oracle.cn100.com (TNS V1-V3)
*** 2014-07-01 11:16:36.260
*** SESSION ID:(61.13360) 2014-07-01 11:16:36.260
*** CLIENT ID:() 2014-07-01 11:16:36.260
*** SERVICE NAME:(SYS$USERS) 2014-07-01 11:16:36.260
*** MODULE NAME:(SQL*Plus) 2014-07-01 11:16:36.260
*** ACTION NAME:() 2014-07-01 11:16:36.260
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3,
mask=0x0)
----- Error Stack Dump -----
ORA-01438: value larger than specified precision allowed for this column
----- Current SQL Statement for this session (sql_id=b8n03s73k7d39) -----
--可以看到,当前SQL就在这一行下面
INSERT INTO DH_T VALUES (:B2 ,:B1 )
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
object line object
handle number name
0x1075fcd10 6 procedure DBMON.P_DH1
0xfcfaebe8 7 procedure DBMON.P_DH2
0x10e7d6420 1 anonymous block
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+36 call kgdsdst() 000000000 ? 000000000 ?
7FFF332C8AD8 ? 000000001 ?
7FFF332CCFD8 ? 000000000 ?
......为了排版,后续省略......
Trace file /u01/oracle/diag/rdbms/cn100/cn100/trace/cn100_ora_10848.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
ORACLE_HOME = /u01/oracle/product/11.2.0
System name: Linux
Node name: 192oracle.cn100.com
Release: 2.6.32-358.el6.x86_64
Version: #1 SMP Fri Feb 22 00:31:26 UTC 2013
Machine: x86_64
Instance name: cn100
Redo thread mounted by this instance: 1
Oracle process number: 26
Unix process pid: 10848, image: oracle@192oracle.cn100.com (TNS V1-V3)
*** 2014-07-01 11:16:36.260
*** SESSION ID:(61.13360) 2014-07-01 11:16:36.260
*** CLIENT ID:() 2014-07-01 11:16:36.260
*** SERVICE NAME:(SYS$USERS) 2014-07-01 11:16:36.260
*** MODULE NAME:(SQL*Plus) 2014-07-01 11:16:36.260
*** ACTION NAME:() 2014-07-01 11:16:36.260
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3,
mask=0x0)
----- Error Stack Dump -----
ORA-01438: value larger than specified precision allowed for this column
----- Current SQL Statement for this session (sql_id=b8n03s73k7d39) -----
--可以看到,当前SQL就在这一行下面
INSERT INTO DH_T VALUES (:B2 ,:B1 )
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
object line object
handle number name
0x1075fcd10 6 procedure DBMON.P_DH1
0xfcfaebe8 7 procedure DBMON.P_DH2
0x10e7d6420 1 anonymous block
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+36 call kgdsdst() 000000000 ? 000000000 ?
7FFF332C8AD8 ? 000000001 ?
7FFF332CCFD8 ? 000000000 ?
......为了排版,后续省略......
注意上面跟踪文件中PL/SQL标红那一部分,就是我们关注的部分。
如果进行errorstack跟踪式,跟踪进程执行的是一个PL/SQL调用,那么PL/SQL调用堆也将被跟踪下来(在PL/SQL Call Stack部分)。这部分告诉你错误发生时Oracle在执行具体哪个PL/SQL过程(包or函数)以及errorstack跟踪过程中的具体哪个调用发生错误。这对我们诊断问题非常有帮助
PL/SQL Call Stack包含三列,如下
- object handle
- line number
- object name
下面我们一一介绍这三列的含义:
1、object handle
object handle是这个对象(PL/SQL过程、包、函数、匿名块)被load进library cache中的内存地址,可以通过这个映射地址和X$KGLOB.KGLHDADR表列关联起来以发现那个对象是正在被处理。如下
SQL> select kglnaown,kglnaobj,kglhdadr from X$KGLOB a where KGLHDADR=‘00000001075FCD10‘;
KGLNAOWN KGLNAOBJ KGLHDADR
---------- ---------- ----------------
DBMON P_DH1 00000001075FCD10
2、line number
这个是非常重要的信息,它将告诉你当errorstack调用发生时正在执行的PL/SQL代码(可以定位到代码中的具体行)。例如,在如上的输出中,在这个匿名块的第1行调用了DBMON.P_DH2存储过程,而DBMON.P_DH2存储过程在第7行调用了另外一个存储过程DBMON.P_DH1,当errorstack跟踪发生时正在执行DBMON.P_DH2存储过程中的第6行代码。
3、object name
PL/SQL存储的对象名(或者匿名块,当对象并没有存储在一个过程中),如果是匿名块(匿名块的文本可以通过V$SQL发现),你可以关联这个地址和V$SQL.ADDRESS来发现匿名块的文本信息。
以上的PL/SQL call stack仅仅包含三行。
0x1075fcd10 6 procedure DBMON.P_DH1
0xfcfaebe8 7 procedure DBMON.P_DH2
0x10e7d6420 1 anonymous
block
应该从下而上来阅读一个PL/SQL call stack,例如
1.底部的行可以告诉我们正在执行一个匿名块以及在这个匿名块的第一行,它在调用DBMON.P_DH2存储过程
2.第二行可以告诉我们DBMON.P_DH2存储过程在第7行调用了另外一个存储过程DBMON.P_DH1
3.DBMON.P_DH2存储过程中的第6行代码出现错误,errorstack信息被转储。
通过查询DBA_SOURCE,我们可以与errorstack跟踪文件中的PL/SQL call stack部分信息进行验证,如下。
SQL> select line, text from dba_source where owner = ‘DBMON‘ and name = ‘P_DH2‘ order by line asc;
LINE TEXT
---------- ------------------------------------------------------------
1 procedure p_dh2 as
2 v_cnt number;
3 begin
4 ----just for errorstack test
5 select count(*) into v_cnt from dh_t;
6 dbms_output.put_line(‘the dh_t count is ‘||v_cnt);
7 p_dh1;
8 end;
9
9 rows selected.
SQL> select line, text from dba_source where owner = ‘DBMON‘ and name = ‘P_DH1‘ order by line asc;
LINE TEXT
---------- ------------------------------------------------------------
1 procedure p_dh1 as
2 v_id number :=1234335;
3 v_name varchar2(200) :=‘oradh‘;
4 begin
5 --just for errorstack test
6 insert into dh_t values (v_id,v_name);
7 commit;
8 end;
9
9 rows selected.
你可以发现会话正在执行的PL/SQL第6行(一个insert语句导致错误)。
通常,当error dump,crash,hang发生时(顶部的行是”parent" function递归调用的“child”function正在执行的代码),PL/SQL errorstack告诉我们精确的PL/SQL code。
3、从Errorstack跟踪文件中发现当前bind variable value
为什么找到具体的语句后,我们还需要寻找具体的绑定变量值??可以归纳为如下四种原因
- 一个会话可能以某种方式变的非常消耗CPU,并且会话的wait等待时间没有任何意义.
- 你需要调查什么SQL正在被执行,并且你需要查看SQL带有的绑定变量
- SQL的执行计划是正常的,但是性能却非常低下
- 可以假设当某些表或者行源变的大的时候,存在数据倾斜,CBO没有计算出正确的执行计划。
因此,你需要知道当问题发生时SQL使用的绑定变量是什么,不幸的是Oracle中并没有一个V$视图让我们去查看某个session的当前绑定变量值。V$SQL_BIND_CAPTURE视图仅仅随机的采样绑定变量值,并不存储所有的被使用的绑定变量值,而dbms_xplan.display_cursor中显示的也只是第一次窥探的绑定变量值.
Oracle 11gR2中实时的SQL Monitoring特性能够达到此目的。在V$SQL_MONITOR中有一列BIND_XML,此列包含正在运行的足够长时间(默认占用CPU超过5s的SQL,都会出现在次视图中)的bind variable values.但是这个只有在11gR2并且具有Diag+Tuning pack licenses时才有效。
注意:由于SQL语句的绑定变量值存在于进程的PGA中的私有内存中,因此不能轻易的跟踪另一个进程私有内存。errorstack跟踪文件中中包含CURSORDUMP,也就包含我们想要得到的bind variable value。
我们继续看开始的跟踪文件,如下
Trace file /u01/oracle/diag/rdbms/cn100/cn100/trace/cn100_ora_10848.trc
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Oracle Label Security, OLAP, Data Mining,
Oracle Database Vault and Real Application Testing options
ORACLE_HOME = /u01/oracle/product/11.2.0
System name: Linux
Node name: 192oracle.cn100.com
Release: 2.6.32-358.el6.x86_64
Version: #1 SMP Fri Feb 22 00:31:26 UTC 2013
Machine: x86_64
Instance name: cn100
Redo thread mounted by this instance: 1
Oracle process number: 26
Unix process pid: 10848, image: oracle@192oracle.cn100.com (TNS V1-V3)
*** 2014-07-01 11:16:36.260
*** SESSION ID:(61.13360) 2014-07-01 11:16:36.260
*** CLIENT ID:() 2014-07-01 11:16:36.260
*** SERVICE NAME:(SYS$USERS) 2014-07-01 11:16:36.260
*** MODULE NAME:(SQL*Plus) 2014-07-01 11:16:36.260
*** ACTION NAME:() 2014-07-01 11:16:36.260
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3,
mask=0x0)
----- Error Stack Dump -----
ORA-01438: value larger than specified precision allowed for this column
----- Current SQL Statement for this session (sql_id=b8n03s73k7d39) -----
--可以看到,当前SQL就在这一行下面
INSERT INTO DH_T VALUES (:B2 ,:B1 )
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
object line object
handle number name
0x1075fcd10 6 procedure DBMON.P_DH1
0xfcfaebe8 7 procedure DBMON.P_DH2
0x10e7d6420 1 anonymous block
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
skdstdst()+36 call kgdsdst() 000000000 ? 000000000 ?
7FFF332C8AD8 ? 000000001 ?
7FFF332CCFD8 ? 000000000 ?
......为了排版,后续省略......
打开跟踪文件,通常第一步做的是搜索第一个"Session Cursor Dump",当搜索它的时候,将看到如下的输出
----- Session Cursor Dump -----
Current cursor: 2, pgadep=1
Open cursors(pls, sys, hwm, max): 3(1, 1, 64, 1000)
NULL=0 SYNTAX=0 PARSE=0 BOUND=3 FETCH=0 ROW=0
Cached frame pages(total, free):
4k(11, 8), 8k(0, 0), 16k(0, 0), 32k(0, 0)
----- Current Cursor -----
xsc=0x7f5227898580 ctx=0xf92d7aa8 pgactx=0xf92d7aa8 ctxcbk=0xf92d74f8 ctxqbc=(nil)
ctxrws=0x10293a4c0
----- Explain Plan Dump -----
----- Compact Format (Stream) -----
Dumping stream from 0 to 22
----------------------------------
0000: 143 137 23 1 1 0 105 0 0 0 1 0 0 0 0 21 9 4 6 7
......i.............
0020: 0 142 ..
----- Plan Table -----
============
Plan Table
============
--------------------------------------------+-----------------------------------+
| Id | Operation | Name | Rows | Bytes | Cost | Time
|
--------------------------------------------+-----------------------------------+
| 0 | INSERT STATEMENT | | | | 1 | |
| 1 | LOAD TABLE CONVENTIONAL | | | | | |
--------------------------------------------+-----------------------------------+
Content of other_xml column
===========================
db_version : 11.2.0.1
parse_schema : DBMON
plan_hash : 0
plan_hash_2 : 0
Compilation Environment Dump
optimizer_mode_hinted = false
optimizer_features_hinted = 0.0.0
parallel_execution_enabled = true
parallel_query_forced_dop = 0
......为了排版,后续省略......
我们可以发现如下有用的内容
- Current cursor: 2,这个告诉我们在这个会话的UGA打开的cursor中,当前正在执行Cursor# 6 游标
- the pgadep:1变量告诉我们PGA depth,也就是这个查询执行的递归的深度
如果pgadep为0,它意味着这个查询是一个top-level查询,正在被用户或者应用通过OCI接口在执行。
pgadep为1,意味着它是一个递归的查询,通过递归程序接口(RPI)来执行,可能是数据字典查询或者仅仅通过PL/SQL调用执行的SQL。
所以,能够发现当前正在执行的当前查询的绑定变量值,我们需要做的是在trace file中朝前搜索Cursor2#,如下
注意:这个搜索词是大小写敏感的。
Cursor#2(0x7f5227951aa0) state=BOUND curiob=0x7f5227898580
curflg=cd fl2=0 par=(nil) ses=0x129a8edc0
----- Dump Cursor sql_id=b8n03s73k7d39
xsc=0x7f5227898580 cur=0x7f5227951aa0 -----
LibraryHandle: Address=f68ce8d8 Hash=c723b469 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
ObjectName: Name=INSERT INTO DH_T VALUES (:B2 ,:B1 )
FullHashValue=38f6bac85b76f427b45003c1c723b469 Namespace=SQL AREA(00) Type=CURSOR(00)
Identifier=3341005929 OwnerIdn=148
Statistics: InvalidationCount=0 ExecutionCount=1 LoadCount=2 ActiveLocks=1 TotalLockCount=1 TotalPinCount=1
Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=1 KeepHandle=1 BucketInUse=0 HandleInUse=0
Concurrency: DependencyMutex=f68ce988(0, 1, 0, 0) Mutex=f68cea00(61, 19, 0, 6)
Flags=RON/PIN/TIM/PN0/DBN/[10012841]
WaitersLists:
Lock=f68ce968[f68ce968,f68ce968]
Pin=f68ce978[f68ce948,f68ce948]
Timestamp: Current=07-01-2014 11:16:35
LibraryObject: Address=f6b87d18 HeapMask=0000-0001-0001 Flags=EXS[0000]
Flags2=[0000] PublicFlags=[0000]
ChildTable: size=‘16‘
Child: id=‘0‘ Table=f6b88bc8 Reference=f6b88668 Handle=fab7b4c0
Children:
Child: childNum=‘0‘
LibraryHandle: Address=fab7b4c0 Hash=0 LockMode=N PinMode=S LoadLockMode=0 Status=VALD
Name: Namespace=SQL AREA(00) Type=CURSOR(00)
Statistics: InvalidationCount=0 ExecutionCount=1 LoadCount=1 ActiveLocks=1 TotalLockCount=1 TotalPinCount=2
Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=0 KeepHandle=0 BucketInUse=0 HandleInUse=0
Concurrency: DependencyMutex=fab7b570(0, 0, 0, 0) Mutex=f68cea00(61, 19, 0, 6)
Flags=RON/PIN/PN0/EXP/[10012111]
WaitersLists:
Lock=fab7b550[fab7b550,fab7b550]
Pin=fab7b560[fab7b530,fab7b530]
LibraryObject: Address=f6569b20 HeapMask=0000-0001-0001 Flags=EXS[0000]
Flags2=[0000] PublicFlags=[0000]
DataBlocks:
Block: #=‘0‘ name=CCUR^c723b469 pins=0 Change=NONE
Heap=f6a0b38 Pointer=f6569c08 Extent=f6569aa0 Flags=I/-/P/A/-/-
FreedLocation=0 Alloc=3.093750 Size=3.937500 LoadTime=11473268840
Block: #=‘6‘ name=SQLA^c723b469 pins=0 Change=NONE
Heap=f6b88438 Pointer=f92d7aa8 Extent=f92d6e48 Flags=I/-/P/A/-/E
FreedLocation=0 Alloc=8.890625 Size=11.859375 LoadTime=0
NamespaceDump:
Child Cursor: Heap0=0xf6569c08 Heap6=0xf92d7aa8 Heap0 Load Time=07-01-2014 11:16:35 Heap6 Load Time=07-01-2014 11:16:35
NamespaceDump:
Parent Cursor: sql_id=b8n03s73k7d39 parent=0xf6b87e00 maxchild=1 plk=y
ppn=n kkscs=0xf6b88308 nxt=(nil) flg=18 cld=0 hd=0xfab7b4c0
par=0xf6b87e00
Mutex 0xf6b88308(0, 0) idn 3000000000
ct=0 hsh=0 unp=(nil) unn=0 hvl=f6b88c60
nhv=0 ses=(nil)
hep=0xf6b88398 flg=80 ld=1 ob=0xf6569b20 ptr=0xf92d7aa8
fex=0xf92d6e48
cursor instantiation=0x7f5227898580 used=1404184595 exec_id=16777216 exec=1
child#0(0xfab7b4c0) pcs=0xf6b88308
clk=0x10d6111e0 ci=0xf6569c08 pn=0xfdf4c890 ctx=0xf92d7aa8
kgsccflg=1 llk[0x7f5227898588,0x7f5227898588] idx=6a
xscflg=c0110676 fl2=d120000 fl3=422a2188 fl4=100
----- Bind Byte Code (IN) -----
Opcode = 6 Bind Rpi Scalar Sql In(may be out)
Nocopy NoSkip
Offsi = 48, Offsi = 0
Opcode = 6 Bind Rpi Scalar Sql In(may be out)
Nocopy NoSkip
Offsi = 48, Offsi = 32
----- Bind Info (kkscoacd) -----
Bind#0
oacdty=02 mxl=22(21) mxlc=00 mal=00
scl=00 pre=00
oacflg=13 fl2=206001 frm=00 csi=00 siz=24 off=0
kxsbbbfp=7f5227943d08 bln=22 avl=05 flg=09
value=1234335
Bind#1
oacdty=01 mxl=2000(200) mxlc=00 mal=00
scl=00 pre=00
oacflg=13 fl2=206001 frm=01 csi=852 siz=2000 off=0
kxsbbbfp=7f5227943d48 bln=2000 avl=05 flg=09
value="oradh"
Frames pfr 0x7f5227897c18 siz=3424 efr 0x7f5227897b38 siz=3376
Cursor frame dump
enxt: 3.0x00000550 enxt: 2.0x00000040 enxt: 1.0x000007a0
pnxt: 1.0x00000030
kxscphp=0x7f5227961030 siz=984 inu=584 nps=360
kxscbhp=0x7f5227961920 siz=984 inu=152 nps=0
kxscwhp=0x7f5227960f40 siz=4056 inu=56 nps=0
注意:重点关注上面标红字体部份
ErrorStack dump是我们可以很可靠的找到SQL语句当前使用的绑定变量值一种方法。另外,还有如下说明:
- 在errorstack跟踪文件中发现Cursor#2,它的state=BOUND,意味着绑定变量值已经被赋予的
- SQL文本是我们在前面已经查找出来的,我们可以看到两个绑定变量值
- Bind variale numbering是从0开始,所以如果我们想搜索第一个绑定值,需要搜索Bind#0
4、从Errorstack跟踪文件中发现一个cursor正在使用多少private memory(UGA)
继续我们前面的示例,跟踪文件中能够度量一个session的某些游标使用了多少UGA私有内存。如下
Cursor#2游标部分:
Cursor frame dump
enxt: 3.0x00000550 enxt: 2.0x00000040 enxt: 1.0x000007a0
pnxt: 1.0x00000030
kxscphp=0x7f5227961030 siz=984 inu=584 nps=360
kxscbhp=0x7f5227961920 siz=984 inu=152 nps=0
kxscwhp=0x7f5227960f40 siz=4056 inu=56 nps=0
Cursor#1游标部分:
Cursor frame dump
enxt: 12.0x000005c0 enxt: 11.0x00000fd8 enxt: 10.0x00000fa0 enxt: 9.0x00000658
enxt: 8.0x00000228 enxt: 7.0x00000fd8 enxt: 6.0x00000fa0 enxt: 5.0x00000248
enxt: 4.0x00000fa0 enxt: 3.0x00000410 enxt: 2.0x00000480 enxt: 1.0x00000f70
pnxt: 1.0x00000030
kxscphp=0x7f5227960c70 siz=2792 inu=1056 nps=424
kxscbhp=0x7f5227960e50 siz=10376 inu=10008 nps=8768
在cursor frame dump部分我们可以看见kxsc开始的一些信息,这些意味着Kernel eXECUTION Shared Cursor,其中的每一行的siz参数告诉我们当前分配的私有内存大小。当然,我们需要汇总所有的siz,这个值就是此游标的memory usage。
下面是我的一些猜测汇总:
01.Heap description Meaning
02.kxscphp Cursor permanent heap. Allocated when cursor is opened
03.kxscdfhp Cursor default heap - default duration allocations
04.kxscehp Cursor ephemeral heap - short lived duration allocations
05.kxscwhp Cursor Work heap - used when actually executing the cursor (workareas etc)
06.kxscbhp Cursor Bind heap - this is where bind variable values and their metadata are kept.
四、实验代码