cursor_sharing+outline的问题
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: cursor_sharing+outline的问题
1 2 3 4 5 6 |
昨天同事遇到一个有些怪异的问题,使用了outlines,但是SQLID执行仍然出现执行计划错误的情况, 导致大量的read by other session等待,严重影响业务,最后通过revoke和grant来解决了。 下面的问题就是如何去找到当时的问题所在?换句话说,当时的那个sql语句为什么不走index scan而 选择低效的full table scan呢? 错误的执行计划如下: |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
Plan hash value: 1414349908 ----------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | Pstart| Pstop | ----------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | | 100K(100)| | | | | 1 | SORT ORDER BY | | 157K| 16M| 35M| 100K (2)| 00:20:03 | | | | 2 | HASH UNIQUE | | 157K| 16M| 35M| 96389 (2)| 00:19:17 | | | | 3 | HASH JOIN | | 157K| 16M| | 92578 (2)| 00:18:31 | | | | 4 | TABLE ACCESS FULL | ACCT_ITEM_TYPE | 2817 | 73242 | | 15 (0)| 00:00:01 | | | | 5 | HASH JOIN | | 157K| 12M| | 92561 (2)| 00:18:31 | | | | 6 | TABLE ACCESS FULL | BILLING_CYCLE | 156 | 3900 | | 15 (0)| 00:00:01 | | | | 7 | HASH JOIN | | 158K| 8647K| | 92544 (2)| 00:18:31 | | | | 8 | TABLE ACCESS BY INDEX ROWID| A_PAY_OWE_STATE | 3 | 21 | | 2 (0)| 00:00:01 | | | | 9 | INDEX RANGE SCAN | INDX_PAYMENT_METHOD_01 | 3 | | | 1 (0)| 00:00:01 | | | | 10 | PARTITION HASH SINGLE | | 368K| 17M| | 92538 (2)| 00:18:31 | KEY | KEY | | 11 | TABLE ACCESS FULL | ACCT_ITEM_OWE | 368K| 17M| | 92538 (2)| 00:18:31 | KEY | KEY | ----------------------------------------------------------------------------------------------------------------------------------- |
1 |
下面是正确的执行计划: |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
Plan hash value: 2474252999 ----------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ----------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 198 (100)| | | | | 1 | SORT ORDER BY | | 52 | 5304 | 198 (3)| 00:00:03 | | | | 2 | HASH UNIQUE | | 52 | 5304 | 197 (2)| 00:00:03 | | | | 3 | HASH JOIN | | 52 | 5304 | 196 (2)| 00:00:03 | | | | 4 | HASH JOIN | | 52 | 4004 | 181 (2)| 00:00:03 | | | | 5 | TABLE ACCESS BY INDEX ROWID | A_PAY_OWE_STATE | 3 | 21 | 2 (0)| 00:00:01 | | | | 6 | INDEX RANGE SCAN | INDX_PAYMENT_METHOD_01 | 3 | | 1 (0)| 00:00:01 | | | | 7 | MERGE JOIN | | 121 | 8470 | 178 (1)| 00:00:03 | | | | 8 | TABLE ACCESS BY INDEX ROWID | BILLING_CYCLE | 70 | 1470 | 4 (0)| 00:00:01 | | | | 9 | INDEX FULL SCAN | PK_BILLING_CYCLE | 157 | | 1 (0)| 00:00:01 | | | | 10 | SORT JOIN | | 132 | 6468 | 174 (1)| 00:00:03 | | | | 11 | TABLE ACCESS BY GLOBAL INDEX ROWID| ACCT_ITEM_OWE | 132 | 6468 | 173 (0)| 00:00:03 | ROW L | ROW L | | 12 | INDEX RANGE SCAN | IDX_ACCT_ITEM_ACCT_ID | 132 | | 9 (0)| 00:00:01 | | | | 13 | TABLE ACCESS FULL | ACCT_ITEM_TYPE | 2845 | 71125 | 15 (0)| 00:00:01 | | | ----------------------------------------------------------------------------------------------------------------------------------- |
1 2 3 4 |
我们可以发现,区别就在是否使用了index <span style="font-size: 12px; color: #0000ff; font-family: monospace;"><strong>IDX_ACCT_ITEM_ACCT_ID</strong></span>,进而导致整个执行计划差别非常之大, 还可以看到正常的情况下使用了merge join。 处理问题的思路是这样的: |
1 2 3 4 5 6 7 8 9 10 |
1. 首先查看outline是否正常使用 SQL> select owner,name,used,version,ENABLED from DBA_OUTLINES 2 where owner='ACCT'; OWNER NAME USED VERSION ENABLED --------- ------------------------------ ------------------- -------- ACCT SYS_OUTLINE_11091615175537003 USED 10.2.0.4.0 ENABLED -- outline正常。 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
2. 检查outline定义是否包含index hint IDX_ACCT_ITEM_ACCT_ID. SQL> select ol_name, hint#, hint_text 2 from OUTLN.ol$hints 3 order by ol_name, hint#; OL_NAME HINT# HINT_TEXT ------------------------------ ----- ------------------------------------------------------------------------------------------ SYS_OUTLINE_11091615175537003 1 USE_HASH(@"SEL$5DA710D3" "D"@"SEL$1") SYS_OUTLINE_11091615175537003 2 USE_HASH(@"SEL$5DA710D3" "A_PAY_OWE_STATE"@"SEL$2") SYS_OUTLINE_11091615175537003 3 USE_MERGE(@"SEL$5DA710D3" "A"@"SEL$1") SYS_OUTLINE_11091615175537003 4 LEADING(@"SEL$5DA710D3" "H"@"SEL$1" "A"@"SEL$1" "A_PAY_OWE_STATE"@"SEL$2" "D"@"SEL$1") SYS_OUTLINE_11091615175537003 5 FULL(@"SEL$5DA710D3" "D"@"SEL$1") SYS_OUTLINE_11091615175537003 6 INDEX_RS_ASC(@"SEL$5DA710D3" "A_PAY_OWE_STATE"@"SEL$2" ("A_PAY_OWE_STATE"."PAYMENT_METHOD")) SYS_OUTLINE_11091615175537003 7 INDEX_RS_ASC(@"SEL$5DA710D3" "A"@"SEL$1" ("ACCT_ITEM_OWE"."ACCT_ID")) SYS_OUTLINE_11091615175537003 8 INDEX(@"SEL$5DA710D3" "H"@"SEL$1" ("BILLING_CYCLE"."BILLING_CYCLE_ID")) SYS_OUTLINE_11091615175537003 9 OUTLINE(@"SEL$2") SYS_OUTLINE_11091615175537003 10 OUTLINE(@"SEL$1") SYS_OUTLINE_11091615175537003 11 UNNEST(@"SEL$2") SYS_OUTLINE_11091615175537003 12 OUTLINE_LEAF(@"SEL$5DA710D3") SYS_OUTLINE_11091615175537003 13 ALL_ROWS SYS_OUTLINE_11091615175537003 14 OPTIMIZER_FEATURES_ENABLE('10.2.0.4') SYS_OUTLINE_11091615175537003 15 IGNORE_OPTIM_EMBEDDED_HINTS 15 rows selected. 从INDEX_RS_ASC(@"SEL$5DA710D3" "A"@"SEL$1" ("ACCT_ITEM_OWE"."ACCT_ID"))看,outline是有的。 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
3. 检查相关的对象在故障期间之前是否做过相应的ddl操作 SQL> select OWNER,OBJECT_NAME,SUBOBJECT_NAME,LAST_DDL_TIME,STATUS 2 from dba_objects 3 where OBJECT_NAME='ACCT_ITEM_OWE'; OWNER OBJECT_NAME SUBOBJECT_NAME LAST_DDL_TIM STATUS -------- ---------------- ---------------------- ------------ ------- COMM ACCT_ITEM_OWE 02-MAR-09 VALID ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_0 28-FEB-09 VALID ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_1 28-FEB-09 VALID ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_2 28-FEB-09 VALID ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_3 28-FEB-09 VALID ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_4 28-FEB-09 VALID ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_5 28-FEB-09 VALID ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_6 28-FEB-09 VALID ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_7 28-FEB-09 VALID ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_8 28-FEB-09 VALID ACCT ACCT_ITEM_OWE P_ACCT_ITEM_OWE_9 28-FEB-09 VALID ACCT ACCT_ITEM_OWE 19-SEP-11 VALID 12 rows selected. SQL> select OWNER,OBJECT_NAME,SUBOBJECT_NAME,LAST_DDL_TIME,STATUS from 2 dba_objects where OBJECT_NAME='IDX_ACCT_ITEM_ACCT_ID'; OWNER OBJECT_NAME SUBOBJECT_NAME LAST_DDL_TIM STATUS ----------- ----------------------- ----------------- ------------ ------- ACCT IDX_ACCT_ITEM_ACCT_ID 28-FEB-09 VALID 检查发现, 9月18号并无DDL操作。 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
4. 查询该SQLID是否存在多个child number或存在多个执行计划 昨天下午查询v$sql已经找不到那个SQLID了,通过如下查询发现: SQL> select sql_id,CHILD_NUMBER,SQL_TYPE_MISMATCH,OPTIMIZER_MISMATCH,OUTLINE_MISMATCH,BIND_MISMATCH,STATS_ROW_MISMATCH 2 from V$SQL_SHARED_CURSOR 3 where SQL_ID='&id'; Enter value for id: 4u9y643v2k8fw old 2: from V$SQL_SHARED_CURSOR where SQL_ID='&id' new 2: from V$SQL_SHARED_CURSOR where SQL_ID='4u9y643v2k8fw' SQL_ID CHILD_NUMBER S O O B S ------------- ------------ - - - - - 4u9y643v2k8fw 0 N N N N N 4u9y643v2k8fw 1 N N Y N N 4u9y643v2k8fw 2 N N N N N 4u9y643v2k8fw 3 N N Y N N 4u9y643v2k8fw 4 N N Y N N 4u9y643v2k8fw 5 N N Y N N 4u9y643v2k8fw 6 N N N N N 4u9y643v2k8fw 7 N N N N N 4u9y643v2k8fw 8 N N N N N 4u9y643v2k8fw 9 N N N N N 4u9y643v2k8fw 10 N N N N N 4u9y643v2k8fw 11 N N N N N 4u9y643v2k8fw 12 N N N N N 4u9y643v2k8fw 13 N N N N N 4u9y643v2k8fw 14 N N N N N 4u9y643v2k8fw 15 N N N N N 4u9y643v2k8fw 16 N N N N N 4u9y643v2k8fw 17 N N N N N 4u9y643v2k8fw 18 N N N N N 4u9y643v2k8fw 19 N N N N N 4u9y643v2k8fw 20 N N N N N 21 rows selected. 备注:这里的第2个O是指OUTLINE_MISMATCH。 可以发现该sqlid出现了多个child cursor,另外检查发现参数cursor_sharing设置为simlair了。 该sqlid sql version肯定也较高。 这里有一点让我不解的是,通过DBA_HIST_*相关视图查询,发现当时出问题的时候的sqlid的child number 跟V$SQL_SHARED_CURSOR并不相同,当然这也不能完全说明问题,很可能sql已经被flush out多次了。 通过查hist视图,可以发现当时故障期间出现大量的read by other session等待: EVENT COUNT(*) SESSION_ID SESSION SQL_ID SQL_CHILD_NUMBER SQL_PLAN_HASH_VALUE FORCE_MATCHING_SIGNATURE TIME_WAITED ----------------------- --------- ---------- ------- ------------- ---------------- ------------------- ------------------------ ----------- db file scattered read 1 1382 WAITING 4u9y643v2k8fw 18 1414349908 0 4502 read by other session 1 1398 WAITING 4u9y643v2k8fw 18 1414349908 0 8099 read by other session 1 1404 WAITING 4u9y643v2k8fw 18 1414349908 0 89 db file scattered read 1 1424 WAITING 4u9y643v2k8fw 18 1414349908 0 8483 read by other session 1 1633 WAITING 4u9y643v2k8fw 18 1414349908 0 10470 read by other session 1 2081 WAITING 4u9y643v2k8fw 18 1414349908 4.3893E+18 2047 read by other session 1 2239 WAITING 4u9y643v2k8fw 18 1414349908 0 5403 read by other session 1 685 WAITING 4u9y643v2k8fw 18 1414349908 0 11577 read by other session 1 1094 WAITING 4u9y643v2k8fw 18 1414349908 0 6340 read by other session 1 1168 WAITING 4u9y643v2k8fw 18 1414349908 4.3893E+18 1427 read by other session 1 1304 WAITING 4u9y643v2k8fw 18 1414349908 0 12656 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 |
5. 查看sqlid是否使用了outline SQL> select SQL_ID,CHILD_NUMBER 2 from v$sql_plan 3 where sql_id='4u9y643v2k8fw'; SQL_ID CHILD_NUMBER ------------- ------------ 4u9y643v2k8fw 12 4u9y643v2k8fw 12 ..... 4u9y643v2k8fw 12 4u9y643v2k8fw 2 4u9y643v2k8fw 2 4u9y643v2k8fw 2 ...... 4u9y643v2k8fw 2 4u9y643v2k8fw 2 28 rows selected. 下午查询时只有number为2 12 SQL> SELECT info outline_name 2 FROM (SELECT ROWNUM r, EXTRACTVALUE (VALUE (d), '/info') info 3 FROM v$sql_plan p, 4 TABLE (XMLSEQUENCE (EXTRACT (XMLTYPE (p.other_xml), 5 '/other_xml/info' 6 ) 7 ) 8 ) d 9 WHERE other_xml IS NOT NULL 10 AND ID = 1 11 AND sql_id = '4u9y643v2k8fw' 12 AND child_number = 3) 13 WHERE r = 4; no rows selected SQL> SELECT info outline_name 2 FROM (SELECT ROWNUM r, EXTRACTVALUE (VALUE (d), '/info') info 3 FROM v$sql_plan p, 4 TABLE (XMLSEQUENCE (EXTRACT (XMLTYPE (p.other_xml), 5 '/other_xml/info' 6 ) 7 ) 8 ) d 9 WHERE other_xml IS NOT NULL 10 AND ID = 1 11 AND sql_id = '4u9y643v2k8fw' 12 AND child_number = 4) 13 WHERE r = 4; no rows selected SQL> SELECT info outline_name 2 FROM (SELECT ROWNUM r, EXTRACTVALUE (VALUE (d), '/info') info 3 FROM v$sql_plan p, 4 TABLE (XMLSEQUENCE (EXTRACT (XMLTYPE (p.other_xml), 5 '/other_xml/info' 6 ) 7 ) 8 ) d 9 WHERE other_xml IS NOT NULL 10 AND ID = 1 11 AND sql_id = '4u9y643v2k8fw' 12 AND child_number = 5) 13 WHERE r = 4; no rows selected SQL> SELECT info outline_name 2 FROM (SELECT ROWNUM r, EXTRACTVALUE (VALUE (d), '/info') info 3 FROM v$sql_plan p, 4 TABLE (XMLSEQUENCE (EXTRACT (XMLTYPE (p.other_xml), 5 '/other_xml/info' 6 ) 7 ) 8 ) d 9 WHERE other_xml IS NOT NULL 10 AND ID = 1 11 AND sql_id = '4u9y643v2k8fw' 12 AND child_number = 2) 13 WHERE r = 4; OUTLINE_NAME -------------------------------------------------------------------------------- "SYS_OUTLINE_11091615175537003" SQL> SELECT info outline_name 2 FROM (SELECT ROWNUM r, EXTRACTVALUE (VALUE (d), '/info') info 3 FROM v$sql_plan p, 4 TABLE (XMLSEQUENCE (EXTRACT (XMLTYPE (p.other_xml), 5 '/other_xml/info' 6 ) 7 ) 8 ) d 9 WHERE other_xml IS NOT NULL 10 AND ID = 1 11 AND sql_id = '4u9y643v2k8fw' 12 AND child_number = 12) 13 WHERE r = 4; OUTLINE_NAME -------------------------------------------------------------------------------- "SYS_OUTLINE_11091615175537003" 可以看到,该sql的当前的2个child cursor都是使用了outline的。 但是这说明不了问题,既然该sql存在多个child cursor,那么必然存在多个plan hash value值。 这里我们需要去查询那个出问题的sqlid对应的plan_hash_value所对应的执行计划。 虽然说是有awr快照,但是通过这个方式已经查不到信息了。 select * from table(dbms_xplan.display_awr('&sql_id','&child_number',null,'advanced')); 我这里使用的如下的查询方式: SQL> set long 1000000 SQL> set linesize 200 SQL> SELECT plan_hash_value, XMLTYPE (other_xml).EXTRACT ('/*') 2 FROM dba_hist_sql_plan 3 WHERE sql_id = '4u9y643v2k8fw' 4 AND plan_hash_value IN (1414349908) 5 AND DBMS_LOB.SUBSTR (other_xml, 1000) LIKE '%<other%' 6 / PLAN_HASH_VALUE --------------- XMLTYPE(OTHER_XML).EXTRACT('/*') -------------------------------------------------------------------------------------------------------------------------------- 1414349908 <other_xml> <info type="db_version">10.2.0.4</info> <info type="parse_schema"><![CDATA["ACCT"]]></info> <info type="plan_hash">1414349908</info> <peeked_binds> <bind nam=":IPAYMETH" pos="17" dty="2" pre="0" scl="0" mxl="22">c10c</bind> <bind nam=":IACTION" pos="18" dty="2" pre="0" scl="0" mxl="22">c103</bind> <bind nam=":SYS_B_16" pos="19" dty="1" csi="852" frm="1" mxl="32">313042</bind> PLAN_HASH_VALUE --------------- XMLTYPE(OTHER_XML).EXTRACT('/*') -------------------------------------------------------------------------------------------------------------------------------- <bind nam=":SYS_B_17" pos="20" dty="1" csi="852" frm="1" mxl="32">313044</bind> <bind nam=":LACCTID" pos="21" dty="2" pre="0" scl="0" mxl="22">c6481504081c43</bind> </peeked_binds> <outline_data> <hint><![CDATA[IGNORE_OPTIM_EMBEDDED_HINTS]]></hint> <hint><![CDATA[OPTIMIZER_FEATURES_ENABLE('10.2.0.4')]]></hint> <hint><![CDATA[ALL_ROWS]]></hint> <hint><![CDATA[OUTLINE_LEAF(@"SEL$5DA710D3")]]></hint> <hint><![CDATA[UNNEST(@"SEL$2")]]></hint> PLAN_HASH_VALUE --------------- XMLTYPE(OTHER_XML).EXTRACT('/*') -------------------------------------------------------------------------------------------------------------------------------- <hint><![CDATA[OUTLINE(@"SEL$1")]]></hint> <hint><![CDATA[OUTLINE(@"SEL$2")]]></hint> <hint><![CDATA[INDEX_RS_ASC(@"SEL$5DA710D3" "A_PAY_OWE_STATE"@"SEL$2" ("A_PAY_OWE_STATE"."PAYMENT_METHOD"))]]></hint> <hint><![CDATA[FULL(@"SEL$5DA710D3" "A"@"SEL$1")]]></hint> <hint><![CDATA[FULL(@"SEL$5DA710D3" "H"@"SEL$1")]]></hint> <hint><![CDATA[FULL(@"SEL$5DA710D3" "D"@"SEL$1")]]></hint> <hint><![CDATA[LEADING(@"SEL$5DA710D3" "A_PAY_OWE_STATE"@"SEL$2" "A"@"SEL$1" "H"@"SEL$1" "D"@"SEL$1")]]></hint> <hint><![CDATA[USE_HASH(@"SEL$5DA710D3" "A"@"SEL$1")]]></hint> <hint><![CDATA[USE_HASH(@"SEL$5DA710D3" "H"@"SEL$1")]]></hint> PLAN_HASH_VALUE --------------- XMLTYPE(OTHER_XML).EXTRACT('/*') -------------------------------------------------------------------------------------------------------------------------------- <hint><![CDATA[USE_HASH(@"SEL$5DA710D3" "D"@"SEL$1")]]></hint> <hint><![CDATA[SWAP_JOIN_INPUTS(@"SEL$5DA710D3" "H"@"SEL$1")]]></hint> <hint><![CDATA[SWAP_JOIN_INPUTS(@"SEL$5DA710D3" "D"@"SEL$1")]]></hint> </outline_data> </other_xml> |
1 |
从上面的查询来看,当时是使用了outline,但是问题出在这里: |
1 2 3 |
<hint><![CDATA[FULL(@"SEL$5DA710D3" "A"@"SEL$1")]]></hint> <hint><![CDATA[FULL(@"SEL$5DA710D3" "H"@"SEL$1")]]></hint> <hint><![CDATA[FULL(@"SEL$5DA710D3" "D"@"SEL$1")]]></hint> |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
也就是说正常情况下,这3条当中至少有1条是ACCT_ID的index rang scan才对。 今天上午,和同事沟通,说又出问题了,同事将参数cursor_sharing调为exact了,即默认值。 虽然这样会导致很多sql语句的硬解析,但是听他说上午调整以后到11点时,还没出现问题。 跟itpub版主棉花糖ONE交流了一下,他认为很可能是outline和cursor_sharing的bug。 但是我仍然觉得cursor_sharing可能不是真正的原因,因为ACCT_ID列是应用绑定变量列, 并不是通过参数cursor_sharing的设置来实现绑定变量的。 最后总结一下: 9i以后都不推荐再用outline了,感觉不太稳定,推荐使用sql profile来固定执行计划。 至于最后的结论到底是outline的bug还是cursor_sharing=similar的bug,现在还确定不了,只是可能性非常大。 |
2 Responses to “cursor_sharing+outline的问题”
精彩!Roger的BLOG越来越好了。
第二个执行计划有很大问题
Leave a Reply
You must be logged in to post a comment.