intra blcok chain
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: intra blcok chain
最近在一个优化项目中,通过awr报告发现table fetch continued row 指标很高,怀疑是行迁移/链接比较严重。
后来经过沟通发现,原来客户的数据库中存在几个table,其column 数目超过255. 针对超过255列的行数据.如下:
1 2 3 4 5 6 |
<pre class="brush:php">Statistic Total per Second per Trans dirty buffers inspected 7,532 4.17 0.25 free buffer inspected 17,409,018 9,633.69 575.56 free buffer requested 17,115,682 9,471.36 565.86 table fetch by rowid 98,848,588 54,700.18 3,268.05 table fetch continued row 97,797,107 54,118.32 3,233.28 |
1 2 |
oracle会将其每行数据都分成2个row piece. 实际上如果超过510个列,那么会被分成3个row piece存放在同一个 block中。 10gR2的官方文档是这样描述的: |
1 2 3 4 5 |
<pre class="brush:php">"When a table has more than 255 columns, rows that have data after the 255th column are likely to be chained within the same block. This is called intra-block chaining. A chained row's pieces are chained together using the rowids of the pieces. With intra-block chaining, users receive all the data in the same block. If the row fits in the block, users do not see an effect in I/O performance, because no extra I/O operation is required to retrieve the rest of the row." |
1 |
创建超过255列的测试表: |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
<pre class="brush:sql">----tab1 256 columns SQL> declare 2 v_sql varchar2(32767) ; 3 begin 4 v_sql := 'create table t_chain1 ( ' ; 5 for i in 1..256 loop 6 v_sql := v_sql || 'id'||i||' number,' ; 7 end loop ; 8 v_sql := rtrim(v_sql, ',') || ')'; 9 execute immediate v_sql; 10 end ; 11 / PL/SQL procedure successfully completed. SQL> select count(1) from user_tab_columns where table_name='T_CHAIN1'; COUNT(1) ---------- 256 SQL> create sequence t_chain_seq 2 minvalue 1 3 nomaxvalue 4 start with 1 5 increment by 1 6 nocycle 7 cache 10000; Sequence created. SQL> SQL> insert into t_chain1(id256) values(t_chain_seq.NEXTVAL); 1 row created. SQL> / 1 row created. SQL> / 1 row created. SQL> commit; Commit complete. SQL> select count(1) from t_chain1; COUNT(1) ---------- 3 SQL> |
1 |
1 |
再创建一个255列的测试表进行对比: |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
<pre class="brush:sql">----tab2 255 columns SQL> declare 2 v_sql varchar2(32767) ; 3 begin 4 v_sql := 'create table t_chain2 ( ' ; 5 for i in 1..255 loop 6 v_sql := v_sql || 'id'||i||' number,' ; 7 end loop ; 8 v_sql := rtrim(v_sql, ',') || ')'; 9 execute immediate v_sql; 10 end ; 11 / PL/SQL procedure successfully completed. SQL> select count(1) from user_tab_columns where table_name='T_CHAIN2'; COUNT(1) ---------- 255 SQL> create sequence t_chain_seq2 2 minvalue 1 3 nomaxvalue 4 start with 1 5 increment by 1 6 nocycle 7 cache 10000; Sequence created. SQL> insert into t_chain2(id255) values(t_chain_seq2.NEXTVAL); 1 row created. SQL> / 1 row created. SQL> / 1 row created. SQL> commit; Commit complete. SQL> select count(1) from t_chain2; COUNT(1) ---------- 3 |
1 |
1 |
通过前面的测试表,我们来对比观察下这2个表的block信息的差别: |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
<pre class="brush:sql">----使用dbms_rowid定位到block 号 SQL> select dbms_rowid.rowid_relative_fno(t.rowid) as "file#" , 2 dbms_rowid.rowid_block_number(t.rowid) as "block#" from t_chain1 t ; file# block# ---------- ---------- 2 73688 2 73688 2 73688 SQL> select dbms_rowid.rowid_relative_fno(t.rowid) as "file#" , 2 dbms_rowid.rowid_block_number(t.rowid) as "block#" from t_chain2 t ; file# block# ---------- ---------- 2 73696 2 73696 2 73696 |
1 |
1 |
<pre>我们先来看下这2种情况下,block内信息的差异. |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
<pre class="brush:php">---tab1 256 column dump data_block_dump,data header at 0xd0da87c =============== tsiz: 0x1f80 hsiz: 0x1e pbl: 0x0d0da87c bdba: 0x00811fd8 76543210 flag=-------- ntab=1 nrow=6 ---注意这里是6(实际上测试表的数据只有3条) frre=-1 fsbo=0x1e fseo=0x1c56 avsp=0x1c38 tosp=0x1c38 0xe:pti[0] nrow=6 offs=0 0x12:pri[0] offs=0x1e7c 0x14:pri[1] offs=0x1e72 0x16:pri[2] offs=0x1d6e 0x18:pri[3] offs=0x1d64 0x1a:pri[4] offs=0x1c60 0x1c:pri[5] offs=0x1c56 block_row_dump: tab 0, row 0, @0x1e7c 我们可以看到,oracle这样讲第1行数据分成了2个row piece.这里是第一个row piece的offset tl: 260 fb: -----L-- lb: 0x1 cc: 255 col 0: *NULL* col 1: *NULL* col 2: *NULL* col 3: *NULL* ....... col 253: *NULL* col 254: [ 2] c1 02 从这里我们可以看到,oracle将有数据的列存放到第一个row piece了. tab 0, row 1, @0x1e72 这里是第一行数据的第2个row piece. tl: 10 fb: --H-F--- lb: 0x1 cc: 1 nrid: 0x00811fd8.0 这里的nrid相当于rdba,转换后即为我们的file 2 block 73688地址. col 0: *NULL* tab 0, row 2, @0x1d6e tl: 260 fb: -----L-- lb: 0x1 cc: 255 col 0: *NULL* col 1: *NULL* col 2: *NULL* col 3: *NULL* ...... col 252: *NULL* col 253: *NULL* col 254: [ 2] c1 03 tab 0, row 3, @0x1d64 tl: 10 fb: --H-F--- lb: 0x1 cc: 1 nrid: 0x00811fd8.2 col 0: *NULL* tab 0, row 4, @0x1c60 tl: 260 fb: -----L-- lb: 0x1 cc: 255 col 0: *NULL* col 1: *NULL* col 2: *NULL* col 3: *NULL* ...... col 252: *NULL* col 253: *NULL* col 254: [ 2] c1 04 tab 0, row 5, @0x1c56 tl: 10 fb: --H-F--- lb: 0x1 cc: 1 nrid: 0x00811fd8.4 col 0: *NULL* end_of_block_dump |
1 2 |
对于超过255列的表,oracle会将其在block内的一行数据分成2个row piece来存放。但是会将有数据的列存放在 第一个row piece中,无数的列存放的在第2个row piece中. |
1 |
<pre> |
1 |
如下是tab2 255 columns的dump: |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
<pre class="brush:php">data_block_dump,data header at 0xd0da864 =============== tsiz: 0x1f98 hsiz: 0x18 pbl: 0x0d0da864 bdba: 0x00811fe0 76543210 flag=-------- ntab=1 nrow=3 frre=-1 fsbo=0x18 fseo=0x1c8c avsp=0x1c74 tosp=0x1c74 0xe:pti[0] nrow=3 offs=0 0x12:pri[0] offs=0x1e94 0x14:pri[1] offs=0x1d90 0x16:pri[2] offs=0x1c8c block_row_dump: tab 0, row 0, @0x1e94 tl: 260 fb: --H-FL-- lb: 0x1 cc: 255 col 0: *NULL* col 1: *NULL* col 2: *NULL* ...... col 253: *NULL* col 254: [ 2] c1 02 tab 0, row 1, @0x1d90 tl: 260 fb: --H-FL-- lb: 0x1 cc: 255 col 0: *NULL* col 1: *NULL* col 2: *NULL* ....... col 253: *NULL* col 254: [ 2] c1 03 tab 0, row 2, @0x1c8c tl: 260 fb: --H-FL-- lb: 0x1 cc: 255 col 0: *NULL* col 1: *NULL* col 2: *NULL* col 3: *NULL* ........ col 253: *NULL* col 254: [ 2] c1 04 end_of_block_dump |
1 |
1 |
对于未超过255列的表.没有什么特别的,正是大家平时所看到的这样. |
1 |
这里给大家补充一点,为什么oracle这里最大允许255个列呢 ? 如下: |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
<pre class="brush:php">BBED> x /rnnnnnnnnnnnnnnnnnnnn rowdata[6686] @7924 ------------- flag@7924: 0x04 (KDRHFL) lock@7925: 0x00 cols@7926: 255 ....... BBED> d /v offset 7924 count 10 File: /home/ora10g/oradata/roger01.dbf (2) Block: 73688 Offsets: 7924 to 7933 Dba:0x00811fd8 ------------------------------------------------------- d80000ff 0401ffff ffff l ........ <16 bytes per line> |
1 |
<pre> |
1 |
可以看到oracle用了一个byte来存放column 的count值。简称cc. |
1 2 |
我们知道,1个byte等于8个bit. 一个bit最大表示的数目是power(2,1).以此类推,那么一个byte 所能表示的最大数目都是power(2,8),即使256. 所以oracle这里一个row piece最大允许255个column. |
1 2 |
超过255列即为分到另外一个row piece中. 虽然数据被划分到了其他的row piece,然而,一行数据仍然是存在同一个block中. 同时,每行都存在一个rowid. |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
<pre class="brush:sql">SQL> select rowid from t_chain1; ROWID ------------------ AAAPJfAACAAAR/YAAB AAAPJfAACAAAR/YAAD AAAPJfAACAAAR/YAAF SQL> select dbms_rowid.rowid_object(rowid) obj#, 2 dbms_rowid.rowid_relative_fno(rowid) rfile#, 3 dbms_rowid.rowid_block_number(rowid) block#, 4 dbms_rowid.rowid_row_number(rowid) row# 5 from t_chain1 ; OBJ# RFILE# BLOCK# ROW# ---------- ---------- ---------- ---------- 62047 2 73688 1 62047 2 73688 3 62047 2 73688 5 |
1 2 3 |
我们可以看到,虽然该block存放了6行数据(3条数据,6个row piece),然而其rowid只有3个. 从这点也可以看出,其实虽然一条数据 被分成2个row piece,然而其rowid却只有一个.所有如果是所有通过rowid访问,那么是可以指定返回整条数据的,不需要产生额外的IO。 那么,针对这样的行内迁移(intra block chain),到底是否会产生多余的IO消耗呢 ?我们通过创建通过如下的SQL来简单测试下: |
1 |
1 |
+++++++首先测试不存在行迁移的表,观察逻辑读消耗 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
<pre class="brush:sql">SQL> conn roger/roger Connected. SQL> select a.NAME, b.VALUE 2 from v$statname a, v$mystat b 3 where a.STATISTIC# = b.STATISTIC# 4 and lower(a.NAME) = 'table fetch continued row'; NAME VALUE ---------------------------------------------------------------- ---------- table fetch continued row 0 SQL> alter system flush shared_pool; System altered. SQL> alter system flush BUFFER_CACHE; System altered. SQL> set autot on SQL> set lines 200 SQL> select count(1) from t_chain2 where id255=2; COUNT(1) ---------- 1 Execution Plan ---------------------------------------------------------- Plan hash value: 1667017148 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 3 | | | |* 2 | TABLE ACCESS FULL| T_CHAIN2 | 1 | 3 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("ID255"=2) Statistics ---------------------------------------------------------- 476 recursive calls 0 db block gets 41 consistent gets 17 physical reads 0 redo size 411 bytes sent via SQL*Net to client 400 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 4 sorts (memory) 0 sorts (disk) 1 rows processed |
1 |
++++++++ 测试intra block chaining的表 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
<pre class="brush:sql">SQL> select count(1) from t_chain1 where id256=2; COUNT(1) ---------- 1 SQL> select a.NAME, b.VALUE 2 from v$statname a, v$mystat b 3 where a.STATISTIC# = b.STATISTIC# 4 and lower(a.NAME) = 'table fetch continued row'; NAME VALUE ---------------------------------------------------------------- ---------- table fetch continued row 48 SQL> alter system flush buffer_cache; System altered. SQL> set autot on SQL> set autot off SQL> select a.NAME, b.VALUE 2 from v$statname a, v$mystat b 3 where a.STATISTIC# = b.STATISTIC# 4 and lower(a.NAME) = 'table fetch continued row'; NAME VALUE ---------------------------------------------------------------- ---------- table fetch continued row 48 SQL> set autot on SQL> select count(1) from t_chain1 where id256=2; COUNT(1) ---------- 1 Execution Plan ---------------------------------------------------------- Plan hash value: 432753352 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 3 | | | |* 2 | TABLE ACCESS FULL| T_CHAIN1 | 1 | 3 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("ID256"=2) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 10 consistent gets 6 physical reads 0 redo size 411 bytes sent via SQL*Net to client 400 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed SQL> alter system flush buffer_cache; System altered. SQL> SQL> set autot off SQL> select a.NAME, b.VALUE 2 from v$statname a, v$mystat b 3 where a.STATISTIC# = b.STATISTIC# 4 and lower(a.NAME) = 'table fetch continued row'; NAME VALUE ---------------------------------------------------------------- ---------- table fetch continued row 63 SQL> alter system flush BUFFER_CACHE; System altered. SQL> set autot on SQL> select count(1) from t_chain1 where id256=2; COUNT(1) ---------- 1 Execution Plan ---------------------------------------------------------- Plan hash value: 432753352 ------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 3 | 3 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 3 | | | |* 2 | TABLE ACCESS FULL| T_CHAIN1 | 1 | 3 | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("ID256"=2) Statistics ---------------------------------------------------------- 477 recursive calls 0 db block gets 38 consistent gets 18 physical reads 0 redo size 411 bytes sent via SQL*Net to client 400 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 4 sorts (memory) 0 sorts (disk) 1 rows processed SQL> select a.NAME, b.VALUE 2 from v$statname a, v$mystat b 3 where a.STATISTIC# = b.STATISTIC# 4 and lower(a.NAME) = 'table fetch continued row'; NAME VALUE ---------------------------------------------------------------- ---------- table fetch continued row 87 |
1 |
1 2 |
<strong>我们可以看到虽然intra block chain,会由于对表的访问导致table fetch continued row 统计信息的增加,</strong> <strong>然而通过测试我们可以发现,其本身并不会消耗额外的IO。</strong> |
1 |
<strong>从我上面的2个测试来看:</strong> |
1 2 |
<strong>t_chain1 256 列</strong> <strong>t_chain2 255 列</strong> |
1 |
<strong>这2个测试表均包含3条数据。 全表扫描的逻辑读几乎一致。</strong> |
6 Responses to “intra blcok chain”
你DUMP BLOCK的命令是什么啊?
roger,感觉你理解错误了
H: Head of row piece
K:Cluster key
C:Cluster table member
D:Deleted row
F:First data piece,
L:Last data piece
P:First column continues from previous row
N:Last column continues
#define KDRHFK 0×80 Cluster Key
#define KDRHFC 0×40 Clustered table member
#define KDRHFH 0×20 Head piece of row
#define KDRHFD 0×10 Deleted row
#define KDRHFF 0×08 First data piece|
#define KDRHFL 0×04 Last data piece
#define KDRHFP 0×02 First column continues from Previous piece
#define KDRHFN 0×01 Last column continues in Next piece
我的理解没有问题。 nrid表是next row piece。前面的row 0从dump来看是row piece 1. 下面的row 1对应row piece 2. 但是实际上oracle读取的数据是先读下面row 1. 这个从前面的bbed dump看出来。 其实row 1的数据是在前面的。 从结构上来讲,row 0是第2个row piece。
博主没有理解错,是这样的,因为oracle在写数据的时候是从底部往上写的,所以读数据的时候会先读取下面的。 但是说消耗的IO一样我不太赞同。
roger,我也觉得你理解有问题。Oracle每次读一个block(非Exadata的系统),因此,是否会消耗额外的IO,需要看是否读取了多余的块,你这样构造的数据,读取的时候,IO没有太大改变,但是生产的数据,可能是读取了多余的block…………
针对block内的啊
Leave a Reply
You must be logged in to post a comment.