达梦数据库学习笔记 — 字符集与数据存储
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: 达梦数据库学习笔记 — 字符集与数据存储
有很久没写国产数据库达梦方面的技术文章了,近2年达梦数据库发展十分迅猛,不得不再研究研究。这里围绕字符集相关进行简单测试。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
SQL> select * from v$version; BANNER ------------------------- DM Database Server 64 V8 DB Version: 0x7000b used time: 0.578(ms). Execute id is 308. SQL> select unicode from dual; UNICODE ----------- 0 used time: 0.485(ms). Execute id is 309. SQL> |
这里的查询结果 0 代表 gb18030;1 代表 UTF-8。
1 2 3 4 5 6 7 |
SQL> SELECT * FROM v$parameter WHERE name like '%LENGTH_IN_CHAR%'; ID NAME TYPE VALUE SYS_VALUE FILE_VALUE DESCRIPTION ----------- -------------- --------- ----- --------- ---------- --------------------------------------------------------- 599 LENGTH_IN_CHAR READ ONLY 0 0 0 Column max length in CHAR or BYTE, 0: in BYTE, 1: in CHAR used time: 4.781(ms). Execute id is 406. |
LENGTH_IN_CHAR参数决定了达梦数据库中varchar数据类型存储长度是byte还是char,默认为0表示byte,为1则表现为字符。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
SQL> create table test_dm(a int,b varchar(8)); executed successfully used time: 39.949(ms). Execute id is 409. SQL> insert into test_dm values(1,'中国'); commit;affect rows 1 used time: 0.513(ms). Execute id is 410. SQL> insert into test_dm values(2,'云和恩墨'); DMSQL executed successfully used time: 1.064(ms). Execute id is 411. SQL> commit; executed successfully used time: 1.460(ms). Execute id is 412. SQL> insert into test_dm values(3,'数据库'); affect rows 1 used time: 0.390(ms). Execute id is 413. SQL> commit; executed successfully used time: 0.786(ms). Execute id is 414. SQL> insert into test_dm values(4,'数据库领军企业'); insert into test_dm values(4,'数据库领军企业'); [-6169]:Column [B] out of length. used time: 0.366(ms). Execute id is 0. SQL> insert into test_dm values(4,'数据库领军'); insert into test_dm values(4,'数据库领军'); [-6169]:Column [B] out of length. used time: 0.441(ms). Execute id is 0. SQL> select a,length(b) from test_dm; A LENGTH(B) ----------- ----------- 1 2 2 4 3 3 |
从上面的测试来看,默认varchar存储长度为byte的情况下,一个汉字占2个字节。在UTF8字符集情况下,是占3个字节。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
SQL> DBMS_STATS.GATHER_TABLE_STATS('SYSDBA','TEST_DM',null,100,TRUE,'FOR ALL COLUMNS SIZE AUTO'); DMSQL executed successfully used time: 97.433(ms). Execute id is 429. SQL> select owner,table_name,INITIAL_EXTENT,NEXT_EXTENT,NUM_ROWS,BLOCKS,EMPTY_BLOCKS from dba_tables 2 where table_name='TEST_DM'; OWNER TABLE_NAME INITIAL_EXTENT NEXT_EXTENT NUM_ROWS BLOCKS EMPTY_BLOCKS ------ ---------- -------------- ----------- -------- ------ ------------ SYSDBA TEST_DM NULL NULL 3 NULL NULL used time: 22.077(ms). Execute id is 430. SQL> select owner,segment_name,TABLESPACE_NAME,HEADER_FILE,HEADER_BLOCK,BYTES,BLOCKS,EXTENTS from dba_segments 2 where segment_name='TEST_DM'; OWNER SEGMENT_NAME TABLESPACE_NAME HEADER_FILE HEADER_BLOCK BYTES BLOCKS EXTENTS ------ ------------ --------------- ----------- ------------ -------------------- -------------------- -------------------- SYSDBA TEST_DM MAIN 0 32720 262144 32 2 used time: 360.959(ms). Execute id is 432. SQL> SQL> select 262144/32 from dual; 262144/32 ----------- 8192 used time: 0.540(ms). Execute id is 433. SQL> select file_id,file_name,bytes/1024/1024,blocks,RELATIVE_FNO,AUTOEXTENSIBLE from dba_data_files 2 where tablespace_name='MAIN'; FILE_ID FILE_NAME BYTES/1024/1024 BLOCKS RELATIVE_FNO AUTOEXTENSIBLE ----------- ------------------------------------- -------------------- -------------------- ------------ -------------- 0 /opt/dm/dmdbms/data/enmotech/MAIN.DBF 265 33920 1 YES used time: 4.969(ms). Execute id is 435. SQL> select dump(1,16) from dual; DUMP(1,16) -------------------- Typ=7 Len=4: 1,0,0,0 used time: 0.901(ms). Execute id is 436. SQL> select dump('中国',16) from dual; DUMP('16) ------------------------ Typ=2 Len=4: d6,d0,b9,fa used time: 0.646(ms). Execute id is 437. SQL> select dump('云和恩墨',16) from dual; DUMP('ī',16) ------------------------------------ Typ=2 Len=8: d4,c6,ba,cd,b6,f7,c4,ab used time: 0.311(ms). Execute id is 442. SQL> select dump('数据库',16) from dual; DUMP('',16) ------------------------------ Typ=2 Len=6: ca,fd,be,dd,bf,e2 used time: 0.196(ms). Execute id is 443. SQL> set TIMING off SQL> select a,b,rowid from test_dm; A B ROWID ----------- -------- -------------------- 1 中国 1 2 云和恩墨 2 3 数据库 3 SQL> select owner,segment_name,TABLESPACE_NAME,HEADER_FILE,HEADER_BLOCK,BYTES,BLOCKS,EXTENTS from dba_segments 2 where segment_name='TEST_DM'; OWNER SEGMENT_NAME TABLESPACE_NAME HEADER_FILE HEADER_BLOCK BYTES BLOCKS EXTENTS ------ ------------ --------------- ----------- ------------ -------------------- -------------------- -------------------- SYSDBA TEST_DM MAIN 0 32720 262144 32 2 SQL> select dump(32720,16) from dual; DUMP(32720,16) ---------------------- Typ=7 Len=4: d0,7f,0,0 SQL> select dump(262144,16) from dual; DUMP(262144,16) -------------------- Typ=7 Len=4: 0,0,4,0 SQL> select owner,object_name,SUBOBJECT_NAME,OBJECT_ID,DATA_OBJECT_ID,OBJECT_TYPE from dba_objects 2 where object_name='TEST_DM'; OWNER OBJECT_NAME SUBOBJECT_NAME OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE ------ ----------- -------------- --------- -------------- ----------- SYSDBA TEST_DM NULL 1284 NULL TABLE SQL> select dump(1284,16) from dual; DUMP(1284,16) -------------------- Typ=7 Len=4: 4,5,0,0 |
我们看到达梦数据库也有object_id的概念,但无data_object_id。那么既然无data_object_id,那么对于列存表来说,
如果发生truncate操作,其object_id会变化吗?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
SQL> create table test_dm2 as select * from test_dm; executed successfully SQL> truncate table test_dm; executed successfully SQL> insert into test_dm select * from test_dm2; affect rows 3 SQL> commit; executed successfully SQL> select owner,object_name,SUBOBJECT_NAME,OBJECT_ID,DATA_OBJECT_ID,OBJECT_TYPE from dba_objects 2 where object_name='TEST_DM'; OWNER OBJECT_NAME SUBOBJECT_NAME OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE ------ ----------- -------------- --------- -------------- ----------- SYSDBA TEST_DM NULL 1284 NULL TABLE SQL> SQL> drop table test_dm; executed successfully SQL> create table test_dm as select * from test_dm2; executed successfully SQL> select owner,object_name,SUBOBJECT_NAME,OBJECT_ID,DATA_OBJECT_ID,OBJECT_TYPE from dba_objects 2 where object_name='TEST_DM'; OWNER OBJECT_NAME SUBOBJECT_NAME OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE ------ ----------- -------------- --------- -------------- ----------- SYSDBA TEST_DM NULL 1286 NULL TABLE |
从测试来看跟预期看法是一致的,truncate操作不影响object_id,drop操作会增加object_id。达梦无法重用object_id。
最后我们来看下表的实际存储情况:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
[dmdba@test25 ~]$ dd if=/opt/dm/dmdbms/data/enmotech/MAIN.DBF bs=8192 skip=32720 count=2 | od -x 2+0 records in 2+0 records out 16384 bytes (16 kB) copied, 6.6176e-05 s, 248 MB/s 0000000 0004 0000 7fd0 0000 ffff ffff ffff ffff --00 00 d0 7f 表示段头block地址 ; 0000020 ffff ffff 0014 0000 0000 0000 0876 0002 --02 00 00 表示blocks数量 0000040 0000 0000 0005 00e7 0000 0000 0003 00c5 0000060 0052 005a ffff 00d7 0000 0401 0200 0004 0000100 0000 0008 0000 010c 0004 0000 0008 0000 0000120 00c4 0000 0000 0000 0000 ffff ffff ffff 0000140 ffff 1f00 0100 0000 8400 d0d6 fab9 0001 --d6 d0 b9 fa (第1行数据) 0000160 0000 0000 ffff ffff ff7f a5ff 0042 0000 0000200 0000 0023 0002 0000 d488 bac6 b6cd c4f7 --d4 c6 ba cd b6 f7 c4 ab (第2行数据) 0000220 02ab 0000 0000 ff00 ffff 7fff ffff 42a6 0000240 0000 0000 2100 0300 0000 8600 fdca ddbe --ca fd be dd bf e2 (第3行数据) 0000260 e2bf 0003 0000 0000 ffff ffff ff7f a7ff 0000300 0042 0000 0000 0022 ffff 0000 ca87 befd 0000320 bfdd 39e2 0004 0000 0000 ffff ffff ff7f 0000340 a9ff 0042 0000 0000 0000 0000 0000 0000 0000360 0000 0000 0000 0000 0000 0000 0000 0000 * 0017740 0000 0000 0000 0000 0000 0000 005a 005a 0017760 00a4 0081 0062 0052 0000 0000 0000 0000 0020000 0004 0000 7fd1 0000 ffff ffff ffff ffff 0020020 ffff ffff 1a1a 1a1a 0000 0000 0875 0002 0020040 0000 0000 0002 0062 0000 0000 0000 ffff 0020060 0052 005a ffff 006e 0000 0000 0000 0004 0020100 0000 0000 0000 0002 0000 0003 0000 0000 0020120 0000 ffff ffff ffff 7fff ffff ffff ffff 0020140 ffff 0000 0000 0000 0000 0000 0000 0000 0020160 0000 0000 0000 0000 0000 0000 0000 0000 * 0037760 0000 0000 005a 0052 0000 0000 0000 0000 0040000 [dmdba@test25 ~]$ |
由于达梦数据库没有提供类似Oracle BBED之类可以直接map查看block结构的工具,因此只能通过上面dd + od 的方式来进行猜测判断。
从信息判断来看,达梦的数据块存储结构跟Oracle类似,block头部也会存放类似oracle block rdba一样的地址。
从前面几点小的测试可以得到如下几个结论:
1、达梦数据库相关视图跟Oracle几乎一致,兼容性确实很好,不过似乎只是为了兼容而兼容;
比如我们看到存在object_id,没有data_object_id的概念。truncate操作不影响object_id,且无法重用。
2、对于列存表来讲,达梦数据库page 存储结构跟Oracle block类似,也存在类似rdba地址的内容在page(block)头部。
3、普通列存表存在rowid的,而且是实际存储,但对于业务来讲实际上是个伪列,业务不可见,但可查询。
其中达梦数据库中的ROWID跟Oracle 的ROWID完全不同,Oracle中的ROWID可以分解为rdba+object_id+row_number;
对于普通表来讲,达梦的ROWID就是一个单纯递增的序列。
4、对于对象空间分配,其最小IO单元是page(block),默认是8k,但最小分配单元为簇,类似Oracle extent,一个extent 包含多个page(block);
对于簇大小,有16,32,64 几种选择,假设默认为8k block size,那么最大extent为8个block。
对于这一点,后续进行一次更深入的测试分析,目前来看跟Oracle差距非常大,Oracle支持auto和uniform 2种方式,extent尺寸可变且比较大,一定程度上
对于大数据量的插入场景来讲,有很大的优势。
5、达梦数据库的字符集仅支持3种,gb18030,utf8以及韩文模式。
Leave a Reply
You must be logged in to post a comment.