达梦数据库学习笔记之 — 线程架构
本站文章除注明转载外,均为本站原创: 转载自love wife love life —Roger的Oracle/MySQL/PostgreSQL数据恢复博客
本文链接地址: 达梦数据库学习笔记之 — 线程架构
这里接着前面的系列文章继续进行学习探索(本文就是纯技术研究,仅代表个人观点,与公司无关)。达梦数据库本质上是一个单进程多线程架构,跟MySQL类似。如下所示;
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
[root@mogdb ~]# ps -ef|grep dmserver|grep -v grep dmdba 9539 1 0 00:47 ? 00:00:02 /opt/dmdbms/bin/dmserver /opt/dm/dmdbms/data/enmotech/dm.ini -noconsole [root@mogdb ~]# SQL> select pname,trace_name,type$,pid from v$process; LINEID PNAME TRACE_NAME TYPE$ PID ---------- -------- ---------- ----------- ----------- 1 dmserver 1 9539 [root@mogdb enmotech]# pstree -p |grep dmserver |-dmserver(9539)-+-{dmserver}(9902) | |-{dmserver}(9956) | |-{dmserver}(9957) | |-{dmserver}(9958) | |-{dmserver}(9959) | |-{dmserver}(10051) | |-{dmserver}(10052) | |-{dmserver}(10151) | |-{dmserver}(10152) | |-{dmserver}(10153) | |-{dmserver}(10154) | |-{dmserver}(10187) | |-{dmserver}(10190) | |-{dmserver}(10192) | |-{dmserver}(10206) | |-{dmserver}(10207) | |-{dmserver}(10208) | |-{dmserver}(10209) | |-{dmserver}(10210) | |-{dmserver}(10211) | |-{dmserver}(10212) | |-{dmserver}(10213) | |-{dmserver}(10219) | |-{dmserver}(10220) | `-{dmserver}(10787) |
可以看到这里的PID即是dmserver的操作系统进程号。跟Oracle 中的pid不同。 接下来我们看看dm8中有哪些线程:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
SQL> select id,name,THREAD_DESC from v$threads; LINEID ID NAME THREAD_DESC ---------- -------------------- -------------- ------------------------------------------------------------------------------------- 1 9902 dm_quit_thd Thread for executing shutdown-normal operation 2 9956 dm_io_thd IO thread 3 9957 dm_io_thd IO thread 4 9958 dm_io_thd IO thread 5 9959 dm_io_thd IO thread 6 10051 dm_chkpnt_thd Flush checkpoint thread 7 10052 dm_redolog_thd Redo log thread, used to flush log 8 10153 dm_hio_thd IO thread for HFS to read data pages 9 10152 dm_hio_thd IO thread for HFS to read data pages 10 10151 dm_hio_thd IO thread for HFS to read data pages 11 10154 dm_hio_thd IO thread for HFS to read data pages 12 10187 dm_sqllog_thd Thread for writing dmsql dmserver 13 10190 dm_purge_thd Purge thread 14 10211 dm_wrkgrp_thd User working thread 15 10206 dm_tskwrk_thd Task Worker Thread for SQL parsing and execution for sevrer itself 16 10210 dm_trctsk_thd Thread for writing trace information 17 10209 dm_tskwrk_thd Task Worker Thread for SQL parsing and execution for sevrer itself 18 10208 dm_tskwrk_thd Task Worker Thread for SQL parsing and execution for sevrer itself 19 10207 dm_tskwrk_thd Task Worker Thread for SQL parsing and execution for sevrer itself 20 10212 dm_wrkgrp_thd User working thread 21 10213 dm_audit_thd Thread for flush audit logs 22 10219 dm_sched_thd Server scheduling thread,used to trigger background checkpoint, time-related triggers 23 10220 dm_lsnr_thd Service listener thread 24 10787 dm_sql_thd User session thread SQL> select name,THREAD_DESC,count(1) from v$threads group by 2 name,THREAD_DESC order by 3; LINEID NAME THREAD_DESC COUNT(1) ---------- -------------- ------------------------------------------------------------------------------------- -------------------- 1 dm_sched_thd Server scheduling thread,used to trigger background checkpoint, time-related triggers 1 2 dm_chkpnt_thd Flush checkpoint thread 1 3 dm_lsnr_thd Service listener thread 1 4 dm_sql_thd User session thread 1 5 dm_sqllog_thd Thread for writing dmsql dmserver 1 6 dm_purge_thd Purge thread 1 7 dm_quit_thd Thread for executing shutdown-normal operation 1 8 dm_redolog_thd Redo log thread, used to flush log 1 9 dm_trctsk_thd Thread for writing trace information 1 10 dm_audit_thd Thread for flush audit logs 1 11 dm_wrkgrp_thd User working thread 2 12 dm_hio_thd IO thread for HFS to read data pages 4 13 dm_tskwrk_thd Task Worker Thread for SQL parsing and execution for sevrer itself 4 14 dm_io_thd IO thread 4 14 rows got used time: 31.360(ms). Execute id is 15. |
下面我们针对上述线程简单进行描述和总结。
1、dm_lsnr_thd 监听线程
类似Oracle的listener 程序,任务是在服务器端口上进行循环监听,一旦有来自客户的连接请求,监 听线程被唤醒并生成一个会话申请任务,加入工作线程的任务队列,等待工作线程进行处理。
达梦数据库中监听线程的监听端口范围为1024 – 65534;默认监听的端口为5236。通过分析参数文件配置,可以看到与端口相关的一些参数:
1 2 3 4 5 6 7 8 9 10 11 12 |
[root@mogdb enmotech]# cat dm.ini |grep PORT PORT_NUM = 5236 #Port number on which the database server will listen ELOG_REPORT_LINK_SQL = 0 #Whether to write the SQLs that sent to remote database by DBLINKs into error log file DFS_HUGE_SUPPORT = 1 #Whether support huge table operation in DDFS EXTERNAL_JFUN_PORT = 6363 #DmAgent port for external java fun. EXTERNAL_AP_PORT = 4236 #DmAp port for external fun. DCRS_PORT_NUM = 6236 #Port number on which the database dcrs will listen AP_PORT_NUM = 0 #Port number on which the database ap will listen DW_PORT = 0 #Instance tcp port for watch2 DCP_PORT_NUM = 5237 #Port number on which DCP will listen HA_INST_CHECK_PORT = 65534 #HA instance check port [root@mogdb enmotech]# |
根据达梦官方手册的描述,为了保证在处理大量客户连接 时系统具有较短的响应时间,监听线程比普通线程优先级更高。从目前来看,DM8中只能有一个监听线程;在高并发场景,可能面临请求处理不及时的情况(大家知道在Oracle中,如果单个监听进程无法处理时,我们可以还有创建多个监听程序)。
2、dm_tskwrk_thd 工作线程
这是达梦数据库的核心线程;它从任务队列中取出任务,并根据任务的类型进行相应的处理,负责所有实际的数据相关操作。
DM8 的初始工作线程个数由配置文件指定,随着会话连接的增加,工作线程也会同步增加,以保持每个会话都有专门的工作线程处理请求。
看上去其默认值是cpu core的2倍。高并发场景下,该参数应该是需要调大的。
1 2 |
[root@mogdb enmotech]# cat dm.ini |grep TASK TASK_THREADS = 4 #Number Of Task Threads |
3、dm_io_thd 即IO thread
该线程也是核心线程,负责脏数据写出,类似Oracle中的DB writer进程。根据官方的手册描述,有如下几种情况下,需要IO线程来完成工作。
DM Server需要进行IO操作的时机主要有以下三种:
1)需要处理的数据页不在缓冲区中,此时需要将相关数据页读入缓冲区;
2)缓冲区满或系统关闭时,此时需要将部分脏数据页写入磁盘;
3)检查点到来时,需要将所有脏数据页写入磁盘。
通过如下2个参数来进行IO线程数量的控制;
1 2 3 |
[root@mogdb enmotech]# cat dm.ini |grep IO_THR IO_THR_GROUPS = 2 #The Number Of Io Thread Groups(Non-Windows Only) HIO_THR_GROUPS = 2 #The Number Of Huge Io Thread Groups(Non-Windows Only) |
这里简单创建测试表,并触发检查点,来观察一下IO thread的行为:
1 2 3 4 5 6 7 8 9 10 11 |
SQL> create table test0825 as select * from dba_objects; executed successfully used time: 68.693(ms). Execute id is 29. SQL> select checkpoint(20); LINEID CHECKPOINT(20) ---------- -------------- 1 0 used time: 9.623(ms). Execute id is 30. SQL> |
我们可以通过perf top -p 9956 来观察IO线程的函数调用情况:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
70.49% [kernel] [k] finish_task_switch 4.13% [kernel] [k] __do_softirq 2.58% [kernel] [k] _raw_spin_unlock_irqrestore 2.26% libpthread-2.17.so [.] pthread_mutex_lock 1.30% dmserver [.] ini_get_value 1.17% dmserver [.] trx4_clear_ptab_version 1.17% libpthread-2.17.so [.] __pthread_enable_asynccancel 1.16% [vdso] [.] __vdso_clock_gettime 1.16% libc-2.17.so [.] __GI___libc_poll 1.00% [kernel] [k] system_call_after_swapgs 0.78% [kernel] [k] __audit_syscall_exit 0.77% [kernel] [k] timespec_add_safe 0.77% dmserver [.] rlog4_log_get 0.76% dmserver [.] buf4_pool_is_empty_or_recycle_or_fast 0.69% [kernel] [k] do_select 0.67% [vdso] [.] __vdso_time 0.67% dmserver [.] uevent_reset 0.67% libpthread-2.17.so [.] _L_unlock_738 0.65% [kernel] [k] dput 0.59% libc-2.17.so [.] __localtime_r 0.57% libpthread-2.17.so [.] pthread_mutex_unlock 0.56% [kernel] [k] static_key_enabled 0.51% [kernel] [k] unroll_tree_refs 0.45% [kernel] [k] mntput 0.45% libc-2.17.so [.] __clock_gettime 0.39% [kernel] [k] __virt_addr_valid 0.39% [kernel] [k] sysret_signal 0.39% dmserver [.] capt_enter 0.39% libpthread-2.17.so [.] 0x000000000000ee41 0.35% libpthread-2.17.so [.] pthread_cond_timedwait@@GLIBC_2.3.2 0.34% [kernel] [k] plist_del 0.34% dmserver [.] ini_get_double_value 0.26% [kernel] [k] path_put 0.23% dmserver [.] buf4_pool_force_flush_pages 0.20% [kernel] [k] __audit_syscall_entry 0.20% dmserver [.] sys_get_sys_mode_low 0.14% [kernel] [k] sys_poll 0.14% [vdso] [.] __vdso_gettimeofday 0.12% dmserver [.] os_thread_sleep_low 0.12% dmserver [.] purg2_thread |
这里我创建一个测试表,插入数十万数据,触发全量检查点,对进程进行跟踪一下看看具体的数据写出行为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
SQL> select checkpoint(100); LINEID CHECKPOINT(100) ---------- --------------- 1 0 [root@mogdb yum.repos.d]# ps -ef|grep dms dmdba 9539 1 0 00:47 ? 00:00:06 /opt/dmdbms/bin/dmserver /opt/dm/dmdbms/data/enmotech/dm.ini -noconsole root 11437 11138 0 01:37 pts/2 00:00:00 grep --color=auto dms [root@mogdb yum.repos.d]# strace -fr -o /tmp/io_thread.log -p 9539 strace: Process 9539 attached with 26 threads ^Cstrace: Process 9539 detached strace: Process 9902 detached strace: Process 9956 detached strace: Process 9957 detached strace: Process 9958 detached strace: Process 9959 detached strace: Process 10051 detached strace: Process 10052 detached strace: Process 10151 detached 。。。。。 strace: Process 10213 detached strace: Process 10219 detached strace: Process 10220 detached strace: Process 10787 detached [root@mogdb ~]# ls -ltr /proc/9539/fd total 0 lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 9 -> /opt/dm/dmdbms/data/enmotech/enmotech03.log lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 8 -> /opt/dm/dmdbms/data/enmotech/enmotech02.log lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 7 -> /opt/dm/dmdbms/data/enmotech/enmotech01.log lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 6 -> /opt/dm/dmdbms/data/enmotech/TEMP.DBF lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 5 -> /opt/dm/dmdbms/data/enmotech/SYSTEM.DBF lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 4 -> socket:[62747] lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 3 -> socket:[62746] l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 2 -> /opt/dmdbms/log/DmServicedmdb.log lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 17 -> socket:[73233] l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 16 -> pipe:[64988] lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 15 -> pipe:[64988] l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 14 -> pipe:[64987] lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 13 -> pipe:[64987] lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 12 -> /opt/dm/dmdbms/data/enmotech/MAIN.DBF lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 11 -> /opt/dm/dmdbms/data/enmotech/ROLL.DBF lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 10 -> /opt/dm/dmdbms/data/enmotech/enmotech01.dbf l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 1 -> /opt/dmdbms/log/DmServicedmdb.log lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 0 -> /dev/null [root@mogdb ~]# cat /tmp/io_thread.log |grep "9956" |grep pwrite64|more 9956 0.000016 pwrite64(11, "\1\0\0\0\320\234\0\0\377\377\377\377\377\377\377\377\377\377\377\377$\0\0\0\0\0\0\0\31Q\260\2"..., 8192, 328859648 <unfinished ...> 9956 0.000005 <... pwrite64 resumed> ) = 8192 9956 0.000030 pwrite64(12, "\4\0\0\0000\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\25\0\0\0\0\0\0\0@Q\260\2"..., 8192, 393216 <unfinished ...> 9956 0.000012 <... pwrite64 resumed> ) = 8192 9956 0.000020 pwrite64(12, "\4\0\0\0001\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\32\32\32\32\0\0\0\0\206Q\260\2"..., 8192, 401408 <unfinished ...> 9956 0.000008 <... pwrite64 resumed> ) = 8192 9956 0.000017 pwrite64(12, "\4\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\23\0\0\0\0\0\0\0,R\260\2"..., 8192, 0 <unfinished ...> 9956 0.000007 <... pwrite64 resumed> ) = 8192 9956 0.000015 pwrite64(12, "\4\0\0\0q\0\0\0\0\0p\0\0\0\0\0r\0\0\0\24\0\0\0\0\0\0\0gQ\260\2"..., 8192, 925696 <unfinished ...> 9956 0.000008 <... pwrite64 resumed> ) = 8192 9956 0.000020 pwrite64(12, "\4\0\0\0\10\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\21\0\0\0\0\0\0\0,R\260\2"..., 8192, 65536 <unfinished ...> 9956 0.000008 <... pwrite64 resumed> ) = 8192 9956 0.000019 pwrite64(12, "\4\0\0\0r\0\0\0\0\0q\0\0\0\0\0s\0\0\0\24\0\0\0\0\0\0\0iQ\260\2"..., 8192, 933888 <unfinished ...> 9956 0.000008 <... pwrite64 resumed> ) = 8192 9956 0.000029 pwrite64(12, "\4\0\0\0\260\0\0\0\0\0\257\0\0\0\0\0\261\0\0\0\24\0\0\0\0\0\0\0\370Q\260\2"..., 8192, 1441792 <unfinished ...> 9956 0.000009 <... pwrite64 resumed> ) = 8192 9956 0.000029 pwrite64(12, "\4\0\0\0e\0\0\0\0\0d\0\0\0\0\0f\0\0\0\24\0\0\0\0\0\0\0KQ\260\2"..., 8192, 827392 <unfinished ...> 9956 0.000008 <... pwrite64 resumed> ) = 8192 9956 0.000105 pwrite64(12, "\4\0\0\0\242\0\0\0\0\0\241\0\0\0\0\0\243\0\0\0\24\0\0\0\0\0\0\0\330Q\260\2"..., 8192, 1327104) = 8192 9956 0.000019 pwrite64(12, "\4\0\0\0f\0\0\0\0\0e\0\0\0\0\0g\0\0\0\24\0\0\0\0\0\0\0MQ\260\2"..., 8192, 835584 <unfinished ...> 9956 0.000007 <... pwrite64 resumed> ) = 8192 9956 0.000103 pwrite64(12, "\4\0\0\0\243\0\0\0\0\0\242\0\0\0\0\0\244\0\0\0\24\0\0\0\0\0\0\0\332Q\260\2"..., 8192, 1335296) = 8192 9956 0.000019 pwrite64(12, "\4\0\0\0g\0\0\0\0\0f\0\0\0\0\0h\0\0\0\24\0\0\0\0\0\0\0OQ\260\2"..., 8192, 843776 <unfinished ...> 9956 0.000008 <... pwrite64 resumed> ) = 8192 9956 0.000210 pwrite64(12, "\4\0\0\0\244\0\0\0\0\0\243\0\0\0\0\0\245\0\0\0\24\0\0\0\0\0\0\0\334Q\260\2"..., 8192, 1343488) = 8192 9956 0.000022 pwrite64(12, "\4\0\0\0h\0\0\0\0\0g\0\0\0\0\0i\0\0\0\24\0\0\0\0\0\0\0QQ\260\2"..., 8192, 851968 <unfinished ...> 9956 0.000009 <... pwrite64 resumed> ) = 8192 9956 0.000119 pwrite64(12, "\4\0\0\0\245\0\0\0\0\0\244\0\0\0\0\0\246\0\0\0\24\0\0\0\0\0\0\0\336Q\260\2"..., 8192, 1351680) = 8192 9956 0.000013 pwrite64(12, "\4\0\0\0i\0\0\0\0\0h\0\0\0\0\0j\0\0\0\24\0\0\0\0\0\0\0SQ\260\2"..., 8192, 860160 <unfinished ...> 9956 0.000005 <... pwrite64 resumed> ) = 8192 9956 0.000070 pwrite64(12, "\4\0\0\0\246\0\0\0\0\0\245\0\0\0\0\0\247\0\0\0\24\0\0\0\0\0\0\0\340Q\260\2"..., 8192, 1359872) = 8192 9956 0.000019 pwrite64(12, "\4\0\0\0j\0\0\0\0\0i\0\0\0\0\0k\0\0\0\24\0\0\0\0\0\0\0UQ\260\2"..., 8192, 868352 <unfinished ...> 9956 0.000027 <... pwrite64 resumed> ) = 8192 9956 0.000110 pwrite64(12, "\4\0\0\0\247\0\0\0\0\0\246\0\0\0\0\0\250\0\0\0\24\0\0\0\0\0\0\0\345Q\260\2"..., 8192, 1368064) = 8192 9956 0.000019 pwrite64(12, "\4\0\0\0001\0\0\0\377\377\377\377\377\377\377\377\377\377\377\377\32\32\32\32\0\0\0\0lS\260\2"..., 8192, 401408 <unfinished ...> 9956 0.000009 <... pwrite64 resumed> ) = 8192 9956 0.000016 pwrite64(12, "\4\0\0\0y\0\0\0\0\0x\0\0\0\0\0z\0\0\0\24\0\0\0\0\0\0\0zQ\260\2"..., 8192, 991232 <unfinished ...> 9956 0.000007 <... pwrite64 resumed> ) = 8192 9956 0.000013 pwrite64(12, "\4\0\0\0\267\0\0\0\0\0\266\0\0\0\0\0\270\0\0\0\24\0\0\0\0\0\0\0\tR\260\2"..., 8192, 1499136 <unfinished ...> 9956 0.000005 <... pwrite64 resumed> ) = 8192 9956 0.000013 pwrite64(12, "\4\0\0\0\361\0\0\0\0\0\360\0\0\0\0\0\362\0\0\0\24\0\0\0\0\0\0\0\223R\260\2"..., 8192, 1974272 <unfinished ...> 。。。。。。 SQL> select name,value,DESCRIPTION from v$parameter where NAME like '%PAGE_SIZE%'; LINEID NAME VALUE DESCRIPTION ---------- --------------------- ---------- --------------------- 1 GLOBAL_PAGE_SIZE_MASK 4294959104 global_page_size_mask 2 GLOBAL_PAGE_SIZE 8192 global_page_size 3 GLOBAL_LOG_PAGE_SIZE 512 global_log_page_size |
看上去这里DM的IO线程是单page写入(DM默认page size是8k);居然不会进行IO合并。这看起来不太对,否则性能太低。使用blktrace来深入分析看看IO情况。
1 2 3 4 5 |
[root@mogdb tmp]# blktrace -d /dev/sda3 ^C=== sda3 === CPU 0: 3116 events, 147 KiB data CPU 1: 7807 events, 366 KiB data Total: 10923 events (dropped 0), 513 KiB data |
通过btt格式化后可以进一步进行观察btt -i sda3.blktrace.0 -B sda3_io_thread.log
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
[root@mogdb tmp]# blkparse -i sda3.blktrace.0 |grep dm_io_thd|more 8,3 0 156 8.082262362 9956 Q WS 55248800 + 1024 [dm_io_thd] 8,3 0 157 8.082264411 9956 G WS 55248800 + 1024 [dm_io_thd] 8,3 0 158 8.082265881 9956 I WS 55248800 + 1024 [dm_io_thd] 8,3 0 159 8.082266808 9956 D WS 55248800 + 1024 [dm_io_thd] 8,3 0 161 8.082361026 9956 Q WS 55249824 + 456 [dm_io_thd] 8,3 0 162 8.082362306 9956 G WS 55249824 + 456 [dm_io_thd] 8,3 0 163 8.082363357 9956 I WS 55249824 + 456 [dm_io_thd] 8,3 0 164 8.082364168 9956 D WS 55249824 + 456 [dm_io_thd] 8,3 0 168 8.082943188 9956 Q WSM 52753552 + 9 [dm_io_thd] 8,3 0 169 8.082944128 9956 G WSM 52753552 + 9 [dm_io_thd] 8,3 0 170 8.082944516 9956 P N [dm_io_thd] 8,3 0 171 8.082945063 9956 I WSM 52753552 + 9 [dm_io_thd] 8,3 0 172 8.082945461 9956 U N [dm_io_thd] 1 8,3 0 173 8.082945848 9956 D WSM 52753552 + 9 [dm_io_thd] 8,3 1 358 8.122993041 9957 Q WS 55265344 + 136 [dm_io_thd] ...... 8,3 1 387 8.123178082 9957 G WS 55265616 + 16 [dm_io_thd] 8,3 1 388 8.123178399 9957 I WS 55265616 + 16 [dm_io_thd] 8,3 1 389 8.123178805 9957 D WS 55265616 + 16 [dm_io_thd] 8,3 1 393 8.123224770 9957 Q WS 55265648 + 16 [dm_io_thd] 8,3 1 394 8.123225332 9957 G WS 55265648 + 16 [dm_io_thd] 8,3 1 395 8.123225650 9957 I WS 55265648 + 16 [dm_io_thd] 8,3 1 396 8.123226082 9957 D WS 55265648 + 16 [dm_io_thd] 8,3 1 398 8.123257086 9957 Q WS 55265680 + 16 [dm_io_thd] [root@mogdb tmp]# cat sda3_io_thread.log_8,3_w.dat|more 7420.463667378 81083264 81083280 7420.463748341 90913576 90913592 7420.464266588 52873936 52873952 7420.464338445 52874072 52874088 7420.464624548 54613992 54614072 7420.464725512 54587992 54588152 7420.465154622 54588152 54588312 7420.465240773 54588312 54588472 7420.465317470 54588472 54588632 7420.465685249 54641704 54641896 7420.465762457 54641896 54642072 7420.465836406 54642072 54642232 7420.465926814 54642232 54642424 7420.466335284 54642424 54642584 --以这里为例,前后相减,大概是160个IO操作;每次IO size 512 byte,那么应该是81920 bytes。 7420.466460526 55242640 55243664 7420.466763160 55243664 55244688 ...... [root@mogdb tmp]# cat /sys/block/sda/queue/physical_block_size 512 [root@mogdb tmp]# cat /sys/block/sda/queue/logical_block_size 512 [root@mogdb tmp]# |
由此可见,还是会存在IO合并的操作,不是单page写入,page应该是最小IO单元。
4、dm_sched_thd 调度进程
类似Oracle中的job 进程,负责定时任务的调度,默认情况下每秒轮询一次。不过达梦数据库中的调度线程,
不单单是这些功能,从官方手册来看,包括了如下一系列功能:
1) 检查系统级的时间触发器,如果满足触发条件则生成任务加到工作线程的任务队列 由工作线程执行;
2) 清理 SQL 缓存、计划缓存中失效的项,或者超出缓存限制后淘汰不常用的缓存项;
3) 检查数据重演捕获持续时间是否到期,到期则自动停止捕获;
4) 执行动态缓冲区检查。根据需要动态扩展或动态收缩系统缓冲池;
5) 自动执行检查点。为了保证日志的及时刷盘,减少系统故障时恢复时间,根据 INI参数设置的自动检查点执行间隔定期执行检查点操作;
6) 会话超时检测。当客户连接设置了连接超时时,定期检测是否超时,如果超时则自动断开连接;
7) 必要时执行数据更新页刷盘;
8) 唤醒等待的工作线程
可以通过perf top -p 10219 来简单窥视一下调度进程的一些函数调用:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
68.73% [kernel] [k] finish_task_switch 2.66% [kernel] [k] __do_softirq 2.52% [kernel] [k] _raw_spin_unlock_irqrestore 1.92% libpthread-2.17.so [.] pthread_mutex_lock 1.77% [kernel] [k] __audit_syscall_exit 1.73% dmserver [.] buf4_pool_is_empty_or_recycle_or_fast 1.42% libpthread-2.17.so [.] pthread_mutex_unlock 1.22% libpthread-2.17.so [.] pthread_cond_timedwait@@GLIBC_2.3.2 1.10% [kernel] [k] static_key_enabled 1.00% libc-2.17.so [.] __GI___libc_poll 0.85% dmserver [.] ini_get_value 0.85% dmserver [.] ntsk_leave 0.83% [kernel] [k] fget_light 0.75% [kernel] [k] do_sys_poll 0.75% dmserver [.] mal_lsnr_port_check 0.70% libpthread-2.17.so [.] __pthread_mutex_cond_lock 0.69% [kernel] [k] __check_object_size 0.69% [kernel] [k] unroll_tree_refs 0.64% [kernel] [k] futex_wake 0.57% dmserver [.] ini_get_double_value 0.57% libc-2.17.so [.] __clock_gettime 0.55% [kernel] [k] auditsys 0.55% [kernel] [k] __audit_syscall_entry 0.51% [kernel] [k] wake_q_add 0.50% dmserver [.] buf4_force_flush_pages 0.50% libc-2.17.so [.] __memcpy_ssse3_back 0.50% libpthread-2.17.so [.] __pthread_disable_asynccancel 0.49% dmserver [.] purg2_crash_cmt_trx 0.44% [kernel] [k] __virt_addr_valid 0.44% [kernel] [k] hash_futex 0.38% [kernel] [k] __smp_mb__after_atomic 0.38% [vdso] [.] __vdso_clock_gettime 0.38% dmserver [.] trx4_clear_ptab_version 0.29% dmserver [.] clock_gettime@plt 0.22% [kernel] [k] sys_select 0.20% dmserver [.] os_event2_reset 0.17% [kernel] [k] ret_from_sys_call 0.17% dmserver [.] rlog4_flush_for_db_low 0.17% dmserver [.] trx4_active_view_broadcast_thread 0.15% [kernel] [k] _copy_from_user 0.15% [kernel] [k] _raw_qspin_lock |
5、dm_redolog_thd 即日志flush 线程
该线程作用类似Oracle LGWR进程,负责将日志缓冲区中的buffer写出到redo日志文件中,用于确保事务中断或异常时可以进行恢复,保证事务一致性。
在DM8之前的版本中,是没有redolog flush线程的,统一由IO线程负责。DM8中对这个性能瓶颈点进行了优化拆分,将redolog flush线程独立了出来;从这点来看DM8
在日志写方面比之前老版本应该有较大的性能提升。不过仍然是一个单线程。当redo产生量极大的情况下,单线程应该还是会存在瓶颈。
大家知道Oracle 12c版本之前,redo buffer的写出进程lgwr也是单进程,从12c版本开始,Oracle引入了多进程模式,解决了最后一个瓶颈。
那么多redolog flush线程会合并写入吗? 这里看看过滤看看前面的strace跟踪即可:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 |
[root@mogdb ~]# cat /tmp/io_thread.log |grep "10052" |grep pwrite64|more 10052 0.000447 pwrite64(7, "\0,\2\0.*\2\0\337\233W\0236\240\332\10\1\0\0\0\0\0m\344\n\0\0\0\0\0m\344"..., 142336, 2559817216) = 142336 10052 0.000914 pwrite64(7, "\0R\4\0\334P\4\0\337\233W\0236\240\332\10\1\0\0\0\0\0n\344\n\0\0\0\0\0n\344"..., 283136, 2559959552) = 283136 10052 0.000270 pwrite64(7, "\0\354\7\0\23\352\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0o\344\n\0\0\0\0\0o\344"..., 519168, 2560242688) = 519168 10052 0.000010 pwrite64(7, "\0\276\0\0F\275\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0p\344\n\0\0\0\0\0p\344"..., 48640, 2560761856 <unfinished ...> 10052 0.003413 <... pwrite64 resumed> ) = 48640 10052 0.000116 pwrite64(7, "\0\4\0\0y\2\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0q\344\n\0\0\0\0\0q\344"..., 1024, 2560810496) = 1024 10052 0.000527 pwrite64(7, "\0\2\0\0\257\0\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0r\344\n\0\0\0\0\0r\344"..., 512, 2560811520) = 512 10052 0.000409 pwrite64(7, "\0\0\10\0\305\376\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0s\344\n\0\0\0\0\0s\344"..., 524288, 2560812032 <unfinished ...> 10052 0.000564 <... pwrite64 resumed> ) = 524288 10052 0.000351 pwrite64(7, "\0\374\7\0\323\372\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0t\344\n\0\0\0\0\0t\344"..., 523264, 2561336320) = 523264 10052 0.000373 pwrite64(7, "\0R\1\0&Q\1\0\337\233W\0236\240\332\10\1\0\0\0\0\0u\344\n\0\0\0\0\0u\344"..., 86528, 2561859584) = 86528 10052 0.002699 pwrite64(7, "\0\340\17\0\221\336\17\0\337\233W\0236\240\332\10\1\0\1\0\0\0v\344\n\0\0\0\0\0v\344"..., 1040384, 2561946112) = 1040384 10052 0.000019 pwrite64(7, "\0\336\17\0\334\334\17\0\337\233W\0236\240\332\10\1\0\1\0\0\0w\344\n\0\0\0\0\0w\344"..., 1039872, 2562986496 <unfinished ...> 10052 0.000426 <... pwrite64 resumed> ) = 1039872 10052 0.000392 pwrite64(7, "\0\4\3\0\234\3\3\0\337\233W\0236\240\332\10\1\0\0\0\0\0x\344\n\0\0\0\0\0x\344"..., 197632, 2564026368) = 197632 10052 0.000237 pwrite64(7, "\0\4\0\0\t\3\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0y\344\n\0\0\0\0\0y\344"..., 1024, 2564224000) = 1024 10052 0.000125 pwrite64(7, "\0\2\0\0\257\0\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0z\344\n\0\0\0\0\0z\344"..., 512, 2564225024) = 512 10052 0.002194 pwrite64(7, "\0\374\7\0005\373\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0{\344\n\0\0\0\0\0{\344"..., 523264, 2564225536) = 523264 10052 0.001846 pwrite64(7, "\0\364\17\0!\363\17\0\337\233W\0236\240\332\10\1\0\1\0\0\0|\344\n\0\0\0\0\0|\344"..., 1045504, 2564748800) = 1045504 10052 0.001553 pwrite64(7, "\0\354\7\0\1\353\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0}\344\n\0\0\0\0\0}\344"..., 519168, 2565794304) = 519168 10052 0.001214 pwrite64(7, "\0\362\7\0\240\360\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0~\344\n\0\0\0\0\0~\344"..., 520704, 2566313472) = 520704 10052 0.000467 pwrite64(7, "\0\366\7\0\210\365\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0\177\344\n\0\0\0\0\0\177\344"..., 521728, 2566834176) = 521728 10052 0.000331 pwrite64(7, "\0\354\7\0`\352\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0\200\344\n\0\0\0\0\0\200\344"..., 519168, 2567355904) = 519168 10052 0.000304 pwrite64(7, "\0\360\7\0'\357\7\0\337\233W\0236\240\332\10\1\0\0\0\0\0\201\344\n\0\0\0\0\0\201\344"..., 520192, 2567875072) = 520192 10052 0.000895 pwrite64(7, "\0\\\5\0c[\5\0\337\233W\0236\240\332\10\1\0\0\0\0\0\202\344\n\0\0\0\0\0\202\344"..., 351232, 2568395264) = 351232 10052 0.000167 pwrite64(7, "\0\4\0\0\1\2\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\203\344\n\0\0\0\0\0\203\344"..., 1024, 2568746496) = 1024 10052 0.000169 pwrite64(7, "\0\2\0\0\257\0\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\204\344\n\0\0\0\0\0\204\344"..., 512, 2568747520) = 512 10052 0.000027 pwrite64(7, "\0\24\0\0\270\23\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\205\344\n\0\0\0\0\0\205\344"..., 5120, 2568748032) = 5120 10052 0.000026 pwrite64(7, "\0\16\0\0\200\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\206\344\n\0\0\0\0\0\206\344"..., 3584, 2568753152) = 3584 10052 0.000027 pwrite64(7, "\0\16\0\0\200\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\207\344\n\0\0\0\0\0\207\344"..., 3584, 2568756736) = 3584 10052 0.000027 pwrite64(7, "\0\16\0\0\200\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\210\344\n\0\0\0\0\0\210\344"..., 3584, 2568760320) = 3584 10052 0.000051 pwrite64(7, "\0\16\0\0\230\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\211\344\n\0\0\0\0\0\211\344"..., 3584, 2568763904) = 3584 10052 0.000027 pwrite64(7, "\0\16\0\0\200\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\212\344\n\0\0\0\0\0\212\344"..., 3584, 2568767488) = 3584 10052 0.000026 pwrite64(7, "\0\16\0\0008\f\0\0\337\233W\0236\240\332\10\1\0\0\0\0\0\213\344\n\0\0\0\0\0\213\344"..., 3584, 2568771072) = 3584 [root@mogdb ~]# ls -ltr /proc/9539/fd total 0 lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 9 -> /opt/dm/dmdbms/data/enmotech/enmotech03.log lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 8 -> /opt/dm/dmdbms/data/enmotech/enmotech02.log lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 7 -> /opt/dm/dmdbms/data/enmotech/enmotech01.log lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 6 -> /opt/dm/dmdbms/data/enmotech/TEMP.DBF lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 5 -> /opt/dm/dmdbms/data/enmotech/SYSTEM.DBF lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 4 -> socket:[62747] lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 3 -> socket:[62746] l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 2 -> /opt/dmdbms/log/DmServicedmdb.log lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 17 -> socket:[73233] l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 16 -> pipe:[64988] lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 15 -> pipe:[64988] l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 14 -> pipe:[64987] lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 13 -> pipe:[64987] lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 12 -> /opt/dm/dmdbms/data/enmotech/MAIN.DBF lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 11 -> /opt/dm/dmdbms/data/enmotech/ROLL.DBF lrwx------. 1 dmdba dinstall 64 Aug 25 01:41 10 -> /opt/dm/dmdbms/data/enmotech/enmotech01.dbf l-wx------. 1 dmdba dinstall 64 Aug 25 01:41 1 -> /opt/dmdbms/log/DmServicedmdb.log lr-x------. 1 dmdba dinstall 64 Aug 25 01:41 0 -> /dev/null [root@mogdb ~]# |
可见达梦8中,redolog flush线程会进行日志缓冲的合并写出。这也是正常的处理逻辑,否则性能上不去的。最后还有一些其他线程,如dm_chkpnt_thd 即检查点线程;purge线程,则主要是负责回滚段的清理;归档线程、审计线程等等。
总结:
看达梦数据库的14类线程,不少核心线程原理似乎跟Oracle类似(实际上关系型数据库应该都差不多);不过仍然有一些差别。
这里不发表太多评论,防止被喷。从上面测试来看,监听线程,redolog flush线程 是可能存在瓶颈的点。
Leave a Reply
You must be logged in to post a comment.