经验首页 前端设计 程序设计 Java相关 移动开发 数据库/运维 软件/图像 大数据/云计算 其他经验
当前位置:技术经验 » 数据库/运维 » Oracle » 查看文章
log file sync等待超高案例浅析
来源:cnblogs  作者:潇湘隐者  时间:2019/9/10 10:50:07  对本文有异议

监控工具DPA发现海外一台Oracle数据库服务器DB Commit Time指标告警,超过红色告警线(40毫秒左右,黄色告警是10毫秒,红色告警线是20毫秒),如下截图所示,生成了对应的时段的AWR报告,发现Top 5 Timed Events里面,log file sync等待事件的平均等待时间为37毫秒,log file parallel write等待事件的平均等待时间为40毫秒

 

clip_image001

 

clip_image002

 

 

如果对Tanel Poder::Understanding LGWR, Log File Sync Waits and Commit Performance这篇文章所讲述的内容很熟悉的话(经典图如下),那么通过等待事件log file sync与log file parallel write的Avg Wait(ms)指标: 37ms & 40ms,基本上可以判断就是redo log所在的磁盘I/O出现了性能问题

 

 

clip_image003

 

 

然后在用lfsdiag.sql脚本分析一下详细的统计数据,如下所示:

 

clip_image004

 

 

 

  1. HISTOGRAM DATA FOR LFS AND OTHER RELATED WAITS:
  1.  
  1. APPROACH: Look at the wait distribution for log file sync waits
  1. by looking at "wait_time_milli". Look at the high wait times then
  1. see if you can correlate those with other related wait events.
  1.  
  1.    INST_ID EVENT                                    WAIT_TIME_MILLI WAIT_COUNT
  1. ---------- ---------------------------------------- --------------- ----------
  1.          1 log file sync                                          1       4363
  1.          1 log file sync                                          2        835
  1.          1 log file sync                                          4       1650
  1.          1 log file sync                                          8       4937
  1.          1 log file sync                                         16     146252
  1.          1 log file sync                                         32     606674
  1.          1 log file sync                                         64     263377
  1.          1 log file sync                                        128     253254
  1.          1 log file sync                                        256          2
  1.          1 log file switch completion                             1        124
  1.          1 log file switch completion                             2          9
  1.          1 log file switch completion                             4         19
  1.          1 log file switch completion                             8         21
  1.          1 log file switch completion                            16         35
  1.          1 log file switch completion                            32         97
  1.          1 log file switch completion                            64        133
  1.          1 log file switch completion                           128        326
  1.          1 log file switch completion                           256       1736
  1.          1 log file switch completion                           512       3042
  1.          1 log file switch completion                          1024       2020
  1.          1 log file parallel write                                1          0
  1.          1 log file parallel write                                2          0
  1.          1 log file parallel write                                4         80
  1.          1 log file parallel write                                8       2142
  1.          1 log file parallel write                               16     170987
  1.          1 log file parallel write                               32     779205
  1.          1 log file parallel write                               64     311463
  1.          1 log file parallel write                              128      79688
  1.          1 log file parallel write                              256      42763
  1.          1 log file parallel write                              512      13052
  1.          1 log file parallel write                             1024      20468
  1.          1 log file parallel write                             2048      14020
  1.          1 log file parallel write                             4096        921
  1.          1 log file parallel write                             8192         96
  1.          1 log file parallel write                            16384         18
  1.          1 log file parallel write                            32768         18
  1.          1 log file parallel write                            65536          8
  1.          1 log file parallel write                           131072          2
  1.          1 LGWR wait for redo copy                                1       8516
  1.          1 LGWR wait for redo copy                                2         20
  1.          1 LGWR wait for redo copy                                4         19
  1.          1 LGWR wait for redo copy                                8         20
  1.          1 LGWR wait for redo copy                               16         11
  1.  
  1. ORDERED BY WAIT_TIME_MILLI
  1.  
  1.    INST_ID EVENT                                    WAIT_TIME_MILLI WAIT_COUNT
  1. ---------- ---------------------------------------- --------------- ----------
  1.          1 log file sync                                          1       4363
  1.          1 log file switch completion                             1        124
  1.          1 log file parallel write                                1          0
  1.          1 LGWR wait for redo copy                                1       8516
  1.          1 log file sync                                          2        835
  1.          1 log file switch completion                             2          9
  1.          1 log file parallel write                                2          0
  1.          1 LGWR wait for redo copy                                2         20
  1.          1 log file sync                                          4       1650
  1.          1 log file switch completion                             4         19
  1.          1 log file parallel write                                4         80
  1.          1 LGWR wait for redo copy                                4         19
  1.          1 log file sync                                          8       4937
  1.          1 log file switch completion                             8         21
  1.          1 log file parallel write                                8       2142
  1.          1 LGWR wait for redo copy                                8         20
  1.          1 log file sync                                         16     146252
  1.          1 log file switch completion                            16         35
  1.          1 log file parallel write                               16     170987
  1.          1 LGWR wait for redo copy                               16         11
  1.          1 log file sync                                         32     606674
  1.          1 log file switch completion                            32         97
  1.          1 log file parallel write                               32     779205
  1.          1 log file sync                                         64     263377
  1.          1 log file switch completion                            64        133
  1.          1 log file parallel write                               64     311463
  1.          1 log file sync                                        128     253254
  1.          1 log file switch completion                           128        326
  1.          1 log file parallel write                              128      79688
  1.          1 log file sync                                        256          2
  1.          1 log file switch completion                           256       1736
  1.          1 log file parallel write                              256      42763
  1.          1 log file switch completion                           512       3042
  1.          1 log file parallel write                              512      13052
  1.          1 log file switch completion                          1024       2020
  1.          1 log file parallel write                             1024      20468
  1.          1 log file parallel write                             2048      14020
  1.          1 log file parallel write                             4096        921
  1.          1 log file parallel write                             8192         96
  1.          1 log file parallel write                            16384         18
  1.          1 log file parallel write                            32768         18
  1.          1 log file parallel write                            65536          8
  1.          1 log file parallel write                           131072          2
  1.  
  1. REDO WRITE STATS
  1.  
  1. "redo write time" in centiseconds (100 per second)
  1. 11.1: "redo write broadcast ack time" in centiseconds (100 per second)
  1. 11.2: "redo write broadcast ack time" in microseconds (1000 per millisecond)
  1.  
  1. VERSION              INST_ID NAME                                                     VALUE        MILLISECONDS
  1. ----------------- ---------- ---------------------------------------- --------------------- -------------------
  1. 10.2.0.5.0                 1 redo write time                                        9551524        95515240.000
  1. 10.2.0.5.0                 1 redo writer latching time                                   51
  1. 10.2.0.5.0                 1 redo writes                                            1434931
  1.  
  1. AWR WORST AVG LOG FILE SYNC SNAPS:

 

上面数据可以看到,log file sync等待事件数量最多的是32ms这个区间的,log file parallel write等待事件发生最多的也是32ms这个区间的,其实这个值已经远远超过7ms,极其不正常。log file parallel write 事件是LGWR进程专属的等待事件,发生在LGWR将log_buffer中的重做日志信息写入联机重做日志文件组的成员文件,LGWR在该事件上等待该写入过程的完成。该事件等待时间过长,说明日志文件所在磁盘缓慢或存在争用。log file synclog file parallel write是相互关联的。换句话讲,假设log file parallel write的时间非常长,那么必定导致log file sync等待时间拉长。如果log file parallel write 等待非常高,那么可能一般是物理磁盘I/O的问题

 

另外,我们也检查了一下redo log的切换频率,如下所示,redolog_sitch_time.sql查看发现redo log切换的次数并不频繁,生成的归档日志的量也并不大。大部分时候一小时切换零次或一次。

 

image

 

 

然后我们找了一台机器(上述指标正常的服务器)简单对测试了一下IO的速度,这个方法极其简单,就是看看生成一个大文件需要多长时间,简单测试一下I/O性能(没有考虑cache等,测试采样也不详尽),但是对比数据也基本能验证、反馈磁盘IO存在问题)。

 

 

 

问题服务器:

 

 

# time dd if=/dev/zero of=./test bs=512k count=2048 oflag=direct

2048+0 records in

2048+0 records out

1073741824 bytes (1.1 GB) copied, 88.271 seconds, 12.2 MB/s

 

real    1m28.273s

user    0m0.010s

sys     0m0.655s

 

 

 

对比服务器(正常的服务器):

 

 

 

 

# time dd if=/dev/zero of=./test bs=512k count=2048 oflag=direct

2048+0 records in

2048+0 records out

1073741824 bytes (1.1 GB) copied, 2.48344 seconds, 432 MB/s

 

real    0m2.485s

user    0m0.004s

sys     0m0.386s

 

 

  如上对比所示,两台服务器生成同样一个大小文件,耗费的时间,I/O性能差别非常大,完全验证了告警的服务器所在的存储I/O存在性能问题,但是公司分工非常明确,DBA也不清楚底层存储出了什么问题,只能将这个问题反馈出来,等待海外负责维护系统和存储的同事的回复。

 

 

原文链接:http://www.cnblogs.com/kerrycode/p/11484066.html

 友情链接:直通硅谷  点职佳  北美留学生论坛

本站QQ群:前端 618073944 | Java 606181507 | Python 626812652 | C/C++ 612253063 | 微信 634508462 | 苹果 692586424 | C#/.net 182808419 | PHP 305140648 | 运维 608723728

W3xue 的所有内容仅供测试,对任何法律问题及风险不承担任何责任。通过使用本站内容随之而来的风险与本站无关。
关于我们  |  意见建议  |  捐助我们  |  报错有奖  |  广告合作、友情链接(目前9元/月)请联系QQ:27243702 沸活量
皖ICP备17017327号-2 皖公网安备34020702000426号