Linux服务器卡死报错:INFO: task blocked for more than 120 seconds

1、现象:

系统假死,可以ping通,但ssh无响应,通过VNC进去看到系统状态如下

task blocked for more than 120 seconds

 

2、问题原因:

关键信息是“hung_task_timeout_secs”,通过在网上搜索资料,发现这是linux kernel的一个bug。摘抄一段:

By default Linux uses up to 40% of the available memory for file system caching.

After this mark has been reached the file system flushes all outstanding data to disk causing all following IOs going synchronous.

For flushing out this data to disk this there is a time limit of 120 seconds by default.

In the case here the IO subsystem is not fast enough to flush the data withing 120 seconds.

This especially happens on systems with a lot of memory.

The problem is solved in later kernels

翻译过来就是:一般情况下,linux会把可用内存的40%的空间作为文件系统的缓存。当缓存快满时,文件系统将缓存中的数据整体同步到磁盘中。但是系统对同步时间有最大120秒的限制。如果文件系统不能在时间限制之内完成数据同步,则会发生上述的错误。这通常发生在内存很大的系统上。系统内存大,则缓冲区大,同步数据所需要的时间就越长,超时的概率就越大。

 

3、解决办法

根据应用程序情况,对vm.dirty_ratio,vm.dirty_background_ratio两个参数进行调优设置。 推荐如下设置,重启系统生效。

# vi /etc/sysctrl.conf 

vm.dirty_background_ratio = 5 
vm.dirty_ratio = 10
阅读剩余
THE END