Ceph常见故障处理
Ceph常见故障处理
本文档介绍 Ceph 集群常见故障的处理方法。
参考资源
1. full ratio(s) out of order 问题分析处理
查看相关配置:
ceph daemon osd.0 config show | grep full 输出示例:
"mon_cache_target_full_warn_ratio": "0.660000", "mon_osd_backfillfull_ratio": "0.960000", "mon_osd_full_ratio": "0.970000", "mon_osd_nearfull_ratio": "0.950000", "osd_debug_skip_full_check_in_backfill_reservation": "false", "osd_debug_skip_full_check_in_recovery": "false", "osd_failsafe_full_ratio": "0.980000", "osd_pool_default_cache_target_full_ratio": "0.800000", "paxos_stash_full_interval": "25", 设置参数值(方法一):
ceph osd set-full-ratio .97 ceph osd set-backfillfull-ratio .96 ceph osd set-nearfull-ratio .95 设置参数值(方法二,立即生效):
ceph tell osd.* injectargs "--mon_osd_full_ratio .97" ceph tell osd.* injectargs "--mon_osd_nearfull_ratio .95" ceph tell osd.* injectargs "--mon_osd_backfillfull_ratio .96" 2. ‘n’ daemons have recently crashed
获取相关信息:
ceph crash ls-new 查看详情:
ceph crash info <crash-id> 清除错误:
ceph crash archive <crash-id> 清除所有:
ceph crash archive-all 3. 做 recover 会触发阻塞,引起前端的 IO 卡住
可以通过参数避免这个情况:
# the number of entries to keep in the pg log when trimming it. Defaults to 3000. osd_min_pg_log_entries = 1 # the max entries, say when degraded, before we trim. Defaults to 10000. osd_max_pg_log_entries = 2 # 是否需要做backfill是通过pg log判断的,可以通过调整上面的参数来强制做backfill 本文由作者按照 CC BY 4.0 进行授权