针对SGF6112存储节点触发意外节点重新启动警报
适用场景
- NetApp StorageGRID
- StorageGRID设备GF6112
问题描述
- StorageGRID UI 会针对一个或多个GF6112 存储节点报告"
Unexpected node reboot
"警报。 - 重新启动发生在每月的第一个星期日左右。
- 存储节点基本操作系统
syslog
日志指示 节点重新启动之前每月的内部RAID擦除(mdadm checkarray
)操作正在运行:
Aug 4 00:57:01 localhost CRON[2431869]: (root) CMD (if [ -x /usr/share/mdadm/checkarray ] && [ $(date +%d) -le 7 ]; then /usr/share/mdadm/checkarray --cron --all --idle --quiet; fi)
Aug 4 08:00:53 localhost kernel: [992624.118933] md: delaying data-check of md124 until md123 has finished (they share one or more physical units)
Aug 4 08:00:53 localhost kernel: [992624.118935] md: delaying data-check of md120 until md123 has finished (they share one or more physical units)
Aug 4 08:00:53 localhost kernel: [992624.118943] md: delaying data-check of md121 until md120 has finished (they share one or more physical units)
Aug 4 08:00:53 localhost kernel: [992624.118946] md: delaying data-check of md125 until md120 has finished (they share one or more physical units)
Aug 4 08:00:53 localhost kernel: [992624.118949] md: data-check of RAID array md123
.
Aug 4 08:53:06 localhost kernel: [995757.911186] md: delaying data-check of md127 until md115 has finished (they share one or more physical units)
Aug 4 08:53:06 localhost kernel: [995757.911195] md: delaying data-check of md118 until md115 has finished (they share one or more physical units)
Aug 4 08:53:06 localhost kernel: [995757.911204] md: delaying data-check of md122 until md115 has finished (they share one or more physical units)
Aug 4 08:53:06 localhost kernel: [995757.911232] md: data-check of RAID array md116
### REBOOT
Aug 4 10:16:19 localhost kernel: [ 0.000000] Linux version 5.10.0-26-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.197-1+ntap1 (2023-10-30)