由于网络隔离、StorageGRID设备意外重新启动
适用场景
- NetApp StorageGRID 11.5及更高版本
- NetApp StorageGRID设备
问题描述
StorageGRID 设备节点会自行重新启动、而没有任何明显的原因。
在节点日志(/var/log/storagegrid/nodes/<nodename>.log
在基础操作系统中)中、可以观察到以下情况:
[2021-06-14T12:36:14.818704] INFO -- Possible network isolation: Node has no contact with other nodes. If this warning persists, use the /usr/sbin/add_node_ip.py command to tell this node the address of another node in the grid. See the Recovery and Maintenance Guide for details.
[2021-06-14T12:36:14.818919] INFO -- 2021-06-14 12:36:14 +0000 | dynip | Possible network isolation: Node has no contact with other nodes.
[2021-06-14T12:36:30.821317] INFO -- Node service caught SIGTERM
[2021-06-14T12:36:30.841484] INFO -- Node service caught SIGTERM
[2021-06-14T12:36:30.841436] WARN -- Got socket error 4 with message Interrupted system call
首次记录的隔离事件与 重新启动(SIG特M)之间至少应相差10分钟。