由于丢失 ACP 通信而重新启动 SP
适用场景
- ONTAP 9
- 服务处理器 (SP)
问题描述
- 清除 ACP 警报问题描述后, SP 重新启动。EMS 日志 示例:
[node_name-01: dsa_worker1: ses.status.ACPError:alert]: DS2246 (S/N SHFHU0123456789) shelf 0 on channel 0a ACP Processor error for SAS shelf ACP processor 2: critical status ; Alternate Control Path hardware failed This module is on the rear of the shelf at the top right, on shelf module B.
[node_name-01: statd: monitor.shelf.fault:alert]: Critical fault reported on disk storage shelf attached to channel 0a. Check fans, power supplies, disks, and temperature sensors.
[node_name-01: monitor: monitor.globalStatus.critical:EMERGENCY]: Disk shelf fault.
[node_name-01: dsa_worker2: ses.status.ACPError:alert]: DS2246 (S/N SHFHU0123456789) shelf 0 on channel 0a ACP Processor error for SAS shelf ACP processor 1: critical status ; Alternate Control Path hardware failed This module is on the rear of the shelf at the top left, on shelf module A.
[node_name-01: dsa_worker2: ses.status.ACPInfo:info]: DS2246 (S/N SHFHU0123456789) shelf 0 on channel 0a ACP Processor information for SAS shelf ACP processor 2: normal status.
[node_name-01: splog_main: splog.running.normally:info]: Process splogd is operating normally.
[node_name-01: dsa_worker1: ses.status.ACPInfo:info]: DS2246 (S/N SHFHU0123456789) shelf 0 on channel 0a ACP Processor information for SAS shelf ACP processor 1: normal status.
[node_name-01: statd: monitor.shelf.fault.ok:notice]: Fault previously reported on disk storage shelf attached to channel 0a has been corrected.
[node_name-01: monitor: monitor.globalStatus.ok:notice]: The system's global status is normal.
[node_name-02: statd: monitor.shelf.fault:alert]: Critical fault reported on disk storage shelf attached to channel 0a. Check fans, power supplies, disks, and temperature sensors.
[node_name-02: monitor: monitor.globalStatus.critical:EMERGENCY]: Disk shelf fault.
[node_name-02: dsa_worker1: ses.status.ACPError:alert]: DS2246 (S/N SHFHU0123456789) shelf 0 on channel 0a ACP Processor error for SAS shelf ACP processor 1: critical status ; Alternate Control Path hardware failed This module is on the rear of the shelf at the top left, on shelf module A.
[node_name-02: splog_main: splog.running.normally:info]: Process splogd is operating normally.
[node_name-02: dsa_worker3: ses.status.ACPInfo:info]: DS2246 (S/N SHFHU0123456789) shelf 0 on channel 0a ACP Processor information for SAS shelf ACP processor 2: normal status.
[node_name-02: dsa_worker2: ses.status.ACPInfo:info]: DS2246 (S/N SHFHU0123456789) shelf 0 on channel 0a ACP Processor information for SAS shelf ACP processor 1: normal status.
[node_name-02: statd: monitor.shelf.fault.ok:notice]: Fault previously reported on disk storage shelf attached to channel 0a has been corrected.
[node_name-02: monitor: monitor.globalStatus.ok:notice]: The system's global status is normal.
- SP 自动重新启动并显示事件消息 示例:
Record 833: Tue Oct 13 18:20:19 2020 [SP.critical]: Rebooting SP due to loss of ACP comms
- ACP 状态正常且工作正常。
- 通过管理 e0M 端口传输的帧数和每秒字节数较高:
-- interface e0M (30 days, 20 hours, 46 minutes, 42 seconds) --
RECEIVE
…
TRANSMIT
>>>Total frames: 2992m | Frames/second: 1122 | Total bytes: 4523g
Bytes/second: 1696k | Total errors: 0 | Errors/minute: 0
Total discards: 0 | Queue overflow: 0 | Multi/broadcast: 90594
…
-- interface e0M (30 days, 20 hours, 44 minutes, 31 seconds) --
RECEIVE
…
TRANSMIT
>>>Total frames: 216m | Frames/second: 81 | Total bytes: 322g
Bytes/second: 120k | Total errors: 0 | Errors/minute: 0
Total discards: 0 | Queue overflow: 0 | Multi/broadcast: 90526
…
- 节点管理 LIF 和集群间 LIF 共享同一广播域。