AFF-A900节点在MetroCluster IP中重新启动异常
- Views:
- 5
- Visibility:
- Public
- Votes:
- 0
- Category:
- metrocluster
- Specialty:
- MetroCluster<a>2009年475055</a>
- Last Updated:
适用场景
- ONTAP 9
- AF-A900
- MetroCluster IP
- 节点重新启动
问题描述
- 节点意外重新启动、而事件日志或BMC日志中没有明确的原因/崩溃
- 系统日志报告异常重新启动事件:
Record 1045: Wed Dec 13 11:54:07.700528 2023 [BMC.critical]: Filer Reboots
Record 1046: Wed Dec 13 15 11:54:07.711401 2023 [Trap Event.critical]: SNMP abnormal_reboot (28)
- HA配对节点报告因检测信号丢失而启动接管:
Wed Dec 13 12:54:21 +0100 [Node_A: cf_main: cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after no heartbeat was detected from the partner node.
- 事件日志显示系统中其中一个内载声卡的ICL错误
[?] Wed Feb 15 12:53:25 +0100 [Node_A: ICL error: pcie.stealth.errors:debug]: params: {'pcie_errors': 'IIO0: RPT(166,2,0): T62100-CR Dual 40/100G NIC in slot 5 on Controller, Dv[600d](169,0,0) in slot 5: DevStatus(Corr), CorrErr(Rcvr); Dv[600d](169,0,1) in slot 5: DevStatus(Corr), CorrErr(Rcvr); Dv[600d](169,0,2) in slot 5: DevStatus(Corr), CorrErr(Rcvr); Dv[600d](169,0,3) in slot 5: DevStatus(Corr), CorrErr(Rcvr); Dv[640d](169,0,4) in slot 5: DevStatus(Corr), CorrErr(Rcvr); Dv[650d](169,0,5) in slot 5: DevStatus(Corr), CorrErr(Rcvr); Dv[660d](169,0,6) in slot 5: DevStatus(Corr), CorrErr(Rcvr); '}