在无心跳和未报告 panic 消息后,节点因意外接管而重新启动
适用于
- AFF A400
- FAS 8700
- FAS 8300
- 意外接管(无心跳警报)
问题
- 意外接管具有以下事件的节点:
[Node-01: kltp: clam.heartbeat.state.change:info]: Heartbeats to node (name=Node-02, ID=1001) are Failing.
[Node-01: cf_main: cf.fsm.takeover.noHeartbeat:alert]: Failover monitor: Takeover initiated after no heartbeat was detected from the partner node.
- 事件日志中报告了以下心跳消息:
[node_name_1: cf_main: fm_lastHeartbeatInfo_1:debug]: params: {'time_since_firmware_rcvd': '5000', 'time_since_htbt_read_attempt_ic': '1000', 'time_since_mb_upd_minor_seq': '13', 'time_since_mb_upd_major_seq': '520405022', 'time_since_htbt_read_success_ic': '6000', 'current_time': '520650179', 'time_since_htbt_read_success_mb': '4029', 'time_since_ic_upd_major_seq': '520407022', 'time_since_htbt_write_ic': '0', 'time_since_first_htbt_write_mb_drop': '259840013', 'time_since_htbt_write_mb': '13', 'time_since_ic_upd_minor_seq': '0', 'mb_htbt_drop_count': '10', 'time_since_recent_htbt_write_mb_drop': '259837513', 'time_since_firmware_written': '0', 'time_since_htbt_read_upd_seq_ic': '6000', 'partner_minor_seq_num_mb': '728568', 'time_since_firmware_read': '6979', 'partner_minor_seq_num_ic': '728567', 'partner_major_seq_num_ic': '1669711236', 'time_since_htbt_read_upd_seq_mb': '4029', 'partner_major_seq_num_mb': '1669711236', 'time_since_htbt_read_attempt_mb': '4029'}
[node_name_1: cf_main: fm_lastHeartbeatInfo_1:debug]: params: {'time_since_firmware_rcvd': '20000', 'time_since_htbt_read_attempt_ic': '1000', 'time_since_mb_upd_minor_seq': '13', 'time_since_mb_upd_major_seq': '520420022', 'time_since_htbt_read_success_ic': '21000', 'current_time': '520665179', 'time_since_htbt_read_success_mb': '867', 'time_since_ic_upd_major_seq': '520422022', 'time_since_htbt_write_ic': '0', 'time_since_first_htbt_write_mb_drop': '259855013', 'time_since_htbt_write_mb': '13', 'time_since_ic_upd_minor_seq': '0', 'mb_htbt_drop_count': '10', 'time_since_recent_htbt_write_mb_drop': '259852513', 'time_since_firmware_written': '0', 'time_since_htbt_read_upd_seq_ic': '21000', 'partner_minor_seq_num_mb': '728568', 'time_since_firmware_read': '21979', 'partner_minor_seq_num_ic': '728567', 'partner_major_seq_num_ic': '1669711236', 'time_since_htbt_read_upd_seq_mb': '19029', 'partner_major_seq_num_mb': '1669711236', 'time_since_htbt_read_attempt_mb': '867'}
- 未找到 Panic 消息。
- 在某些情况下,可能会填充 SSRAM 日志:
SRAM record type(CPU) from Data ONTAP: socket(0) core(8) bank(6)
SRAM record type(LOG) from Data ONTAP: IIO MCE Root Bus(23), Device(0), Function(0), Segment(0).
- 节点重新启动完成交还并继续正常工作。