跳转到主内容

环境原因关机且 SP 无响应

Views:
2
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
hw
Last Updated:

适用场景

  • AFF A300
  • 服务处理器 (SP) 固件 5.11P2

问题描述

  • PSU1在底盘Node1遇到了严重错误,但过了一段时间后恢复了。

EMS日志:

[?]  Fri May 16 12:42:00 +0000 [Node1: monitor: monitor.globalStatus.critical:EMERGENCY]: Power Supply Status Critical: PSU1.
[?]  Fri May 16 12:42:50 +0000 [Node1: spsm_listener: sp.heartbeat.stopped:error]: Have not received a IPMI heartbeat from the Service Processor (SP) in last 20 seconds.
[?]  Fri May 16 12:43:14 +0000 [Node1: pmcsas_asyncd_0: sas.adapter.debug:info]: params: {'adapterName': '1', 'debug_string': 'Adapter debug dump is being collected'}
[?]  Fri May 16 12:43:14 +0000 [Node1: pmcsas_asyncd_1: sas.adapter.debug:info]: params: {'adapterName': '0a', 'debug_string': 'Adapter debug dump is being collected'}
[?]  Fri May 16 12:45:02 +0000 [Node1: spsm_listener: sp.heartbeat.resumed:info]: Received IPMI heartbeat from the Service Processor (SP).
[?]  Fri May 16 12:46:11 +0000 [Node1: power_low_monitor: monitor.chassisPowerSupplies.ok:info]: Chassis power supplies OK.
[?]  Fri May 16 12:47:00 +0000 [Node1: monitor: monitor.globalStatus.ok:notice]: The system's global status is normal.
[?]  Fri May 16 12:55:11 +0000 [Node1: env_mgr: monitor.chassisPowerSupply.degraded:notice]: Chassis power supply 1 is degraded: PSU1 Fan2 Fault is Unreadable
[?]  Fri May 16 12:55:21 +0000 [Node1: power_low_monitor: monitor.chassisPower.degraded:alert]: Chassis power is degraded: Power Supply Status Critical: PSU1.
[?]  Fri May 16 12:55:21 +0000 [Node1: power_low_monitor: callhome.chassis.power:error]: Call home for CHASSIS POWER DEGRADED: Power Supply Status Critical: PSU1.
[?]  Fri May 16 12:56:33 +0000 [Node1: env_mgr: monitor.chassisPowerSupply.ok:info]: Chassis power supply 1 is OK.
[?]  Fri May 16 12:56:41 +0000 [Node1: power_low_monitor: monitor.chassisPowerSupplies.ok:info]: Chassis power supplies OK.
[?]  Fri May 16 12:57:00 +0000 [Node1: monitor: monitor.globalStatus.ok:notice]: The system's global status is normal.
[?]  Fri May 16 12:57:33 +0000 [Node1: env_mgr: callhome.chassis.ps.ok:notice]: Call home for CHASSIS POWER SUPPLY OK: PS 1

  • 一段时间后,Node1因环境原因紧急停产。

SP系统日志:

May 16 13:23:00 [Node1:sp.ipmi.lost.shutdown:EMERGENCY]: SP heartbeat stopped and cannot be recovered. To prevent hardware damage and data loss, the system will shut down in 2 minutes.
May 16 13:25:00 [Node1:monitor.shutdown.emergency:EMERGENCY]: Emergency shutdown: Environmental Reason Shutdown (System reboot to recover the SP)

  • SP 没有响应,无法读取system sensors

SP Node1> system sensors
Sensor Name    | Current   | Unit     | Status    | LCR     | LNC     | UNC     | UCR
-----------------+------------+------------+------------+-----------+-----------+-----------+-----------
Error: Unable to establish LAN session
Get Device ID command failed
Unable to open SDR for reading

  • 多个实例SP load is high观察到events all

Record 339: Thu Jan  1 00:01:01 1970 [SP.notice]: Running primary version 5.11P2
Record 340: Thu Jan  1 00:01:17 1970 [SP.normal]: Heartbeat started
Record 341: Thu Jan  1 00:01:17 1970 [Heartbeat.notice]: Heartbeat start: Set SP time. Old time: Thu
Jan  1 00:01:17 1970. New time: Fri May 16 13:22:23 2025.
Record 342: Fri May 16 13:22:23 2025 [Heartbeat.notice]: Heartbeat time adjusted: Set SP time. Old t
ime: Thu Jan  1 00:01:17 1970. New time: Fri May 16 13:22:23 2025.
Record 343: Fri May 16 13:23:19 2025 [SP.notice]: IPMI not ready & run /usr/local/bin/notify 4
Record 344: Fri May 16 13:25:40 2025 [ONTAP.notice]: Appliance user command reboot.
Record 345: Fri May 16 13:25:50 2025 [SP.critical]: Filer Reboots
Record 346: Fri May 16 13:25:55 2025 [SysFW.notice]: Waiting for SP ...
Record 347: Fri May 16 13:28:17 2025 [SP.notice]: Switch is running on latest version 16
Record 348: Fri May 16 13:31:16 2025 [IPMI.warning]: FRUID 1 Access error
Record 349: Fri May 16 13:31:42 2025 [SP.notice]: Failure on battery wake up attempt
Record 350: Fri May 16 13:36:09 2025 [SP.notice]: SP load is high: 3.12 3.06 2.02
Record 351: Fri May 16 13:36:29 2025 [SP.critical]: Heartbeat stopped
Record 352: Fri May 16 13:41:57 2025 [IPMI.warning]: FRUID 2 Access error
Record 353: Fri May 16 13:54:10 2025 [SP.notice]: SP load is high: 3.03 3.11 2.79
Record 354: Fri May 16 13:55:18 2025 [IPMI.warning]: FRUID 3 Access error
Record 355: Fri May 16 14:04:30 2025 [IPMI.warning]: FRUID 4 Access error
Record 356: Fri May 16 14:11:11 2025 [IPMI.warning]: FRUID 5 Access error
Record 357: Fri May 16 14:13:11 2025 [IPMI.warning]: PSU FRUID 6 Access error, retry 5 times
Record 358: Fri May 16 14:15:12 2025 [IPMI.warning]: PSU FRUID 7 Access error, retry 5 times
Record 359: Fri May 16 14:15:19 2025 [IPMI.notice]: IPMI session creation failed - err(0x0021)

8400 | 02 | EVT: 0300ffff | Sensor 61 | Assertion Event, "State Deasserted"
Record 360: Fri May 16 14:15:19 2025 [IPMI.notice]: IPMI session creation failed - err(0x0021)

8500 | 02 | EVT: 6fc203ff | Sensor 109 | Assertion Event, "Memory Init Done"
Record 361: Fri May 16 14:15:19 2025 [IPMI.notice]: IPMI session creation failed - err(0x0021)

8600 | 02 | EVT: 0901ffff | Sensor 183 | Assertion Event, "Device Enabled"
Record 362: Fri May 16 14:25:10 2025 [SP.notice]: SP load is high: 3.14 2.96 2.72
Record 363: Mon May 19 09:00:45 2025 [SP CLI.notice]: cs_admi "log in from 192.168.180.10"

 

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.

 

  • 这篇文章对您有帮助吗?