"sensorReadingFailed" 和 "ensembleDegraded" 警报由于文件系统问题
适用于
- NetApp HCI 存储节点
- NetApp SolidFire 存储节点
- 在重新启动节点时检测到启动介质故障,这是此 KB 解决方案的一部分
问题描述
- NetApp SolidFire Active IQ 和 Element 集群 Web GUI 上可以看到以下警报:- sensorReadingFailed- IPMI diagnostics are currently unresponsive. Please contact support if this problem persists.
- ensembleDegraded- Ensemble degraded: 1/5 database servers not connectable: {3:x.x.x.x}
 
- 存储节点的远程控制台显示 EXT4-fs 错误- [3367598.061077] EXT4-fs error (device sda2): ext4_journal_check_start:61:Detected aborted journal
- [3367598.061078] EXT4-fs error (sda2): Remounting filesystem read-only xxxxxxxx
- [3367598.125694] EXT4-fs error (sda3): in ext4_writepages:2878: IO failure
 
- 事件日志显示:
networkEvent Failed to install SSL certificate  3  { "message": "Failed to remove path=[/sf/etc/ssl/active.crt] errorCode=system:30 errorCode.message()=Read-only file system", "name": "xCheckFailure" }
 platformHardwareEvent Updating BMC cold reset date 6 3  { "bmcResetDurationMinutes": 0, "bmcResetDate": "2021-05-11T23:16:41" }
 unexpectedException Unexpected Exception - xCreateRepositorySourceFileFailed Failed to open and truncate /sf/apt/sources.list.new.tmp callback=[ {4:RepositorySources::packageManagerCallbackTag}] wtype=[SessionConnected] - Contact SolidFire Support. 6 3  ""
- 节点的BMC网页可以正常访问,1G/10G网络可以访问
- 按照KB的解决方案重新启动节点时,在启动过程中检测到以下错误:
Version 2.17.1249. Copyright (C) 2017 American Megatrends, Inc.
NetApp H500S BIOS Date:07/10/2017 Rev:NA2.1
 CPU : Intel(R) Xenom(R) CPU E5-2620 v4 @ 2.10GHz
  Speed : 2.10 GHz
 The IMC is operating with DDR4 2133 MHz
Port 0 : Micron_5100_XXXXXXXXXX
 S.M.A.R.T Status Bad, Backup and Replace.
 Press F1 to Resume...
或
EXT4-fs (sda3): ext4_check_descriptors: Block bitmap for group 0 not in group (block 271582785)!
 EXT4-fs (sda3): group descriptors corrupted!
 Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
Rebooting in 5 seconds..