由于NVRAM卡出现故障且出现"hung_Task "和"hung_Task _timeout secs"、节点脱机
- Views:
- 5
- Visibility:
- Public
- Votes:
- 0
- Category:
- element-software<a>2009558150</a>
- Specialty:
- solidfire
- Last Updated:
适用场景
SolidFire AFA:SF19210
问题描述
- 节点脱机之前 、sf-master.info显示以下内容
2023-04-29T18:44:47.632229Z SFALPSF08 master-1[26751]: [APP-5] [Leader] 28567 CMIscsiConnectMo serviceshared/LeaderCoordinator.cpp:618:OnClusterMasterConnectCallback|Full vote, based on connection states shouldVote=1 stateVote=1 sequenceNumber=143 nodesWithWorkingEAContainers={57,72,86,126,154,155,185,199}
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
- dmesg -T在nvme0n1上显示"hung_tase_timeout sec"
crash> dmesg -T [Sat Apr 29 18:49:04 UTC 2023] INFO: task jbd2/nvme0n1-8:26613 blocked for more than 120 seconds. [Sat Apr 29 18:49:04 UTC 2023] Tainted: G O 4.19.37-solidfire8 #1 [Sat Apr 29 18:49:04 UTC 2023] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
- 在崩溃后生成多个核心转储
-rw-rw-rw- 1 dexterap engr 76763717096 Apr 29 12:20 dump.202304291845
-rw-rw-rw- 1 dexterap engr 776107259 Apr 29 12:32 dump.202304291928
- 核心文件在NVRAM卡"nvme0n1"上显示多个内核崩溃
KERNEL: /sf_debug/12.3.2.3/lib64/modules/4.19.37-solidfire8/vmlinux-ember-x86_64-4.19.37-solidfire8 DUMPFILE: dump.202304291845 [PARTIAL DUMP] CPUS: 56 DATE: Sat Apr 29 18:45:09 UTC 2023 UPTIME: 380 days, 21:16:56 LOAD AVERAGE: 3.68, 3.95, 4.22 TASKS: 3273 NODENAME: QALPOGSF08 RELEASE: 4.19.37-solidfire8 VERSION: #1 SMP Mon Aug 17 14:34:57 UTC 2020 MACHINE: x86_64 (2600 Mhz) MEMORY: 383.9 GB PANIC: "Kernel panic - not syncing: hung_task: blocked tasks" PID: 299 COMMAND: "khungtaskd" TASK: ffff8f9c77b71d80 [THREAD_INFO: ffff8f9c77b71d80] CPU: 22 STATE: TASK_RUNNING (PANIC) [32908851.679379] INFO: task jbd2/nvme0n1-8:26613 blocked for more than 120 seconds. [32908852.259911] Kernel panic - not syncing: hung_task: blocked tasks