跳转到主内容

在CP和vNVRAM刷新延迟较长之后、ONTAP Select 节点意外重新启动

Views:
Visibility:
Public
Votes:
0
Category:
ontap-select
Specialty:
core
Last Updated:

适用场景

  • ONTAP 9
  • ONTAP Select

问题描述

  • ONTAP Select 节点使用panic字符串意外重新启动: received completion for unknown cmd in process irqXXX: nvme0
  • 所引用的设备通常 nvmeX在不使用NVMe后端的配置中 nvme0
  • 导致崩溃的ONTAP 端日志序列:

Sat Jul 02 03:10:28 +0200 [node-01: ctlg_flxlg_mirror: vnvram.dma.long.wait:alert]: vNVRAM flush taking over 10 seconds.
Sat Jul 02 03:10:29 +0200 [node-01: wafl_exempt03: wafl.cp.toolong:error]: Aggregate aggr0 experienced a long CP.
Sat Jul 02 03:10:30 +0200 [node-01: irq282: nvme0: cf.fm.localFwTransition:debug]: params: {'progresscounter': '1031', 'newstate': 'SF_DUMPCORE', 'prevstate': 'SF_UP'}
Sat Jul 02 03:10:30 +0200 [node-01: irq282: nvme0: ha.panicInfoSent:notice]: Node successfully sent a panic information message to its HA partner. Partner name: . Partner system ID: 1234567890.
Sat Jul 02 03:10:30 +0200 [node-01: irq282: nvme0: sk.panic:alert]: Panic String: received completion for unknown cmd in process irq282: nvme0 on release 9.9.1P8 (C)

  • ESXi端 vmware.log 序列:

2022-07-02T01:10:30.121Z| vcpu-0| | I005: NVME-VMM: Controller level reset via CC.EN bit transition on nvme0
2022-07-02T01:10:30.121Z| vcpu-0| | I005: NVME-CORE: Doing a partial reset of controller regs and queues.
2022-07-02T01:10:33.353Z| vcpu-0| | I005: HBACommon: First write on scsi0:0.fileName='/vmfs/volumes/.../ontapselect-n02/ontapselect-n02.vmdk'

 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

Scan to view the article on your device