由于长时间运行 mlx5dump 进程,Azure CVO 重新启动
适用于
- Cloud Volumes ONTAP (CVO)
- Microsoft Azure
问题
- EMS 日志显示 mlx5dump 进程运行的持续时间更长:
Thu Jun 15 06:38:25 -0400 [cluster1-01: sched_monitor: mgr.stack.longrun.proc:notice]: Long running process: mlx5dumpThu Jun 15 06:43:28 -0400 [cluster1-01: sched_monitor: sk.hog.runtime:notice]: Process mlx5dump ran for 15569 milliseconds
- 以上过程触发节点重启:
Thu Jun 15 06:43:36 -0400 [cluster1-01: pha_main000: kern.shutdown.initiator:debug]: SK reboot was initiated by "maytag.ko::fm_handleReserved+763".Thu Jun 15 06:59:16 -0400 [cluster1-01: sfo_status: callhome.reboot.giveback:notice]: Call home for REBOOT (after giveback)
- 节点反复重新启动并在控制台上显示以下错误。
mlx5_core2: ERR: mlx5e_ioctl:4600:(pid 0): tso6 disabled due to -txcsum6.mlx5_core2: ERR: mlx5e_ioctl:4622:(pid 0): enable txcsum6 first.e0c: Forced delayed initialize mlx5_core2 before network ifconfig calle0c: mlx5_core2 SIOCGRSSKEY failed: 22[node-01:netif.init.failed:ALERT]: Initialization of network interface mlx5_core2 failed due to unexpected software error mlx5_core err=0xffffffc4:100.