磁盘出现故障后连接断开
适用场景
- 硬件磁盘故障
- 报告长时间一致点(CP)错误
- 数据中断
问题描述
- 客户发现数据中断已持续数秒。示例:
- NFS导出已断开连接
- 无法访问CIFS共享
- 缺少VM
- 报告硬件磁盘故障。示例:
[node_name: config_thread: raid.config.filesystem.disk.not.responding:notice]: File system Disk /aggr_name/plex0/rg0/0a.0.1 Shelf 0 Bay 1 [...] is not responding.
[node_name: monitor: monitor.globalStatus.nonCritical:error]: Disk on adapter FPF1939S03T:9, shelf 1, bay 5, not responding.
- 数据和/或根聚合中报告了长CP的ONTAP事件错误。示例:
[node_name: wafl_exempt13: wafl.cp.toolong:error]: Aggregate aggr0 experienced a long CP.
[node_name: wafl_exempt16: wafl.cp.toolong:error]: Aggregate aggr_name experienced a long CP.
- 向磁盘转储数据时、sktraces AutoSupport部分报告的一致点(CP)阶段2过长。示例:
2024-1-1T00:01:01Z 12345678912345678 [5:0] CRUISE_6: CP toolong: aggr0[5678901] CP_P2_FLUSH 498765ms
2024-1-1T01:01:05Z 23456789123456789 [2:0] CRUISE_6: CP toolong: aggr_name[5789012] CP_P2_FLUSH 512345ms