跳转到主内容

当整个FlexGroup使用量达到100%时、NFS操作挂起或报告NFS无响应错误

Views:
42
Visibility:
Public
Votes:
0
Category:
fas-systems
Specialty:
CORE
Last Updated:

适用场景

  • ONTAP 9
  • Flexgroup
  • NFS

问题描述

  • NFS客户端的内核日志包含
    • mount: server <name> not responding, timed out
  • find 命令无响应
  • 客户端出现NFS延迟
  • storage aggregate show 命令返回错误
cluster::*> storage aggregate show
 
Info: Failed to get the information for aggregate aggr0_node09. Reason: ZSM - failed, status code = 571, extra = Timeout: Operation "ksmfRawZapi_iterator::get_imp()" took longer than 110
seconds to complete [from mgwd on node "node01" (VSID: -1) to kernel at 169.254.33.96], took 109.996s, max 110s [169.254.33.96:951].
Failed to get the information for aggregate node09. Reason: ZSM - failed, status code = 571, extra = Timeout: Operation "ksmfRawZapi_iterator::get_imp()" took longer than 110
seconds to complete [from mgwd on node "node01" (VSID: -1) to kernel at 169.254.33.96], took 109.997s, max 110s [169.254.33.96:951].
 
Aggregate   Size Available Used% State  #Vols  Nodes       RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
aggr0_node09   -     -   - unknown    - node09      -
aggr0_node10 1020GB  49.46GB  95% online    1 node10      raid_dp,normal
node09      -     -   - unknown    - node09      -
node10    527.0TB  148.3TB  72% online    93 node10      raid_dp,normal
  • cf status 命令返回错误
cluster::*> cf status
Takeover
Node      Partner     Possible State Description
-------------- -------------- -------- -------------------------------------
node09     node10     -     Up. Node accessible via HA-IC, but cluster access failed
node10     node09     true   Connected to node09
  • EMS日志
Sun Jan 08 01:20:04 [node09: wafl_exempt14: wafl.vol.fsp.full:error]: volume flexvol__0005@vserver:xxxxxxxx-0a45-11e8-86ae-xxxxxxxxxxxx: insufficient space in FSP wafl_remote_reserve to satisfy a request of 0 holes and 12 overwrites.
 
Sun Jan 08 01:20:30 [node01: kernel: Nblade.nfsLongRunningOp:debug]: Detected a long running network process operation.
The client IP address:port is xxx.xxx.109.64:922.
The local IP address:port is xxx.xxx.207.30:2049.
The protocol requesting the operation is NFS3.
The RPC program number for the operation is 100003.
The protocol procedure for the operation is ReadDirPlus (17).
The disk process UUID is xxxxxxxx926a11e9999b00a0xxxxxxxx.
The Vserver associated with the operation is vserver1.
The UID of the user is 0.
The MSID for the volume is xxxxxxxxxx.
The inode number of the file is 45644.
 
Sun Jan 08 01:21:56 [node01: kernel: Nblade.dBladeNoResponse.NFS:error]: File operation timed out because there was no response from the data-serving node.
Node UUID: xxxxxxxx-ff66-11e9-9b05-xxxxxxxxxxxx,
file operation protocol: NFS,
client IP address: xxx.xxx.109.58,
RPC procedure: 3.
 
Sun Jan 08 01:27:11 [node01: kernel: Nblade.dBladeNoResponse.NFS:error]: File operation timed out because there was no response from the data-serving node.
Node UUID: xxxxxxxx-ff66-11e9-9b05-xxxxxxxxxxxx,
file operation protocol: NFS,
client IP address: xxx.xxx.109.60,
RPC procedure: 17.
 
Sun Jan 08 01:27:36 [node09: wafl_exempt04: wafl.vol.full:alert]: Insufficient space on volume flexvol__0005@vserver:xxxxxxxx-0a45-11e8-86ae-xxxxxxxxxxxx to perform operation. 76.0KB was requested but only 12.0KB was available.
Sun Jan 08 01:28:19 [node09: wafl_exempt06: wafl.vol.fsp.full:error]: volume flexvol__0005@vserver:xxxxxxxx-0a45-11e8-86ae-xxxxxxxxxxxx: insufficient space in FSP wafl_remote_reserve to satisfy a request of 1 holes and 26 overwrites.
 
Sun Jan 08 11:44:26 [node09: kernel: Nblade.nfsLongRunningOp:debug]: Detected a long running network process operation.
The client IP address:port is 10.96.103.108:775.The local IP address:port is xxx.xxx.207.207:2049.
The protocol requesting the operation is NFS3.The RPC program number for the operation is 100003.
The protocol procedure for the operation is LookUp (3).The disk process UUID is xxxxxxxx926a11e9999b00a0xxxxxxxx.
The Vserver associated with the operation is vserver1.The UID of the user is 0.The MSID for the volume is xxxxxxxxxx.
The inode number of the file is xxxxxxxx.
 
Sun Jan 08 11:49:31 [node09: kernel: Nblade.nfsLongRunningOp:debug]: Detected a long running network process operation.
The client IP address:port is 10.96.103.108:823.The local IP address:port is xxx.xxx.207.206:2049.
The protocol requesting the operation is NFS3.The RPC program number for the operation is 100003.The protocol procedure for the operation is ReadDirPlus (17).
The disk process UUID is xxxxxxxx926a11e9999b00a0xxxxxxxx.The Vserver associated with the operation is vserver1.
The UID of the user is 0.The MSID for the volume is xxxxxxxxxx. The inode number of the file is xxxxxxxx.
 

Fri Jun 21 01:19:58 -0400 [node09: kernel: Nblade.nfsLongRunningOp:debug]: Detected a long running network process operation. The client IP address:port is xx.xx.xx.xx:808. The local IP address:port is xx.xx.xx.xx:2049. The protocol requesting the operation is NFS3. The RPC program number for the operation is 100003. The protocol procedure for the operation is Write (7). The disk process UUID is xxxxxxxxxxxxxxxx. The Vserver associated with the operation is vserver1. The UID of the user is xxxxxx. The MSID for the volume is xxxxxx. The inode number of the file is xxxxxx.
 

Sign in to view the entire content of this KB article.

New to NetApp?

Learn more about our award-winning Support

NetApp provides no representations or warranties regarding the accuracy or reliability or serviceability of any information or recommendations provided in this publication or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information in this document is distributed AS IS and the use of this information or the implementation of any recommendations or techniques herein is a customer's responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. This document and the information contained herein may be used solely in connection with the NetApp products discussed in this document.