由于DNS查询失败、客户端自动挂载偶尔会失败
适用场景
问题描述
- 在挂载风暴环境中、绑定版本等于或大于 bind-9.9.9.4-69.el7 且开始支持 0 TTL的外部Linux DNS服务器 可能会将DNS查询请求转发到非侦听器机载DNS服务器LIF
- 如果外部DNS服务器将DNS查询请求转发给非侦听器机载DNS服务器LIF、则ONTAP 会拒绝初始连接请求
- 最终会导致客户端DNS查询失败
192.168.0.61
192.168.0.188
192.168.0.134
192.168.0.135
(仅允许使用此IP来处理DNS查询请求。listen-for-dns-query:true
)192.168.0.136
192.168.0.137
192.168.0.139
cluster1::*> net int show -vserver svm2_cluster1 -fields listen-for-dns-query,dns-zone,address
(network interface show)
vserver lif address dns-zone listen-for-dns-query
------------- --------------------- ------------- ----------------------- --------------------
svm2_cluster1 lif_svm2_cluster1_433 192.168.0.135 storage_hostname.test.com true
svm2_cluster1 lif_svm2_cluster1_453 192.168.0.137 storage_hostname.test.com false
svm2_cluster1 lif_svm2_cluster1_583 192.168.0.134 storage_hostname.test.com false
svm2_cluster1 lif_svm2_cluster1_858 192.168.0.136 storage_hostname.test.com false
svm2_cluster1 lif_svm2_cluster1_996 192.168.0.139 storage_hostname.test.com false
/var/named/test.zone
------------------------------------------
;TR guide setting
$ORIGIN storage_hostname.com.
@ IN NS storage_hostname.com.
IN NS ansible.seccae.com.
storage_hostname.test.com. IN A 192.168.0.135
///日志分析
NFS客户端返回SERVERFAIL错误和自动挂载失败消息
Sun Jun 26 15:14:06 UTC 2022
storage_hostname.test.com has address 192.168.0.134
Host storage_hostname.test.com not found: 2(SERVFAIL)
Host storage_hostname.test.com not found: 2(SERVFAIL)
Sun Jun 26 15:14:07 UTC 2022
Host storage_hostname.test.com not found: 2(SERVFAIL)
May 9 07:55:12 client_hostname automount[1364] : add_host_addrs: hostname lookup for storage_hostname failed: Name or service not known
外部DNS服务器从 named.run
日志返回连接被拒绝的信息级别消息
6-Jun-2022 15:10:02.301 resolver: debug 1: fetch: storage_hostname.test.com/AAAA
26-Jun-2022 15:10:02.335 resolver: debug 1: fetch: storage_hostname.test.com/MX
26-Jun-2022 15:14:06.990 resolver: debug 1: fetch: storage_hostname.test.com/A
26-Jun-2022 15:14:07.004 resolver: debug 1: fetch: storage_hostname.test.com/AAAA
26-Jun-2022 15:14:07.005 lame-servers: info: connection refused resolving 'storage_hostname.test.com/AAAA/IN': 192.168.0.134#5326-Jun-2022 15:14:07.005 resolver: debug 1: fetch: storage_hostname.test.com/AAAA
26-Jun-2022 15:14:07.005 query-errors: debug 1: client @0x7f5362edac20 192.168.0.61#60563 (storage_hostname.test.com): query failed (SERVFAIL) for storage_hostname.test.com/IN/AAAA at ../../../bin/named/query.c:8580
26-Jun-2022 15:14:07.005 lame-servers: debug 1: lame server resolving 'storage_hostname.test.com' (in 'storage_hostname.test.com'?): 192.168.0.188#53
26-Jun-2022 15:14:07.005 resolver: debug 1: fetch: storage_hostname.test.com/MX
26-Jun-2022 15:14:07.006 lame-servers: info: connection refused resolving 'storage_hostname.test.com/AAAA/IN': 192.168.0.134#53
26-Jun-2022 15:14:07.006 lame-servers: info: connection refused resolving 'storage_hostname.test.com/MX/IN': 192.168.0.134#53
26-Jun-2022 15:14:07.006 query-errors: debug 1: client @0x7f53600aa070 192.168.0.61#57471 (storage_hostname.test.com): query failed (SERVFAIL) for storage_hostname.test.com/IN/MX at ../../../bin/named/query.c:8580
26-Jun-2022 15:14:07.017 resolver: debug 1: fetch: storage_hostname.test.com/A
26-Jun-2022 15:14:07.017 lame-servers: info: connection refused resolving 'storage_hostname.test.com/A/IN': 192.168.0.134#53
26-Jun-2022 15:14:07.017 query-errors: debug 1: client @0x7f5362edac20 192.168.0.61#48698 (storage_hostname.test.com): query failed (SERVFAIL) for storage_hostname.test.com/IN/A at ../../../bin/named/query.c:8580
存储端pakcet跟踪在
此处清楚地记录了此行为、只有192.168.0.135是机载DNS列表程序IP
72 2022-06-26 23:14:19.572835 6.154698 192.168.0.188 38078 192.168.0.135 53 DNS Standard query 0x1d9c A storage_hostname.test.com OPT
73 2022-06-26 23:14:19.585266 0.012431 192.168.0.135 53 192.168.0.188 38078 DNS Standard query response 0x1d9c A storage_hostname.test.com A 192.168.0.134 NS storage_hostname.test.com OPT
74 2022-06-26 23:14:19.586980 0.001714 192.168.0.188 33860 192.168.0.134 53 DNS Standard query 0x88a9 AAAA storage_hostname.test.com OPT
75 2022-06-26 23:14:19.587027 0.000047 192.168.0.134 33860 192.168.0.188 53 ICMP Destination unreachable (Port unreachable)
可以通过从外部DNS服务器捕获的数据包跟踪检查完整行为
- 查询成功
- 客户端向外部DNS服务器发送查询请求
- 外部DNS服务器将此查询请求转发到机载DNS IP 192.168.0.135
- 对外部DNS服务器发出查询响应的机载DNS回复(此时、外部DNS服务器似乎记住IP 192.168.0.134)
- 外部DNS服务器使用查询响应响应响应客户端(使用192.168.0.134挂载存储系统)
- AAAA查询失败
- 客户端向外部DNS服务器发送AAAA查询请求
- 外部DNS服务器将此AAAA查询请求转发到机载DNS IP 192.168.0.134
- 机载DNS回复
Destination unreachable
到外部DNS服务器 - 外部DNS服务器使用响应客户端
query failed (SERVFAIL)
- MX查询失败
- 客户端向外部DNS服务器发送MX查询请求
- 外部DNS服务器将此MX查询请求转发到机载DNS IP 192.168.0.134
- 机载DNS回复
Destination unreachable
到外部DNS服务器 - 外部DNS服务器使用响应客户端
query failed (SERVFAIL)
- 查询失败
- 客户端向外部DNS服务器发送查询请求
- 外部DNS服务器将此查询请求转发到机载DNS IP 192.168.0.134
- 机载DNS回复
Destination unreachable
到外部DNS服务器 - 外部DNS服务器使用响应客户端
query failed (SERVFAIL)
967 2022-06-26 23:14:06.989631 0.000501 192.168.0.61 36365 192.168.0.188 53 DNS Standard query 0x6fea A storage_hostname.test.com
969 2022-06-26 23:14:06.990708 0.000557 192.168.0.188 38078 192.168.0.135 53 DNS Standard query 0x1d9c A storage_hostname.test.com OPT
970 2022-06-26 23:14:07.003287 0.012579 192.168.0.135 53 192.168.0.188 38078 DNS Standard query response 0x1d9c A storage_hostname.test.com A 192.168.0.134 NS storage_hostname.test.com OPT
971 2022-06-26 23:14:07.003852 0.000565 192.168.0.188 53 192.168.0.61 36365 DNS Standard query response 0x6fea A storage_hostname.test.com A 192.168.0.134 NS storage_hostname.test.com
972 2022-06-26 23:14:07.004491 0.000639 192.168.0.61 60563 192.168.0.188 53 DNS Standard query 0x47bc AAAA storage_hostname.test.com
973 2022-06-26 23:14:07.004892 0.000401 192.168.0.188 33860 192.168.0.134 53 DNS Standard query 0x88a9 AAAA storage_hostname.test.com OPT
974 2022-06-26 23:14:07.005017 0.000125 192.168.0.134 33860 192.168.0.188 53 ICMP Destination unreachable (Port unreachable)
975 2022-06-26 23:14:07.005451 0.000434 192.168.0.188 53 192.168.0.61 60563 DNS Standard query response 0x47bc Server failure AAAA storage_hostname.test.com
976 2022-06-26 23:14:07.005710 0.000259 192.168.0.61 57471 192.168.0.188 53 DNS Standard query 0xe348 MX storage_hostname.test.com
977 2022-06-26 23:14:07.005742 0.000032 192.168.0.188 59080 192.168.0.134 53 DNS Standard query 0xde2d AAAA storage_hostname.test.com OPT
978 2022-06-26 23:14:07.005809 0.000067 192.168.0.134 59080 192.168.0.188 53 ICMP Destination unreachable (Port unreachable)
979 2022-06-26 23:14:07.005901 0.000092 192.168.0.188 51515 192.168.0.134 53 DNS Standard query 0x0a65 MX storage_hostname.test.com OPT
980 2022-06-26 23:14:07.005999 0.000098 192.168.0.134 51515 192.168.0.188 53 ICMP Destination unreachable (Port unreachable)
981 2022-06-26 23:14:07.006199 0.000200 192.168.0.188 53 192.168.0.61 57471 DNS Standard query response 0xe348 Server failure MX storage_hostname.test.com
982 2022-06-26 23:14:07.016776 0.010577 192.168.0.61 48698 192.168.0.188 53 DNS Standard query 0x573a A storage_hostname.test.com
983 2022-06-26 23:14:07.017217 0.000441 192.168.0.188 46874 192.168.0.134 53 DNS Standard query 0xf57a A storage_hostname.test.com OPT
984 2022-06-26 23:14:07.017343 0.000126 192.168.0.134 46874 192.168.0.188 53 ICMP Destination unreachable (Port unreachable)
985 2022-06-26 23:14:07.017536 0.000193 192.168.0.188 53 192.168.0.61 48698 DNS Standard query response 0x573a Server failure A storage_hostname.test.com
987 2022-06-26 23:14:07.027914 0.010016 192.168.0.61 53770 192.168.0.188 53 DNS Standard query 0x4a57 A storage_hostname.test.com
989 2022-06-26 23:14:07.028215 0.000217 192.168.0.188 53 192.168.0.61 53770 DNS Standard query response 0x4a57 Server failure A storage_hostname.test.com
995 2022-06-26 23:14:07.038529 0.009511 192.168.0.61 41057 192.168.0.188 53 DNS Standard query 0x498a A storage_hostname.test.com
996 2022-06-26 23:14:07.038756 0.000227 192.168.0.188 53 192.168.0.61 41057 DNS Standard query response 0x498a Server failure A storage_hostname.test.com