跳转到主内容
NetApp Response to Russia-Ukraine Cyber Threat
In response to the recent rise in cyber threat due to the Russian-Ukraine crisis, NetApp is actively monitoring the global security intelligence and updating our cybersecurity measures. We follow U.S. Federal Government guidance and remain on high alert. Customers are encouraged to monitor the Cybersecurity and Infrastructure Security (CISA) website for new information as it develops and remain on high alert.

StorageX 迁移泛洪为 Secd

Views:
13
Visibility:
Public
Votes:
0
Category:
ontap-9
Specialty:
nas
Last Updated:

可不使用  

适用于

  • ONTAP 9
  • StorageX 的
  • 迁移

解答

利用第三方多线程迁移工具进行存储迁移可能会导致 Secd 瓶颈。通常情况下,这会在迁移发生时将自身视为潜在的客户端身份验证性能问题。与节点身份验证相关的任何工作也可能受到影响。
 
 在以下错误中可以看到这种瓶颈的迹象:

secd
开始排队,请求将在队列中停留较长的时间。


[kern_secd:info:10816] debug: Worker Thread 34641252096 processing RPC 153:secd_rpc_auth_get_creds with request ID:6605 which sat in the queue for 23 seconds. { in run() at src/server/secd_rpc_server.cpp:2067 }
 

这可能会导致 RPC 请求由于 23 秒超时而失败。
[kern_secd:info:5895] .------------------------------------------------------------------------------.
[kern_secd:info:5895] |                              RPC TOOK TOO LONG:                             |
[kern_secd:info:5895] |                       RPC used 24 seconds (max is 23)                      |
[kern_secd:info:5895] |                   and likely caused the client to timeout                   |
[kern_secd:info:5895] .------------------------------------------------------------------------------.
 

 最终,如果 Secd RPC 内存分配达到 80% 、 我们开始录制这些消息:

SED
[kern_secd:info:10816] [SECD MASTER THREAD] SecD RPC Server: Too many outstanding Generic RPC requests: sending System Error to RPC 153:secd_rpc_auth_get_creds Request ID:65535.

EMS
[secd: secd.rpc.server.request.dropped:debug]: The RPC secd_rpc_auth_get_creds sent from NBLADE_CIFS was dropped by SecD due to memory pressure.

收集 SED CM 统计信息也可以确认此情况的多少倍 命中。
 
nas::> set diag
nas::*> statistics start -object secd -instance secd -node NETAPP01-06 -sample-id sample_695
nas::*> statistics stop –sample-id sample_695
nas::*> statistics show –sample-id sample_695

 
Object: secd
Instance: secd
Start-time: 5/24/2018 15:46:34
End-time: 5/24/2018 15:50:09
Elapsed-time: 214s
Scope: NETAPP01-06

 
instance_name                                                secd
    node_name                                              NETAPP01-06
   num_rpcs_dropped_due_to_low_memory
                                mgwd                                0
                             nblade                           98765
                              dblade                                0
   num_rpcs_failed                                                 -
                                mgwd                               0
                             nblade                           98753
                              dblade                                0
                                libc                                0
 

rpc_task_queue_latency 还会记录每个排队请求的直方图及其在队列中停留的时间。
 
    process_name                                                 secd
    rpc_task_queue_latency                                          -
                              <20us                            16667
                              <40us                                0
                              <60us                                0
                              <80us                                0
                             <100us                                0
                             <200us                                0
                             <400us                                0
                             <600us                               0
                             <800us                                0
                               <1ms                                0
                               <2ms                                0
                               <4ms                                0
                               <6ms                                0
                               <8ms                                0
                              <10ms                                0
                              <12ms                                0
                              <14ms                                0
                              <16ms                                0
                              <18ms                               0
                              <20ms                                0
                              <40ms                                0
                              <60ms                                0
                              <80ms                                0
                             <100ms                                0
                             <200ms                                0
                             <400ms                                0
                             <600ms                                0
                             <800ms                                0
                                <1s                                0
                                <2s                           17620
                                <4s                            16077
                                <6s                            43298
                                <8s                            31813
                               <10s                              378
                               <20s                               23
                               <30s                                0
                               <60s                               0
                               <90s                                0
                              <120s                                0
                              >120s                                0


此外,由于身份凭证查找发生在secd_rpc_auth_get_creds预期中会看到提升的计数:

Object: secd_rpc
Instance: secd_rpc_auth_get_creds
Start-time: 5/24/2018 15:46:34
End-time: 5/24/2018 15:50:09
Elapsed-time: 214s
Scope: vservername

    Counter                                                     Value
    -------------------------------- --------------------------------
    instance_name                             secd_rpc_auth_get_creds
    last_update_time                         Thu May 24 15:50:09 2018
    longest_runtime                                               0ms
    node_name                                               NETAPP-06
    num_calls                                                   97699
    num_failures                                                   86
    num_successes                                               97613
    process_name                                                 secd
    shortest_runtime                                              0ms
    vserver_name                                         
vservername
    vserver_uuid                             c4f936f2-66a6-11e7-9713-
                                                         90e2bacde704
 
 
 
 
之所以特别提到 StorageX 、是因为此迁移产品会发现这些类型的问题。
 
默认情况下, StorageX 每个 CPU 内核使用 16 个线程(可配置)、因此在大型多处理器核心服务器中、它可以快速并行扩展。每个线程负责复制文件;然后在作业任务结束时放置安全描述符、包括 DACL\SACL \owner 信息。最后,该线程将处理下一个文件。
 
例如: 8 个 CPU 核心服务器、相当于 128 个线程、迁移非常小的文件、如果每个文件所有者都是唯一的、这会导致 ONTAP 在短时间内执行大量凭据查找工作。此外、使用 StorageX 、我们可以处理多个运行其复制代理的服务器。
 
为什么设置文件所有者会使 ONTAP 更有效?
 
设置文件所有者时, ONTAP 必须构建用户的凭据。如果尚未缓存该凭据、请向域控制器查询用户凭据。
 
此 RFE 有助于避免将来出现这种情况:
RFE :在设置文件
https://mysupport-Beta.netapp.com/si... 的 ACL 时禁用 SID 所有者查找的选项。 p/Burt/1153207

或固定版本的此类情况也可避免:
在设置文件所有权时避免获得 Windows 组成员身份
 

在数据包跟踪中可以看到的示例:
 
>>file owner is set by StorageX at the end of the file sync
Frame1 Source: StorageX  Dest: ONTAPSMB2    SetInfo Request SEC_INFO/SMB2_SEC_INFO_00 File: 1.txt
Owner: S-1-5-21-1417671877-1164952658-2896985891-1156  (Domain SID-Domain RID)
 
>>ONTAP (if SID not cached) will need to go to Domain Controller to lookup SID
Frame2 Source: ONTAP Dest: DC LSARPC lsa_LookupSids2 request
Sid: S-1-5-21-1417671877-1164952658-2896985891-1156  (Domain SID-Domain RID)
RID: 1156  (Domain RID)
 
>> Domain Controller will respond with the name translation of SID
Frame3 Source: DC  Dest: ONTAP LSARPC lsa_LookupSids2 response
Pointer to String (uint16): thor
 
>>ONTAP will build the credential  via s4u2self (LDAP is fallback) to Domain Controller
Frame4 Source: ONTAP Dest: DC KRB5      TGS-REQ
padata-type: kRB5-PADATA-S4U2SELF (129)
KerberosString: thor
 
>>Domain Controller will respond with user’s credentials – ONTAP will usermap internally
Frame5 Source: DC  Dest: ONTAP KRB5      TGS-REP
 
>>ONTAP responds to the original setinfo in Frame1
Frame6 Source: ONTAP Dest: StorageX  SMB2    SetInfo Response
 

当我们遇到这种情况时,我们有哪些建议?
  • 检查外部服务器瓶颈 \ 延迟
  • 减少 StorageX 线程
  • 将负载扩展到其他节点 Secd
  • 与客户客户团队合作以帮助迁移

检查外部服务器瓶颈 \ 延迟

,因为部分文件同步涉及设置要迁移的文件的所有者信息、这可能会对 Secd 处理造成压力。正常的客户端工作负载很可能不包括创建这么多的帐户凭据查找。由于发生了大量线程且同步了大量小文件的大型迁移、因此可能导致出现大量凭据查找。检查外部服务器通信(对于 DNS 、 AD 、 LDAP 、 NIS 、名称映射、名称服务等)中的任何延迟 \ 瓶颈
。对这些延迟 \ 瓶颈进行故障排除有助于减少凭据查找大量产生的影响。

请参见 How can I tell an external service if an netlogon,ldap-ad 、 Lsa 、 ldap-nis-namemap 或 NIS 等外部服务是否响应缓慢?

检查与 offbox vscan 、 fpolicy 、审核有关的外部服务器延迟。任何可以为与迁移相关的提升运营增加延迟的内容。
 

减少 StorageX 线程

此建议很可能要求 StorageX 参与验证限制其并发性的最佳方法。在发布时、这是关于如何完成此操作的已知方法。
 
注册表项是DWORD -  HKEY_LOCAL_MACHINE\SOFTWARE\Data Dynamics\StorageX\ReplicationAgent\MaxDirectoryEnumerationThreads

MaxDirectoryEnumerationThreads(REG_DWORD) :默认为 0 (或未定义),这意味着根据当前系统中 CPU 的数量计算最大线程数。 
 
重新启动 RA 。
 
Linux ( UNIX )复制代理:
The same setting came be made on the UNIX RA in the following file: /usr/local/URA/log/Registry.xml. Add the following lines under the <Replication Agent> tag:
<VALUE name = "DisableReplicationPipelining" type "REG_DWORD"><0X00000001>
<VALUE name = "MaxDirectoryEnumerationThreads" type "REG_DWORD"><hex value of number of enumeration threads>
 
最佳实践可能需要进行试验、并在必要的平衡上出现错误、以避免使 Secd 不堪重负并保持迁移速度。线程计数可以低至“ 1 ”。
 

其他信息

在此处添加您的文本。

 

Scan to view the article on your device