启动停用任务后、所有StorageGRID节点都会显示蓝色/未知
适用场景
- StorageGRID 11.6.0.10或更早版本
- StorageGRID 11.7.0.3或更早版本
- 网格任务(如 取消配置)已启动
问题描述
- 支持 > 网格拓扑 会将所有节点显示为蓝色/未知或不显示节点
/var/local/log/nms.log
表示invalid Atom label error
/usr/local/lib/site_ruby/bycast/storage-grid/atom-container.rb
在处理捆绑包期间由检测程序检测到
MI: |2023-07-18T13:46:59.082| NOTICE [DataConnectionManager] BundleProtocol.java:288: Processed bundle GTSB version 1 namespace BNDL instance 0
NMS: |2023-07-18T13:46:59.096| ERROR invalid Atom label "S>oK" (ArgumentError)
NMS: |2023-07-18T13:46:59.096| ERROR /usr/local/lib/site_ruby/bycast/storage-grid/atom-container.rb:44:in `label='
/var/local/log/nms.log
表示Java MI线程丢失其连接
MI: |2023-07-21T13:10:25.725| ERROR [DATA_STREAM_25] AddNodeProtocol.java:226: Connection lost.
MI: |2023-07-21T13:10:25.761| NOTICE [CONTROL_STREAM] ControlConnection.java:191: Restarting control connection...
- 服务mgmt-API无法重新启动
/var/local/log/bycast-err.log
表示mgmt-api错误
NMS: |2023-07-25T05:39:37.383| ERROR Exception in thread created by /usr/local/lib/site_ruby/mgmt-api/alertmanager/rules/prometheus-alert-rules-updater.rb:25:in `new'
NMS: |2023-07-25T05:39:37.383| ERROR Directory not empty @ dir_s_rmdir - /var/local/mgmt-api/prometheus-rules (Errno::ENOTEMPTY)
NMS: |2023-07-25T05:39:37.383| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:1337:in `rmdir'
NMS: |2023-07-25T05:39:37.383| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:1337:in `block in remove_dir1'
NMS: |2023-07-25T05:39:37.383| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:1348:in `platform_support'
NMS: |2023-07-25T05:39:37.383| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:1336:in `remove_dir1'
NMS: |2023-07-25T05:39:37.383| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:1329:in `remove'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:691:in `block in remove_entry'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:1386:in `ensure in postorder_traverse'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:1386:in `postorder_traverse'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:689:in `remove_entry'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/lib/ruby/2.5.0/fileutils.rb:717:in `remove_dir'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/local/lib/site_ruby/mgmt-api/alertmanager/rules/prometheus-alert-rules-updater.rb:123:in `stage_rules!'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/local/lib/site_ruby/mgmt-api/alertmanager/rules/prometheus-alert-rules-updater.rb:28:in `block (2 levels) in update_alert_rules!'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/local/lib/site_ruby/mgmt-api/alertmanager/rules/prometheus-alert-rules-updater.rb:26:in `synchronize'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/local/lib/site_ruby/mgmt-api/alertmanager/rules/prometheus-alert-rules-updater.rb:26:in `block in update_alert_rules!'
NMS: |2023-07-25T05:39:37.384| ERROR /usr/local/lib/site_ruby/mgmt-api/tools/api-thread.rb:21:in `block in initialize'
/var/local/log/nms.log
表示java.net.ConnectException: Connection refused (Connection refused)
MI: |2023-07-25T05:51:36.885| NOTICE [DATA_STREAM_36] NMSClustersUtils.java:255: Failed to call /localhost/alert-notification-sender-update
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:607)
at java.net.Socket.connect(Socket.java:556)
at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1223)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1162)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1056)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:990)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1337)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1312)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1521)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1495)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at com.bycast.config.NMSClustersUtils.notifyMgmtApiOfAlertSenderChange(NMSClustersUtils.java:235)
at com.bycast.config.NMSClustersUtils.setSendingClusterId(NMSClustersUtils.java:211)
at com.bycast.clusters.ClustersUtils.getSendingClusterId(ClustersUtils.java:224)
at com.bycast.clusters.ClustersUtils.getEmailNotificationSendingClusterId(ClustersUtils.java:165)
at com.bycast.transactions.protocols.AttributeNotifyProtocol.saveAttributeData(AttributeNotifyProtocol.java:184)
at com.bycast.transactions.protocols.AttributeNotifyProtocol.processAttrNotify(AttributeNotifyProtocol.java:150)
at com.bycast.transactions.protocols.AddNodeProtocol.startProcessing(AddNodeProtocol.java:192)
at com.bycast.transactions.connectionagent.DataConnection.dataProcessing(DataConnection.java:140)
at com.bycast.transactions.connectionagent.DataConnection.run(DataConnection.java:55)