2025-11-05 02:03:58.342 WARN [com.alibaba.nacos.client.naming.grpc.redo.0:c.a.n.c.naming] Grpc Connection is disconnect, skip current redo task 2025-11-05 02:04:00.638 INFO [com.alibaba.nacos.client.remote.worker:c.a.n.c.naming] Grpc connection connect
异常。。。。。
2025-11-05 02:17:24.549 WARN [com.alibaba.nacos.client.remote.worker:c.a.n.c.naming] Grpc connection disconnect, mark to redo 2025-11-05 02:17:24.550 WARN [com.alibaba.nacos.client.remote.worker:c.a.n.c.naming] mark to redo completed 2025-11-05 02:17:24.550 INFO [com.alibaba.nacos.client.remote.worker:c.a.n.c.naming] Grpc connection connect 2025-11-05 02:17:26.354 INFO [com.alibaba.nacos.client.naming.grpc.redo.0:c.a.n.c.naming] Redo instance operation REGISTER for xxxx 2025-11-05 02:17:26.362 INFO [com.alibaba.nacos.client.naming.grpc.redo.0:c.a.n.c.naming] Redo subscriber operation REGISTER for xxxx
异常。。。。。。
2025-11-05 02:50:04.844 WARN [com.alibaba.nacos.client.remote.worker:c.a.n.c.naming] Grpc connection disconnect, mark to redo 2025-11-05 02:50:04.845 WARN [com.alibaba.nacos.client.remote.worker:c.a.n.c.naming] mark to redo completed 2025-11-05 02:50:04.845 INFO [com.alibaba.nacos.client.remote.worker:c.a.n.c.naming] Grpc connection connect //注意这里,和上面日志相比,少了Redo instance 和 Redo subscriber
2025-11-05 02:50:08.656 INFO [com.alibaba.nacos.client.naming.updater.0:c.a.n.c.naming] removed ips(1) service: xxxxx 2025-11-05 02:50:08.657 INFO [com.alibaba.nacos.client.naming.updater.0:c.a.n.c.naming] current ips:(0) service: xxxxx -> []
联想到Nacos(2.0.4)存在服务断线重连机制,所以怀疑该机制失效。找到源码所在位置:
NamingGrpcRedoService
1 2 3 4 5
public NamingGrpcRedoService(NamingGrpcClientProxy clientProxy) { this.redoExecutor = new ScheduledThreadPoolExecutor(REDO_THREAD, new NameThreadFactory(REDO_THREAD_NAME)); this.redoExecutor.scheduleWithFixedDelay(new RedoScheduledTask(clientProxy, this), DEFAULT_REDO_DELAY, DEFAULT_REDO_DELAY, TimeUnit.MILLISECONDS); }
RedoScheduledTask
1 2 3 4 5 6 7 8 9 10 11 12 13
@Override public void run() { if (!redoService.isConnected()) { LogUtils.NAMING_LOGGER.warn("Grpc Connection is disconnect, skip current redo task"); return; } try { redoForInstances(); redoForSubscribes(); } catch (Exception e) { LogUtils.NAMING_LOGGER.warn("Redo task run with unexpected exception: ", e); } }
这是个定时任务,正常应该定时执行才对,为啥会停止执行。然后立马想到去看线程dump。得到如下:
1 2 3 4 5 6 7 8 9 10 11 12 13
"com.alibaba.nacos.client.naming.grpc.redo.0" #42 daemon prio=5 os_prio=0 cpu=50.83ms elapsed=55226.59s tid=0x0000ffffa9df7000 nid=0x2e waiting on condition [0x0000ffff1f1fd000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000f7c9f8d0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2057) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750)