Frequent "Disconnected computer for node" messages in jenkins logs

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Frequent "Disconnected computer for node" messages in jenkins logs

Vincent Massol
Hi guys.

Is anyone also having problems with "Disconnected computer for node" happening all the time, resulting in jenkins master killing agent nodes?

We're getting that all the time it seems for https://ci.xwiki.org. 

See https://up1.xwikisas.com/#vI0VAypIpe_tD9LrQRTdMA 

2020-02-14 09:10:59.040+0000 [id=268440] INFO c.n.j.p.d.DockerContainerWatchdog#loadNodeMap: We currently have 7 nodes assigned to this Jenkins instance, which we will check
42020-02-14 09:10:59.040+0000 [id=268440] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Checking Docker Cloud docker-a3 at tcp://xxx
52020-02-14 09:10:59.047+0000 [id=268440] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Checking Docker Cloud docker-a4 at tcp://xxx
62020-02-14 09:10:59.057+0000 [id=268440] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Checking Docker Cloud docker-a5 at tcp://xxx
72020-02-14 09:10:59.083+0000 [id=268440] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Checking Docker Cloud docker-a6 at tcp://xxx
82020-02-14 09:10:59.109+0000 [id=268440] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Checking Docker Cloud docker-a7 at tcp://xxx
92020-02-14 09:10:59.132+0000 [id=268440] INFO c.n.j.p.d.DockerContainerWatchdog#execute: Docker Container Watchdog check has been completed
102020-02-14 09:10:59.133+0000 [id=268440] INFO hudson.model.AsyncPeriodicWork#lambda$doRun$0: Finished DockerContainerWatchdog Asynchronous Periodic Work. 93 ms
112020-02-14 09:13:37.434+0000 [id=268432] INFO i.j.docker.DockerTransientNode$1#println: Disconnected computer for node 'Jenkins SSH Slave a3-0094ebcvu7jkf'.
122020-02-14 09:13:37.434+0000 [id=268243] INFO h.r.SynchronousCommandTransport$ReaderThread#run: I/O error in channel Jenkins SSH Slave a3-0094ebcvu7jkf
13java.net.SocketException: Socket closed
14 at java.net.SocketInputStream.socketRead0(Native Method)
15 at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
16 at java.net.SocketInputStream.read(SocketInputStream.java:171)
17 at java.net.SocketInputStream.read(SocketInputStream.java:141)
18 at io.jenkins.docker.client.DockerMultiplexedInputStream.readInternal(DockerMultiplexedInputStream.java:48)
19 at io.jenkins.docker.client.DockerMultiplexedInputStream.read(DockerMultiplexedInputStream.java:30)
20 at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:91)
21 at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
22 at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
23 at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
24 at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
25 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
262020-02-14 09:13:37.440+0000 [id=268432] INFO i.j.docker.DockerTransientNode$1#println: Removed Node for node 'Jenkins SSH Slave a3-0094ebcvu7jkf'.
27
28
29


Since it happens so frequently (like every 5mn), I'm wondering if it's a problem or if it's the normal way that Jenkins work: agent finishes its job, watchdog comes, tries to connect to the agent, fails, and removes it.

Any idea?

Thx
-Vincent

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/a2a0d523-f242-4ca7-ad4f-943dc0be2e54%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Frequent "Disconnected computer for node" messages in jenkins logs

Victor Martinez
I've seen those stack traces with some other Cloud Node providers in Jenkins. 

Not sure if that's an implementation within the Jenkins core or the docker-plugin itself or some specific design.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/e402eb10-4600-4cf7-bf86-19b0410a5c9d%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Frequent "Disconnected computer for node" messages in jenkins logs

Ivan Fernandez Calvo
Pingthread and some monitoring stuff run every 4 min, I think that the disconnections happens before that process but because there is not activity on this agents is not detected until the pingthread passes. So I guess you have half closed connections, I mean, the agent closes the convention but the master does not received the reset packet. If you are using SSH agents, you can enable the verbose mode on the sshd server to monitor what the heck happens see https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#common-info-needed-to-troubleshooting-a-bug

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/6afddcab-d9e4-46a1-84b4-3e692e285910%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Frequent "Disconnected computer for node" messages in jenkins logs

Vincent Massol
Thanks Ivan. We're not using SHH agents but Docker Cloud (the agents are provisioned on the fly as docker containers).

I was indeed looking for how to turn on some debugging on the agent side but I couldn't find anything. Also the agent docker container is removed once the job is finished so it seems even harder to get some info about what's going on.

What I wanted to know is whether what we're experiencing is a normal behavior of Jenkins or not. I'm asking because a lot of our jobs are going fine every day but we stil have several ones that are killed in mid-air every day. For example if I take agent 6 (a6) from https://up1.xwikisas.com/#vI0VAypIpe_tD9LrQRTdMA I can see it's been terminate on 2020-02-10 at:
* 4:44
* 5:06
* 5:24
* 7:45
* 10:06
* 10:24
* etc

Now I don't think we have that many job failures every day. It's more like 1 or 2 per day. So I'm not sure what to think of it. 

I was trying to investigate why we see the following regularly (every day) in our CI job logs:

Cannot contact Jenkins SSH Slave a6-009448n7sqon4: java.lang.InterruptedException
Agent Jenkins SSH Slave a6-009448n7sqon4 was deleted; cancelling node body
Could not connect to Jenkins SSH Slave a6-009448n7sqon4 to send interrupt signal to process

And then I discovered what I've pasted at https://up1.xwikisas.com/#vI0VAypIpe_tD9LrQRTdMA by looking at the jenkins master log file and I went "wow, how come there are so many disconnections".

Any idea is most welcome!

Thanks a lot
-Vincent


Le vendredi 14 février 2020 19:50:27 UTC+1, Ivan Fernandez Calvo a écrit :
Pingthread and some monitoring stuff run every 4 min, I think that the disconnections happens before that process but because there is not activity on this agents is not detected until the pingthread passes. So I guess you have half closed connections, I mean, the agent closes the convention but the master does not received the reset packet. If you are using SSH agents, you can enable the verbose mode on the sshd server to monitor what the heck happens see <a href="https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2Fjenkinsci%2Fssh-slaves-plugin%2Fblob%2Fmaster%2Fdoc%2FTROUBLESHOOTING.md%23common-info-needed-to-troubleshooting-a-bug&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNFInvV2jEZnSZ_-KN3YkxCp6g7igA" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fjenkinsci%2Fssh-slaves-plugin%2Fblob%2Fmaster%2Fdoc%2FTROUBLESHOOTING.md%23common-info-needed-to-troubleshooting-a-bug\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFInvV2jEZnSZ_-KN3YkxCp6g7igA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fjenkinsci%2Fssh-slaves-plugin%2Fblob%2Fmaster%2Fdoc%2FTROUBLESHOOTING.md%23common-info-needed-to-troubleshooting-a-bug\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFInvV2jEZnSZ_-KN3YkxCp6g7igA&#39;;return true;">https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#common-info-needed-to-troubleshooting-a-bug

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/6745b3f8-6da2-49b4-8e99-835fb67315dc%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Frequent "Disconnected computer for node" messages in jenkins logs

Ivan Fernandez Calvo
After seen the log I understand you are asking for the INFO messages that inform that a Docker agent is disconnected, IIRC those messages are normal they only inform about the Docker agent status, you can change the verbose level of the Java package on logs configuration to omit those type of messages if they bother you.
About the other message the InterruptedException, this looks like and issue, but there is not much info to troubleshooting it, you have to monitor those errors and try to find something in common, same job always, same Docker image, Same resources, ... the most common issue is a resources problem, in those cases the container is killed because an OOM error, you can check if this is the case if you can make a Docker inspect of the container.

El 14 feb 2020, a las 21:47, Vincent Massol <[hidden email]> escribió:


Thanks Ivan. We're not using SHH agents but Docker Cloud (the agents are provisioned on the fly as docker containers).

I was indeed looking for how to turn on some debugging on the agent side but I couldn't find anything. Also the agent docker container is removed once the job is finished so it seems even harder to get some info about what's going on.

What I wanted to know is whether what we're experiencing is a normal behavior of Jenkins or not. I'm asking because a lot of our jobs are going fine every day but we stil have several ones that are killed in mid-air every day. For example if I take agent 6 (a6) from https://up1.xwikisas.com/#vI0VAypIpe_tD9LrQRTdMA I can see it's been terminate on 2020-02-10 at:
* 4:44
* 5:06
* 5:24
* 7:45
* 10:06
* 10:24
* etc

Now I don't think we have that many job failures every day. It's more like 1 or 2 per day. So I'm not sure what to think of it. 

I was trying to investigate why we see the following regularly (every day) in our CI job logs:

Cannot contact Jenkins SSH Slave a6-009448n7sqon4: java.lang.InterruptedException
Agent Jenkins SSH Slave a6-009448n7sqon4 was deleted; cancelling node body
Could not connect to Jenkins SSH Slave a6-009448n7sqon4 to send interrupt signal to process

And then I discovered what I've pasted at https://up1.xwikisas.com/#vI0VAypIpe_tD9LrQRTdMA by looking at the jenkins master log file and I went "wow, how come there are so many disconnections".

Any idea is most welcome!

Thanks a lot
-Vincent


Le vendredi 14 février 2020 19:50:27 UTC+1, Ivan Fernandez Calvo a écrit :
Pingthread and some monitoring stuff run every 4 min, I think that the disconnections happens before that process but because there is not activity on this agents is not detected until the pingthread passes. So I guess you have half closed connections, I mean, the agent closes the convention but the master does not received the reset packet. If you are using SSH agents, you can enable the verbose mode on the sshd server to monitor what the heck happens see https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#common-info-needed-to-troubleshooting-a-bug

--
You received this message because you are subscribed to a topic in the Google Groups "Jenkins Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/jenkinsci-users/A1H9vVP-9c4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/6745b3f8-6da2-49b4-8e99-835fb67315dc%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/BBB88F2F-F7A6-4DB0-A7D7-18404B7B7B58%40gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Frequent "Disconnected computer for node" messages in jenkins logs

Vincent Massol
After seen the log I understand you are asking for the INFO messages that inform that a Docker agent is disconnected, IIRC those messages are normal they only inform about the Docker agent status, 

Thanks for your reply. Let me make sure I understand. So the Docker Cloud plugin will spawn new Jenkins Docker agents. It'll stop the agents by using DockerContainerWatchdog thread which regularly tries to connect to the agent and when it fails, it removes the agent. This is what happened in the following example:

2020-02-14 09:13:37.434+0000 [id=268432] INFO i.j.docker.DockerTransientNode$1#println: Disconnected computer for node 'Jenkins SSH Slave a3-0094ebcvu7jkf'.
122020-02-14 09:13:37.434+0000 [id=268243] INFO h.r.SynchronousCommandTransport$ReaderThread#run: I/O error in channel Jenkins SSH Slave a3-0094ebcvu7jkf
13java.net.SocketException: Socket closed
14 at java.net.SocketInputStream.socketRead0(Native Method)
15 at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
16 at java.net.SocketInputStream.read(SocketInputStream.java:171)
17 at java.net.SocketInputStream.read(SocketInputStream.java:141)
18 at io.jenkins.docker.client.DockerMultiplexedInputStream.readInternal(DockerMultiplexedInputStream.java:48)
19 at io.jenkins.docker.client.DockerMultiplexedInputStream.read(DockerMultiplexedInputStream.java:30)
20 at hudson.remoting.FlightRecorderInputStream.read(FlightRecorderInputStream.java:91)
21 at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:72)
22 at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:103)
23 at hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)
24 at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
25 at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
262020-02-14 09:13:37.440+0000 [id=268432] INFO i.j.docker.DockerTransientNode$1#println: Removed Node for node 'Jenkins SSH Slave a3-0094ebcvu7jkf'.

So it means that whether the agent finishes it work or whether there's a connection issue between Jenkins master and the agent, it'll be reported the same in the jenkins.log file. Basically it's the same mechanism for stopping an agent having finished its work or handling a connection error. In both cases it"ll be reported as INFO in the logs too. Right?

you can change the verbose level of the Java package on logs configuration to omit those type of messages if they bother you

Indeed that could be interesting. Now it means we would also not be able to see the real communication errors between master and agents I guess.

Thanks a lot for your help. If you could confirm this it would be great; I'd be able to move forward and move to the next problems (we have plenty of intermittent errors to figure out ;)).

Have a great weekend
-Vincent

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/c7e601cd-b34c-4e75-8d76-fb58439474ef%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Frequent "Disconnected computer for node" messages in jenkins logs

Ivan Fernandez Calvo


> El 16 feb 2020, a las 14:15, Vincent Massol <[hidden email]> escribió:
>
> In both cases it"ll be reported as INFO in the logs too. Right?

It seems like, I didn’t noticed that the exception is also a INFO messages, so the only difference is the stack trace of the exception, the log level is the same.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/6A64DDBC-4F5C-405C-838D-9E22A9462EC7%40gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Frequent "Disconnected computer for node" messages in jenkins logs

Vincent Massol
Hi Ivan,
 
so the only difference is the stack trace of the exception, the log level is the same.

Is it possible that you misunderstood the data at  https://up1.xwikisas.com/#vI0VAypIpe_tD9LrQRTdMA ? :)

As is mentioned there, ALL of the lines are the same as the ones at the top (ie they all have a stack trace). I just didn't put the full lines for the sake of space ;)

This is why I've been asking from the beginning if it's normal :) Usually when there's a stack trace it's not really normal. But it happens so frequently that the only explanation I can think of is that it's the normal behavior of Jenkins.

WDYT?

Thanks again!
-Vincent

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/84491b67-7061-4ad7-a964-c7252dd4c7bf%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Frequent "Disconnected computer for node" messages in jenkins logs

Ivan Fernandez Calvo


El El dom, 16 feb 2020 a las 14:48, Vincent Massol <[hidden email]> escribió:
This is why I've been asking from the beginning if it's normal :) Usually when there's a stack trace it's not really normal. But it happens so frequently that the only explanation I can think of is that it's the normal behavior of Jenkins.

So you see exceptions on every disconnection, I’ve never seen this behavior on the Docker plugin, I’ve seen the distinction messages but without an exception.  
--

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/CAKo5Qrr4%3D3cmCgfKe_MkzjByO7m88znQdFWuV1tYRrZ2Nkmy7A%40mail.gmail.com.