Jenkins Agents getting disconnected

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Jenkins Agents getting disconnected

Sverre Moe-2
Lately we have experienced disconnected Agents.
Running Jenkins LTS 2.150.1
Java 8u181. Same for both Jenkins server and all build agents.

Looking at the log it shows this:

ERROR: [07/04/19 14:47:18] [SSH] Error deleting file. 
java.util.concurrent.TimeoutException 
at java.util.concurrent.FutureTask.get(FutureTask.java:205) 
at hudson.plugins.sshslaves.SSHLauncher.tearDownConnectionImpl(SSHLauncher.java:989) 
at hudson.plugins.sshslaves.SSHLauncher.tearDownConnection(SSHLauncher.java:930) 
at hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(SSHLauncher.java:925) 
at hudson.slaves.SlaveComputer$3.run(SlaveComputer.java:738) 
at jenkins.util.ContextResettingExecutorService$1.run(ContextResettingExecutorService.java:28) 
at jenkins.security.ImpersonatingExecutorService$1.run(ImpersonatingExecutorService.java:59) 
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Relaunching the agent does not work. It just hangs.

I have no problem ssh into the agent server from the Jenkins server.

The only thing that works is restarting Jenkins. We have to do this several times per day now.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/c9d2b342-be51-4f7f-82ad-c4098749fe4f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Karan Kaushik
Hi

We had been facing the same issue with Jenkins agent, one thing I remember doing was managing the space on the jenkins agent, the disconnect could happen due to no space remaining on agent machine.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/68060c35-11bc-429c-b12d-fe11f3e84de1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Sverre Moe-2
On the build agents that gets disconnected there is plenty of available disk space.

When there are trying to connect, there are no remoting.jar java process on the agent running.

lørdag 6. juli 2019 22.59.31 UTC+2 skrev Karan Kaushik følgende:
Hi

We had been facing the same issue with Jenkins agent, one thing I remember doing was managing the space on the jenkins agent, the disconnect could happen due to no space remaining on agent machine.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/22fe9b83-c585-4f1a-8346-471e395ffbce%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Sverre Moe-2
Strange
If I configure the agent, save then try to reconnect it is able to create a connection and is back online.

tirsdag 9. juli 2019 13.20.55 UTC+2 skrev Sverre Moe følgende:
On the build agents that gets disconnected there is plenty of available disk space.

When there are trying to connect, there are no remoting.jar java process on the agent running.

lørdag 6. juli 2019 22.59.31 UTC+2 skrev Karan Kaushik følgende:
Hi

We had been facing the same issue with Jenkins agent, one thing I remember doing was managing the space on the jenkins agent, the disconnect could happen due to no space remaining on agent machine.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/3dea1612-4bbe-49f0-98d4-696856501a90%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Sverre Moe-2
I don't actually have to do anything, judt open Configure, Save, then Relaunch Agent.

fredag 12. juli 2019 13.30.05 UTC+2 skrev Sverre Moe følgende:
Strange
If I configure the agent, save then try to reconnect it is able to create a connection and is back online.

tirsdag 9. juli 2019 13.20.55 UTC+2 skrev Sverre Moe følgende:
On the build agents that gets disconnected there is plenty of available disk space.

When there are trying to connect, there are no remoting.jar java process on the agent running.

lørdag 6. juli 2019 22.59.31 UTC+2 skrev Karan Kaushik følgende:
Hi

We had been facing the same issue with Jenkins agent, one thing I remember doing was managing the space on the jenkins agent, the disconnect could happen due to no space remaining on agent machine.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/e430054e-21a8-47ac-9bcc-bec37459f8ed%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Sverre Moe-2
Also when this happens, even after I have managed to relaunch the agent, no build can run on it. 
It stops on "Waiting for next available executor on ‘node-name’", even though it is online.
the previous build I stopped is still on the executor. The only solution is to restart Jenkins.

fredag 12. juli 2019 14.23.24 UTC+2 skrev Sverre Moe følgende:
I don't actually have to do anything, judt open Configure, Save, then Relaunch Agent.

fredag 12. juli 2019 13.30.05 UTC+2 skrev Sverre Moe følgende:
Strange
If I configure the agent, save then try to reconnect it is able to create a connection and is back online.

tirsdag 9. juli 2019 13.20.55 UTC+2 skrev Sverre Moe følgende:
On the build agents that gets disconnected there is plenty of available disk space.

When there are trying to connect, there are no remoting.jar java process on the agent running.

lørdag 6. juli 2019 22.59.31 UTC+2 skrev Karan Kaushik følgende:
Hi

We had been facing the same issue with Jenkins agent, one thing I remember doing was managing the space on the jenkins agent, the disconnect could happen due to no space remaining on agent machine.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/7707010a-2d74-4ea7-9e21-64f5c69604d6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Ivan Fernandez Calvo
Hi,

You do not need to save the configuration to force the disconnection, you can use the disconnection REST call URL see https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#force-disconnection

About the disconnection error, this trace is the last error after the disconnection but it is not the cause before this error should be another that it is what causes the disconnection. the error that you show it is because there is no connection to the agent and it is not possible to remove the remoting.jar file. Try to grab the info I need to troubleshooting this kind of issues see https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#common-info-needed-to-troubleshooting-a-bug

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/2f8395ff-5c0a-4073-9832-3d33d2e03602%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Ivan Fernandez Calvo
I saw that you have another question related with OOM errors in Jenkins if it is the same instance , this is your real issue with the agents, until you do not have a stable Jenkins instance the agent disconnection will be a side effect.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/18982851-c950-4667-8c7d-458a4e66e986%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Sverre Moe-2
I suspected it might be related, but was not sure. 

The odd thing this just started being a problem a week ago. Nothing as far as I can see has changed on the Jenkins server.

lørdag 13. juli 2019 13.04.44 UTC+2 skrev Ivan Fernandez Calvo følgende:
I saw that you have another question related with OOM errors in Jenkins if it is the same instance , this is your real issue with the agents, until you do not have a stable Jenkins instance the agent disconnection will be a side effect.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/661096a3-4689-4162-8d75-e644090d568a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Sverre Moe-2
We have had to blissfull days of stable Jenkins. Today two nodes are disconnected and they will not come back online.

What is strange is it is the same two-three nodes every time.
Running disconnect on them through the URL http://jenkins.example.com/jenkins/computer/NODE_NAME/disconnect, does not work.
I have to enter configuration, Save, then relaunch to get them up running.

I tried setting the ulimit values as suggested in
https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support#bulimitsettingsjustforlinuxos

I have also added additional JVM options as suggested in
https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support#ajavaparameters
https://go.cloudbees.com/docs/solutions/jvm-troubleshooting/

The number of threads of Jenkins server is currently 265. Yesterday when all was fine this was up to 300.


Maybe ralted or unrelated:
When this happens we have some builds on other nodes that stops working. They are aborted, but are still showing as running. The only thing that works is deleting the agent and creating it again, that or restarting Jenkins.


søndag 14. juli 2019 13.31.51 UTC+2 skrev Sverre Moe følgende:
I suspected it might be related, but was not sure. 

The odd thing this just started being a problem a week ago. Nothing as far as I can see has changed on the Jenkins server.

lørdag 13. juli 2019 13.04.44 UTC+2 skrev Ivan Fernandez Calvo følgende:
I saw that you have another question related with OOM errors in Jenkins if it is the same instance , this is your real issue with the agents, until you do not have a stable Jenkins instance the agent disconnection will be a side effect.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/f64ce739-9706-4fc3-8c39-02b93cd45253%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Sverre Moe-2
It seems to be the monitoring that gets the agents disconnected.

Got this in my log file this last time they got disconnectd.

Jul 17, 2019 11:58:22 AM hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler uncaughtExc
eption
SEVERE: A thread (Timer-3450/103166) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a
bad way and is usually indicative of a bug in the code.
java.lang.OutOfMemoryError: unable to create new native thread
       at java.lang.Thread.start0(Native Method)
       at java.lang.Thread.start(Thread.java:717)
       at java.util.Timer.<init>(Timer.java:160)
       at java.util.Timer.<init>(Timer.java:132)
       at org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.scheduleRetryQueueProcessing(EventDispatcher.java:296
)
       at org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.processRetries(EventDispatcher.java:437)
       at org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$1.run(EventDispatcher.java:299)
       at java.util.TimerThread.mainLoop(Timer.java:555)
       at java.util.TimerThread.run(Timer.java:505)

Jul 17, 2019 11:58:31 AM hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler uncaughtExc
eption
SEVERE: A thread (Thread-30062/98187) died unexpectedly due to an uncaught exception, this may leave your Jenkins in  
a bad way and is usually indicative of a bug in the code.
java.lang.OutOfMemoryError: unable to create new native thread
       at java.lang.Thread.start0(Native Method)
       at java.lang.Thread.start(Thread.java:717)
       at com.trilead.ssh2.transport.TransportManager.sendAsynchronousMessage(TransportManager.java:649)
       at com.trilead.ssh2.channel.ChannelManager.msgChannelRequest(ChannelManager.java:1213)
       at com.trilead.ssh2.channel.ChannelManager.handleMessage(ChannelManager.java:1466)
       at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:809)
       at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:502)
       at java.lang.Thread.run(Thread.java:748)


Now I have gotten catastrophic failure. I cannot relaunch any agents any more.

[07/17/19 12:04:10] [SSH] Opening SSH connection to jbssles120x64r12.spacetec.no:22.
ERROR: Unexpected error in launching a agent. This is probably a bug in Jenkins.
java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:717)
	at com.trilead.ssh2.transport.TransportManager.initialize(TransportManager.java:545)
	at com.trilead.ssh2.Connection.connect(Connection.java:774)
	at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:817)
	at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:419)
	at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:406)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
[07/17/19 12:04:10] Launch failed - cleaning up connection
[07/17/19 12:04:10] [SSH] Connection closed.

My Jenkins server has over 500 threads open
Threads: 506 total,   0 running, 506 sleeping,   0 stopped,   0 zombie


onsdag 17. juli 2019 10.24.12 UTC+2 skrev Sverre Moe følgende:
We have had to blissfull days of stable Jenkins. Today two nodes are disconnected and they will not come back online.

What is strange is it is the same two-three nodes every time.
Running disconnect on them through the URL <a href="http://jenkins.example.com/jenkins/computer/NODE_NAME/disconnect" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fjenkins.example.com%2Fjenkins%2Fcomputer%2FNODE_NAME%2Fdisconnect\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFqsPagB8FlSMDaNRMjusd0JARaCQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fjenkins.example.com%2Fjenkins%2Fcomputer%2FNODE_NAME%2Fdisconnect\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFqsPagB8FlSMDaNRMjusd0JARaCQ&#39;;return true;">http://jenkins.example.com/jenkins/computer/NODE_NAME/disconnect, does not work.
I have to enter configuration, Save, then relaunch to get them up running.

I tried setting the ulimit values as suggested in
<a href="https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support#bulimitsettingsjustforlinuxos" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fsupport.cloudbees.com%2Fhc%2Fen-us%2Farticles%2F222446987-Prepare-Jenkins-for-Support%23bulimitsettingsjustforlinuxos\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGHZMO36L8VcyFlmIvtPNJtR4KcFw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fsupport.cloudbees.com%2Fhc%2Fen-us%2Farticles%2F222446987-Prepare-Jenkins-for-Support%23bulimitsettingsjustforlinuxos\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGHZMO36L8VcyFlmIvtPNJtR4KcFw&#39;;return true;">https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support#bulimitsettingsjustforlinuxos

I have also added additional JVM options as suggested in
<a href="https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support#ajavaparameters" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fsupport.cloudbees.com%2Fhc%2Fen-us%2Farticles%2F222446987-Prepare-Jenkins-for-Support%23ajavaparameters\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFMbXrQKA5Oh6NC-05Zt_Q-JZy4tQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fsupport.cloudbees.com%2Fhc%2Fen-us%2Farticles%2F222446987-Prepare-Jenkins-for-Support%23ajavaparameters\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFMbXrQKA5Oh6NC-05Zt_Q-JZy4tQ&#39;;return true;">https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support#ajavaparameters
<a href="https://go.cloudbees.com/docs/solutions/jvm-troubleshooting/" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgo.cloudbees.com%2Fdocs%2Fsolutions%2Fjvm-troubleshooting%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHsv02RmI49aR51SSvM_ZOi-nXqxQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgo.cloudbees.com%2Fdocs%2Fsolutions%2Fjvm-troubleshooting%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHsv02RmI49aR51SSvM_ZOi-nXqxQ&#39;;return true;">https://go.cloudbees.com/docs/solutions/jvm-troubleshooting/

The number of threads of Jenkins server is currently 265. Yesterday when all was fine this was up to 300.


Maybe ralted or unrelated:
When this happens we have some builds on other nodes that stops working. They are aborted, but are still showing as running. The only thing that works is deleting the agent and creating it again, that or restarting Jenkins.


søndag 14. juli 2019 13.31.51 UTC+2 skrev Sverre Moe følgende:
I suspected it might be related, but was not sure. 

The odd thing this just started being a problem a week ago. Nothing as far as I can see has changed on the Jenkins server.

lørdag 13. juli 2019 13.04.44 UTC+2 skrev Ivan Fernandez Calvo følgende:
I saw that you have another question related with OOM errors in Jenkins if it is the same instance , this is your real issue with the agents, until you do not have a stable Jenkins instance the agent disconnection will be a side effect.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/9800172e-4ed4-40b3-80b1-76a26ba8591e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Sverre Moe-2
I ran jstack on Jenkins, and many of the threads had state BLOCKED.
However after a restart most of the threads are BLOCKED. Not sure if it is an issue here.

After a restart Jenkins starts with aprox 200 threads open.
When I got problem with disconnected agents, the thread count reached 500.

onsdag 17. juli 2019 12.40.14 UTC+2 skrev Sverre Moe følgende:
It seems to be the monitoring that gets the agents disconnected.

Got this in my log file this last time they got disconnectd.

Jul 17, 2019 11:58:22 AM hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler uncaughtExc
eption
SEVERE: A thread (Timer-3450/103166) died unexpectedly due to an uncaught exception, this may leave your Jenkins in a
bad way and is usually indicative of a bug in the code.
java.lang.OutOfMemoryError: unable to create new native thread
       at java.lang.Thread.start0(Native Method)
       at java.lang.Thread.start(Thread.java:717)
       at java.util.Timer.<init>(Timer.java:160)
       at java.util.Timer.<init>(Timer.java:132)
       at org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.scheduleRetryQueueProcessing(EventDispatcher.java:296
)
       at org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.processRetries(EventDispatcher.java:437)
       at org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$1.run(EventDispatcher.java:299)
       at java.util.TimerThread.mainLoop(Timer.java:555)
       at java.util.TimerThread.run(Timer.java:505)

Jul 17, 2019 11:58:31 AM hudson.init.impl.InstallUncaughtExceptionHandler$DefaultUncaughtExceptionHandler uncaughtExc
eption
SEVERE: A thread (Thread-30062/98187) died unexpectedly due to an uncaught exception, this may leave your Jenkins in  
a bad way and is usually indicative of a bug in the code.
java.lang.OutOfMemoryError: unable to create new native thread
       at java.lang.Thread.start0(Native Method)
       at java.lang.Thread.start(Thread.java:717)
       at com.trilead.ssh2.transport.TransportManager.sendAsynchronousMessage(TransportManager.java:649)
       at com.trilead.ssh2.channel.ChannelManager.msgChannelRequest(ChannelManager.java:1213)
       at com.trilead.ssh2.channel.ChannelManager.handleMessage(ChannelManager.java:1466)
       at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:809)
       at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:502)
       at java.lang.Thread.run(Thread.java:748)


Now I have gotten catastrophic failure. I cannot relaunch any agents any more.

[07/17/19 12:04:10] [SSH] Opening SSH connection to <a href="http://jbssles120x64r12.spacetec.no:22" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fjbssles120x64r12.spacetec.no%3A22\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH13yQx4nxA7afT5_ziJmv6oxkHLg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fjbssles120x64r12.spacetec.no%3A22\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH13yQx4nxA7afT5_ziJmv6oxkHLg&#39;;return true;">jbssles120x64r12.spacetec.no:22.
ERROR: Unexpected error in launching a agent. This is probably a bug in Jenkins.
java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method)
	at java.lang.Thread.start(Thread.java:717)
	at com.trilead.ssh2.transport.TransportManager.initialize(TransportManager.java:545)
	at com.trilead.ssh2.Connection.connect(Connection.java:774)
	at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:817)
	at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:419)
	at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:406)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
[07/17/19 12:04:10] Launch failed - cleaning up connection
[07/17/19 12:04:10] [SSH] Connection closed.

My Jenkins server has over 500 threads open
Threads: 506 total,   0 running, 506 sleeping,   0 stopped,   0 zombie


onsdag 17. juli 2019 10.24.12 UTC+2 skrev Sverre Moe følgende:
We have had to blissfull days of stable Jenkins. Today two nodes are disconnected and they will not come back online.

What is strange is it is the same two-three nodes every time.
Running disconnect on them through the URL <a href="http://jenkins.example.com/jenkins/computer/NODE_NAME/disconnect" rel="nofollow" target="_blank" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fjenkins.example.com%2Fjenkins%2Fcomputer%2FNODE_NAME%2Fdisconnect\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFqsPagB8FlSMDaNRMjusd0JARaCQ&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fjenkins.example.com%2Fjenkins%2Fcomputer%2FNODE_NAME%2Fdisconnect\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFqsPagB8FlSMDaNRMjusd0JARaCQ&#39;;return true;">http://jenkins.example.com/jenkins/computer/NODE_NAME/disconnect, does not work.
I have to enter configuration, Save, then relaunch to get them up running.

I tried setting the ulimit values as suggested in
<a href="https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support#bulimitsettingsjustforlinuxos" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fsupport.cloudbees.com%2Fhc%2Fen-us%2Farticles%2F222446987-Prepare-Jenkins-for-Support%23bulimitsettingsjustforlinuxos\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGHZMO36L8VcyFlmIvtPNJtR4KcFw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fsupport.cloudbees.com%2Fhc%2Fen-us%2Farticles%2F222446987-Prepare-Jenkins-for-Support%23bulimitsettingsjustforlinuxos\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGHZMO36L8VcyFlmIvtPNJtR4KcFw&#39;;return true;">https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support#bulimitsettingsjustforlinuxos

I have also added additional JVM options as suggested in
<a href="https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support#ajavaparameters" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fsupport.cloudbees.com%2Fhc%2Fen-us%2Farticles%2F222446987-Prepare-Jenkins-for-Support%23ajavaparameters\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFMbXrQKA5Oh6NC-05Zt_Q-JZy4tQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fsupport.cloudbees.com%2Fhc%2Fen-us%2Farticles%2F222446987-Prepare-Jenkins-for-Support%23ajavaparameters\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFMbXrQKA5Oh6NC-05Zt_Q-JZy4tQ&#39;;return true;">https://support.cloudbees.com/hc/en-us/articles/222446987-Prepare-Jenkins-for-Support#ajavaparameters
<a href="https://go.cloudbees.com/docs/solutions/jvm-troubleshooting/" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgo.cloudbees.com%2Fdocs%2Fsolutions%2Fjvm-troubleshooting%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHsv02RmI49aR51SSvM_ZOi-nXqxQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgo.cloudbees.com%2Fdocs%2Fsolutions%2Fjvm-troubleshooting%2F\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHsv02RmI49aR51SSvM_ZOi-nXqxQ&#39;;return true;">https://go.cloudbees.com/docs/solutions/jvm-troubleshooting/

The number of threads of Jenkins server is currently 265. Yesterday when all was fine this was up to 300.


Maybe ralted or unrelated:
When this happens we have some builds on other nodes that stops working. They are aborted, but are still showing as running. The only thing that works is deleting the agent and creating it again, that or restarting Jenkins.


søndag 14. juli 2019 13.31.51 UTC+2 skrev Sverre Moe følgende:
I suspected it might be related, but was not sure. 

The odd thing this just started being a problem a week ago. Nothing as far as I can see has changed on the Jenkins server.

lørdag 13. juli 2019 13.04.44 UTC+2 skrev Ivan Fernandez Calvo følgende:
I saw that you have another question related with OOM errors in Jenkins if it is the same instance , this is your real issue with the agents, until you do not have a stable Jenkins instance the agent disconnection will be a side effect.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/78dc2517-d4e0-4d1b-939f-b0546c796807%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Ivan Fernandez Calvo
Those BLOCKED threads should be related to some plugin or class, see the stack trace on the thread dump to try to figure out which one is, then seems the root cause of your problem.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/37363a77-5e73-409e-97e0-f92ed7c10312%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Sverre Moe-2
I cannot see any specific plugins in the stacktrace.
There are several duplicate threads. Here are some of them.
Most common denominator seems to be about SSH.

Thread 29360: (state = BLOCKED)

- java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
- java.lang.Object.wait() @bci=2, line=502 (Compiled frame)
- java.util.TimerThread.mainLoop() @bci=28, line=526 (Compiled frame)
- java.util.TimerThread.run() @bci=1, line=505 (Compiled frame)

Thread 29339: (state = BLOCKED)
- hudson.plugins.sshslaves.SSHLauncher.launch(hudson.slaves.SlaveComputer, hudson.model.TaskListener) @bci=25, line=401 (Compiled frame)
- hudson.slaves.SlaveComputer$1.call() @bci=88, line=294 (Compiled frame)
- jenkins.util.ContextResettingExecutorService$2.call() @bci=18, line=46 (Compiled frame)
- jenkins.security.ImpersonatingExecutorService$2.call() @bci=17, line=71 (Compiled frame)
- java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
- java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1149 (Compiled frame)
- java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 (Compiled frame)
- java.lang.Thread.run() @bci=11, line=748 (Compiled frame)

Thread 29122: (state = BLOCKED)
- java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
- java.lang.Object.wait() @bci=2, line=502 (Compiled frame)
- com.trilead.ssh2.channel.ChannelManager.waitUntilChannelOpen(com.trilead.ssh2.channel.Channel) @bci=13, line=110 (Compiled frame)
- com.trilead.ssh2.channel.ChannelManager.openSessionChannel() @bci=109, line=574 (Compiled frame)
- com.trilead.ssh2.Session.<init>(com.trilead.ssh2.channel.ChannelManager, java.security.SecureRandom) @bci=36, line=42 (Compiled frame)
- com.trilead.ssh2.Connection.openSession() @bci=46, line=1145 (Compiled frame)
- com.trilead.ssh2.Connection.exec(java.lang.String, java.io.OutputStream) @bci=1, line=1566 (Compiled frame)
- hudson.plugins.sshslaves.SSHLauncher$3.run() @bci=79, line=969 (Compiled frame)
- jenkins.util.ContextResettingExecutorService$1.run() @bci=18, line=28 (Compiled frame)
- jenkins.security.ImpersonatingExecutorService$1.run() @bci=17, line=59 (Compiled frame)
- java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame)
- java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
- java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1149 (Compiled frame)
- java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 (Compiled frame)
- java.lang.Thread.run() @bci=11, line=748 (Compiled frame)

Thread 28586: (state = BLOCKED)
- sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may be imprecise)
- java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long) @bci=20, line=215 (Compiled frame)
- java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(long, java.util.concurrent.TimeUnit) @bci=97, line=2163 (Compiled frame)
- org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.reservedWait() @bci=97, line=292 (Compiled frame)
- org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run() @bci=188, line=357 (Compiled frame)
- org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(java.lang.Runnable) @bci=1, line=765 (Compiled frame)
- org.eclipse.jetty.util.thread.QueuedThreadPool$2.run() @bci=104, line=683 (Compiled frame)
- java.lang.Thread.run() @bci=11, line=748 (Compiled frame)

Thread 28324: (state = BLOCKED)
- java.lang.Thread.sleep(long) @bci=0 (Compiled frame; information may be imprecise)
- hudson.remoting.PingThread.run() @bci=38, line=95 (Compiled frame)

Thread 27552: (state = BLOCKED)
- com.trilead.ssh2.Connection.close() @bci=0, line=573 (Compiled frame)
- hudson.plugins.sshslaves.SSHLauncher.cleanupConnection(hudson.model.TaskListener) @bci=11, line=511 (Compiled frame)
- hudson.plugins.sshslaves.SSHLauncher.tearDownConnectionImpl(hudson.slaves.SlaveComputer, hudson.model.TaskListener) @bci=345, line=1006 (Compiled frame)
- hudson.plugins.sshslaves.SSHLauncher.tearDownConnection(hudson.slaves.SlaveComputer, hudson.model.TaskListener) @bci=10, line=930 (Compiled frame)
- hudson.plugins.sshslaves.SSHLauncher.afterDisconnect(hudson.slaves.SlaveComputer, hudson.model.TaskListener) @bci=50, line=925 (Compiled frame)
- hudson.slaves.SlaveComputer$3.run() @bci=46, line=738 (Compiled frame)
- jenkins.util.ContextResettingExecutorService$1.run() @bci=18, line=28 (Compiled frame)
- jenkins.security.ImpersonatingExecutorService$1.run() @bci=17, line=59 (Compiled frame)
- java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 (Compiled frame)
- java.util.concurrent.FutureTask.run() @bci=42, line=266 (Compiled frame)
- java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1149 (Compiled frame)
- java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 (Compiled frame)
- java.lang.Thread.run() @bci=11, line=748 (Compiled frame)

Thread 16047: (state = BLOCKED)
- java.lang.Object.wait(long) @bci=0 (Compiled frame; information may be imprecise)
- hudson.remoting.Request$1.get(long, java.util.concurrent.TimeUnit) @bci=109, line=312 (Compiled frame)
- hudson.remoting.Request$1.get(long, java.util.concurrent.TimeUnit) @bci=3, line=240 (Compiled frame)
- hudson.remoting.FutureAdapter.get(long, java.util.concurrent.TimeUnit) @bci=7, line=59 (Compiled frame)
- net.bull.javamelody.RemoteCallHelper.collectDataByNodeName(hudson.remoting.Callable) @bci=242, line=205 (Compiled frame)
- net.bull.javamelody.RemoteCallHelper.collectJavaInformationsListByName() @bci=4, line=217 (Compiled frame)
- net.bull.javamelody.NodesCollector.collectWithoutErrorsNow() @bci=9, line=159 (Compiled frame)
- net.bull.javamelody.NodesCollector.collectWithoutErrors() @bci=9, line=147 (Compiled frame)
- net.bull.javamelody.NodesCollector$1.run() @bci=4, line=91 (Compiled frame)
- java.util.TimerThread.mainLoop() @bci=221, line=555 (Compiled frame)
- java.util.TimerThread.run() @bci=1, line=505 (Interpreted frame)


onsdag 17. juli 2019 13.55.39 UTC+2 skrev Ivan Fernandez Calvo følgende:
Those BLOCKED threads should be related to some plugin or class, see the stack trace on the thread dump to try to figure out which one is, then seems the root cause of your problem.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/19f23486-f1c5-49ec-a9a6-c9fa7502f035%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Ivan Fernandez Calvo
In that dump I can not see which thread is blocking the others, the jstack output has a reference on each thread that said what thread is the blocker on each thread (- locked <0x00000000> a java.lang.Object), you can try to analyze those thread dump with https://fastthread.io/index.jsp or other online tools to see if you see something relevant, it looks like there is a deadlock.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/be4b2df4-3d8e-4eef-a988-f96ec7fbe38b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Sverre Moe-2
There is no such reference in my jstack output.
The output says no deadlock detected.
I will try that site for analyzing the jstack.

Even a normal running Jenkins has many BLOCKED threads. If that is normal I don't know.

We have a test Jenkins instance running on Java 11. That one does not have any BLOCKED threads.
Our production Jenkins is running Java 8u181.

torsdag 18. juli 2019 11.04.16 UTC+2 skrev Ivan Fernandez Calvo følgende:
In that dump I can not see which thread is blocking the others, the jstack output has a reference on each thread that said what thread is the blocker on each thread (- locked <0x00000000> a java.lang.Object), you can try to analyze those thread dump with <a href="https://fastthread.io/index.jsp" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Ffastthread.io%2Findex.jsp\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFBRxyrf5n6g5TIS8Xj5kzBcs3aLA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Ffastthread.io%2Findex.jsp\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFBRxyrf5n6g5TIS8Xj5kzBcs3aLA&#39;;return true;">https://fastthread.io/index.jsp or other online tools to see if you see something relevant, it looks like there is a deadlock.

<a href="https://dzone.com/articles/how-to-read-a-thread-dump" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdzone.com%2Farticles%2Fhow-to-read-a-thread-dump\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEQRJXlP5cu4_K-80zIZGmOWbJRGg&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdzone.com%2Farticles%2Fhow-to-read-a-thread-dump\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEQRJXlP5cu4_K-80zIZGmOWbJRGg&#39;;return true;">https://dzone.com/articles/how-to-read-a-thread-dump

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/d1a855fe-34fb-4e88-973f-8c2b8fa0ab22%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Sverre Moe-2
I was unable to determine something from the stack output
Here is the result: https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMDcvMjkvLS1qc3RhY2sudHh0LS05LTE2LTI3

torsdag 18. juli 2019 11.28.06 UTC+2 skrev Sverre Moe følgende:
There is no such reference in my jstack output.
The output says no deadlock detected.
I will try that site for analyzing the jstack.

Even a normal running Jenkins has many BLOCKED threads. If that is normal I don't know.

We have a test Jenkins instance running on Java 11. That one does not have any BLOCKED threads.
Our production Jenkins is running Java 8u181.

torsdag 18. juli 2019 11.04.16 UTC+2 skrev Ivan Fernandez Calvo følgende:
In that dump I can not see which thread is blocking the others, the jstack output has a reference on each thread that said what thread is the blocker on each thread (- locked <0x00000000> a java.lang.Object), you can try to analyze those thread dump with <a href="https://fastthread.io/index.jsp" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Ffastthread.io%2Findex.jsp\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFBRxyrf5n6g5TIS8Xj5kzBcs3aLA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Ffastthread.io%2Findex.jsp\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFBRxyrf5n6g5TIS8Xj5kzBcs3aLA&#39;;return true;">https://fastthread.io/index.jsp or other online tools to see if you see something relevant, it looks like there is a deadlock.

<a href="https://dzone.com/articles/how-to-read-a-thread-dump" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdzone.com%2Farticles%2Fhow-to-read-a-thread-dump\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEQRJXlP5cu4_K-80zIZGmOWbJRGg&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdzone.com%2Farticles%2Fhow-to-read-a-thread-dump\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEQRJXlP5cu4_K-80zIZGmOWbJRGg&#39;;return true;">https://dzone.com/articles/how-to-read-a-thread-dump

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/9ae17af0-1ed6-4365-8050-aef2b025d6cf%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Ivan Fernandez Calvo
you have 83 threads in state:IN_NATIVE, probably stuck in IO operations, those 83 threads are blocking the other 382 threads, if you use an NFS or similar device for you Jenkins HOME this is probably your bottleneck, if not check the IO stats on the OS to see where you have the bottleneck.

El lunes, 29 de julio de 2019, 11:20:50 (UTC+2), Sverre Moe escribió:
I was unable to determine something from the stack output
Here is the result: <a href="https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMDcvMjkvLS1qc3RhY2sudHh0LS05LTE2LTI3" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Ffastthread.io%2Fmy-thread-report.jsp%3Fp%3Dc2hhcmVkLzIwMTkvMDcvMjkvLS1qc3RhY2sudHh0LS05LTE2LTI3\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHlQoB9XJrQLNkQ_33CAaNs2mbR8g&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Ffastthread.io%2Fmy-thread-report.jsp%3Fp%3Dc2hhcmVkLzIwMTkvMDcvMjkvLS1qc3RhY2sudHh0LS05LTE2LTI3\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHlQoB9XJrQLNkQ_33CAaNs2mbR8g&#39;;return true;">https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMDcvMjkvLS1qc3RhY2sudHh0LS05LTE2LTI3

torsdag 18. juli 2019 11.28.06 UTC+2 skrev Sverre Moe følgende:
There is no such reference in my jstack output.
The output says no deadlock detected.
I will try that site for analyzing the jstack.

Even a normal running Jenkins has many BLOCKED threads. If that is normal I don't know.

We have a test Jenkins instance running on Java 11. That one does not have any BLOCKED threads.
Our production Jenkins is running Java 8u181.

torsdag 18. juli 2019 11.04.16 UTC+2 skrev Ivan Fernandez Calvo følgende:
In that dump I can not see which thread is blocking the others, the jstack output has a reference on each thread that said what thread is the blocker on each thread (- locked <0x00000000> a java.lang.Object), you can try to analyze those thread dump with <a href="https://fastthread.io/index.jsp" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Ffastthread.io%2Findex.jsp\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFBRxyrf5n6g5TIS8Xj5kzBcs3aLA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Ffastthread.io%2Findex.jsp\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFBRxyrf5n6g5TIS8Xj5kzBcs3aLA&#39;;return true;">https://fastthread.io/index.jsp or other online tools to see if you see something relevant, it looks like there is a deadlock.

<a href="https://dzone.com/articles/how-to-read-a-thread-dump" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdzone.com%2Farticles%2Fhow-to-read-a-thread-dump\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEQRJXlP5cu4_K-80zIZGmOWbJRGg&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdzone.com%2Farticles%2Fhow-to-read-a-thread-dump\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEQRJXlP5cu4_K-80zIZGmOWbJRGg&#39;;return true;">https://dzone.com/articles/how-to-read-a-thread-dump

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/1d477738-f36f-4464-82ea-6411884c6a31%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

Sverre Moe-2
Yes, we are using NFS for JENKINS_HOME.

mandag 29. juli 2019 15.41.00 UTC+2 skrev Ivan Fernandez Calvo følgende:
you have 83 threads in state:IN_NATIVE, probably stuck in IO operations, those 83 threads are blocking the other 382 threads, if you use an NFS or similar device for you Jenkins HOME this is probably your bottleneck, if not check the IO stats on the OS to see where you have the bottleneck.

El lunes, 29 de julio de 2019, 11:20:50 (UTC+2), Sverre Moe escribió:
I was unable to determine something from the stack output
Here is the result: <a href="https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMDcvMjkvLS1qc3RhY2sudHh0LS05LTE2LTI3" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Ffastthread.io%2Fmy-thread-report.jsp%3Fp%3Dc2hhcmVkLzIwMTkvMDcvMjkvLS1qc3RhY2sudHh0LS05LTE2LTI3\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHlQoB9XJrQLNkQ_33CAaNs2mbR8g&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Ffastthread.io%2Fmy-thread-report.jsp%3Fp%3Dc2hhcmVkLzIwMTkvMDcvMjkvLS1qc3RhY2sudHh0LS05LTE2LTI3\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHlQoB9XJrQLNkQ_33CAaNs2mbR8g&#39;;return true;">https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMDcvMjkvLS1qc3RhY2sudHh0LS05LTE2LTI3

torsdag 18. juli 2019 11.28.06 UTC+2 skrev Sverre Moe følgende:
There is no such reference in my jstack output.
The output says no deadlock detected.
I will try that site for analyzing the jstack.

Even a normal running Jenkins has many BLOCKED threads. If that is normal I don't know.

We have a test Jenkins instance running on Java 11. That one does not have any BLOCKED threads.
Our production Jenkins is running Java 8u181.

torsdag 18. juli 2019 11.04.16 UTC+2 skrev Ivan Fernandez Calvo følgende:
In that dump I can not see which thread is blocking the others, the jstack output has a reference on each thread that said what thread is the blocker on each thread (- locked <0x00000000> a java.lang.Object), you can try to analyze those thread dump with <a href="https://fastthread.io/index.jsp" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Ffastthread.io%2Findex.jsp\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFBRxyrf5n6g5TIS8Xj5kzBcs3aLA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Ffastthread.io%2Findex.jsp\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFBRxyrf5n6g5TIS8Xj5kzBcs3aLA&#39;;return true;">https://fastthread.io/index.jsp or other online tools to see if you see something relevant, it looks like there is a deadlock.

<a href="https://dzone.com/articles/how-to-read-a-thread-dump" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdzone.com%2Farticles%2Fhow-to-read-a-thread-dump\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEQRJXlP5cu4_K-80zIZGmOWbJRGg&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fdzone.com%2Farticles%2Fhow-to-read-a-thread-dump\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEQRJXlP5cu4_K-80zIZGmOWbJRGg&#39;;return true;">https://dzone.com/articles/how-to-read-a-thread-dump

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/cc2d0bdb-b15f-4bec-a0a3-0562ea8c7df7%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Jenkins Agents getting disconnected

slide
CloudBees (not my employer) has some resources on using NFS (generally the recommendation is to NOT use NFS for JENKINS_HOME). 

and

On Mon, Jul 29, 2019 at 8:51 AM Sverre Moe <[hidden email]> wrote:
Yes, we are using NFS for JENKINS_HOME.

mandag 29. juli 2019 15.41.00 UTC+2 skrev Ivan Fernandez Calvo følgende:
you have 83 threads in state:IN_NATIVE, probably stuck in IO operations, those 83 threads are blocking the other 382 threads, if you use an NFS or similar device for you Jenkins HOME this is probably your bottleneck, if not check the IO stats on the OS to see where you have the bottleneck.

El lunes, 29 de julio de 2019, 11:20:50 (UTC+2), Sverre Moe escribió:
I was unable to determine something from the stack output
Here is the result: https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMDcvMjkvLS1qc3RhY2sudHh0LS05LTE2LTI3

torsdag 18. juli 2019 11.28.06 UTC+2 skrev Sverre Moe følgende:
There is no such reference in my jstack output.
The output says no deadlock detected.
I will try that site for analyzing the jstack.

Even a normal running Jenkins has many BLOCKED threads. If that is normal I don't know.

We have a test Jenkins instance running on Java 11. That one does not have any BLOCKED threads.
Our production Jenkins is running Java 8u181.

torsdag 18. juli 2019 11.04.16 UTC+2 skrev Ivan Fernandez Calvo følgende:
In that dump I can not see which thread is blocking the others, the jstack output has a reference on each thread that said what thread is the blocker on each thread (- locked <0x00000000> a java.lang.Object), you can try to analyze those thread dump with https://fastthread.io/index.jsp or other online tools to see if you see something relevant, it looks like there is a deadlock.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/cc2d0bdb-b15f-4bec-a0a3-0562ea8c7df7%40googlegroups.com.


--

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/CAPiUgVcY3HW1qCfGrMO2gCm9e-%2BvxzQJEk1E61xc1DDGk7ngQQ%40mail.gmail.com.
12