Hi, in my Jenkins I am regularly facing master/slave connection drops with a message like: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on JNLP4-connect connection from IP/IP:58344 failed. The channel is closing down or has closed down. I have seen a lot of bug-reports on this. For most, a workaround is advised by disabling the Ping-Thread through setting: master: -Dhudson.slaves.ChannelPinger.pingInterval=-1 slaves: -Dhudson.remoting.Launcher.pingIntervalSec=-1 I also found a link indicating that I can increase the timeout value (default: 240) on the master: hudson.slaves.ChannelPinger.pingTimeoutSeconds I am wondering if this would be a better approach ? And, is there also a slave setting for the timeoutvalue? (naming for all these settings does not look to be very consistent...) Thx, M You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/1032035216.2364985.1586324783151%40mail.yahoo.com. |
On 4/7/20 11:46 PM, 'monger_39' via
Jenkins Users wrote:
Usually these are caused by something external to the Remoting communication protocol. Most often by something in the system or networking environment. Sometimes by some bad interaction between plugins that ends up impacting the channel. Your best approach is to figure out where these disconnects originate and resolve the issue. You should be cautious about changing the ping settings or disabling it entirely. It can cause some weird and unexpected behaviors. If you do change the settings, I recommend you change one thing at a time and evaluate the results. If it doesn't make any difference, restore it to its default setting. It depends on how you launch the agent. Remoting system properties are described at https://github.com/jenkinsci/remoting/blob/master/docs/configuration.md
Unfortunately, that's the case. Jeff Thompson You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/ab43b555-176c-4834-e125-fb66ff612f4d%40cloudbees.com. |
Hi Jeff, thx. Last week I disabled the ping-thread on master and slaves by setting the interval to '-1'. Unfortunately, over the weekend, again one of the slaves (even though the jobs kept on running), went into 'offline' mode. It seems indeed that this does not solve the issue. Or, iow I think it means that the disconnect was not caused by the ping-thread(s) timing out. Which puts me to the challenge to figure out what could be this 'external someting' that you mention that would break the remoting. And I honestly have no idea how to tackle that yet. The master, as well as the slave are Windows server VM's running 6 executor slots each. The tests we are running heavily use TCP communication. Any idea how to tackle this ? thx, M.
On Thursday, April 9, 2020, 10:53:48 PM GMT+2, Jeff Thompson <[hidden email]> wrote:
On 4/7/20 11:46 PM, 'monger_39' via
Jenkins Users wrote:
Hi,
in my Jenkins I am regularly facing master/slave connection drops with a message like: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on JNLP4-connect connection from IP/IP:58344 failed. The channel is closing down or has closed down. Usually these are caused by something external to the Remoting communication protocol. Most often by something in the system or networking environment. Sometimes by some bad interaction between plugins that ends up impacting the channel. Your best approach is to figure out where these disconnects originate and resolve the issue. You should be cautious about changing the ping settings or disabling it entirely. It can cause some weird and unexpected behaviors. If you do change the settings, I recommend you change one thing at a time and evaluate the results. If it doesn't make any difference, restore it to its default setting. It depends on how you launch the agent. Remoting system properties are described at https://github.com/jenkinsci/remoting/blob/master/docs/configuration.md
Unfortunately, that's the case. Jeff Thompson
-- https://groups.google.com/d/msgid/jenkinsci-users/ab43b555-176c-4834-e125-fb66ff612f4d%40cloudbees.comYou received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. To view this discussion on the web visit . You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/1263552947.287750.1586874699662%40mail.yahoo.com. |
Unfortunately, it's really hard to say. Possibilities include resource contention, such as CPU or networking, anything in the middle, such as load balancers, firewalls, etc., network or system configuration. I heard of one a while back that ended up being connected to IP table definition. Can't remember if that was related to docker containers or full VMs. I've heard that there have been some common problems in some VM environments, but I don't know what environments or issues specifically. Maybe VMotion. Maybe the network gets overloaded, especially between VMs. Or interactions between loads on different VMs. I'm not as familiar with the current state, but in the past in other environments I have seen more interference between VMs than expected. It comes down to standard troubleshooting sorts of behavior. Try to catch the problem. Gather information about different occurrences. Try to isolate any commonalities. Isolate a system for reproduction. You could try a different type of agent, such as an SSH Agent. The behavior might be different. I've heard recently that Microsoft's SSHD implementation works well. Good luck on troubleshooting Jeff On 4/14/20 8:31 AM, 'monger_39' via
Jenkins Users wrote:
-- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/98d316ef-0bd0-d706-16ff-b8f9d409d900%40cloudbees.com. |
Hey Jeff, looks indeed like the 'standard' type of problems. Unfortunately in our network, I do not have the privileges to do anything much. Not that that would help much, since I'm only a simple SW engineer, not a network specialist. The tip to try another agent connection is a good one though. Will try that. thx again, David
On Tuesday, April 14, 2020, 07:52:18 PM GMT+2, Jeff Thompson <[hidden email]> wrote:
Unfortunately, it's really hard to say. Possibilities include resource contention, such as CPU or networking, anything in the middle, such as load balancers, firewalls, etc., network or system configuration. I heard of one a while back that ended up being connected to IP table definition. Can't remember if that was related to docker containers or full VMs. I've heard that there have been some common problems in some VM environments, but I don't know what environments or issues specifically. Maybe VMotion. Maybe the network gets overloaded, especially between VMs. Or interactions between loads on different VMs. I'm not as familiar with the current state, but in the past in other environments I have seen more interference between VMs than expected. It comes down to standard troubleshooting sorts of behavior. Try to catch the problem. Gather information about different occurrences. Try to isolate any commonalities. Isolate a system for reproduction. You could try a different type of agent, such as an SSH Agent. The behavior might be different. I've heard recently that Microsoft's SSHD implementation works well. Good luck on troubleshooting Jeff On 4/14/20 8:31 AM, 'monger_39' via
Jenkins Users wrote:
Hi Jeff,
thx. Last week I disabled the
ping-thread on master and slaves by setting the interval to
'-1'.
Unfortunately, over the
weekend, again one of the slaves (even though the jobs kept on
running),
went into 'offline' mode. It
seems indeed that this does not solve the issue. Or, iow I
think it means
that the disconnect was not
caused by the ping-thread(s) timing out.
Which puts me to the
challenge to figure out what could be this 'external someting'
that you mention
that would break the
remoting. And I honestly have no idea how to tackle that yet.
The master, as well as the slave are Windows server VM's running 6 executor slots each. The tests we are running heavily use TCP communication. Any idea how to tackle this ?
thx, M.
On Thursday, April 9, 2020, 10:53:48 PM GMT+2, Jeff
Thompson [hidden email] wrote:
On 4/7/20
11:46 PM, 'monger_39' via Jenkins Users wrote:
Hi,
in my Jenkins I am regularly facing master/slave connection drops with a message like: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on JNLP4-connect connection from IP/IP:58344 failed. The channel is closing down or has closed down. Usually these are caused by something external to the Remoting communication protocol. Most often by something in the system or networking environment. Sometimes by some bad interaction between plugins that ends up impacting the channel. Your best approach is to figure out where these disconnects originate and resolve the issue. You should be cautious about changing the ping settings or disabling it entirely. It can cause some weird and unexpected behaviors. If you do change the settings, I recommend you change one thing at a time and evaluate the results. If it doesn't make any difference, restore it to its default setting. It depends on how you launch the agent. Remoting system properties are described at https://github.com/jenkinsci/remoting/blob/master/docs/configuration.md
Unfortunately, that's the case. Jeff Thompson
--
https://groups.google.com/d/msgid/jenkinsci-users/ab43b555-176c-4834-e125-fb66ff612f4d%40cloudbees.com
You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. To view this discussion on the web visit .
You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/1263552947.287750.1586874699662%40mail.yahoo.com.
-- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/98d316ef-0bd0-d706-16ff-b8f9d409d900%40cloudbees.com. You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/1020197314.795889.1586930758877%40mail.yahoo.com. |
On Wednesday, April 15, 2020 at 2:06:19 AM UTC-4 monger_39 wrote:
I have been running on JNLP for a while. Is it going to be deprecated? Should I prepare to move to SSH?
You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/c59fb874-6388-431c-9504-be0d5018f787n%40googlegroups.com. |
Free forum by Nabble | Edit this page |