Slave Agent Terminates with Stacktrace

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Slave Agent Terminates with Stacktrace

Keith Kowalczykowski-3
Hi Everyone,

    I'm running into a strange issue with slaves terminating with the
following stack trace immediately after launching.

channel stopped
[05/26/09 17:38:59] slave agent was terminated
hudson.remoting.Channel$OrderlyShutdown
 at hudson.remoting.Channel$CloseCommand.execute(Channel.java:623)
 at hudson.remoting.Channel$ReaderThread.run(Channel.java:737)
Caused by: Command close created at
 at hudson.remoting.Command.<init>(Command.java:47)
 at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
 at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
 at hudson.remoting.Channel.close(Channel.java:663)
 at hudson.remoting.Channel$CloseCommand.execute(Channel.java:622)
 ... 1 more

The node is setup with the following options:

* Launch via execution of command on Master (shell script)
* Take this slave online when in demand and off-line when idle

What happens is the slave starts up and soon there-after terminates with the
above stack trace. This seems to happen if started either due to demand or
manually myself. If I switch the node to "Keep this slave online as much as
possible", however, the problem does not occur.

Usually by looking at the code I can track down the source of an issue
fairly well myself. However, in this case it seems to be some odd
interaction between the server and the slave. If I understand correctly, the
ReaderThread.run is deserializing and executing the "close" command from the
slave. Beyond this, I am lost, as I don't understand the full interaction
between slave and server. Can anyone help shed some light on this?

    -Keith



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slave Agent Terminates with Stacktrace

Kohsuke Kawaguchi
Administrator
The master initiated a shut down of a channel, which gets acked by the
slave, and you are seeing that ACK as the OrderlyShutdown object.

Can you follow http://wiki.hudson-ci.org/display/HUDSON/Logging and
increase log level for hudson.slaves.RetentionStrategy ?

I'm pretty sure the on-demand retention strategy is shutting down the
node, but I'm curious why it's doing so right after a connection gets
established.


2009/5/26 Keith Kowalczykowski <[hidden email]>:

> Hi Everyone,
>
>    I'm running into a strange issue with slaves terminating with the
> following stack trace immediately after launching.
>
> channel stopped
> [05/26/09 17:38:59] slave agent was terminated
> hudson.remoting.Channel$OrderlyShutdown
>  at hudson.remoting.Channel$CloseCommand.execute(Channel.java:623)
>  at hudson.remoting.Channel$ReaderThread.run(Channel.java:737)
> Caused by: Command close created at
>  at hudson.remoting.Command.<init>(Command.java:47)
>  at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
>  at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
>  at hudson.remoting.Channel.close(Channel.java:663)
>  at hudson.remoting.Channel$CloseCommand.execute(Channel.java:622)
>  ... 1 more
>
> The node is setup with the following options:
>
> * Launch via execution of command on Master (shell script)
> * Take this slave online when in demand and off-line when idle
>
> What happens is the slave starts up and soon there-after terminates with the
> above stack trace. This seems to happen if started either due to demand or
> manually myself. If I switch the node to "Keep this slave online as much as
> possible", however, the problem does not occur.
>
> Usually by looking at the code I can track down the source of an issue
> fairly well myself. However, in this case it seems to be some odd
> interaction between the server and the slave. If I understand correctly, the
> ReaderThread.run is deserializing and executing the "close" command from the
> slave. Beyond this, I am lost, as I don't understand the full interaction
> between slave and server. Can anyone help shed some light on this?
>
>    -Keith
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>



--
Kohsuke Kawaguchi

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slave Agent Terminates with Stacktrace

Keith Kowalczykowski-3
Thanks for pointing me in the direction of the logging, Kohsuke. I have
found the problem. Here is the output from the logging:

May 27, 2009 4:17:21 PM hudson.slaves.RetentionStrategy$Demand check
INFO: Disconnecting computer IE 7 as it has been idle for 2 hr 17 min
May 27, 2009 4:17:18 PM hudson.slaves.NodeProvisioner
FINE: Excess workload 0.99999976 detected. (planned
capacity=0.0,Qlen=0.99999976,idle=1.7090673E-20&0,total=0)
May 27, 2009 4:17:15 PM hudson.slaves.CommandLauncher launch
INFO: slave agent launched for IE 7

As you can see, the issue is that the demand retention strategy seems to
think that the slave has been idle for 2hrs 17mins when it has only really
been started for 6 seconds. After looking through the code a little more,
here is what happens:

The demand retention strategy computes idle time by doing:

final long idleMilliseconds = System.currentTimeMillis() -
c.getIdleStartMilliseconds();

Looking at the implementation of Computer.getIdleStartMilliseconds(), it
computes the start of idle time by iterating over all of its Executors and
calling their getIdleStartMilliseconds() method. Which is implemented as
follows:


    public long getIdleStartMilliseconds() {
        if (isIdle())
            return finishTime;
        else {
            return Math.max(startTime + Math.max(0,
executable.getParent().getEstimatedDuration()),
                    System.currentTimeMillis() + 15000);
        }
    }

As you can see, if the executor is idle, its idle time is computed by the
last finish time of an execution. Now this is obviously wrong for the demand
retention strategy (and probably in general), as idle time should not be
computed across stop/start boundaries. Therefore I propose the following
changes:

1. Computer.java should have a "connect time" property that is set when the
Slave connects.

2. Computer.getIdleStartMilliseconds() should return max of "connect time"
and the existing idle executor computation in order to ignore the start/stop
boundries.


    -Keith


On 5/26/09 10:18 PM, "Kohsuke Kawaguchi" <[hidden email]> wrote:

> The master initiated a shut down of a channel, which gets acked by the
> slave, and you are seeing that ACK as the OrderlyShutdown object.
>
> Can you follow http://wiki.hudson-ci.org/display/HUDSON/Logging and
> increase log level for hudson.slaves.RetentionStrategy ?
>
> I'm pretty sure the on-demand retention strategy is shutting down the
> node, but I'm curious why it's doing so right after a connection gets
> established.
>
>
> 2009/5/26 Keith Kowalczykowski <[hidden email]>:
>> Hi Everyone,
>>
>>    I'm running into a strange issue with slaves terminating with the
>> following stack trace immediately after launching.
>>
>> channel stopped
>> [05/26/09 17:38:59] slave agent was terminated
>> hudson.remoting.Channel$OrderlyShutdown
>>  at hudson.remoting.Channel$CloseCommand.execute(Channel.java:623)
>>  at hudson.remoting.Channel$ReaderThread.run(Channel.java:737)
>> Caused by: Command close created at
>>  at hudson.remoting.Command.<init>(Command.java:47)
>>  at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
>>  at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
>>  at hudson.remoting.Channel.close(Channel.java:663)
>>  at hudson.remoting.Channel$CloseCommand.execute(Channel.java:622)
>>  ... 1 more
>>
>> The node is setup with the following options:
>>
>> * Launch via execution of command on Master (shell script)
>> * Take this slave online when in demand and off-line when idle
>>
>> What happens is the slave starts up and soon there-after terminates with the
>> above stack trace. This seems to happen if started either due to demand or
>> manually myself. If I switch the node to "Keep this slave online as much as
>> possible", however, the problem does not occur.
>>
>> Usually by looking at the code I can track down the source of an issue
>> fairly well myself. However, in this case it seems to be some odd
>> interaction between the server and the slave. If I understand correctly, the
>> ReaderThread.run is deserializing and executing the "close" command from the
>> slave. Beyond this, I am lost, as I don't understand the full interaction
>> between slave and server. Can anyone help shed some light on this?
>>
>>    -Keith
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slave Agent Terminates with Stacktrace

Kohsuke Kawaguchi
Administrator
Thanks for the detective work.

Just one minor nit-picking though. Shouldn't it be
Executor.getIdleStartMilliseconds() that should return max of "connect
time" and the finishTime?

2009/5/27 Keith Kowalczykowski <[hidden email]>:

> Thanks for pointing me in the direction of the logging, Kohsuke. I have
> found the problem. Here is the output from the logging:
>
> May 27, 2009 4:17:21 PM hudson.slaves.RetentionStrategy$Demand check
> INFO: Disconnecting computer IE 7 as it has been idle for 2 hr 17 min
> May 27, 2009 4:17:18 PM hudson.slaves.NodeProvisioner
> FINE: Excess workload 0.99999976 detected. (planned
> capacity=0.0,Qlen=0.99999976,idle=1.7090673E-20&0,total=0)
> May 27, 2009 4:17:15 PM hudson.slaves.CommandLauncher launch
> INFO: slave agent launched for IE 7
>
> As you can see, the issue is that the demand retention strategy seems to
> think that the slave has been idle for 2hrs 17mins when it has only really
> been started for 6 seconds. After looking through the code a little more,
> here is what happens:
>
> The demand retention strategy computes idle time by doing:
>
> final long idleMilliseconds = System.currentTimeMillis() -
> c.getIdleStartMilliseconds();
>
> Looking at the implementation of Computer.getIdleStartMilliseconds(), it
> computes the start of idle time by iterating over all of its Executors and
> calling their getIdleStartMilliseconds() method. Which is implemented as
> follows:
>
>
>    public long getIdleStartMilliseconds() {
>        if (isIdle())
>            return finishTime;
>        else {
>            return Math.max(startTime + Math.max(0,
> executable.getParent().getEstimatedDuration()),
>                    System.currentTimeMillis() + 15000);
>        }
>    }
>
> As you can see, if the executor is idle, its idle time is computed by the
> last finish time of an execution. Now this is obviously wrong for the demand
> retention strategy (and probably in general), as idle time should not be
> computed across stop/start boundaries. Therefore I propose the following
> changes:
>
> 1. Computer.java should have a "connect time" property that is set when the
> Slave connects.
>
> 2. Computer.getIdleStartMilliseconds() should return max of "connect time"
> and the existing idle executor computation in order to ignore the start/stop
> boundries.
>
>
>    -Keith
>
>
> On 5/26/09 10:18 PM, "Kohsuke Kawaguchi" <[hidden email]> wrote:
>
>> The master initiated a shut down of a channel, which gets acked by the
>> slave, and you are seeing that ACK as the OrderlyShutdown object.
>>
>> Can you follow http://wiki.hudson-ci.org/display/HUDSON/Logging and
>> increase log level for hudson.slaves.RetentionStrategy ?
>>
>> I'm pretty sure the on-demand retention strategy is shutting down the
>> node, but I'm curious why it's doing so right after a connection gets
>> established.
>>
>>
>> 2009/5/26 Keith Kowalczykowski <[hidden email]>:
>>> Hi Everyone,
>>>
>>>    I'm running into a strange issue with slaves terminating with the
>>> following stack trace immediately after launching.
>>>
>>> channel stopped
>>> [05/26/09 17:38:59] slave agent was terminated
>>> hudson.remoting.Channel$OrderlyShutdown
>>>  at hudson.remoting.Channel$CloseCommand.execute(Channel.java:623)
>>>  at hudson.remoting.Channel$ReaderThread.run(Channel.java:737)
>>> Caused by: Command close created at
>>>  at hudson.remoting.Command.<init>(Command.java:47)
>>>  at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
>>>  at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
>>>  at hudson.remoting.Channel.close(Channel.java:663)
>>>  at hudson.remoting.Channel$CloseCommand.execute(Channel.java:622)
>>>  ... 1 more
>>>
>>> The node is setup with the following options:
>>>
>>> * Launch via execution of command on Master (shell script)
>>> * Take this slave online when in demand and off-line when idle
>>>
>>> What happens is the slave starts up and soon there-after terminates with the
>>> above stack trace. This seems to happen if started either due to demand or
>>> manually myself. If I switch the node to "Keep this slave online as much as
>>> possible", however, the problem does not occur.
>>>
>>> Usually by looking at the code I can track down the source of an issue
>>> fairly well myself. However, in this case it seems to be some odd
>>> interaction between the server and the slave. If I understand correctly, the
>>> ReaderThread.run is deserializing and executing the "close" command from the
>>> slave. Beyond this, I am lost, as I don't understand the full interaction
>>> between slave and server. Can anyone help shed some light on this?
>>>
>>>    -Keith
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>



--
Kohsuke Kawaguchi

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slave Agent Terminates with Stacktrace

Keith Kowalczykowski-3



On 5/27/09 11:27 PM, "Kohsuke Kawaguchi" <[hidden email]> wrote:

> Thanks for the detective work.

No problem. Would you like me to submit a patch, or will you just add it
yourself directly?

> Just one minor nit-picking though. Shouldn't it be
> Executor.getIdleStartMilliseconds() that should return max of "connect
> time" and the finishTime?

Yes, sorry, you're right. The Executor should return a correct notion of the
idle time to begin with, rather than doing the check in Computer.

Also, another issue to be addressed is what idle time should be returned
when a Computer is not connected? Maybe zero? Or maybe -1? Its not really
clear to me what uses the idle time, so I'm not sure what the call-sites
expect? Do you have further insight?

As a side note, this also fixes a more subtle bug that the
Executor.getIdleStartMilliseconds is not correct on server startup, since
Executor.finishTime is un-initialized. It is not until a build runs that
finishTime is properly set.

> 2009/5/27 Keith Kowalczykowski <[hidden email]>:
>> Thanks for pointing me in the direction of the logging, Kohsuke. I have
>> found the problem. Here is the output from the logging:
>>
>> May 27, 2009 4:17:21 PM hudson.slaves.RetentionStrategy$Demand check
>> INFO: Disconnecting computer IE 7 as it has been idle for 2 hr 17 min
>> May 27, 2009 4:17:18 PM hudson.slaves.NodeProvisioner
>> FINE: Excess workload 0.99999976 detected. (planned
>> capacity=0.0,Qlen=0.99999976,idle=1.7090673E-20&0,total=0)
>> May 27, 2009 4:17:15 PM hudson.slaves.CommandLauncher launch
>> INFO: slave agent launched for IE 7
>>
>> As you can see, the issue is that the demand retention strategy seems to
>> think that the slave has been idle for 2hrs 17mins when it has only really
>> been started for 6 seconds. After looking through the code a little more,
>> here is what happens:
>>
>> The demand retention strategy computes idle time by doing:
>>
>> final long idleMilliseconds = System.currentTimeMillis() -
>> c.getIdleStartMilliseconds();
>>
>> Looking at the implementation of Computer.getIdleStartMilliseconds(), it
>> computes the start of idle time by iterating over all of its Executors and
>> calling their getIdleStartMilliseconds() method. Which is implemented as
>> follows:
>>
>>
>>    public long getIdleStartMilliseconds() {
>>        if (isIdle())
>>            return finishTime;
>>        else {
>>            return Math.max(startTime + Math.max(0,
>> executable.getParent().getEstimatedDuration()),
>>                    System.currentTimeMillis() + 15000);
>>        }
>>    }
>>
>> As you can see, if the executor is idle, its idle time is computed by the
>> last finish time of an execution. Now this is obviously wrong for the demand
>> retention strategy (and probably in general), as idle time should not be
>> computed across stop/start boundaries. Therefore I propose the following
>> changes:
>>
>> 1. Computer.java should have a "connect time" property that is set when the
>> Slave connects.
>>
>> 2. Computer.getIdleStartMilliseconds() should return max of "connect time"
>> and the existing idle executor computation in order to ignore the start/stop
>> boundries.
>>
>>
>>    -Keith
>>
>>
>> On 5/26/09 10:18 PM, "Kohsuke Kawaguchi" <[hidden email]> wrote:
>>
>>> The master initiated a shut down of a channel, which gets acked by the
>>> slave, and you are seeing that ACK as the OrderlyShutdown object.
>>>
>>> Can you follow http://wiki.hudson-ci.org/display/HUDSON/Logging and
>>> increase log level for hudson.slaves.RetentionStrategy ?
>>>
>>> I'm pretty sure the on-demand retention strategy is shutting down the
>>> node, but I'm curious why it's doing so right after a connection gets
>>> established.
>>>
>>>
>>> 2009/5/26 Keith Kowalczykowski <[hidden email]>:
>>>> Hi Everyone,
>>>>
>>>>    I'm running into a strange issue with slaves terminating with the
>>>> following stack trace immediately after launching.
>>>>
>>>> channel stopped
>>>> [05/26/09 17:38:59] slave agent was terminated
>>>> hudson.remoting.Channel$OrderlyShutdown
>>>>  at hudson.remoting.Channel$CloseCommand.execute(Channel.java:623)
>>>>  at hudson.remoting.Channel$ReaderThread.run(Channel.java:737)
>>>> Caused by: Command close created at
>>>>  at hudson.remoting.Command.<init>(Command.java:47)
>>>>  at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
>>>>  at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
>>>>  at hudson.remoting.Channel.close(Channel.java:663)
>>>>  at hudson.remoting.Channel$CloseCommand.execute(Channel.java:622)
>>>>  ... 1 more
>>>>
>>>> The node is setup with the following options:
>>>>
>>>> * Launch via execution of command on Master (shell script)
>>>> * Take this slave online when in demand and off-line when idle
>>>>
>>>> What happens is the slave starts up and soon there-after terminates with
>>>> the
>>>> above stack trace. This seems to happen if started either due to demand or
>>>> manually myself. If I switch the node to "Keep this slave online as much as
>>>> possible", however, the problem does not occur.
>>>>
>>>> Usually by looking at the code I can track down the source of an issue
>>>> fairly well myself. However, in this case it seems to be some odd
>>>> interaction between the server and the slave. If I understand correctly,
>>>> the
>>>> ReaderThread.run is deserializing and executing the "close" command from
>>>> the
>>>> slave. Beyond this, I am lost, as I don't understand the full interaction
>>>> between slave and server. Can anyone help shed some light on this?
>>>>
>>>>    -Keith
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slave Agent Terminates with Stacktrace

Keith Kowalczykowski-3
Hi Kohsuke,

    Did you have a chance to look at this yet? Please let me know if you
would like me to proceed with creating a patch, or not.

Thanks,
   Keith


On 5/28/09 12:03 AM, "Keith Kowalczykowski" <[hidden email]> wrote:

>
>
>
> On 5/27/09 11:27 PM, "Kohsuke Kawaguchi" <[hidden email]> wrote:
>
>> Thanks for the detective work.
>
> No problem. Would you like me to submit a patch, or will you just add it
> yourself directly?
>
>> Just one minor nit-picking though. Shouldn't it be
>> Executor.getIdleStartMilliseconds() that should return max of "connect
>> time" and the finishTime?
>
> Yes, sorry, you're right. The Executor should return a correct notion of the
> idle time to begin with, rather than doing the check in Computer.
>
> Also, another issue to be addressed is what idle time should be returned
> when a Computer is not connected? Maybe zero? Or maybe -1? Its not really
> clear to me what uses the idle time, so I'm not sure what the call-sites
> expect? Do you have further insight?
>
> As a side note, this also fixes a more subtle bug that the
> Executor.getIdleStartMilliseconds is not correct on server startup, since
> Executor.finishTime is un-initialized. It is not until a build runs that
> finishTime is properly set.
>
>> 2009/5/27 Keith Kowalczykowski <[hidden email]>:
>>> Thanks for pointing me in the direction of the logging, Kohsuke. I have
>>> found the problem. Here is the output from the logging:
>>>
>>> May 27, 2009 4:17:21 PM hudson.slaves.RetentionStrategy$Demand check
>>> INFO: Disconnecting computer IE 7 as it has been idle for 2 hr 17 min
>>> May 27, 2009 4:17:18 PM hudson.slaves.NodeProvisioner
>>> FINE: Excess workload 0.99999976 detected. (planned
>>> capacity=0.0,Qlen=0.99999976,idle=1.7090673E-20&0,total=0)
>>> May 27, 2009 4:17:15 PM hudson.slaves.CommandLauncher launch
>>> INFO: slave agent launched for IE 7
>>>
>>> As you can see, the issue is that the demand retention strategy seems to
>>> think that the slave has been idle for 2hrs 17mins when it has only really
>>> been started for 6 seconds. After looking through the code a little more,
>>> here is what happens:
>>>
>>> The demand retention strategy computes idle time by doing:
>>>
>>> final long idleMilliseconds = System.currentTimeMillis() -
>>> c.getIdleStartMilliseconds();
>>>
>>> Looking at the implementation of Computer.getIdleStartMilliseconds(), it
>>> computes the start of idle time by iterating over all of its Executors and
>>> calling their getIdleStartMilliseconds() method. Which is implemented as
>>> follows:
>>>
>>>
>>>    public long getIdleStartMilliseconds() {
>>>        if (isIdle())
>>>            return finishTime;
>>>        else {
>>>            return Math.max(startTime + Math.max(0,
>>> executable.getParent().getEstimatedDuration()),
>>>                    System.currentTimeMillis() + 15000);
>>>        }
>>>    }
>>>
>>> As you can see, if the executor is idle, its idle time is computed by the
>>> last finish time of an execution. Now this is obviously wrong for the demand
>>> retention strategy (and probably in general), as idle time should not be
>>> computed across stop/start boundaries. Therefore I propose the following
>>> changes:
>>>
>>> 1. Computer.java should have a "connect time" property that is set when the
>>> Slave connects.
>>>
>>> 2. Computer.getIdleStartMilliseconds() should return max of "connect time"
>>> and the existing idle executor computation in order to ignore the start/stop
>>> boundries.
>>>
>>>
>>>    -Keith
>>>
>>>
>>> On 5/26/09 10:18 PM, "Kohsuke Kawaguchi" <[hidden email]> wrote:
>>>
>>>> The master initiated a shut down of a channel, which gets acked by the
>>>> slave, and you are seeing that ACK as the OrderlyShutdown object.
>>>>
>>>> Can you follow http://wiki.hudson-ci.org/display/HUDSON/Logging and
>>>> increase log level for hudson.slaves.RetentionStrategy ?
>>>>
>>>> I'm pretty sure the on-demand retention strategy is shutting down the
>>>> node, but I'm curious why it's doing so right after a connection gets
>>>> established.
>>>>
>>>>
>>>> 2009/5/26 Keith Kowalczykowski <[hidden email]>:
>>>>> Hi Everyone,
>>>>>
>>>>>    I'm running into a strange issue with slaves terminating with the
>>>>> following stack trace immediately after launching.
>>>>>
>>>>> channel stopped
>>>>> [05/26/09 17:38:59] slave agent was terminated
>>>>> hudson.remoting.Channel$OrderlyShutdown
>>>>>  at hudson.remoting.Channel$CloseCommand.execute(Channel.java:623)
>>>>>  at hudson.remoting.Channel$ReaderThread.run(Channel.java:737)
>>>>> Caused by: Command close created at
>>>>>  at hudson.remoting.Command.<init>(Command.java:47)
>>>>>  at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
>>>>>  at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
>>>>>  at hudson.remoting.Channel.close(Channel.java:663)
>>>>>  at hudson.remoting.Channel$CloseCommand.execute(Channel.java:622)
>>>>>  ... 1 more
>>>>>
>>>>> The node is setup with the following options:
>>>>>
>>>>> * Launch via execution of command on Master (shell script)
>>>>> * Take this slave online when in demand and off-line when idle
>>>>>
>>>>> What happens is the slave starts up and soon there-after terminates with
>>>>> the
>>>>> above stack trace. This seems to happen if started either due to demand or
>>>>> manually myself. If I switch the node to "Keep this slave online as much
>>>>> as
>>>>> possible", however, the problem does not occur.
>>>>>
>>>>> Usually by looking at the code I can track down the source of an issue
>>>>> fairly well myself. However, in this case it seems to be some odd
>>>>> interaction between the server and the slave. If I understand correctly,
>>>>> the
>>>>> ReaderThread.run is deserializing and executing the "close" command from
>>>>> the
>>>>> slave. Beyond this, I am lost, as I don't understand the full interaction
>>>>> between slave and server. Can anyone help shed some light on this?
>>>>>
>>>>>    -Keith
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slave Agent Terminates with Stacktrace

Keith Kowalczykowski-3
In reply to this post by Keith Kowalczykowski-3
Kohsuke,

    Attached is a basic patch implementing the fix that we talked about.
Your review and feedback would be much appreciated.

    The one things I don't know how to handle is API compatitblity. In order
to make the fix, I needed to have a concrete implementation of connect() in
Computer.java. Therefore, I created doConnect() so implementing classes
could still provide their own implementation, and have connect() log the
connection time and proxy to doConnect(). Now, if anyone else has created an
implementation of Computer, this will obviously break them. How does Hudson
handle API changes?


    -Keith


On 5/28/09 12:03 AM, "Keith Kowalczykowski" <[hidden email]> wrote:

>
>
>
> On 5/27/09 11:27 PM, "Kohsuke Kawaguchi" <[hidden email]> wrote:
>
>> Thanks for the detective work.
>
> No problem. Would you like me to submit a patch, or will you just add it
> yourself directly?
>
>> Just one minor nit-picking though. Shouldn't it be
>> Executor.getIdleStartMilliseconds() that should return max of "connect
>> time" and the finishTime?
>
> Yes, sorry, you're right. The Executor should return a correct notion of the
> idle time to begin with, rather than doing the check in Computer.
>
> Also, another issue to be addressed is what idle time should be returned
> when a Computer is not connected? Maybe zero? Or maybe -1? Its not really
> clear to me what uses the idle time, so I'm not sure what the call-sites
> expect? Do you have further insight?
>
> As a side note, this also fixes a more subtle bug that the
> Executor.getIdleStartMilliseconds is not correct on server startup, since
> Executor.finishTime is un-initialized. It is not until a build runs that
> finishTime is properly set.
>
>> 2009/5/27 Keith Kowalczykowski <[hidden email]>:
>>> Thanks for pointing me in the direction of the logging, Kohsuke. I have
>>> found the problem. Here is the output from the logging:
>>>
>>> May 27, 2009 4:17:21 PM hudson.slaves.RetentionStrategy$Demand check
>>> INFO: Disconnecting computer IE 7 as it has been idle for 2 hr 17 min
>>> May 27, 2009 4:17:18 PM hudson.slaves.NodeProvisioner
>>> FINE: Excess workload 0.99999976 detected. (planned
>>> capacity=0.0,Qlen=0.99999976,idle=1.7090673E-20&0,total=0)
>>> May 27, 2009 4:17:15 PM hudson.slaves.CommandLauncher launch
>>> INFO: slave agent launched for IE 7
>>>
>>> As you can see, the issue is that the demand retention strategy seems to
>>> think that the slave has been idle for 2hrs 17mins when it has only really
>>> been started for 6 seconds. After looking through the code a little more,
>>> here is what happens:
>>>
>>> The demand retention strategy computes idle time by doing:
>>>
>>> final long idleMilliseconds = System.currentTimeMillis() -
>>> c.getIdleStartMilliseconds();
>>>
>>> Looking at the implementation of Computer.getIdleStartMilliseconds(), it
>>> computes the start of idle time by iterating over all of its Executors and
>>> calling their getIdleStartMilliseconds() method. Which is implemented as
>>> follows:
>>>
>>>
>>>    public long getIdleStartMilliseconds() {
>>>        if (isIdle())
>>>            return finishTime;
>>>        else {
>>>            return Math.max(startTime + Math.max(0,
>>> executable.getParent().getEstimatedDuration()),
>>>                    System.currentTimeMillis() + 15000);
>>>        }
>>>    }
>>>
>>> As you can see, if the executor is idle, its idle time is computed by the
>>> last finish time of an execution. Now this is obviously wrong for the demand
>>> retention strategy (and probably in general), as idle time should not be
>>> computed across stop/start boundaries. Therefore I propose the following
>>> changes:
>>>
>>> 1. Computer.java should have a "connect time" property that is set when the
>>> Slave connects.
>>>
>>> 2. Computer.getIdleStartMilliseconds() should return max of "connect time"
>>> and the existing idle executor computation in order to ignore the start/stop
>>> boundries.
>>>
>>>
>>>    -Keith
>>>
>>>
>>> On 5/26/09 10:18 PM, "Kohsuke Kawaguchi" <[hidden email]> wrote:
>>>
>>>> The master initiated a shut down of a channel, which gets acked by the
>>>> slave, and you are seeing that ACK as the OrderlyShutdown object.
>>>>
>>>> Can you follow http://wiki.hudson-ci.org/display/HUDSON/Logging and
>>>> increase log level for hudson.slaves.RetentionStrategy ?
>>>>
>>>> I'm pretty sure the on-demand retention strategy is shutting down the
>>>> node, but I'm curious why it's doing so right after a connection gets
>>>> established.
>>>>
>>>>
>>>> 2009/5/26 Keith Kowalczykowski <[hidden email]>:
>>>>> Hi Everyone,
>>>>>
>>>>>    I'm running into a strange issue with slaves terminating with the
>>>>> following stack trace immediately after launching.
>>>>>
>>>>> channel stopped
>>>>> [05/26/09 17:38:59] slave agent was terminated
>>>>> hudson.remoting.Channel$OrderlyShutdown
>>>>>  at hudson.remoting.Channel$CloseCommand.execute(Channel.java:623)
>>>>>  at hudson.remoting.Channel$ReaderThread.run(Channel.java:737)
>>>>> Caused by: Command close created at
>>>>>  at hudson.remoting.Command.<init>(Command.java:47)
>>>>>  at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
>>>>>  at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
>>>>>  at hudson.remoting.Channel.close(Channel.java:663)
>>>>>  at hudson.remoting.Channel$CloseCommand.execute(Channel.java:622)
>>>>>  ... 1 more
>>>>>
>>>>> The node is setup with the following options:
>>>>>
>>>>> * Launch via execution of command on Master (shell script)
>>>>> * Take this slave online when in demand and off-line when idle
>>>>>
>>>>> What happens is the slave starts up and soon there-after terminates with
>>>>> the
>>>>> above stack trace. This seems to happen if started either due to demand or
>>>>> manually myself. If I switch the node to "Keep this slave online as much
>>>>> as
>>>>> possible", however, the problem does not occur.
>>>>>
>>>>> Usually by looking at the code I can track down the source of an issue
>>>>> fairly well myself. However, in this case it seems to be some odd
>>>>> interaction between the server and the slave. If I understand correctly,
>>>>> the
>>>>> ReaderThread.run is deserializing and executing the "close" command from
>>>>> the
>>>>> slave. Beyond this, I am lost, as I don't understand the full interaction
>>>>> between slave and server. Can anyone help shed some light on this?
>>>>>
>>>>>    -Keith
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

patch.txt (7K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Slave Agent Terminates with Stacktrace

Keith Kowalczykowski-3
Bump...


On 6/21/09 9:04 PM, "Keith Kowalczykowski" <[hidden email]> wrote:

> Kohsuke,
>
>     Attached is a basic patch implementing the fix that we talked about.
> Your review and feedback would be much appreciated.
>
>     The one things I don't know how to handle is API compatitblity. In order
> to make the fix, I needed to have a concrete implementation of connect() in
> Computer.java. Therefore, I created doConnect() so implementing classes
> could still provide their own implementation, and have connect() log the
> connection time and proxy to doConnect(). Now, if anyone else has created an
> implementation of Computer, this will obviously break them. How does Hudson
> handle API changes?
>
>
>     -Keith
>
>
> On 5/28/09 12:03 AM, "Keith Kowalczykowski" <[hidden email]> wrote:
>
>>
>>
>>
>> On 5/27/09 11:27 PM, "Kohsuke Kawaguchi" <[hidden email]> wrote:
>>
>>> Thanks for the detective work.
>>
>> No problem. Would you like me to submit a patch, or will you just add it
>> yourself directly?
>>
>>> Just one minor nit-picking though. Shouldn't it be
>>> Executor.getIdleStartMilliseconds() that should return max of "connect
>>> time" and the finishTime?
>>
>> Yes, sorry, you're right. The Executor should return a correct notion of the
>> idle time to begin with, rather than doing the check in Computer.
>>
>> Also, another issue to be addressed is what idle time should be returned
>> when a Computer is not connected? Maybe zero? Or maybe -1? Its not really
>> clear to me what uses the idle time, so I'm not sure what the call-sites
>> expect? Do you have further insight?
>>
>> As a side note, this also fixes a more subtle bug that the
>> Executor.getIdleStartMilliseconds is not correct on server startup, since
>> Executor.finishTime is un-initialized. It is not until a build runs that
>> finishTime is properly set.
>>
>>> 2009/5/27 Keith Kowalczykowski <[hidden email]>:
>>>> Thanks for pointing me in the direction of the logging, Kohsuke. I have
>>>> found the problem. Here is the output from the logging:
>>>>
>>>> May 27, 2009 4:17:21 PM hudson.slaves.RetentionStrategy$Demand check
>>>> INFO: Disconnecting computer IE 7 as it has been idle for 2 hr 17 min
>>>> May 27, 2009 4:17:18 PM hudson.slaves.NodeProvisioner
>>>> FINE: Excess workload 0.99999976 detected. (planned
>>>> capacity=0.0,Qlen=0.99999976,idle=1.7090673E-20&0,total=0)
>>>> May 27, 2009 4:17:15 PM hudson.slaves.CommandLauncher launch
>>>> INFO: slave agent launched for IE 7
>>>>
>>>> As you can see, the issue is that the demand retention strategy seems to
>>>> think that the slave has been idle for 2hrs 17mins when it has only really
>>>> been started for 6 seconds. After looking through the code a little more,
>>>> here is what happens:
>>>>
>>>> The demand retention strategy computes idle time by doing:
>>>>
>>>> final long idleMilliseconds = System.currentTimeMillis() -
>>>> c.getIdleStartMilliseconds();
>>>>
>>>> Looking at the implementation of Computer.getIdleStartMilliseconds(), it
>>>> computes the start of idle time by iterating over all of its Executors and
>>>> calling their getIdleStartMilliseconds() method. Which is implemented as
>>>> follows:
>>>>
>>>>
>>>>    public long getIdleStartMilliseconds() {
>>>>        if (isIdle())
>>>>            return finishTime;
>>>>        else {
>>>>            return Math.max(startTime + Math.max(0,
>>>> executable.getParent().getEstimatedDuration()),
>>>>                    System.currentTimeMillis() + 15000);
>>>>        }
>>>>    }
>>>>
>>>> As you can see, if the executor is idle, its idle time is computed by the
>>>> last finish time of an execution. Now this is obviously wrong for the
>>>> demand
>>>> retention strategy (and probably in general), as idle time should not be
>>>> computed across stop/start boundaries. Therefore I propose the following
>>>> changes:
>>>>
>>>> 1. Computer.java should have a "connect time" property that is set when the
>>>> Slave connects.
>>>>
>>>> 2. Computer.getIdleStartMilliseconds() should return max of "connect time"
>>>> and the existing idle executor computation in order to ignore the
>>>> start/stop
>>>> boundries.
>>>>
>>>>
>>>>    -Keith
>>>>
>>>>
>>>> On 5/26/09 10:18 PM, "Kohsuke Kawaguchi" <[hidden email]> wrote:
>>>>
>>>>> The master initiated a shut down of a channel, which gets acked by the
>>>>> slave, and you are seeing that ACK as the OrderlyShutdown object.
>>>>>
>>>>> Can you follow http://wiki.hudson-ci.org/display/HUDSON/Logging and
>>>>> increase log level for hudson.slaves.RetentionStrategy ?
>>>>>
>>>>> I'm pretty sure the on-demand retention strategy is shutting down the
>>>>> node, but I'm curious why it's doing so right after a connection gets
>>>>> established.
>>>>>
>>>>>
>>>>> 2009/5/26 Keith Kowalczykowski <[hidden email]>:
>>>>>> Hi Everyone,
>>>>>>
>>>>>>    I'm running into a strange issue with slaves terminating with the
>>>>>> following stack trace immediately after launching.
>>>>>>
>>>>>> channel stopped
>>>>>> [05/26/09 17:38:59] slave agent was terminated
>>>>>> hudson.remoting.Channel$OrderlyShutdown
>>>>>>  at hudson.remoting.Channel$CloseCommand.execute(Channel.java:623)
>>>>>>  at hudson.remoting.Channel$ReaderThread.run(Channel.java:737)
>>>>>> Caused by: Command close created at
>>>>>>  at hudson.remoting.Command.<init>(Command.java:47)
>>>>>>  at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
>>>>>>  at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
>>>>>>  at hudson.remoting.Channel.close(Channel.java:663)
>>>>>>  at hudson.remoting.Channel$CloseCommand.execute(Channel.java:622)
>>>>>>  ... 1 more
>>>>>>
>>>>>> The node is setup with the following options:
>>>>>>
>>>>>> * Launch via execution of command on Master (shell script)
>>>>>> * Take this slave online when in demand and off-line when idle
>>>>>>
>>>>>> What happens is the slave starts up and soon there-after terminates with
>>>>>> the
>>>>>> above stack trace. This seems to happen if started either due to demand
>>>>>> or
>>>>>> manually myself. If I switch the node to "Keep this slave online as much
>>>>>> as
>>>>>> possible", however, the problem does not occur.
>>>>>>
>>>>>> Usually by looking at the code I can track down the source of an issue
>>>>>> fairly well myself. However, in this case it seems to be some odd
>>>>>> interaction between the server and the slave. If I understand correctly,
>>>>>> the
>>>>>> ReaderThread.run is deserializing and executing the "close" command from
>>>>>> the
>>>>>> slave. Beyond this, I am lost, as I don't understand the full interaction
>>>>>> between slave and server. Can anyone help shed some light on this?
>>>>>>
>>>>>>    -Keith
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slave Agent Terminates with Stacktrace

Keith Kowalczykowski-3
Kohsuke,

    Sorry to be so persistent about this bug, but it makes the on demand
retention strategy completely unusable right now. Can you please review the
patch in my previous email and/or provide some feedback about how best to
proceed with this. If there are other things I should be doing (besides
posting to the mailing list) such as filing a ticket, etc, please let me
know.

    -Keith


On 6/24/09 10:17 AM, "Keith Kowalczykowski" <[hidden email]> wrote:

> Bump...
>
>
> On 6/21/09 9:04 PM, "Keith Kowalczykowski" <[hidden email]> wrote:
>
>> Kohsuke,
>>
>>     Attached is a basic patch implementing the fix that we talked about.
>> Your review and feedback would be much appreciated.
>>
>>     The one things I don't know how to handle is API compatitblity. In order
>> to make the fix, I needed to have a concrete implementation of connect() in
>> Computer.java. Therefore, I created doConnect() so implementing classes
>> could still provide their own implementation, and have connect() log the
>> connection time and proxy to doConnect(). Now, if anyone else has created an
>> implementation of Computer, this will obviously break them. How does Hudson
>> handle API changes?
>>
>>
>>     -Keith
>>
>>
>> On 5/28/09 12:03 AM, "Keith Kowalczykowski" <[hidden email]> wrote:
>>
>>>
>>>
>>>
>>> On 5/27/09 11:27 PM, "Kohsuke Kawaguchi" <[hidden email]> wrote:
>>>
>>>> Thanks for the detective work.
>>>
>>> No problem. Would you like me to submit a patch, or will you just add it
>>> yourself directly?
>>>
>>>> Just one minor nit-picking though. Shouldn't it be
>>>> Executor.getIdleStartMilliseconds() that should return max of "connect
>>>> time" and the finishTime?
>>>
>>> Yes, sorry, you're right. The Executor should return a correct notion of the
>>> idle time to begin with, rather than doing the check in Computer.
>>>
>>> Also, another issue to be addressed is what idle time should be returned
>>> when a Computer is not connected? Maybe zero? Or maybe -1? Its not really
>>> clear to me what uses the idle time, so I'm not sure what the call-sites
>>> expect? Do you have further insight?
>>>
>>> As a side note, this also fixes a more subtle bug that the
>>> Executor.getIdleStartMilliseconds is not correct on server startup, since
>>> Executor.finishTime is un-initialized. It is not until a build runs that
>>> finishTime is properly set.
>>>
>>>> 2009/5/27 Keith Kowalczykowski <[hidden email]>:
>>>>> Thanks for pointing me in the direction of the logging, Kohsuke. I have
>>>>> found the problem. Here is the output from the logging:
>>>>>
>>>>> May 27, 2009 4:17:21 PM hudson.slaves.RetentionStrategy$Demand check
>>>>> INFO: Disconnecting computer IE 7 as it has been idle for 2 hr 17 min
>>>>> May 27, 2009 4:17:18 PM hudson.slaves.NodeProvisioner
>>>>> FINE: Excess workload 0.99999976 detected. (planned
>>>>> capacity=0.0,Qlen=0.99999976,idle=1.7090673E-20&0,total=0)
>>>>> May 27, 2009 4:17:15 PM hudson.slaves.CommandLauncher launch
>>>>> INFO: slave agent launched for IE 7
>>>>>
>>>>> As you can see, the issue is that the demand retention strategy seems to
>>>>> think that the slave has been idle for 2hrs 17mins when it has only really
>>>>> been started for 6 seconds. After looking through the code a little more,
>>>>> here is what happens:
>>>>>
>>>>> The demand retention strategy computes idle time by doing:
>>>>>
>>>>> final long idleMilliseconds = System.currentTimeMillis() -
>>>>> c.getIdleStartMilliseconds();
>>>>>
>>>>> Looking at the implementation of Computer.getIdleStartMilliseconds(), it
>>>>> computes the start of idle time by iterating over all of its Executors and
>>>>> calling their getIdleStartMilliseconds() method. Which is implemented as
>>>>> follows:
>>>>>
>>>>>
>>>>>    public long getIdleStartMilliseconds() {
>>>>>        if (isIdle())
>>>>>            return finishTime;
>>>>>        else {
>>>>>            return Math.max(startTime + Math.max(0,
>>>>> executable.getParent().getEstimatedDuration()),
>>>>>                    System.currentTimeMillis() + 15000);
>>>>>        }
>>>>>    }
>>>>>
>>>>> As you can see, if the executor is idle, its idle time is computed by the
>>>>> last finish time of an execution. Now this is obviously wrong for the
>>>>> demand
>>>>> retention strategy (and probably in general), as idle time should not be
>>>>> computed across stop/start boundaries. Therefore I propose the following
>>>>> changes:
>>>>>
>>>>> 1. Computer.java should have a "connect time" property that is set when
>>>>> the
>>>>> Slave connects.
>>>>>
>>>>> 2. Computer.getIdleStartMilliseconds() should return max of "connect time"
>>>>> and the existing idle executor computation in order to ignore the
>>>>> start/stop
>>>>> boundries.
>>>>>
>>>>>
>>>>>    -Keith
>>>>>
>>>>>
>>>>> On 5/26/09 10:18 PM, "Kohsuke Kawaguchi" <[hidden email]> wrote:
>>>>>
>>>>>> The master initiated a shut down of a channel, which gets acked by the
>>>>>> slave, and you are seeing that ACK as the OrderlyShutdown object.
>>>>>>
>>>>>> Can you follow http://wiki.hudson-ci.org/display/HUDSON/Logging and
>>>>>> increase log level for hudson.slaves.RetentionStrategy ?
>>>>>>
>>>>>> I'm pretty sure the on-demand retention strategy is shutting down the
>>>>>> node, but I'm curious why it's doing so right after a connection gets
>>>>>> established.
>>>>>>
>>>>>>
>>>>>> 2009/5/26 Keith Kowalczykowski <[hidden email]>:
>>>>>>> Hi Everyone,
>>>>>>>
>>>>>>>    I'm running into a strange issue with slaves terminating with the
>>>>>>> following stack trace immediately after launching.
>>>>>>>
>>>>>>> channel stopped
>>>>>>> [05/26/09 17:38:59] slave agent was terminated
>>>>>>> hudson.remoting.Channel$OrderlyShutdown
>>>>>>>  at hudson.remoting.Channel$CloseCommand.execute(Channel.java:623)
>>>>>>>  at hudson.remoting.Channel$ReaderThread.run(Channel.java:737)
>>>>>>> Caused by: Command close created at
>>>>>>>  at hudson.remoting.Command.<init>(Command.java:47)
>>>>>>>  at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
>>>>>>>  at hudson.remoting.Channel$CloseCommand.<init>(Channel.java:619)
>>>>>>>  at hudson.remoting.Channel.close(Channel.java:663)
>>>>>>>  at hudson.remoting.Channel$CloseCommand.execute(Channel.java:622)
>>>>>>>  ... 1 more
>>>>>>>
>>>>>>> The node is setup with the following options:
>>>>>>>
>>>>>>> * Launch via execution of command on Master (shell script)
>>>>>>> * Take this slave online when in demand and off-line when idle
>>>>>>>
>>>>>>> What happens is the slave starts up and soon there-after terminates with
>>>>>>> the
>>>>>>> above stack trace. This seems to happen if started either due to demand
>>>>>>> or
>>>>>>> manually myself. If I switch the node to "Keep this slave online as much
>>>>>>> as
>>>>>>> possible", however, the problem does not occur.
>>>>>>>
>>>>>>> Usually by looking at the code I can track down the source of an issue
>>>>>>> fairly well myself. However, in this case it seems to be some odd
>>>>>>> interaction between the server and the slave. If I understand correctly,
>>>>>>> the
>>>>>>> ReaderThread.run is deserializing and executing the "close" command from
>>>>>>> the
>>>>>>> slave. Beyond this, I am lost, as I don't understand the full
>>>>>>> interaction
>>>>>>> between slave and server. Can anyone help shed some light on this?
>>>>>>>
>>>>>>>    -Keith
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [hidden email]
>>>>>>> For additional commands, e-mail: [hidden email]
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> For additional commands, e-mail: [hidden email]
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]