Slave build termination - how to debug?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Slave build termination - how to debug?

Kirk True
Hi all,

I'm running a long-ish build (9 hours) that dies about 8 and a half hours into the build. Here is
the tail of the log in Hudson for the build:

FATAL: command execution failed. Maybe you need to configure the job to choose one of your Ant
installations?
hudson.util.IOException2: Failed to join the process
        at hudson.Proc$RemoteProc.join(Proc.java:220)
        at hudson.tasks.Ant.perform(Ant.java:171)
        at hudson.model.Build$RunnerImpl.build(Build.java:156)
        at hudson.model.Build$RunnerImpl.doRun(Build.java:124)
        at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:192)
        at hudson.model.Run.run(Run.java:649)
        at hudson.model.Build.run(Build.java:102)
        at hudson.model.ResourceController.execute(ResourceController.java:70)
        at hudson.model.Executor.run(Executor.java:64)
Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException:
java.io.EOFException
        at hudson.remoting.Request$1.get(Request.java:152)
        at hudson.remoting.Request$1.get(Request.java:121)
        at hudson.remoting.FutureAdapter.get(FutureAdapter.java:32)
        at hudson.Proc$RemoteProc.join(Proc.java:212)
        ... 8 more
Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
        at hudson.remoting.Request.abort(Request.java:187)
        at hudson.remoting.Channel.terminate(Channel.java:319)
        at hudson.remoting.Channel$ReaderThread.run(Channel.java:465)
Caused by: java.io.EOFException
        at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2498)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1273)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
        at hudson.remoting.Channel$ReaderThread.run(Channel.java:446)
FATAL: null
java.lang.NullPointerException
        at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:70)
        at hudson.model.Build$RunnerImpl.performAllBuildStep(Build.java:150)
        at hudson.model.Build$RunnerImpl.post2(Build.java:137)
        at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:254)
        at hudson.model.Run.run(Run.java:667)
        at hudson.model.Build.run(Build.java:102)
        at hudson.model.ResourceController.execute(ResourceController.java:70)
        at hudson.model.Executor.run(Executor.java:64)

Here is the slave computer's log (via Hudson):

debug2: channel 0: window 0 sent adjust 4096
debug2: channel 0: window 4096 sent adjust 4096
debug2: channel 0: window 4792 sent adjust 4096
debug2: channel 0: window 3758 sent adjust 4096
debug2: channel 0: window 4096 sent adjust 4096
debug2: channel 0: window 4096 sent adjust 4096
debug2: channel 0: window 4096 sent adjust 4096
debug2: channel 0: window 4679 sent adjust 4096
debug2: channel 0: window 4096 sent adjust 4096
debug2: channel 0: window 4096 sent adjust 4096
debug2: channel 0: window 4096 sent adjust 4096
debug2: channel 0: window 4096 sent adjust 4096
debug2: channel 0: window 4743 sent adjust 4096
debug2: channel 0: window 8839 sent adjust 4096
debug2: channel 0: window 12935 sent adjust 4096
debug2: channel 0: window 17031 sent adjust 4096
debug2: channel 0: window 21127 sent adjust 4096
debug2: channel 0: window 25223 sent adjust 4096
debug2: channel 0: window 29319 sent adjust 4096
debug2: channel 0: window 33415 sent adjust 4096
debug2: channel 0: window 37511 sent adjust 4096
debug2: channel 0: window 41607 sent adjust 4096
debug2: channel 0: window 45703 sent adjust 4096
debug2: channel 0: window 49799 sent adjust 4096
debug2: channel 0: window 53895 sent adjust 4096
debug2: channel 0: window 57991 sent adjust 4096
debug2: channel 0: window 62087 sent adjust 4096
debug2: channel 0: window 64391 sent adjust 66681
debug2: channel 0: window 65155 sent adjust 54122
debug2: channel 0: window 65196 sent adjust 56226
debug2: channel 0: window 64947 sent adjust 36864
debug2: channel 0: window 65385 sent adjust 65687
debug2: channel 0: window 64784 sent adjust 60023
debug2: channel 0: window 65049 sent adjust 48227
debug2: channel 0: window 65314 sent adjust 65758
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug2: channel 0: rcvd ext data 123
debug2: channel 0: rcvd eof
debug2: channel 0: output open -> drain
debug2: channel 0: rcvd close
debug2: channel 0: close_read
debug2: channel 0: input open -> closed
debug3: channel 0: will not send data after close
debug2: channel 0: obuf_empty delayed efd 6/(123)
/export/home/tester/hudson/bin/runHudsonSlave.sh: line 6:  5739 Killed                
$JAVA_HOME/bin/java -jar slave.jar
debug2: channel 0: written 123 to efd 6
debug3: channel 0: will not send data after close
debug2: channel 0: obuf empty
debug2: channel 0: close_write
debug2: channel 0: output drain -> closed
debug2: channel 0: almost dead
debug2: channel 0: gc: notify user
debug2: channel 0: gc: user detached
debug2: channel 0: send close
debug2: channel 0: is dead
debug2: channel 0: garbage collecting
debug1: channel 0: free: client-session, nchannels 1
debug3: channel 0: status: The following connections are open:
  #0 client-session (t4 r0 i3/0 o3/0 fd -1/-1 cfd -1)

debug3: channel 0: close_fds r -1 w -1 e 6 c -1
debug1: fd 0 clearing O_NONBLOCK
debug1: fd 1 clearing O_NONBLOCK
debug1: fd 2 clearing O_NONBLOCK
debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 35382.3 seconds
debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.0
debug1: Exit status 137
[12/03/07 23:00:01] slave agent was terminated

java.io.EOFException
        at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2498)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1273)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
        at hudson.remoting.Channel$ReaderThread.run(Channel.java:446)

(My ssh connection uses the -vvv option).

I consistently have this issue when using a specific machine as a Hudson master. If I change the
Hudson master to a different machine, but leave the Hudson slave as is, it works fine.

What additional debugging would be useful to provide to diagnose the issue?

My client and I are both really excited about Hudson; we just need to get over this one little
hump.

Oh, and this occurs on versions 1.151, 1.156, and 1.160.

Thanks!
Kirk

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slave build termination - how to debug?

Kohsuke Kawaguchi
Administrator

The corresponding computer log should have information about what's
causing the slave shutdown. I see the line:

> /export/home/tester/hudson/bin/runHudsonSlave.sh: line 6:  5739 Killed $JAVA_HOME/bin/java -jar slave.jar

Where is this message coming from? It looks like shell prints out this
message, but what's "5739 Killed ..." ? Does this mean someone sends a
signal to java to kill it? (But then how would the parent shell notice
that?) Figuring out what this error message means could lead us to
somewhere.

I actually think you might be better off not using "-vvv". It doesn't
look like ssh is doing anything wrong here.




Kirk True wrote:

> Hi all,
>
> I'm running a long-ish build (9 hours) that dies about 8 and a half hours into the build. Here is
> the tail of the log in Hudson for the build:
>
> FATAL: command execution failed. Maybe you need to configure the job to choose one of your Ant
> installations?
> hudson.util.IOException2: Failed to join the process
> at hudson.Proc$RemoteProc.join(Proc.java:220)
> at hudson.tasks.Ant.perform(Ant.java:171)
> at hudson.model.Build$RunnerImpl.build(Build.java:156)
> at hudson.model.Build$RunnerImpl.doRun(Build.java:124)
> at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:192)
> at hudson.model.Run.run(Run.java:649)
> at hudson.model.Build.run(Build.java:102)
> at hudson.model.ResourceController.execute(ResourceController.java:70)
> at hudson.model.Executor.run(Executor.java:64)
> Caused by: java.util.concurrent.ExecutionException: hudson.remoting.RequestAbortedException:
> java.io.EOFException
> at hudson.remoting.Request$1.get(Request.java:152)
> at hudson.remoting.Request$1.get(Request.java:121)
> at hudson.remoting.FutureAdapter.get(FutureAdapter.java:32)
> at hudson.Proc$RemoteProc.join(Proc.java:212)
> ... 8 more
> Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
> at hudson.remoting.Request.abort(Request.java:187)
> at hudson.remoting.Channel.terminate(Channel.java:319)
> at hudson.remoting.Channel$ReaderThread.run(Channel.java:465)
> Caused by: java.io.EOFException
> at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2498)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1273)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
> at hudson.remoting.Channel$ReaderThread.run(Channel.java:446)
> FATAL: null
> java.lang.NullPointerException
> at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:70)
> at hudson.model.Build$RunnerImpl.performAllBuildStep(Build.java:150)
> at hudson.model.Build$RunnerImpl.post2(Build.java:137)
> at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:254)
> at hudson.model.Run.run(Run.java:667)
> at hudson.model.Build.run(Build.java:102)
> at hudson.model.ResourceController.execute(ResourceController.java:70)
> at hudson.model.Executor.run(Executor.java:64)
>
> Here is the slave computer's log (via Hudson):
>
> debug2: channel 0: window 0 sent adjust 4096
> debug2: channel 0: window 4096 sent adjust 4096
> debug2: channel 0: window 4792 sent adjust 4096
> debug2: channel 0: window 3758 sent adjust 4096
> debug2: channel 0: window 4096 sent adjust 4096
> debug2: channel 0: window 4096 sent adjust 4096
> debug2: channel 0: window 4096 sent adjust 4096
> debug2: channel 0: window 4679 sent adjust 4096
> debug2: channel 0: window 4096 sent adjust 4096
> debug2: channel 0: window 4096 sent adjust 4096
> debug2: channel 0: window 4096 sent adjust 4096
> debug2: channel 0: window 4096 sent adjust 4096
> debug2: channel 0: window 4743 sent adjust 4096
> debug2: channel 0: window 8839 sent adjust 4096
> debug2: channel 0: window 12935 sent adjust 4096
> debug2: channel 0: window 17031 sent adjust 4096
> debug2: channel 0: window 21127 sent adjust 4096
> debug2: channel 0: window 25223 sent adjust 4096
> debug2: channel 0: window 29319 sent adjust 4096
> debug2: channel 0: window 33415 sent adjust 4096
> debug2: channel 0: window 37511 sent adjust 4096
> debug2: channel 0: window 41607 sent adjust 4096
> debug2: channel 0: window 45703 sent adjust 4096
> debug2: channel 0: window 49799 sent adjust 4096
> debug2: channel 0: window 53895 sent adjust 4096
> debug2: channel 0: window 57991 sent adjust 4096
> debug2: channel 0: window 62087 sent adjust 4096
> debug2: channel 0: window 64391 sent adjust 66681
> debug2: channel 0: window 65155 sent adjust 54122
> debug2: channel 0: window 65196 sent adjust 56226
> debug2: channel 0: window 64947 sent adjust 36864
> debug2: channel 0: window 65385 sent adjust 65687
> debug2: channel 0: window 64784 sent adjust 60023
> debug2: channel 0: window 65049 sent adjust 48227
> debug2: channel 0: window 65314 sent adjust 65758
> debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
> debug2: channel 0: rcvd ext data 123
> debug2: channel 0: rcvd eof
> debug2: channel 0: output open -> drain
> debug2: channel 0: rcvd close
> debug2: channel 0: close_read
> debug2: channel 0: input open -> closed
> debug3: channel 0: will not send data after close
> debug2: channel 0: obuf_empty delayed efd 6/(123)
> /export/home/tester/hudson/bin/runHudsonSlave.sh: line 6:  5739 Killed                
> $JAVA_HOME/bin/java -jar slave.jar
> debug2: channel 0: written 123 to efd 6
> debug3: channel 0: will not send data after close
> debug2: channel 0: obuf empty
> debug2: channel 0: close_write
> debug2: channel 0: output drain -> closed
> debug2: channel 0: almost dead
> debug2: channel 0: gc: notify user
> debug2: channel 0: gc: user detached
> debug2: channel 0: send close
> debug2: channel 0: is dead
> debug2: channel 0: garbage collecting
> debug1: channel 0: free: client-session, nchannels 1
> debug3: channel 0: status: The following connections are open:
>   #0 client-session (t4 r0 i3/0 o3/0 fd -1/-1 cfd -1)
>
> debug3: channel 0: close_fds r -1 w -1 e 6 c -1
> debug1: fd 0 clearing O_NONBLOCK
> debug1: fd 1 clearing O_NONBLOCK
> debug1: fd 2 clearing O_NONBLOCK
> debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 35382.3 seconds
> debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.0
> debug1: Exit status 137
> [12/03/07 23:00:01] slave agent was terminated
>
> java.io.EOFException
> at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2498)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1273)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
> at hudson.remoting.Channel$ReaderThread.run(Channel.java:446)
>
> (My ssh connection uses the -vvv option).
>
> I consistently have this issue when using a specific machine as a Hudson master. If I change the
> Hudson master to a different machine, but leave the Hudson slave as is, it works fine.
>
> What additional debugging would be useful to provide to diagnose the issue?
>
> My client and I are both really excited about Hudson; we just need to get over this one little
> hump.
>
> Oh, and this occurs on versions 1.151, 1.156, and 1.160.
>
> Thanks!
> Kirk
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

--
Kohsuke Kawaguchi
Sun Microsystems                   [hidden email]

smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Slave build termination - how to debug?

Kirk True-3
Hi Kohsuke,

Uh, this was SUE on my part. Someone else added a crontab entry that started
in the middle of the run that contained a nice little pkill java line...

Sorry for the noise.

Kirk

On 12/17/07 6:46 PM, "Kohsuke Kawaguchi" <[hidden email]> wrote:

>
> The corresponding computer log should have information about what's
> causing the slave shutdown. I see the line:
>
>> /export/home/tester/hudson/bin/runHudsonSlave.sh: line 6:  5739 Killed
>> $JAVA_HOME/bin/java -jar slave.jar
>
> Where is this message coming from? It looks like shell prints out this
> message, but what's "5739 Killed ..." ? Does this mean someone sends a
> signal to java to kill it? (But then how would the parent shell notice
> that?) Figuring out what this error message means could lead us to
> somewhere.
>
> I actually think you might be better off not using "-vvv". It doesn't
> look like ssh is doing anything wrong here.
>
>
>
>
> Kirk True wrote:
>> Hi all,
>>
>> I'm running a long-ish build (9 hours) that dies about 8 and a half hours
>> into the build. Here is
>> the tail of the log in Hudson for the build:
>>
>> FATAL: command execution failed. Maybe you need to configure the job to
>> choose one of your Ant
>> installations?
>> hudson.util.IOException2: Failed to join the process
>> at hudson.Proc$RemoteProc.join(Proc.java:220)
>> at hudson.tasks.Ant.perform(Ant.java:171)
>> at hudson.model.Build$RunnerImpl.build(Build.java:156)
>> at hudson.model.Build$RunnerImpl.doRun(Build.java:124)
>> at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:192)
>> at hudson.model.Run.run(Run.java:649)
>> at hudson.model.Build.run(Build.java:102)
>> at hudson.model.ResourceController.execute(ResourceController.java:70)
>> at hudson.model.Executor.run(Executor.java:64)
>> Caused by: java.util.concurrent.ExecutionException:
>> hudson.remoting.RequestAbortedException:
>> java.io.EOFException
>> at hudson.remoting.Request$1.get(Request.java:152)
>> at hudson.remoting.Request$1.get(Request.java:121)
>> at hudson.remoting.FutureAdapter.get(FutureAdapter.java:32)
>> at hudson.Proc$RemoteProc.join(Proc.java:212)
>> ... 8 more
>> Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
>> at hudson.remoting.Request.abort(Request.java:187)
>> at hudson.remoting.Channel.terminate(Channel.java:319)
>> at hudson.remoting.Channel$ReaderThread.run(Channel.java:465)
>> Caused by: java.io.EOFException
>> at
>> java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.jav
>> a:2498)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1273)
>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>> at hudson.remoting.Channel$ReaderThread.run(Channel.java:446)
>> FATAL: null
>> java.lang.NullPointerException
>> at hudson.tasks.ArtifactArchiver.perform(ArtifactArchiver.java:70)
>> at hudson.model.Build$RunnerImpl.performAllBuildStep(Build.java:150)
>> at hudson.model.Build$RunnerImpl.post2(Build.java:137)
>> at hudson.model.AbstractBuild$AbstractRunner.post(AbstractBuild.java:254)
>> at hudson.model.Run.run(Run.java:667)
>> at hudson.model.Build.run(Build.java:102)
>> at hudson.model.ResourceController.execute(ResourceController.java:70)
>> at hudson.model.Executor.run(Executor.java:64)
>>
>> Here is the slave computer's log (via Hudson):
>>
>> debug2: channel 0: window 0 sent adjust 4096
>> debug2: channel 0: window 4096 sent adjust 4096
>> debug2: channel 0: window 4792 sent adjust 4096
>> debug2: channel 0: window 3758 sent adjust 4096
>> debug2: channel 0: window 4096 sent adjust 4096
>> debug2: channel 0: window 4096 sent adjust 4096
>> debug2: channel 0: window 4096 sent adjust 4096
>> debug2: channel 0: window 4679 sent adjust 4096
>> debug2: channel 0: window 4096 sent adjust 4096
>> debug2: channel 0: window 4096 sent adjust 4096
>> debug2: channel 0: window 4096 sent adjust 4096
>> debug2: channel 0: window 4096 sent adjust 4096
>> debug2: channel 0: window 4743 sent adjust 4096
>> debug2: channel 0: window 8839 sent adjust 4096
>> debug2: channel 0: window 12935 sent adjust 4096
>> debug2: channel 0: window 17031 sent adjust 4096
>> debug2: channel 0: window 21127 sent adjust 4096
>> debug2: channel 0: window 25223 sent adjust 4096
>> debug2: channel 0: window 29319 sent adjust 4096
>> debug2: channel 0: window 33415 sent adjust 4096
>> debug2: channel 0: window 37511 sent adjust 4096
>> debug2: channel 0: window 41607 sent adjust 4096
>> debug2: channel 0: window 45703 sent adjust 4096
>> debug2: channel 0: window 49799 sent adjust 4096
>> debug2: channel 0: window 53895 sent adjust 4096
>> debug2: channel 0: window 57991 sent adjust 4096
>> debug2: channel 0: window 62087 sent adjust 4096
>> debug2: channel 0: window 64391 sent adjust 66681
>> debug2: channel 0: window 65155 sent adjust 54122
>> debug2: channel 0: window 65196 sent adjust 56226
>> debug2: channel 0: window 64947 sent adjust 36864
>> debug2: channel 0: window 65385 sent adjust 65687
>> debug2: channel 0: window 64784 sent adjust 60023
>> debug2: channel 0: window 65049 sent adjust 48227
>> debug2: channel 0: window 65314 sent adjust 65758
>> debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
>> debug2: channel 0: rcvd ext data 123
>> debug2: channel 0: rcvd eof
>> debug2: channel 0: output open -> drain
>> debug2: channel 0: rcvd close
>> debug2: channel 0: close_read
>> debug2: channel 0: input open -> closed
>> debug3: channel 0: will not send data after close
>> debug2: channel 0: obuf_empty delayed efd 6/(123)
>> /export/home/tester/hudson/bin/runHudsonSlave.sh: line 6:  5739 Killed
>> $JAVA_HOME/bin/java -jar slave.jar
>> debug2: channel 0: written 123 to efd 6
>> debug3: channel 0: will not send data after close
>> debug2: channel 0: obuf empty
>> debug2: channel 0: close_write
>> debug2: channel 0: output drain -> closed
>> debug2: channel 0: almost dead
>> debug2: channel 0: gc: notify user
>> debug2: channel 0: gc: user detached
>> debug2: channel 0: send close
>> debug2: channel 0: is dead
>> debug2: channel 0: garbage collecting
>> debug1: channel 0: free: client-session, nchannels 1
>> debug3: channel 0: status: The following connections are open:
>>   #0 client-session (t4 r0 i3/0 o3/0 fd -1/-1 cfd -1)
>>
>> debug3: channel 0: close_fds r -1 w -1 e 6 c -1
>> debug1: fd 0 clearing O_NONBLOCK
>> debug1: fd 1 clearing O_NONBLOCK
>> debug1: fd 2 clearing O_NONBLOCK
>> debug1: Transferred: stdin 0, stdout 0, stderr 0 bytes in 35382.3 seconds
>> debug1: Bytes per second: stdin 0.0, stdout 0.0, stderr 0.0
>> debug1: Exit status 137
>> [12/03/07 23:00:01] slave agent was terminated
>>
>> java.io.EOFException
>> at
>> java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.jav
>> a:2498)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1273)
>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>> at hudson.remoting.Channel$ReaderThread.run(Channel.java:446)
>>
>> (My ssh connection uses the -vvv option).
>>
>> I consistently have this issue when using a specific machine as a Hudson
>> master. If I change the
>> Hudson master to a different machine, but leave the Hudson slave as is, it
>> works fine.
>>
>> What additional debugging would be useful to provide to diagnose the issue?
>>
>> My client and I are both really excited about Hudson; we just need to get
>> over this one little
>> hump.
>>
>> Oh, and this occurs on versions 1.151, 1.156, and 1.160.
>>
>> Thanks!
>> Kirk
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...