Re: [JIRA] Commented: (HUDSON-3412) For long running jobs (>2 hours) job failing with hudson.util.IOException2: Failed to join the process

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: [JIRA] Commented: (HUDSON-3412) For long running jobs (>2 hours) job failing with hudson.util.IOException2: Failed to join the process

Joachim Bauernberger
The problem was solved in our setup after making sure hudson uses
autossh instead of ssh

On 7/27/10, [hidden email] (JIRA) <[hidden email]> wrote:

>
>     [
> http://issues.hudson-ci.org/browse/HUDSON-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=140481#action_140481
> ]
>
> alex_barna commented on HUDSON-3412:
> ------------------------------------
>
> I am also experiencing exactly the same problem (callstack). Hudson 1.363 on
> RedHat/Tomcat. Windows XP slaves. I have a matrix job, and each
> configuration job takes about 1.5 hours. Not all jobs fail, in my last
> build, 1 out of 25 configuration failed because of this problem.
>
>> For long running jobs (>2 hours) job failing with
>> hudson.util.IOException2: Failed to join the process
>> ------------------------------------------------------------------------------------------------------
>>
>>                 Key: HUDSON-3412
>>                 URL: http://issues.hudson-ci.org/browse/HUDSON-3412
>>             Project: Hudson
>>          Issue Type: Bug
>>          Components: core
>>    Affects Versions: current
>>         Environment: Platform: PC, OS: Linux
>>            Reporter: chad_lyon
>>
>> We have a sort of special CI environment where after projects build we
>> execute
>> them remotely and use hudson to monitor their progress. The remote
>> execution of
>> these programs take a while and at certain points no output is sent back
>> to the
>> master for long periods of time.  During these long intervals where no
>> output is
>> sent back (just over 2 hours) I am occasionally seeing the job fail with
>> the
>> following:
>> FATAL: command execution failed
>> hudson.util.IOException2: Failed to join the process
>> at hudson.Proc$RemoteProc.join(Proc.java:269)
>> at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:84)
>> at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
>> at hudson.model.Build$RunnerImpl.build(Build.java:195)
>> at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
>> at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272)
>> at hudson.model.Run.run(Run.java:895)
>> at hudson.model.Build.run(Build.java:112)
>> at hudson.model.ResourceController.execute(ResourceController.java:93)
>> at hudson.model.Executor.run(Executor.java:119)
>> Caused by: java.util.concurrent.ExecutionException:
>> hudson.remoting.RequestAbortedException: java.io.EOFException
>> at hudson.remoting.Request$1.get(Request.java:188)
>> at hudson.remoting.Request$1.get(Request.java:157)
>> at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
>> at hudson.Proc$RemoteProc.join(Proc.java:261)
>> ... 9 more
>> Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
>> at hudson.remoting.Request.abort(Request.java:223)
>> at hudson.remoting.Channel.terminate(Channel.java:528)
>> at hudson.remoting.Channel$ReaderThread.run(Channel.java:684)
>> Caused by: java.io.EOFException
>> at
>> java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
>> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
>> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
>> at hudson.remoting.Channel$ReaderThread.run(Channel.java:665)
>> FATAL: Unable to delete script file /tmp/hudson24564.sh
>> hudson.util.IOException2: remote file operation failed
>> at hudson.FilePath.act(FilePath.java:544)
>> at hudson.FilePath.delete(FilePath.java:741)
>> at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:94)
>> at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
>> at hudson.model.Build$RunnerImpl.build(Build.java:195)
>> at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
>> at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272)
>> at hudson.model.Run.run(Run.java:895)
>> at hudson.model.Build.run(Build.java:112)
>> at hudson.model.ResourceController.execute(ResourceController.java:93)
>> at hudson.model.Executor.run(Executor.java:119)
>> Caused by: java.io.IOException: already closed
>> at hudson.remoting.Channel.send(Channel.java:342)
>> at hudson.remoting.Request.call(Request.java:104)
>> at hudson.remoting.Channel.call(Channel.java:481)
>> at hudson.FilePath.act(FilePath.java:541)
>> ... 10 more
>> FATAL: already closed
>> java.io.IOException: already closed
>> at hudson.remoting.Channel.send(Channel.java:342)
>> at hudson.remoting.Request.call(Request.java:104)
>> at hudson.remoting.Channel.call(Channel.java:481)
>> at hudson.Launcher$RemoteLauncher.kill(Launcher.java:466)
>> at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:277)
>> at hudson.model.Run.run(Run.java:895)
>> at hudson.model.Build.run(Build.java:112)
>> at hudson.model.ResourceController.execute(ResourceController.java:93)
>> at hudson.model.Executor.run(Executor.java:119)
>> However, this is not predictable or reproducible which makes me think it
>> corresponds to an external event such as GC, or even an network or OS
>> event (eg
>> TCP Error or Socket timeout). Anyway I thought I would put it up here and
>> see if
>> anyone else is getting this too.
>> I am using Hudson ver. 1.293, The master and slave are both RHEL 4
>> An interesting development occurred when I upgraded recently and then set
>> hudson.util.ProcessTreeKiller.disable=true. The jobs were still failing
>> but the
>> underlying process was eventually completing its job successfully (copying
>> a
>> large MySQL DB if you must know). This is the reason I reported this. This
>> hints
>> at a bug in hudson's remoting code.
>> --Chad
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
> http://issues.hudson-ci.org/secure/Administrators.jspa
> -
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>
>

--
Sent from my mobile device

http://www.bauernberger.com/
Xing: https://www.openbc.com/hp/Joachim_Bauernberger/

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]