[JIRA] Commented: (HUDSON-3412) For long running jobs (>2 hours) job failing with hudson.util.IOException2: Failed to join the process

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[JIRA] Commented: (HUDSON-3412) For long running jobs (>2 hours) job failing with hudson.util.IOException2: Failed to join the process

Hudson issues mailing list

    [ http://issues.hudson-ci.org/browse/HUDSON-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=140481#action_140481 ]

alex_barna commented on HUDSON-3412:
------------------------------------

I am also experiencing exactly the same problem (callstack). Hudson 1.363 on RedHat/Tomcat. Windows XP slaves. I have a matrix job, and each configuration job takes about 1.5 hours. Not all jobs fail, in my last build, 1 out of 25 configuration failed because of this problem.

> For long running jobs (>2 hours) job failing with hudson.util.IOException2: Failed to join the process
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HUDSON-3412
>                 URL: http://issues.hudson-ci.org/browse/HUDSON-3412
>             Project: Hudson
>          Issue Type: Bug
>          Components: core
>    Affects Versions: current
>         Environment: Platform: PC, OS: Linux
>            Reporter: chad_lyon
>
> We have a sort of special CI environment where after projects build we execute
> them remotely and use hudson to monitor their progress. The remote execution of
> these programs take a while and at certain points no output is sent back to the
> master for long periods of time.  During these long intervals where no output is
> sent back (just over 2 hours) I am occasionally seeing the job fail with the
> following:
> FATAL: command execution failed
> hudson.util.IOException2: Failed to join the process
> at hudson.Proc$RemoteProc.join(Proc.java:269)
> at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:84)
> at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
> at hudson.model.Build$RunnerImpl.build(Build.java:195)
> at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
> at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272)
> at hudson.model.Run.run(Run.java:895)
> at hudson.model.Build.run(Build.java:112)
> at hudson.model.ResourceController.execute(ResourceController.java:93)
> at hudson.model.Executor.run(Executor.java:119)
> Caused by: java.util.concurrent.ExecutionException:
> hudson.remoting.RequestAbortedException: java.io.EOFException
> at hudson.remoting.Request$1.get(Request.java:188)
> at hudson.remoting.Request$1.get(Request.java:157)
> at hudson.remoting.FutureAdapter.get(FutureAdapter.java:55)
> at hudson.Proc$RemoteProc.join(Proc.java:261)
> ... 9 more
> Caused by: hudson.remoting.RequestAbortedException: java.io.EOFException
> at hudson.remoting.Request.abort(Request.java:223)
> at hudson.remoting.Channel.terminate(Channel.java:528)
> at hudson.remoting.Channel$ReaderThread.run(Channel.java:684)
> Caused by: java.io.EOFException
> at
> java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2554)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
> at hudson.remoting.Channel$ReaderThread.run(Channel.java:665)
> FATAL: Unable to delete script file /tmp/hudson24564.sh
> hudson.util.IOException2: remote file operation failed
> at hudson.FilePath.act(FilePath.java:544)
> at hudson.FilePath.delete(FilePath.java:741)
> at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:94)
> at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:58)
> at hudson.model.Build$RunnerImpl.build(Build.java:195)
> at hudson.model.Build$RunnerImpl.doRun(Build.java:151)
> at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:272)
> at hudson.model.Run.run(Run.java:895)
> at hudson.model.Build.run(Build.java:112)
> at hudson.model.ResourceController.execute(ResourceController.java:93)
> at hudson.model.Executor.run(Executor.java:119)
> Caused by: java.io.IOException: already closed
> at hudson.remoting.Channel.send(Channel.java:342)
> at hudson.remoting.Request.call(Request.java:104)
> at hudson.remoting.Channel.call(Channel.java:481)
> at hudson.FilePath.act(FilePath.java:541)
> ... 10 more
> FATAL: already closed
> java.io.IOException: already closed
> at hudson.remoting.Channel.send(Channel.java:342)
> at hudson.remoting.Request.call(Request.java:104)
> at hudson.remoting.Channel.call(Channel.java:481)
> at hudson.Launcher$RemoteLauncher.kill(Launcher.java:466)
> at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:277)
> at hudson.model.Run.run(Run.java:895)
> at hudson.model.Build.run(Build.java:112)
> at hudson.model.ResourceController.execute(ResourceController.java:93)
> at hudson.model.Executor.run(Executor.java:119)
> However, this is not predictable or reproducible which makes me think it
> corresponds to an external event such as GC, or even an network or OS event (eg
> TCP Error or Socket timeout). Anyway I thought I would put it up here and see if
> anyone else is getting this too.
> I am using Hudson ver. 1.293, The master and slave are both RHEL 4
> An interesting development occurred when I upgraded recently and then set
> hudson.util.ProcessTreeKiller.disable=true. The jobs were still failing but the
> underlying process was eventually completing its job successfully (copying a
> large MySQL DB if you must know). This is the reason I reported this. This hints
> at a bug in hudson's remoting code.
> --Chad

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.hudson-ci.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]