Automatically re-running a remote job without failing it, if the connection to the node drops

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Automatically re-running a remote job without failing it, if the connection to the node drops

Alexandru Băluț-2
I see jobs failing with "FATAL: command execution failed" caused by java.io.EOFException. I understand the connection to the remote node drops, so java.io.EOFException is raised. Is it possible to re-run the job remotely without failing it at all? I've seen the Naginator plugin, but it seems to start a new separate job if the current one fails, which is not what I want. The failure would bubble up to the parent job which started it and cause it to fail.

Or is it possible for the parent job to check the status of the child jobs started by it, and "manually" restart the ones failing due to java.io.EOFException until they succeed?


Building remotely on instance-1 (tag1) in workspace /var/lib/jenkins/workspace/eval
[vmu-eval-single] $ /bin/sh -xe /tmp/jenkins4616924287086740166.sh
+ /path/to/evaluation-tool
FATAL: command execution failed
java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
at hudson.remoting.Command.readFrom(Command.java:140)
at hudson.remoting.Command.readFrom(Command.java:126)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused: java.io.IOException: Backing channel 'instance-1' is disconnected.
at hudson.remoting.RemoteInvocationHandler.channelOrFail(RemoteInvocationHandler.java:214)
at hudson.remoting.RemoteInvocationHandler.invoke(RemoteInvocationHandler.java:283)
at com.sun.proxy.$Proxy78.isAlive(Unknown Source)
at hudson.Launcher$RemoteLauncher$ProcImpl.isAlive(Launcher.java:1144)
at hudson.Launcher$RemoteLauncher$ProcImpl.join(Launcher.java:1136)
at hudson.tasks.CommandInterpreter.join(CommandInterpreter.java:155)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:109)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
at hudson.model.Build$BuildExecution.build(Build.java:206)
at hudson.model.Build$BuildExecution.doRun(Build.java:163)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
at hudson.model.Run.execute(Run.java:1816)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
FATAL: Unable to delete script file /tmp/jenkins4616924287086740166.sh
java.io.EOFException
at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)
at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
at java.io.ObjectInputStream.<init>(ObjectInputStream.java:358)
at hudson.remoting.ObjectInputStreamEx.<init>(ObjectInputStreamEx.java:49)
at hudson.remoting.Command.readFrom(Command.java:140)
at hudson.remoting.Command.readFrom(Command.java:126)
at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused: hudson.remoting.ChannelClosedException: Channel "unknown": Remote call on instance-eval-worker-25 failed. The channel is closing down or has closed down
at hudson.remoting.Channel.call(Channel.java:950)
at hudson.FilePath.act(FilePath.java:1069)
at hudson.FilePath.act(FilePath.java:1058)
at hudson.FilePath.delete(FilePath.java:1539)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:123)
at hudson.tasks.CommandInterpreter.perform(CommandInterpreter.java:66)
at hudson.tasks.BuildStepMonitor$1.perform(BuildStepMonitor.java:20)
at hudson.model.AbstractBuild$AbstractBuildExecution.perform(AbstractBuild.java:744)
at hudson.model.Build$BuildExecution.build(Build.java:206)
at hudson.model.Build$BuildExecution.doRun(Build.java:163)
at hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:504)
at hudson.model.Run.execute(Run.java:1816)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
at hudson.model.ResourceController.execute(ResourceController.java:97)
at hudson.model.Executor.run(Executor.java:429)
Build step 'Execute shell' marked build as failure
Finished: FAILURE

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/c4aaddae-b82e-448a-893a-844bc50d515c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.