Reasons for job termination

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Reasons for job termination

Fabi
Hi

We've been using Jenkins for years now. Recently a problem has
come up that I can't explain. Jobs started to get terminated with
no apparent reason. With a signal handler I found that it's
apparently the Jenkins user that is sending the SIGTERM to
the running process.

What are reasons for Jenkins to stop a job?

There is no second build being started and it's throttled anyway.
The build timeout plugin is installed but this is a pipeline job
where it doesn't work. And I don't use the timeout options in
the pipeline.
I don't see anything in the jenkins log at that time.

How can I find out why the job is killed?

Thanks

bye  Fabi

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/20200703071908.98DB040EF397%40macserver.private.
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for job termination

Gianluca
Hi,
what you describe seems something we experienced.
The issue in our case was that the Jenkins agents were VMs running on an overloaded host with network issues.
A combination of network errors, agents not responding and IP exhaustion made Jenkins terminating the jobs with SIGTERM when it was uncapable to restore connection with the agent.
It was hard to find because the host running the VMs was overloaded when the agents were doing something so it was something like:
agent was ok -> agent started to build a job -> job was spawning other VMs for testing -> host got overloaded -> agent could run properly -> Jenkins lost connection with agent -> job got terminated -> host not anymore in overload -> agent ok again -> jenkins restored connection with agent.


On Friday, 3 July 2020 08:19:22 UTC+1, fabian wrote:
Hi

We've been using Jenkins for years now. Recently a problem has
come up that I can't explain. Jobs started to get terminated with
no apparent reason. With a signal handler I found that it's
apparently the Jenkins user that is sending the SIGTERM to
the running process.

What are reasons for Jenkins to stop a job?

There is no second build being started and it's throttled anyway.
The build timeout plugin is installed but this is a pipeline job
where it doesn't work. And I don't use the timeout options in
the pipeline.
I don't see anything in the jenkins log at that time.

How can I find out why the job is killed?

Thanks

bye  Fabi

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/813456d7-1d87-4a40-b954-ddfd6c431c86o%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for job termination

Fabi
Thanks for the hint. That's sure something we can look into. I would have guessed
that a lost connection would show up in the system log but it might not. At least
I can try to improve the situation now.

Thanks again

On Friday, July 3, 2020 at 9:51:23 AM UTC+2, Gianluca wrote:
Hi,
what you describe seems something we experienced.
The issue in our case was that the Jenkins agents were VMs running on an overloaded host with network issues.
A combination of network errors, agents not responding and IP exhaustion made Jenkins terminating the jobs with SIGTERM when it was uncapable to restore connection with the agent.
It was hard to find because the host running the VMs was overloaded when the agents were doing something so it was something like:
agent was ok -> agent started to build a job -> job was spawning other VMs for testing -> host got overloaded -> agent could run properly -> Jenkins lost connection with agent -> job got terminated -> host not anymore in overload -> agent ok again -> jenkins restored connection with agent.


On Friday, 3 July 2020 08:19:22 UTC+1, fabian wrote:
Hi

We've been using Jenkins for years now. Recently a problem has
come up that I can't explain. Jobs started to get terminated with
no apparent reason. With a signal handler I found that it's
apparently the Jenkins user that is sending the SIGTERM to
the running process.

What are reasons for Jenkins to stop a job?

There is no second build being started and it's throttled anyway.
The build timeout plugin is installed but this is a pipeline job
where it doesn't work. And I don't use the timeout options in
the pipeline.
I don't see anything in the jenkins log at that time.

How can I find out why the job is killed?

Thanks

bye  Fabi

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/e7a8753f-7aaf-4247-9639-c75d89a80deco%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Reasons for job termination

Fabi
In reply to this post by Fabi
I just wanted to add my findings in case somebody else is looking for a solution to a similar problem.

It turned out that we have a second jenkins job running on the same machine, mostly unrelated to
the first job that was getting killed. The second job wants to start a process which can only work
if the process isn't already running. Therefore it is looking for processes with a certain name and kills
them if they exist. This pattern now unfortunately also matched a process of the first job and killed
it, assuming it was his own still running process. And as this didn't have anything to do with jenkins
it also didn't show up in the logs.

So it wasn't a jenkins error or resource problem but simply human error.

Thanks for any help and sorry for the noise.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/f56d99d5-55d5-4178-b27b-da9cafa52bdfo%40googlegroups.com.