[Idea] Improving master mem/cpu footprint/visualization when running pipelines on nodes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[Idea] Improving master mem/cpu footprint/visualization when running pipelines on nodes

Rodrigo Ghirelli de Queiroz
Hi all,


I am a heavy user of Jenkins supporting companies to create pipelines distributed across several servers.

I find it very hard to control what is really using master CPU and the nodes CPU when developing pipelines (either DSL or scripted). I guess the documentation is not clear enough to show where there is a separation between them. Recently I found out in the documentation to avoid the usage of HttpRequest plugin for it uses masters resources for that. I wonder how many other mistakes have I done in the last months.

As a pipelines (and library) developer, I want to keep the master CPU as down as possible, and use the less memory as possible in order for the master just so essencial coordination.

What I think would improve a lot of the pipeline quality, it is to let developers now (without having to dig into Jenkins source) how this coordination is done. I myself have never dig deep to understand all the underlying communications.

These are things I see could be improved a lot:

1 - Make it clearer for the developers which steps are really requiring master resources that is more than just coordination (ex.: http request is being done on master even if you are the step on a node). The idea is: if the developer is aware of that, he/she can directly look for other options. Maybe we could mark something in the documentation like: node resources / master resources / uses both (very inefficient)

2 - Shared variables between steps. So far I see that devs creates global variables to communicate between steps. Shouldn’t there be a “recommended” way, or even better a special syntax to highlight on how to do it in an improved way? My idea is that I wouldn’t like master to keep this variables in a DSL script. Sometimes the steps run in a node, so I would like that the variable scope is defined and “run” in a node only. This would improve the master memory footprint. Even if we would need to keep global variables in a master, instead of keeping this variables “alive”, we could simply add a syntax to “send value to master” and serialize this to the stash/state control files whatever to make it clear when that happens instead of keeping 10k instances of vars alive for each node that uses them in the master. Then master could serialize a “state”, save the updated var variable and unload it. Coordination for this var would only happen when needed (e.g. DSL pipelines running on different nodes and share common variables).

3 - master / node signaling - Jenkins UI does not make clear how heavy Jenkins master is using resources to coordinate tasks on nodes. We do have executors, but that is all. Also when it is processing something, bars show on the master as well, but what is really doing, the resources it is using, it is not clear. What would be interesting to see: master internal queue/blocked resources, what are the plugins/steps causing this blocking, memory used for each pipeline running on nodes (like if I running 10k pipelines that uses 1MB var for each, I certainly would like to review that... or remove Jenkins Config History plug-in for it is blocking everything, etc). I am sure there are intrinsic coordination that Jenkins is doing under the hood, but that should be clear to users. Even if that means an advanced buttons, etc. mostly devs are using these features anyway.

4 - caching libraries / jars - libraries are so cool. I don’t understand why caching them is still not on LTS. That is not all, even adding some jars (like for eg, I use jedis in my library to communicate with redis). Would be cool to coordinate that inside Jenkins instead of adding them to my java class path etc.

It is my first time I write in the group, so please let me know if I am writing in the wrong place or using a wrong channel.


Rodrigo Queiroz
[hidden email]

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/3AB55E89-946E-4385-B312-2A45D5957F50%40gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Idea] Improving master mem/cpu footprint/visualization when running pipelines on nodes

Gavin Mogan
Is this a list of things you want to work on? Totally on board, they sound like really great adds
Is this a list of things you want others to work on? We are all volunteers and while you can create tickets as suggestions for what people should do next, no guarantees

On Tue, Aug 25, 2020 at 1:23 PM Rodrigo Ghirelli de Queiroz <[hidden email]> wrote:
Hi all,


I am a heavy user of Jenkins supporting companies to create pipelines distributed across several servers.

I find it very hard to control what is really using master CPU and the nodes CPU when developing pipelines (either DSL or scripted). I guess the documentation is not clear enough to show where there is a separation between them. Recently I found out in the documentation to avoid the usage of HttpRequest plugin for it uses masters resources for that. I wonder how many other mistakes have I done in the last months.

As a pipelines (and library) developer, I want to keep the master CPU as down as possible, and use the less memory as possible in order for the master just so essencial coordination.

What I think would improve a lot of the pipeline quality, it is to let developers now (without having to dig into Jenkins source) how this coordination is done. I myself have never dig deep to understand all the underlying communications.

These are things I see could be improved a lot:

1 - Make it clearer for the developers which steps are really requiring master resources that is more than just coordination (ex.: http request is being done on master even if you are the step on a node). The idea is: if the developer is aware of that, he/she can directly look for other options. Maybe we could mark something in the documentation like: node resources / master resources / uses both (very inefficient)

2 - Shared variables between steps. So far I see that devs creates global variables to communicate between steps. Shouldn’t there be a “recommended” way, or even better a special syntax to highlight on how to do it in an improved way? My idea is that I wouldn’t like master to keep this variables in a DSL script. Sometimes the steps run in a node, so I would like that the variable scope is defined and “run” in a node only. This would improve the master memory footprint. Even if we would need to keep global variables in a master, instead of keeping this variables “alive”, we could simply add a syntax to “send value to master” and serialize this to the stash/state control files whatever to make it clear when that happens instead of keeping 10k instances of vars alive for each node that uses them in the master. Then master could serialize a “state”, save the updated var variable and unload it. Coordination for this var would only happen when needed (e.g. DSL pipelines running on different nodes and share common variables).

3 - master / node signaling - Jenkins UI does not make clear how heavy Jenkins master is using resources to coordinate tasks on nodes. We do have executors, but that is all. Also when it is processing something, bars show on the master as well, but what is really doing, the resources it is using, it is not clear. What would be interesting to see: master internal queue/blocked resources, what are the plugins/steps causing this blocking, memory used for each pipeline running on nodes (like if I running 10k pipelines that uses 1MB var for each, I certainly would like to review that... or remove Jenkins Config History plug-in for it is blocking everything, etc). I am sure there are intrinsic coordination that Jenkins is doing under the hood, but that should be clear to users. Even if that means an advanced buttons, etc. mostly devs are using these features anyway.

4 - caching libraries / jars - libraries are so cool. I don’t understand why caching them is still not on LTS. That is not all, even adding some jars (like for eg, I use jedis in my library to communicate with redis). Would be cool to coordinate that inside Jenkins instead of adding them to my java class path etc.

It is my first time I write in the group, so please let me know if I am writing in the wrong place or using a wrong channel.


Rodrigo Queiroz
[hidden email]

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/3AB55E89-946E-4385-B312-2A45D5957F50%40gmail.com.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CAG%3D_DuuzqBcLFd%2BnS4RPx8Vb3jMLns5bUXkuHLfLydw7L7dAxg%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Idea] Improving master mem/cpu footprint/visualization when running pipelines on nodes

Rodrigo Ghirelli de Queiroz
Yes, I totally understand that. I do not have a lot of free time either, but I needed to send my ideas at least for some feedback.
I wanted to see if this is completely unreasonable or if is something others see as useful as well.

Thanks for your feedback!
I am definitely going to start working on them!

On Wednesday, August 26, 2020 at 12:06:58 AM UTC+2 [hidden email] wrote:
Is this a list of things you want to work on? Totally on board, they sound like really great adds
Is this a list of things you want others to work on? We are all volunteers and while you can create tickets as suggestions for what people should do next, no guarantees

On Tue, Aug 25, 2020 at 1:23 PM Rodrigo Ghirelli de Queiroz <[hidden email]> wrote:
Hi all,


I am a heavy user of Jenkins supporting companies to create pipelines distributed across several servers.

I find it very hard to control what is really using master CPU and the nodes CPU when developing pipelines (either DSL or scripted). I guess the documentation is not clear enough to show where there is a separation between them. Recently I found out in the documentation to avoid the usage of HttpRequest plugin for it uses masters resources for that. I wonder how many other mistakes have I done in the last months.

As a pipelines (and library) developer, I want to keep the master CPU as down as possible, and use the less memory as possible in order for the master just so essencial coordination.

What I think would improve a lot of the pipeline quality, it is to let developers now (without having to dig into Jenkins source) how this coordination is done. I myself have never dig deep to understand all the underlying communications.

These are things I see could be improved a lot:

1 - Make it clearer for the developers which steps are really requiring master resources that is more than just coordination (ex.: http request is being done on master even if you are the step on a node). The idea is: if the developer is aware of that, he/she can directly look for other options. Maybe we could mark something in the documentation like: node resources / master resources / uses both (very inefficient)

2 - Shared variables between steps. So far I see that devs creates global variables to communicate between steps. Shouldn’t there be a “recommended” way, or even better a special syntax to highlight on how to do it in an improved way? My idea is that I wouldn’t like master to keep this variables in a DSL script. Sometimes the steps run in a node, so I would like that the variable scope is defined and “run” in a node only. This would improve the master memory footprint. Even if we would need to keep global variables in a master, instead of keeping this variables “alive”, we could simply add a syntax to “send value to master” and serialize this to the stash/state control files whatever to make it clear when that happens instead of keeping 10k instances of vars alive for each node that uses them in the master. Then master could serialize a “state”, save the updated var variable and unload it. Coordination for this var would only happen when needed (e.g. DSL pipelines running on different nodes and share common variables).

3 - master / node signaling - Jenkins UI does not make clear how heavy Jenkins master is using resources to coordinate tasks on nodes. We do have executors, but that is all. Also when it is processing something, bars show on the master as well, but what is really doing, the resources it is using, it is not clear. What would be interesting to see: master internal queue/blocked resources, what are the plugins/steps causing this blocking, memory used for each pipeline running on nodes (like if I running 10k pipelines that uses 1MB var for each, I certainly would like to review that... or remove Jenkins Config History plug-in for it is blocking everything, etc). I am sure there are intrinsic coordination that Jenkins is doing under the hood, but that should be clear to users. Even if that means an advanced buttons, etc. mostly devs are using these features anyway.

4 - caching libraries / jars - libraries are so cool. I don’t understand why caching them is still not on LTS. That is not all, even adding some jars (like for eg, I use jedis in my library to communicate with redis). Would be cool to coordinate that inside Jenkins instead of adding them to my java class path etc.

It is my first time I write in the group, so please let me know if I am writing in the wrong place or using a wrong channel.


Rodrigo Queiroz
[hidden email]

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/3AB55E89-946E-4385-B312-2A45D5957F50%40gmail.com.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/414018bd-2dc8-4650-bdee-bd7b08035fe6n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: [Idea] Improving master mem/cpu footprint/visualization when running pipelines on nodes

Jesse Glick-4
I would suggest breaking this up, taking one thing at a time, starting small.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CANfRfr3-swMEprHE-qW9qU%2Bhi9zk5So5QsRVfD-pJPy5A7UZDg%40mail.gmail.com.