Speed up Artifact Copy between slave and master

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Speed up Artifact Copy between slave and master

Marcelo Brunken
Hellow,

There are a few tickets alread about this problem ... our bottleneck is the copy process between slave and master, is there a solution on way ? Someone is working on it?
I am trying to figure out how it could be faster, I think if the transfer protocol is changed or something, HTTP sucks. (I am almost sure it is sent via HTTP)

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Speed up Artifact Copy between slave and master

David Karlsen

It is also slow over ssh. I saw a fix and pull request for it here the other day - by using TCP nodelay. It has not been applied yet AFAIK.

Den 19. okt. 2011 11:36 skrev "Marcelo Brunken" <[hidden email]> følgende:
Hellow,

There are a few tickets alread about this problem ... our bottleneck is the copy process between slave and master, is there a solution on way ? Someone is working on it?
I am trying to figure out how it could be faster, I think if the transfer protocol is changed or something, HTTP sucks. (I am almost sure it is sent via HTTP)

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: Speed up Artifact Copy between slave and master

Marcelo Brunken
Any Ideas when that release comes out ?

2011/10/19 David Karlsen <[hidden email]>

It is also slow over ssh. I saw a fix and pull request for it here the other day - by using TCP nodelay. It has not been applied yet AFAIK.

Den 19. okt. 2011 11:36 skrev "Marcelo Brunken" <[hidden email]> følgende:

Hellow,

There are a few tickets alread about this problem ... our bottleneck is the copy process between slave and master, is there a solution on way ? Someone is working on it?
I am trying to figure out how it could be faster, I think if the transfer protocol is changed or something, HTTP sucks. (I am almost sure it is sent via HTTP)

Thanks

Reply | Threaded
Open this post in threaded view
|

Re: Speed up Artifact Copy between slave and master

David Karlsen
No idea.
Not even if the pull request was handled and put onto master.

2011/10/21 Marcelo Brunken <[hidden email]>:

> Any Ideas when that release comes out ?
>
> 2011/10/19 David Karlsen <[hidden email]>
>>
>> It is also slow over ssh. I saw a fix and pull request for it here the
>> other day - by using TCP nodelay. It has not been applied yet AFAIK.
>>
>> Den 19. okt. 2011 11:36 skrev "Marcelo Brunken" <[hidden email]>
>> følgende:
>>>
>>> Hellow,
>>> There are a few tickets alread about this problem ... our bottleneck is
>>> the copy process between slave and master, is there a solution on way ?
>>> Someone is working on it?
>>> I am trying to figure out how it could be faster, I think if the transfer
>>> protocol is changed or something, HTTP sucks. (I am almost sure it is sent
>>> via HTTP)
>>> Thanks
>



--
--
David J. M. Karlsen - http://www.linkedin.com/in/davidkarlsen
Reply | Threaded
Open this post in threaded view
|

Re: Speed up Artifact Copy between slave and master

Tim Black
Reviving this very old thread, since this is still very much a problem in Jenkins core a decade later. As I commented here, I'm seeing massive (~13x) performance gains by replacing copyArtifact with a shell call to curl or wget in my pipelines. 

As I understand it, copyArtifact uses a single Jenkins "control channel", which has severely limited i/o and/or cpu resources, and this has been so as far back as I can see. This causes not only sluggish copying of artifacts from controller to agent, but also is a major factor in the similarly abysmal performance of archiving artifacts in the other direction (artifact compression being the other factor).

I am experimenting with workarounds. In lieu of installing a proper artifact management system and replacing all archive/copyArtifact with calls to its REST API (which I'll be doing later this year), I'm hoping to find a quick alternative. The two I'm considering ATM are:
  1. HTTP GET each artifact URL in question via curl, wget, etc.
    1. This is nice bc it can just use the same semantics I was already using with copyArtifact, that is, jobName, branchName, lastSuccessfulBuild symlinks..
    2. This is great for known individual artifacts, but http requires significant extra complexity to fetch whole artifact folders or artifacts matching wildcard/regex like copyArtifact supports. HTTP doesn't have a notion of a directory, so you have to pre-process by fetching an artifact index page, processing, and looping.
      1. This guy said that Jenkins supports http fetching a zip of any folder, but that's not working for me on jenkins 2.249.2.
    3. Another problem here is you have to deal with jenkins authentication / API tokens.
  2. SCP/RSYNC supports rich file/directory pattern matching, but 
    1. require knowledge of the location of the artifacts on the controller's disk. This is non-trivial for multibranch pipeline projects (which I use liberally). Scp would be an obvious choice if I could figure out how to deterministically construct the path to a multibranch pipeline branch job on the controller's disk.
    2. Authentication is trivial since all users/config in my jenkins infra is managed by ansible, so my jenkin user can automatically ssh to any other node in the infra without password.
Any insight into way of replacing copyArtifact with curl/scp would be greatly appreciated. Thanks for your time.
On Friday, October 21, 2011 at 7:44:50 AM UTC-7 David Karlsen wrote:
No idea.
Not even if the pull request was handled and put onto master.

2011/10/21 Marcelo Brunken <[hidden email]>:


> Any Ideas when that release comes out ?
>
> 2011/10/19 David Karlsen <[hidden email]>
>>
>> It is also slow over ssh. I saw a fix and pull request for it here the
>> other day - by using TCP nodelay. It has not been applied yet AFAIK.
>>
>> Den 19. okt. 2011 11:36 skrev "Marcelo Brunken" <[hidden email]>
>> følgende:
>>>
>>> Hellow,
>>> There are a few tickets alread about this problem ... our bottleneck is
>>> the copy process between slave and master, is there a solution on way ?
>>> Someone is working on it?
>>> I am trying to figure out how it could be faster, I think if the transfer
>>> protocol is changed or something, HTTP sucks. (I am almost sure it is sent
>>> via HTTP)
>>> Thanks
>

--

--
David J. M. Karlsen - http://www.linkedin.com/in/davidkarlsen

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/115f6473-9bff-4a95-b89f-d29579a51082n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Speed up Artifact Copy between slave and master

Tim Black
Refining my request a bit further:  SCP as a copyArtifact alternative would be a slam dunk for me if I could construct the source path correctly. The problem is that I use multibranch pipelines liberally and Jenkins uses an algorithm to create a unique folder name, both for workspace names, and for branch job names, but I'm not sure if that's consistent, and therefore I do not know if it would be safe to attempt to re-construct and reference job paths on the controller's disk.

E.g. I want to fetch artifacts from the corresponding branch of an upstream multibranch pipeline job whose Full project name is "ProjectFolder/MyProject/feature%2Ffoo", in the downstream multibranch pipeline, I would do something like:

scp -r jenkins-controller:<JENKINS_HOME>/jobs/ProjectFolder/jobs/MyProject/branches/<HOW_DO_I_COMPUTE_THE_BRANCH_PORTION_OF_PATH?>/lastSuccessfulBuild/artifact/<GLOB> ./

On Monday, March 1, 2021 at 1:50:38 PM UTC-8 Tim Black wrote:
Reviving this very old thread, since this is still very much a problem in Jenkins core a decade later. As I commented here, I'm seeing massive (~13x) performance gains by replacing copyArtifact with a shell call to curl or wget in my pipelines. 

As I understand it, copyArtifact uses a single Jenkins "control channel", which has severely limited i/o and/or cpu resources, and this has been so as far back as I can see. This causes not only sluggish copying of artifacts from controller to agent, but also is a major factor in the similarly abysmal performance of archiving artifacts in the other direction (artifact compression being the other factor).

I am experimenting with workarounds. In lieu of installing a proper artifact management system and replacing all archive/copyArtifact with calls to its REST API (which I'll be doing later this year), I'm hoping to find a quick alternative. The two I'm considering ATM are:
  1. HTTP GET each artifact URL in question via curl, wget, etc.
    1. This is nice bc it can just use the same semantics I was already using with copyArtifact, that is, jobName, branchName, lastSuccessfulBuild symlinks..
    2. This is great for known individual artifacts, but http requires significant extra complexity to fetch whole artifact folders or artifacts matching wildcard/regex like copyArtifact supports. HTTP doesn't have a notion of a directory, so you have to pre-process by fetching an artifact index page, processing, and looping.
      1. This guy said that Jenkins supports http fetching a zip of any folder, but that's not working for me on jenkins 2.249.2.
    3. Another problem here is you have to deal with jenkins authentication / API tokens.
  2. SCP/RSYNC supports rich file/directory pattern matching, but 
    1. require knowledge of the location of the artifacts on the controller's disk. This is non-trivial for multibranch pipeline projects (which I use liberally). Scp would be an obvious choice if I could figure out how to deterministically construct the path to a multibranch pipeline branch job on the controller's disk.
    2. Authentication is trivial since all users/config in my jenkins infra is managed by ansible, so my jenkin user can automatically ssh to any other node in the infra without password.
Any insight into way of replacing copyArtifact with curl/scp would be greatly appreciated. Thanks for your time.
On Friday, October 21, 2011 at 7:44:50 AM UTC-7 David Karlsen wrote:
No idea.
Not even if the pull request was handled and put onto master.

2011/10/21 Marcelo Brunken <[hidden email]>:


> Any Ideas when that release comes out ?
>
> 2011/10/19 David Karlsen <[hidden email]>
>>
>> It is also slow over ssh. I saw a fix and pull request for it here the
>> other day - by using TCP nodelay. It has not been applied yet AFAIK.
>>
>> Den 19. okt. 2011 11:36 skrev "Marcelo Brunken" <[hidden email]>
>> følgende:
>>>
>>> Hellow,
>>> There are a few tickets alread about this problem ... our bottleneck is
>>> the copy process between slave and master, is there a solution on way ?
>>> Someone is working on it?
>>> I am trying to figure out how it could be faster, I think if the transfer
>>> protocol is changed or something, HTTP sucks. (I am almost sure it is sent
>>> via HTTP)
>>> Thanks
>

--

--
David J. M. Karlsen - http://www.linkedin.com/in/davidkarlsen

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/842151a5-f6d4-42ef-8819-1f67891ea250n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Speed up Artifact Copy between slave and master

Jesse Glick-4
In reply to this post by Tim Black
On Mon, Mar 1, 2021 at 4:50 PM Tim Black <[hidden email]> wrote:
In lieu of installing a proper artifact management system and replacing all archive/copyArtifact with calls to its REST API

Suggest https://plugins.jenkins.io/artifact-manager-s3/ (or some other JEP-202 implementation) instead. 

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CANfRfr0WQwqDpupsUnGMLYndUGc7A2JFmqpXQpaTT76a5VR0Yw%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Speed up Artifact Copy between slave and master

Tim Black
I agree external artifact mgmt is the way to go, and I'll be doing that in some work later this year. (Thanks for the link to JEP-202, I've already learned a lot from skimming it.)

I have now gleaned that the branch segment of a multibranch job path is the same for all jobs using a given branch. This indicates there's a common algorithm computing it from the branch name. Can anyone point in the right direction in the source code where the multibranch job path is created?

If I can compute that from the downstream job, I can then fully form the src path to pass to scp and I'm done. 
On Monday, March 1, 2021 at 2:49:13 PM UTC-8 Jesse Glick wrote:
On Mon, Mar 1, 2021 at 4:50 PM Tim Black <[hidden email]> wrote:
In lieu of installing a proper artifact management system and replacing all archive/copyArtifact with calls to its REST API

Suggest https://plugins.jenkins.io/artifact-manager-s3/ (or some other JEP-202 implementation) instead. 

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/e7dd72d2-7225-4ec2-ab4b-a10bda37760dn%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Speed up Artifact Copy between slave and master

Jesse Glick-4
On Mon, Mar 1, 2021 at 6:22 PM Tim Black <[hidden email]> wrote:
Can anyone point in the right direction in the source code where the multibranch job path is created?

Look in the `branch-api` plugin. 

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CANfRfr0bpNd%2BQv0WEiR3upvrn2OqWnYrSp45kbBp82akEiaaaw%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Speed up Artifact Copy between slave and master

Tim Black
Thanks. I see where the directory name is constructed for a workspace: 

https://github.com/jenkinsci/branch-api-plugin/blob/7d005e70758d4b5eb5d48d918aed4e32d5345857/src/main/java/jenkins/branch/WorkspaceLocatorImpl.java#L386

But not where the branch job dir is created. Any help?

On Tuesday, March 2, 2021 at 9:48:33 AM UTC-8 Jesse Glick wrote:
On Mon, Mar 1, 2021 at 6:22 PM Tim Black <[hidden email]> wrote:
Can anyone point in the right direction in the source code where the multibranch job path is created?

Look in the `branch-api` plugin. 

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/f937578d-fb1b-45d6-bb76-cb38eb312d39n%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Speed up Artifact Copy between slave and master

Jesse Glick-4
The principal class to look at is `MultiBranchProject`.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CANfRfr14rydcbe4cXEERowfxojomzx0rRen3R_%3DxX4wYhbWYvg%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Speed up Artifact Copy between slave and master

Tim Black
Think I found it: NameMangler.apply(). Would it be possible/advised to import the NameMangler class in my Shared Library vars/scpArtifacts.groovy (assuming my Jenkins instance has branch-api plugin installed, which it does.) Something like this:

```
import jenkins.branch.NameMangler
def mangled_branch_name = NameMangler.apply(branch_name)
```

I'll try this out in the morning, just curious if anyone can confirm whether this looks feasible or I'm way off track. Thanks.

On Tuesday, March 2, 2021 at 8:03:57 PM UTC-8 Jesse Glick wrote:
The principal class to look at is `MultiBranchProject`.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/40b68498-b2ba-4e02-9c23-1eb76c04709an%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Speed up Artifact Copy between slave and master

Baptiste MATHUS


Le mer. 3 mars 2021 à 06:44, Tim Black <[hidden email]> a écrit :
Think I found it: NameMangler.apply(). Would it be possible/advised to import the NameMangler class in my Shared Library vars/scpArtifacts.groovy (assuming my Jenkins instance has branch-api plugin installed, which it does.) Something like this:

```
import jenkins.branch.NameMangler
def mangled_branch_name = NameMangler.apply(branch_name)
```

I'll try this out in the morning, just curious if anyone can confirm whether this looks feasible or I'm way off track. Thanks.

Using core java classes from Jenkins pipeline shared library is generally strongly discouraged.
This could break from one day to another without notice.
What you're working on looks like it should rather be done in a full-blown Jenkins plugin.
(If not, this discussion should be on the users mailing list)
 

On Tuesday, March 2, 2021 at 8:03:57 PM UTC-8 Jesse Glick wrote:
The principal class to look at is `MultiBranchProject`.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/40b68498-b2ba-4e02-9c23-1eb76c04709an%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/CANWgJS61vkmc_O9mgS1YYOy_CF_Vh92LvkQ0%3D0rQW7Vi5_DKUw%40mail.gmail.com.
Reply | Threaded
Open this post in threaded view
|

Re: Speed up Artifact Copy between slave and master

Tim Black
Points taken; I'm not surprised by your response. As I have no intention to write a plug-in for this, I'll wrap this up and think about presenting it on the user's list. (Where I already had two posts with no responses still).

My case is a bit exceptional in that I'm in complete control of all configuration of all Jenkins clusters at my company, which are configured using ansible and configuration as code, so I've got the Jenkins version and plug-in versions all locked in so there should be no surprises if the name Mangler class changed in a subsequent release.

Importing and using the name Mangler class worked just fine, however it turned out to be completely unnecessary since I can just get the build directory (to construct the path to the artifacts on the controller) from the project/job object. So my shared library function is even simpler now and safer because it doesn't need to use any core or plug in classes.

Most importantly, I've now got a robust and highly performant workaround to the old issue of very sluggish copying of artifacts. I'm getting about a 12-15x performance boost here. (We have several large artifacts) Thanks for your time..

On Tuesday, March 2, 2021 at 11:38:17 PM UTC-8 [hidden email] wrote:


Le mer. 3 mars 2021 à 06:44, Tim Black <[hidden email]> a écrit :
Think I found it: NameMangler.apply(). Would it be possible/advised to import the NameMangler class in my Shared Library vars/scpArtifacts.groovy (assuming my Jenkins instance has branch-api plugin installed, which it does.) Something like this:

```
import jenkins.branch.NameMangler
def mangled_branch_name = NameMangler.apply(branch_name)
```

I'll try this out in the morning, just curious if anyone can confirm whether this looks feasible or I'm way off track. Thanks.

Using core java classes from Jenkins pipeline shared library is generally strongly discouraged.
This could break from one day to another without notice.
What you're working on looks like it should rather be done in a full-blown Jenkins plugin.
(If not, this discussion should be on the users mailing list)
 

On Tuesday, March 2, 2021 at 8:03:57 PM UTC-8 Jesse Glick wrote:
The principal class to look at is `MultiBranchProject`.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/40b68498-b2ba-4e02-9c23-1eb76c04709an%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-dev/0acb6136-73c4-46ba-a587-0554e899bf2dn%40googlegroups.com.