Efficiently copying artifacts

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Efficiently copying artifacts

Simon Richter
Hi,

I have a project that outputs a few large files (compiled DLL and static
library) as well as a few hundred header files as artifacts for use by
the next project in the dependency chain. Copying these in and out of
workspaces takes quite a long time, and the network link is not even
near capacity, so presumably handling of multiple small files is not
really efficient.

Can this be optimized somehow, e.g. by packing and unpacking the files
for transfer? Manual inspection of artifacts is secondary, I think.

   Simon

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/553AD99B.7060308%40hogyros.de.
For more options, visit https://groups.google.com/d/optout.

signature.asc (484 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Efficiently copying artifacts

Matthew.Webber
Are you using "Archive Artifacts" in the upstream job, and the "Copy Artifact" plugin in the downstream job? This is the standard method.
If so, maybe the upstream job should produce a single zip file , which the downstream job and get and unzip.
Matthew

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Simon Richter
> Sent: 25 April 2015 01:03
> To: [hidden email]
> Subject: Efficiently copying artifacts
>
> Hi,
>
> I have a project that outputs a few large files (compiled DLL and static
> library) as well as a few hundred header files as artifacts for use by
> the next project in the dependency chain. Copying these in and out of
> workspaces takes quite a long time, and the network link is not even
> near capacity, so presumably handling of multiple small files is not
> really efficient.
>
> Can this be optimized somehow, e.g. by packing and unpacking the files
> for transfer? Manual inspection of artifacts is secondary, I think.
>
>    Simon
>

--
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/6836E1DC2DDC174C9D64B7860E5AF5FC9EDF4FD8%40EXCHMBX01.fed.cclrc.ac.uk.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Efficiently copying artifacts

Maciej Jaros
In reply to this post by Simon Richter
Simon Richter (2015-04-25 02:02):

> Hi,
>
> I have a project that outputs a few large files (compiled DLL and static
> library) as well as a few hundred header files as artifacts for use by
> the next project in the dependency chain. Copying these in and out of
> workspaces takes quite a long time, and the network link is not even
> near capacity, so presumably handling of multiple small files is not
> really efficient.
>
> Can this be optimized somehow, e.g. by packing and unpacking the files
> for transfer? Manual inspection of artifacts is secondary, I think.

If some of the files remain unchanged then it can be done more
efficently when you NOT pack the files. You could for example create a
respository (SVN) for artifacts and instead of copying all files you
would simple run `svn update` and get only changed files. Another option
would be using rsync for synchronisation but that might not work as good
as SVN would.

Regards,
Nux.

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/553DF5D5.9040304%40mol.com.pl.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

RE: Efficiently copying artifacts

Matthew.Webber
Note that in Jenkins, copying files directly from another workspace is an anti-pattern.

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Maciej Jaros
> Sent: 27 April 2015 09:40
> To: [hidden email]
> Subject: Re: Efficiently copying artifacts
>
> Simon Richter (2015-04-25 02:02):
> > Hi,
> >
> > I have a project that outputs a few large files (compiled DLL and static
> > library) as well as a few hundred header files as artifacts for use by
> > the next project in the dependency chain. Copying these in and out of
> > workspaces takes quite a long time, and the network link is not even
> > near capacity, so presumably handling of multiple small files is not
> > really efficient.
> >
> > Can this be optimized somehow, e.g. by packing and unpacking the files
> > for transfer? Manual inspection of artifacts is secondary, I think.
>
> If some of the files remain unchanged then it can be done more
> efficently when you NOT pack the files. You could for example create a
> respository (SVN) for artifacts and instead of copying all files you
> would simple run `svn update` and get only changed files. Another option
> would be using rsync for synchronisation but that might not work as good
> as SVN would.
>
> Regards,
> Nux.
>

--
This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/6836E1DC2DDC174C9D64B7860E5AF5FC9EDF7084%40EXCHMBX01.fed.cclrc.ac.uk.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Efficiently copying artifacts

Matt Stave
In reply to this post by Matthew.Webber
I found that using that standard method is quite slow compared to scp.   So I use that method to copy just a few small files, and one with GUIDs for fingerprinting, and for the big ones I do something like

scp -v ${WORKSPACE}/bigfile.tar.gz user@jenkins_host_name:path_to_jenkins_root/jobs/${JOB_NAME}/builds/${BUILD_ID}/archive/ 2>&1 | tail -n 5

I think there's a ${JENKINS_HOME} or something for the path on the master.   That copies a 2-3 GB file in roughly 40 seconds instead of something like 4 minutes.  There was a fix put in recently for I think some Maven plugin where when copying files to the master, the master would poll the slave to send over the next packet with too many requests, and fixing that sped things up a ton, perhaps there's another fix coming for how other files are transferred.

Since "big" can sometimes be > 8GB, it would choke the normal archiver which uses tar under the covers, or at least it did.  In any case this is much faster, since pigz is multicore aware:

tar cf ${WORKSPACE}/bigfile.tar.gz --use-compress-program=pigz [files to pack]

YMMV

--- Matt

On Monday, April 27, 2015 at 1:27:43 AM UTC-7, [hidden email] wrote:
Are you using "Archive Artifacts" in the upstream job, and the "Copy Artifact" plugin in the downstream job? This is the standard method.
If so, maybe the upstream job should produce a single zip file , which the downstream job and get and unzip.
Matthew

> -----Original Message-----
> From: <a href="javascript:" target="_blank" gdf-obfuscated-mailto="6iGPpAa2oGIJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">jenkins...@googlegroups.com [mailto:<a href="javascript:" target="_blank" gdf-obfuscated-mailto="6iGPpAa2oGIJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">jenkins...@googlegroups.com] On Behalf Of Simon Richter
> Sent: 25 April 2015 01:03
> To: <a href="javascript:" target="_blank" gdf-obfuscated-mailto="6iGPpAa2oGIJ" rel="nofollow" onmousedown="this.href='javascript:';return true;" onclick="this.href='javascript:';return true;">jenkins...@googlegroups.com
> Subject: Efficiently copying artifacts
>
> Hi,
>
> I have a project that outputs a few large files (compiled DLL and static
> library) as well as a few hundred header files as artifacts for use by
> the next project in the dependency chain. Copying these in and out of
> workspaces takes quite a long time, and the network link is not even
> near capacity, so presumably handling of multiple small files is not
> really efficient.
>
> Can this be optimized somehow, e.g. by packing and unpacking the files
> for transfer? Manual inspection of artifacts is secondary, I think.
>
>    Simon
>

--

This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/94527fee-60cd-4413-864f-822f469c8af6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Efficiently copying artifacts

Tim Black
I'm trying to do same, but in both directions (archiving AND copying artifacts from upstream). I wonder how the scp approach to copying artifacts would work in multibranch pipelines? Can one deterministically construct the path to a branch job's artifact folder on the controller's disk?

As I commented here, I'm also seeking massive performance gains by replacing copyArtifact with a shell call in my pipelines. In lieu of installing a proper artifact management system and replacing all archive/copyArtifact with calls to its REST API (which I'll be doing later this year), I'm hoping to find a quick alternative. SCP would be a slam dunk for me if I could construct the source path correctly. The problem is that Jenkins is using an algorithm to create a unique folder name, both for workspace names, and for branch job names, but I'm not sure if that's consistent.

E.g. to fetch artifacts from the corresponding branch of an upstream multibranch pipeline job with Full project name of "ProjectFolder/MyProject/feature%2Ffoo", in the downstream multibranch pipeline, I would do something like:

scp -r jenkins-controller:<JENKINS_HOME>/jobs/ProjectFolder/jobs/MyProject/branches/<HOW_DO_I_COMPUTE_THE_BRANCH_PORTION_OF_PATH?>/lastSuccessfulBuild/artifact/<GLOB>
On Wednesday, April 29, 2015 at 7:02:09 AM UTC-7 [hidden email] wrote:
I found that using that standard method is quite slow compared to scp.   So I use that method to copy just a few small files, and one with GUIDs for fingerprinting, and for the big ones I do something like

scp -v ${WORKSPACE}/bigfile.tar.gz user@jenkins_host_name:path_to_jenkins_root/jobs/${JOB_NAME}/builds/${BUILD_ID}/archive/ 2>&1 | tail -n 5

I think there's a ${JENKINS_HOME} or something for the path on the master.   That copies a 2-3 GB file in roughly 40 seconds instead of something like 4 minutes.  There was a fix put in recently for I think some Maven plugin where when copying files to the master, the master would poll the slave to send over the next packet with too many requests, and fixing that sped things up a ton, perhaps there's another fix coming for how other files are transferred.

Since "big" can sometimes be > 8GB, it would choke the normal archiver which uses tar under the covers, or at least it did.  In any case this is much faster, since pigz is multicore aware:

tar cf ${WORKSPACE}/bigfile.tar.gz --use-compress-program=pigz [files to pack]

YMMV

--- Matt


On Monday, April 27, 2015 at 1:27:43 AM UTC-7, [hidden email] wrote:
Are you using "Archive Artifacts" in the upstream job, and the "Copy Artifact" plugin in the downstream job? This is the standard method.
If so, maybe the upstream job should produce a single zip file , which the downstream job and get and unzip.
Matthew

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Simon Richter
> Sent: 25 April 2015 01:03
> To: [hidden email]
> Subject: Efficiently copying artifacts
>
> Hi,
>
> I have a project that outputs a few large files (compiled DLL and static
> library) as well as a few hundred header files as artifacts for use by
> the next project in the dependency chain. Copying these in and out of
> workspaces takes quite a long time, and the network link is not even
> near capacity, so presumably handling of multiple small files is not
> really efficient.
>
> Can this be optimized somehow, e.g. by packing and unpacking the files
> for transfer? Manual inspection of artifacts is secondary, I think.
>
>    Simon
>

--

This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/b6ea2a1e-21af-407e-b1a7-f098b39063ecn%40googlegroups.com.
Reply | Threaded
Open this post in threaded view
|

Re: Efficiently copying artifacts

Tim Black
To whom it may concern, I ended up finding the code in Jenkins branch-api plugin that's creating that branch path segment (the NameMangler), however it turned out to be completely unnecessary since I can just get the build directory (to construct the path to the artifacts on the controller) from the project/job object, obtained by name from the jenkins instance in groovy. So my shared library function is even simpler now, works for any project type and safer because it doesn't need to use any core or plug in classes.

On Monday, March 1, 2021 at 1:43:02 PM UTC-8 Tim Black wrote:
I'm trying to do same, but in both directions (archiving AND copying artifacts from upstream). I wonder how the scp approach to copying artifacts would work in multibranch pipelines? Can one deterministically construct the path to a branch job's artifact folder on the controller's disk?

As I commented here, I'm also seeking massive performance gains by replacing copyArtifact with a shell call in my pipelines. In lieu of installing a proper artifact management system and replacing all archive/copyArtifact with calls to its REST API (which I'll be doing later this year), I'm hoping to find a quick alternative. SCP would be a slam dunk for me if I could construct the source path correctly. The problem is that Jenkins is using an algorithm to create a unique folder name, both for workspace names, and for branch job names, but I'm not sure if that's consistent.

E.g. to fetch artifacts from the corresponding branch of an upstream multibranch pipeline job with Full project name of "ProjectFolder/MyProject/feature%2Ffoo", in the downstream multibranch pipeline, I would do something like:

scp -r jenkins-controller:<JENKINS_HOME>/jobs/ProjectFolder/jobs/MyProject/branches/<HOW_DO_I_COMPUTE_THE_BRANCH_PORTION_OF_PATH?>/lastSuccessfulBuild/artifact/<GLOB>
On Wednesday, April 29, 2015 at 7:02:09 AM UTC-7 [hidden email] wrote:
I found that using that standard method is quite slow compared to scp.   So I use that method to copy just a few small files, and one with GUIDs for fingerprinting, and for the big ones I do something like

scp -v ${WORKSPACE}/bigfile.tar.gz user@jenkins_host_name:path_to_jenkins_root/jobs/${JOB_NAME}/builds/${BUILD_ID}/archive/ 2>&1 | tail -n 5

I think there's a ${JENKINS_HOME} or something for the path on the master.   That copies a 2-3 GB file in roughly 40 seconds instead of something like 4 minutes.  There was a fix put in recently for I think some Maven plugin where when copying files to the master, the master would poll the slave to send over the next packet with too many requests, and fixing that sped things up a ton, perhaps there's another fix coming for how other files are transferred.

Since "big" can sometimes be > 8GB, it would choke the normal archiver which uses tar under the covers, or at least it did.  In any case this is much faster, since pigz is multicore aware:

tar cf ${WORKSPACE}/bigfile.tar.gz --use-compress-program=pigz [files to pack]

YMMV

--- Matt


On Monday, April 27, 2015 at 1:27:43 AM UTC-7, [hidden email] wrote:
Are you using "Archive Artifacts" in the upstream job, and the "Copy Artifact" plugin in the downstream job? This is the standard method.
If so, maybe the upstream job should produce a single zip file , which the downstream job and get and unzip.
Matthew

> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Simon Richter
> Sent: 25 April 2015 01:03
> To: [hidden email]
> Subject: Efficiently copying artifacts
>
> Hi,
>
> I have a project that outputs a few large files (compiled DLL and static
> library) as well as a few hundred header files as artifacts for use by
> the next project in the dependency chain. Copying these in and out of
> workspaces takes quite a long time, and the network link is not even
> near capacity, so presumably handling of multiple small files is not
> really efficient.
>
> Can this be optimized somehow, e.g. by packing and unpacking the files
> for transfer? Manual inspection of artifacts is secondary, I think.
>
>    Simon
>

--

This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

--
You received this message because you are subscribed to the Google Groups "Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/3718a0bb-731f-45e1-bef3-5974165bb394n%40googlegroups.com.