While we successfully managed to publish everything in time, I must admit that it was more painful than I expected.
So what happened?
Yesterday, as part of the release procedure we trigger the maven release job but we quickly got hit by timeout errors between the Jenkins agent and the controller. As we faced the same issue on infra.ci last week, we had an idea on how to mitigate that issue and with the help of Gareth and Damien we quickly apply the same mitigation that we applied to infra.ci which was switching from using a Jenkins tunnel to a WebSocket connection as defined in this PR so we could finish the first release part
Today we met again with Daniel Beck to finalize the release, so we build and publish packages but this time we got hit by two additional issues.
First, we had WebSockets connection timeout error that we solved by increasing that value from 30sec to 60sec as defined in this PR
The second issue that hit us was that we couldn't upload windows artifacts from the Jenkins agent to pkg.jenkins.io using the ssh-agent. The problem seems to be related to the latest ssh-agent plugin version which deleted non-exec based agent factories according to the changelog :p.
Thanks to Gareth recent work on building a custom Jenkins image with our plugins as I could quickly pin down the previous version in this PR
So we first focus on publishing packages for non-Windows distributions, then I published the windows ones.
In this process, I also identified additional issues.
1) Our monitoring didn't detect that the latest Jenkins version wasn't available from get,jenkins.io, which seems to be regression.
2) I had to manually trigger mirrrobits mirror scan to enable them, I don't know yet why it didn't automatically scan mirrors.
So what's next?
Next week we'll update the AKS cluster used by the release environment hoping that it will solve all the network issues we currently have.
Regarding the ssh-agent issue with the windows containers, I am looking for someone with better windows skills than mine :) to spare me some debugging time.
> On 7. Apr 2021, at 20:57, 'Olblak' via Jenkins Infrastructure <[hidden email]> wrote:
> So what's next?
Please consider the ability to stage packages in your plans. Both release-day issues would have been noticed sooner (leaving more time to fix them), or bypassed entirely, if all we need to do on release day is a very basic copy operation.