Out-of-order Jenkins 2.100 release with JENKINS-48761/JENKINS-48754 fixes
In Jenkins 2.98 before the new year I have integrated Remoting 3.15 into the weekly. In this release I introduced some new convenience API methods for hudson.remoting.Callables, and in PR #3145 I adopted this APIs in MasterToSlaveCallable implementations. No wonder instances with older Remoting versions on agents fell apart... It caused JENKINS-48761 and most likely JENKINS-48754.
SSHLauncher-only instances were fine + there is a workaround (update Remoting on agents to version 3.15). Docker agent images and Swarm Plugin Client with Remoting 3.15 have been also released along with Jenkins 2.98... But the community ratings are pretty bad (33 positives, 15 negatives), so after the discussion we decided to go forward with the out-of-order release. https://github.com/jenkinsci/jenkins/pull/3212 with a hotfix has been integrated, I hope to get it released soon.
What I did:
Fixed breaking API usages non-compatible with old Remoting versions
Created tests for old agent connection possibility within the Jenkins core test suite. Now we have smoke tests, which at least confirms that the agent is able to connect, execute jobs and pass node monitors
Currently the tests run against Remoting 2.62, which is 1.5 years old
Next action items:
Extend Remoting integration testing in the core
For now - just extra tests to improve coverage of MasterToSlaveCallables and other Remoting-related logic
In longer term I would like to have a "Remoting Compatibility Tester" test suite
JENKINS-48766 - Introduce the "Minimum Supported Remoting Version" in the core and in Remoting
Current state: Any Remoting version can theoretically connect if one of its protocols is accepted by master. Nobody knows when it will fail due to API compatibility in master-to-agent calls
With the proposed change Jenkins admins will start getting warnings about unsupported Remoting versions, so that they will be able to proactively investigate the issues.
The change will help us to setup more testing and maybe advanced static analysis for API compatibility
Revisit Remoting Upgradeability stories (JENKINS-44099), which could have prevented the issue by keeping agents up to date
My apologies for any inconvenience I caused. As always, I highly recommend using LTS on production instances to avoid the boilerplate risks in weekly releases. If you have any feedback/questions, please do not hesitate to respond to this thread or to send me a private message.