New remoting code merged to the trunk

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

New remoting code merged to the trunk

Kohsuke Kawaguchi-2

During the holidays, I've been working on integrating the remoting
support to the main code base in a separate branch.

I've been testing it and deploying it on my production system. While
it's still not tested enough (because most of the people using my
production system are off too),  I thought it's stable enough that it
can be merged back to the trunk.


With this integration, slaves no longer need to have their workspace
mounted on NFS, and therefore they no longer need to run under the same
UID. Various remoting operations that were previously done over NFS,
such as cvs changelog calculation and "rm -rf" are now done in much more
network efficient way.

Previously, Hudson used to launch every remote command via ssh/rsh, but
with this integration, it only uses ssh/rsh once for launching the slave
agent program. All the other programs are launched via this slave agent.
Because of this, configuration for slaves have changed slightly. It used
to expect something like:

   ssh hostname

as the launch program, but now you should specify the whole command to
launch the slave agent, like:

   ssh hostname java -jar ~/bin/slave.jar

When you upgrade to new Hudson, it automatically does this conversion,
but depending on where you put slave.jar, you might need to adjust this.
Click the help icon on the launch command to learn more about this.

As usual, if you see any regressions, please let me know. Since this is
a big change, I plan to take a longer "soaking time" before releasing 1.69.

--
Kohsuke Kawaguchi
Sun Microsystems                   [hidden email]

smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: New remoting code merged to the trunk

sjoerdbakker
Hi Kohsuke,
 
I've built the 1.69 locally this morning, and am looking at the new remoting system.
 
So far I like the way it's working, especially that you don't need too much set-up at the remote machine.
 
I do have a few questions:
1. If I choose "Prepare for shutdown" from the manage menu, I would expect the slaves to be terminated; this seems not the case
2. Is there a posibility to add clients that do not support ssh/rsh, by starting the slave jar on the remote machine? I think this would be nice for the more exentric platforms (like e.g. AS400...)?
 
Maybe you have answered the questions already in the previous conversations, in that case, could you please provide me with a link to the answer?
 
Happy New Year,
Sjoerd
 
 
Reply | Threaded
Open this post in threaded view
|

Re: New remoting code merged to the trunk

Kohsuke Kawaguchi-2
Sjoerd Bakker wrote:
> Hi Kohsuke,
>
> I've built the 1.69 locally this morning, and am looking at the new remoting system.

Thanks. Appreciated. Let me know if you encounter any problems.

> So far I like the way it's working, especially that you don't need too much set-up at the remote machine.

Yep. That was one of the goals.

> I do have a few questions:
> 1. If I choose "Prepare for shutdown" from the manage menu, I would
> expect the slaves to be terminated; this seems not the case

I thought salve agents are terminated when you shut down the server (or
stop the Hudson webapp.) Even if you hard-kill the server (like "kill
-9"), slave agents detect that and kill themselves.

Is there any reason why you'd like to prefer slave agents to terminate
themselves during the preparation phase? It can be done, I guess, but it
has some consequences, such as  not being able to see the remote
workspace, etc.


> 2. Is there a posibility to add clients that do not support ssh/rsh, by
> starting the slave jar on the remote machine? I think this would be nice
> for the more exentric platforms (like e.g. AS400...)?

Yes. I was thinking about the same thing. For example, launching an
agent on Windows box from Unix master is not very easy, so it makes a
lot of sense to do something like you suggested.

The problem right now is how to have the remote agent open a
communication channel to the server.

The first natural choice of the communication channel is over HTTP. This
would work quite nicely as it allows zero-configuration, if a slave
agent is launched over JNLP from the same server (since as JNLP client
you know where you came from.)

The problem is that servlets can't really handle a persistent HTTP
connection, because it has to allocate a thread to attend to this
connection, and such a thread has to come from the container's pool, and
not from Hudson. So it makes things tricky.

Another option could be to open a separate TCP port and instruct the
agent to connect to that TCP port. This could fail in some network where
the communication is through a firewall.

Yet another option might be to emulate bi-directional byte stream over
HTTP by splitting the bytes in smaller packets and making separate HTTP
request for each of them. java.net.HttpURLConnection supports HTTP 1.1
keep-alive, so if multiple requests are sent in quick succession, the
whole thing could go over a single TCP connection, so the performance
hit might not be too bad. But nevertheless this is tricky to get right,
especially since this has to be a bi-directional channel.

So I'm still thinking about this. If you have any good idea, please let
me know.

I'm somewhat inclined to go with a separate TCP port, mainly because
it's easy to do so.

--
Kohsuke Kawaguchi
Sun Microsystems                   [hidden email]

smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: New remoting code merged to the trunk

sjoerdbakker
Happy New Year!!!

> Sjoerd Bakker wrote:
>> Hi Kohsuke,
>>
>> I've built the 1.69 locally this morning, and am looking at the new
>> remoting system.
>
> Thanks. Appreciated. Let me know if you encounter any problems.

If I schedule a job to automatically execute every few minutes on a remote
system, and before it gets actually scheduled, I kill the Slave process (to
simulate a reboot). Hudson finds out that the slave process is missing, and
marks the node "offline". If I then go to the status of the node, and press
the "launch slave agent" button, the node comes online again, but the
scheduler doesn't notice the node being back again (and the build job
remains in queue). Only if I do a manual "mark offline" and a "mark online",
the scheduler starts using the node again.

Regards,
Sjoerd

smime.p7s (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: New remoting code merged to the trunk

Kohsuke Kawaguchi-2
Sjoerd Bakker wrote:

> Happy New Year!!!
>
>> Sjoerd Bakker wrote:
>>> Hi Kohsuke,
>>>
>>> I've built the 1.69 locally this morning, and am looking at the new
>>> remoting system.
>>
>> Thanks. Appreciated. Let me know if you encounter any problems.
>
> If I schedule a job to automatically execute every few minutes on a remote
> system, and before it gets actually scheduled, I kill the Slave process (to
> simulate a reboot). Hudson finds out that the slave process is missing, and
> marks the node "offline". If I then go to the status of the node, and press
> the "launch slave agent" button, the node comes online again, but the
> scheduler doesn't notice the node being back again (and the build job
> remains in queue). Only if I do a manual "mark offline" and a "mark online",
> the scheduler starts using the node again.
Thanks. I fixed this.

--
Kohsuke Kawaguchi
Sun Microsystems                   [hidden email]

smime.p7s (4K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: New remoting code merged to the trunk

sjoerdbakker
I built it this afternoon, and indeed, the symptom is gone. Thanks!
Sjoerd
----- Original Message -----
From: "Kohsuke Kawaguchi" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, January 02, 2007 7:28 PM
Subject: Re: New remoting code merged to the trunk


> Sjoerd Bakker wrote:
>> Happy New Year!!!
>>
>>> Sjoerd Bakker wrote:
>>>> Hi Kohsuke,
>>>>
>>>> I've built the 1.69 locally this morning, and am looking at the new
>>>> remoting system.
>>>
>>> Thanks. Appreciated. Let me know if you encounter any problems.
>>
>> If I schedule a job to automatically execute every few minutes on a
>> remote system, and before it gets actually scheduled, I kill the Slave
>> process (to simulate a reboot). Hudson finds out that the slave process
>> is missing, and marks the node "offline". If I then go to the status of
>> the node, and press the "launch slave agent" button, the node comes
>> online again, but the scheduler doesn't notice the node being back again
>> (and the build job remains in queue). Only if I do a manual "mark
>> offline" and a "mark online", the scheduler starts using the node again.
>
> Thanks. I fixed this.
>
> --
> Kohsuke Kawaguchi
> Sun Microsystems                   [hidden email]
>

smime.p7s (3K) Download Attachment