Hudson incompatable with virtual mashines ?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Hudson incompatable with virtual mashines ?

owen.synge
Dear Hudson People,

I have a set of Virtual machines (RHE), which I want to reinstall and
do deployment testing followed by functional testing.

Unfortunately Hudson seems to not like hosts going on and off line.

I have tried many approaches.
 1) using JNLP I found that the nodes crash on first connection 1 in 10
times. 1.1) Wrote a script to auto restart crashed headless JNLP
clients (python) 1.2) Discovered that first job always fails with the
error message below* 1.3) Experimented with time delays, keeping node
off line for long time etc, no luck


 2) Used to use ssh client to auto launch the node on demand.
 2.1) Nodes only worked once and then when host rebooted (even if off
line on Hudson Web GUI) they got marked as off line and needed manually
brining on line.

I used to have this all working (with ssh clients before they broke),
All I want is to deploy and test our distributed java storage server on
multiple nodes for functional testing, and am very confused why this
should be hard.

All pointers welcome.

Regards

Owen Synge


* JNLP error message, that occurs on first job after node is
reinstalled, following jobs succeed.

Started by upstream project "dCacheTestVmDeploy" build number 477
Building remotely on root_villach
FATAL: hudson.remoting.RequestAbortedException:
java.net.SocketException: Connection reset
hudson.remoting.RequestAbortedException:
hudson.remoting.RequestAbortedException: java.net.SocketException:
Connection reset at hudson.remoting.Request.call(Request.java:131) at
hudson.remoting.Channel.call(Channel.java:514) at hudson.FilePath.act
(FilePath.java:667) at hudson.FilePath.act(FilePath.java:660) at
hudson.FilePath.mkdirs(FilePath.java:724) at
hudson.model.AbstractProject.checkout(AbstractProject.java:831) at
hudson.model.AbstractBuild$AbstractRunner.checkout
(AbstractBuild.java:314) at hudson.model.AbstractBuild
$AbstractRunner.run(AbstractBuild.java:266) at hudson.model.Run.run
(Run.java:949) at hudson.model.Build.run(Build.java:112) at
hudson.model.ResourceController.execute(ResourceController.java:93) at
hudson.model.Executor.run(Executor.java:116) Caused by:
hudson.remoting.RequestAbortedException: java.net.SocketException:
Connection reset at hudson.remoting.Request.abort(Request.java:223) at
hudson.remoting.Channel.terminate(Channel.java:561) at
hudson.remoting.Channel$ReaderThread.run(Channel.java:819) Caused by:
java.net.SocketException: Connection reset at
java.net.SocketInputStream.read(SocketInputStream.java:168) at
java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at
java.io.BufferedInputStream.read(BufferedInputStream.java:235) at
java.io.ObjectInputStream$PeekInputStream.peek
(ObjectInputStream.java:2200) at java.io.ObjectInputStream
$BlockDataInputStream.peek(ObjectInputStream.java:2490) at
java.io.ObjectInputStream$BlockDataInputStream.peekByte
(ObjectInputStream.java:2500) at java.io.ObjectInputStream.readObject0
(ObjectInputStream.java:1267) at java.io.ObjectInputStream.readObject
(ObjectInputStream.java:339) at hudson.remoting.Channel$ReaderThread.run
(Channel.java:800)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hudson incompatable with virtual mashines ?

Kohsuke Kawaguchi
Administrator
2009/8/4  <[hidden email]>:
> Dear Hudson People,
>
> I have a set of Virtual machines (RHE), which I want to reinstall and
> do deployment testing followed by functional testing.
>
> Unfortunately Hudson seems to not like hosts going on and off line.

It's actually implemented to allow this. For example, we have EC2
plugin where Hudson adjusts the # of nodes based on the system load.

> I have tried many approaches.
>  1) using JNLP I found that the nodes crash on first connection 1 in 10
> times. 1.1) Wrote a script to auto restart crashed headless JNLP
> clients (python) 1.2) Discovered that first job always fails with the
> error message below* 1.3) Experimented with time delays, keeping node
> off line for long time etc, no luck

Your stack trace didn't include new lines, which make it very hard for
me to read. Can you repost that, perhaps as an attachment?

Also, when you say "nodes crash on first connection", what is it that
crashes? Slave JVM?What errors does that leave?


>  2) Used to use ssh client to auto launch the node on demand.
>  2.1) Nodes only worked once and then when host rebooted (even if off
> line on Hudson Web GUI) they got marked as off line and needed manually
> brining on line.

If you increase the log level, I think we should be able to tell why
this is happening. You can also disable this behavior by going to
http://server/hudson/computer/configure and disable all the node
monitoring.

In this way, Hudson will stop remembering that it had a trouble using
the machine (which is why it marked the slave offline to get your
attention.)

> I used to have this all working (with ssh clients before they broke),
> All I want is to deploy and test our distributed java storage server on
> multiple nodes for functional testing, and am very confused why this
> should be hard.
>
> All pointers welcome.
>
> Regards
>
> Owen Synge
>
>
> * JNLP error message, that occurs on first job after node is
> reinstalled, following jobs succeed.
>
> Started by upstream project "dCacheTestVmDeploy" build number 477
> Building remotely on root_villach
> FATAL: hudson.remoting.RequestAbortedException:
> java.net.SocketException: Connection reset
> hudson.remoting.RequestAbortedException:
> hudson.remoting.RequestAbortedException: java.net.SocketException:
> Connection reset at hudson.remoting.Request.call(Request.java:131) at
> hudson.remoting.Channel.call(Channel.java:514) at hudson.FilePath.act
> (FilePath.java:667) at hudson.FilePath.act(FilePath.java:660) at
> hudson.FilePath.mkdirs(FilePath.java:724) at
> hudson.model.AbstractProject.checkout(AbstractProject.java:831) at
> hudson.model.AbstractBuild$AbstractRunner.checkout
> (AbstractBuild.java:314) at hudson.model.AbstractBuild
> $AbstractRunner.run(AbstractBuild.java:266) at hudson.model.Run.run
> (Run.java:949) at hudson.model.Build.run(Build.java:112) at
> hudson.model.ResourceController.execute(ResourceController.java:93) at
> hudson.model.Executor.run(Executor.java:116) Caused by:
> hudson.remoting.RequestAbortedException: java.net.SocketException:
> Connection reset at hudson.remoting.Request.abort(Request.java:223) at
> hudson.remoting.Channel.terminate(Channel.java:561) at
> hudson.remoting.Channel$ReaderThread.run(Channel.java:819) Caused by:
> java.net.SocketException: Connection reset at
> java.net.SocketInputStream.read(SocketInputStream.java:168) at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at
> java.io.BufferedInputStream.read(BufferedInputStream.java:235) at
> java.io.ObjectInputStream$PeekInputStream.peek
> (ObjectInputStream.java:2200) at java.io.ObjectInputStream
> $BlockDataInputStream.peek(ObjectInputStream.java:2490) at
> java.io.ObjectInputStream$BlockDataInputStream.peekByte
> (ObjectInputStream.java:2500) at java.io.ObjectInputStream.readObject0
> (ObjectInputStream.java:1267) at java.io.ObjectInputStream.readObject
> (ObjectInputStream.java:339) at hudson.remoting.Channel$ReaderThread.run
> (Channel.java:800)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>



--
Kohsuke Kawaguchi

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hudson incompatable with virtual mashines ?

owen.synge

Dear Kohsuke and Hudson people.

> > Unfortunately Hudson seems to not like hosts going on and off line.
>
> It's actually implemented to allow this. For example, we have EC2
> plugin where Hudson adjusts the # of nodes based on the system load.

Great then their is hope :)

For your EC2 plugin you must use clients to connect inside the VM to
hudson, which way do you mange the client connections to HUDSON.

> > I have tried many approaches.
> >  1) using JNLP I found that the nodes crash on first connection 1 in 10
> > times. 1.1) Wrote a script to auto restart crashed headless JNLP
> > clients (python) 1.2) Discovered that first job always fails with the
> > error message below* 1.3) Experimented with time delays, keeping node
> > off line for long time etc, no luck

Ok the log files on the client look like this

INFO: Connecting to svn.dcache.org:56300
Aug 4, 2009 5:02:21 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Handshaking
Aug 4, 2009 5:02:21 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Connected
Aug 4, 2009 5:04:11 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated

As the clients periodically go off line no clue why. This is why I have
python restart them.

Does hudson have a command line to add more logging on the client?

On most nodes I run two clients, one as root for testing root
operations, one as a user for testing client operations.
 
> Your stack trace didn't include new lines, which make it very hard for
> me to read. Can you repost that, perhaps as an attachment?

Yes definitely added as an attachment, but it looks a little
uninformative as it seems to be just a socket error.
 
> Also, when you say "nodes crash on first connection", what is it that
> crashes? Slave JVM?What errors does that leave?
 
The job, even if its just to execute "sleep" as a script.

> >  2) Used to use ssh client to auto launch the node on demand.
> >  2.1) Nodes only worked once and then when host rebooted (even if off
> > line on Hudson Web GUI) they got marked as off line and needed manually
> > brining on line.
>
> If you increase the log level, I think we should be able to tell why
> this is happening. You can also disable this behaviour by going to
> http://server/hudson/computer/configure and disable all the node
> monitoring.

Ok we will increase the logging, but I cant see how to disable the off
line behaviour being stuck and requiring a restart with the ssh client.
I only see these options on the web page and all where

Preventive Node Monitoring
 Response Time
 Free Swap Space
 Architecture
 Clock Difference

I have no disabled them all now. But it did not prevent the jobs
crashing with the JNLP client, running the job first time. I may try
this again with the ssh plugin method.

> In this way, Hudson will stop remembering that it had a trouble using
> the machine (which is why it marked the slave off line to get your
> attention.)

Unfortunately Hudson did not log this when I used the web page based
logging.

Log level == All

Thankyou for your help, you give me hope I can make nodes do fresh
deployment testing with Hudson.

Owen

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

stacktrace (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Hudson incompatable with virtual mashines ?

owen.synge
Dear all,

The missing magic was, to have the host OS control connecting and
disconnecting and to shut down the hudson slave cleanly, for this I
used the jnlp launching method,

/usr/bin/java -jar /var/lib/hudson/slave.jar \
  -jnlpUrl \
http://example.org/build/computer/${user}_${hostname}/slave-agent.jnlp

then to shut down nicely each process with

kill -s 10 HUDSONPID

This is critical if you kill in a less nice way hudson will get upset
so you will need to interact with hudson.

So I wrote a start stop daemon python script, which sets UID for
each hudson user on the host and dynamically sets the user hostname
combination. So now all my slave nodes have to have a strict naming
convention which is not a bad thing.

Its a little ugly as it has to support python 2.2 which is limited in
its PID management otherwise I would post it as a follow up.

Regards

Owen Synge

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hudson incompatable with virtual mashines ?

Kohsuke Kawaguchi
Administrator
[hidden email] wrote:

> Dear all,
>
> The missing magic was, to have the host OS control connecting and
> disconnecting and to shut down the hudson slave cleanly, for this I
> used the jnlp launching method,
>
> /usr/bin/java -jar /var/lib/hudson/slave.jar \
>   -jnlpUrl \
> http://example.org/build/computer/${user}_${hostname}/slave-agent.jnlp
>
> then to shut down nicely each process with
>
> kill -s 10 HUDSONPID
>
> This is critical if you kill in a less nice way hudson will get upset
> so you will need to interact with hudson.

I lost the context of this thread, but this sounds like just a bug in
Hudson. It should recover gracefully from any kind of slave death.

> So I wrote a start stop daemon python script, which sets UID for
> each hudson user on the host and dynamically sets the user hostname
> combination. So now all my slave nodes have to have a strict naming
> convention which is not a bad thing.
>
> Its a little ugly as it has to support python 2.2 which is limited in
> its PID management otherwise I would post it as a follow up.

Thanks. I'm looking forward to that.

>
> Regards
>
> Owen Synge
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Kohsuke Kawaguchi
http://weblogs.java.net/blog/kohsuke/

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Out of Office AutoReply: Hudson incompatable with virtual mashines ?

pcampbell
Out of Office AutoReply: Hudson incompatable with virtual mashines ?

I am out of the office Monday, March 1 through Friday, March 5 and am unable to respond to your email.  I will return on Monday, March 8.

If you need immediate assistance for Hudson issues, contact Jason Collins, otherwise please contact Juan Nunez.