[Issue 936] New - multi-configuration project: deadlock as parent & child queued on the same node

classic Classic list List threaded Threaded
35 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[Issue 936] New - multi-configuration project: deadlock as parent & child queued on the same node

rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936
                 Issue #|936
                 Summary|multi-configuration project: deadlock as parent & chil
                        |d queued on the same node
               Component|hudson
                 Version|current
                Platform|All
              OS/Version|All
                     URL|
                  Status|NEW
       Status whiteboard|
                Keywords|
              Resolution|
              Issue type|DEFECT
                Priority|P3
            Subcomponent|matrix
             Assigned to|issues@hudson
             Reported by|rdesgroppes






------- Additional comments from [hidden email] Mon Oct 22 11:48:42 +0000 2007 -------
Hi,

On a multi-configuration project, I encounter a systematic deadlock because the
parent job and one of its child are queued on the same node.

The configuration matrix is as follows:
[x] Build on multiple nodes
  [-] Individual nodes
    [x] APOLLON (apollon)
    [x] CHRONOS (chronos)
    [x] DEMETER (demeter)
    [ ] EOS (eos)
    [x] EROS (eros)
    [x] PAN (pan)
    [x] PONTOS (pontos)
    [ ] master (the master Hudson node)
  [+] Labels
[ ] Axes

Here is the console output:

started
Building remotely on PONTOS
Triggering label=PAN
Triggering label=APOLLON
Triggering label=DEMETER
Triggering label=PONTOS
Triggering label=EROS
Triggering label=CHRONOS

All jobs except the ones on "PONTOS" successfully terminate.
The parent job keeps executing, waiting for all its children termination, but it
never stops as the child job supposed to run on "PONTOS" remains in the build queue.

Enabling the master Hudson node has no effect. Its corresponding job also
remains in the build queue.

Thank you for reading.

Regards,
Regis.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

kohsuke-djn
https://hudson.dev.java.net/issues/show_bug.cgi?id=936



User kohsuke changed the following:

                What    |Old value                 |New value
================================================================================
                 Summary|multi-configuration projec|Matrix parent build should
                        |t: deadlock as parent & ch|n't consume an executor.
                        |ild queued on the same nod|
                        |e                         |
--------------------------------------------------------------------------------




------- Additional comments from [hidden email] Tue Oct 23 01:08:27 +0000 2007 -------
Right. The parent build needs to be run outside of the normal executors so that
it can let child builds use that executor.

In the mean time, a work around is to tie the parent build to somewhere where
more executors are available.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

rdesgroppes-2
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936






------- Additional comments from [hidden email] Tue Oct 23 08:40:19 +0000 2007 -------
I don't see how to put in place the workaround you propose, in that the
multi-project configuration page doesn't allow to tie the parent build to a
dedicated node.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

kohsuke-djn
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936






------- Additional comments from [hidden email] Tue Oct 30 00:35:22 +0000 2007 -------
*** Issue 961 has been marked as a duplicate of this issue. ***

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

Kohsuke Kawaguchi
Administrator
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936






------- Additional comments from [hidden email] Thu Mar 20 05:54:26 +0000 2008 -------
*** Issue 1432 has been marked as a duplicate of this issue. ***

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

musilt2
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936






------- Additional comments from [hidden email] Fri Apr 18 11:00:45 +0000 2008 -------
*** Issue 1561 has been marked as a duplicate of this issue. ***

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

musilt2
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936



User musilt2 changed the following:

                What    |Old value                 |New value
================================================================================
                      CC|''                        |'musilt2'
--------------------------------------------------------------------------------




------- Additional comments from [hidden email] Fri Apr 18 11:04:57 +0000 2008 -------
what's the current status of this issue? any time-frame when the fix will be
available?

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

mirilovic
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936



User mirilovic changed the following:

                What    |Old value                 |New value
================================================================================
                      CC|'musilt2'                 |'mirilovic,musilt2'
--------------------------------------------------------------------------------




------- Additional comments from [hidden email] Fri Apr 18 15:35:52 +0000 2008 -------
It would be great to have a possibility to tie the parent job to particular
machine - and I personally think the best one would be Master.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

lloydchang
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936



User lloydchang changed the following:

                What    |Old value                 |New value
================================================================================
                      CC|'mirilovic,musilt2'       |'lloydchang,mirilovic,musi
                        |                          |lt2'
--------------------------------------------------------------------------------




------- Additional comments from [hidden email] Mon Jul  7 06:56:37 +0000 2008 -------
Kohsuke, unless I'm mistaken, a code change in the Hudson Core is needed to even
try the work-around.
 
Like others, I'm not sure how to configure Hudson to tie the parent build to a
specific node.  In node selection, I selected 1 label for multiple axes to use,
but I don't see any options for the parent build.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

scm_issue_link
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936






------- Additional comments from [hidden email] Sun Jul 19 00:39:32 +0000 2009 -------
Code changed in hudson
User: : kohsuke
Path:
http://fisheye4.cenqua.com/changelog/hudson/?cs=19911
Log:
[HUDSON-936] Created a branch to experiment with the solution.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

Kohsuke Kawaguchi
Administrator
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936



User kohsuke changed the following:

                What    |Old value                 |New value
================================================================================
                  Status|NEW                       |STARTED
--------------------------------------------------------------------------------




------- Additional comments from [hidden email] Sun Jul 19 01:45:04 +0000 2009 -------
Note to myself. There's two ways to do this.

One is to just add one more executor temporarily to compensate the effect. This
is easy, but the downside is that the added executor may end up doing something
else, and it might take a bit of time before they get released. Plus this will
cause a UI discrepancy between the user setting vs what they see.

Another is to add subtype of Executor and let it run just so that it can execute
the parent build. The trick is to find a situation where an executor shouldn't
run (like Hudson is shutting down) so that we won't lose the item.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

ridesmet
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936






------- Additional comments from [hidden email] Sun Jul 19 08:37:42 +0000 2009 -------
Kohsuke, from your last comment, I think your second suggestion is a better solution. Just add a "virtual"
executor to the master that can run the composite builds, and also making the composite build of type
"virtual".

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

scm_issue_link
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936






------- Additional comments from [hidden email] Tue Jul 21 01:58:47 +0000 2009 -------
Code changed in hudson
User: : kohsuke
Path:
 branches/matrix-parent/core/src/main/java/hudson/matrix/MatrixProject.java
 branches/matrix-parent/core/src/main/java/hudson/model/Computer.java
 branches/matrix-parent/core/src/main/java/hudson/model/Executor.java
 branches/matrix-parent/core/src/main/java/hudson/model/OneOffExecutor.java
 branches/matrix-parent/core/src/main/java/hudson/model/Queue.java
 branches/matrix-parent/core/src/main/resources/lib/hudson/executors.jelly
http://fisheye4.cenqua.com/changelog/hudson/?cs=19998
Log:
[HUDSON-936] I believe this should do.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

Kohsuke Kawaguchi
Administrator
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936






------- Additional comments from [hidden email] Tue Jul 21 20:57:13 +0000 2009 -------
Hudson 1.317 will include the fix for this, but because of a potential impact to
users, the fix is disabled by default for now. I'd like interested parties to
enable this (by setting the system property
hudson.model.Hudson.flyweightSupport=true on Hudson JVM), and report back if
this is working OK for you.

If the fix appears to work without any side effect, I'll enable the fix by default.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

hydraswitch
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936



User hydraswitch changed the following:

                What    |Old value                 |New value
================================================================================
                      CC|'lloydchang,mirilovic,musi|'hydraswitch,lloydchang,mi
                        |lt2'                      |rilovic,musilt2'
--------------------------------------------------------------------------------




------- Additional comments from [hidden email] Thu Aug 27 19:15:45 +0000 2009 -------
I set the property as described and hit Build Now for my multi configuration job.
The one Node that I did *not* select the job for is now showing a Dead (!) thread?
or something.  Not sure what happened.  It goes away if I stop and restart hudson.
It comes back again each time the job runs.  It seems to run correctly on the nodes
that I checked it on for.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

emmulator
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936






------- Additional comments from [hidden email] Wed Sep 23 17:50:25 +0000 2009 -------
I was hoping this might help with the problems I've encountered in Issue 1022.
It does seem to address the problem of the deadlock, but without the ability to
have the parent job run on a different slave node from any of the children, the
interaction with perforce still causes one of the children to not sync its
workspace.

I found this issue when looking for how I would expose the
assignedNode/hasSlaveAffinity property of a MatrixProject, in the hopes that if
I could tie the parent to a particular node, it would no longer interfere with
the children.  I've noticed that a few other people in this ticket were thinking
along the same lines, but you have chosen this virtual executor approach
instead.  Is there a reason it would not be desirable to be able to tie the
parent to a particular node?  And even if you don't like that approach for the
general release, could you please point me to where in the source I would go to
expose that property?  I realize that the 'real' solution to Issue 1022 probably
involves modifying the perforce plugin, but that seems more complicated, and I
was hoping this would work as a workaround in the meantime.

Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

huybrechts
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936






------- Additional comments from [hidden email] Sun Sep 27 16:24:41 +0000 2009 -------
When using the experimental flyweight support, I noticed that one of my (drools)
builds was scheduled on an offline jnlp slave. It was very hard to detect, since
the Drools plugin does not actually use the slave. Because the slave was
offline, Computer.defaultCharset was null, which results in an NPE in Run when
the logfile is created.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

mdonohue
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936






------- Additional comments from [hidden email] Fri Oct  2 06:04:37 +0000 2009 -------
*** Issue 4552 has been marked as a duplicate of this issue. ***

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

nairb774
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936






------- Additional comments from [hidden email] Sat Nov  7 04:46:35 +0000 2009 -------
Digging through the code today I found this feature and turned it on.  Sadly, I
ran into the same problem hydraswitch did.  Looking into why the executor/thread
died shows nothing on the dead page, but in the main hudson log I saw the following:

Nov 6, 2009 10:40:36 PM hudson.ExpressionFactory2$JexlExpression evaluate
WARNING: Caught exception evaluating: h.printThrowable(it.causeOfDeath). Reason:
java.lang.NullPointerException
java.lang.NullPointerException
        at hudson.Functions.printThrowable(Functions.java:916)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
<<SNIP - Full stack can be provided on request, but it is mostly non-descriptive
jelly frames>>

I am running 1.330 currently and would love for this to work.  I took a look
over the code and the code changes and I can't seem to see where this might be
falling flat.  Anything I can do to help diagnose what hydraswitch and I are seeing?

Brian

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[Issue 936] Matrix parent build shouldn't consume an executor.

nairb774
In reply to this post by rdesgroppes-2
https://hudson.dev.java.net/issues/show_bug.cgi?id=936






------- Additional comments from [hidden email] Sun Nov  8 18:25:58 +0000 2009 -------
I think I nailed down where the dead executor is coming from.  In OneOffExecutor
you have in the constructor:

super(owner, -1);
this.item = item;

In the super constructor (Executor), the last line is a call to Thread.start.
If the thread is able to start and complete shouldRun before the "this.item =
item" line is run, the executor finishes out and shows up dead.  On a
multi-processor computer this race condition is quite common because of the lack
of locking/fencing.

Two solutions to this, one move the start call out of the executor constructor,
and the other is to put in necessary locking around the setting of that field.
I would argue that the best solution be the moving of the start method, but that
might be better saved for an enhancement at a later time.

I will attach a patch in a few minutes that should fix this dead executor problem.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12