[JIRA] Commented: (JENKINS-4093) Ec2 plugin can take down hudson due to lack of error checking

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[JIRA] Commented: (JENKINS-4093) Ec2 plugin can take down hudson due to lack of error checking

JIRA noreply@jenkins-ci.org

    [ http://issues.jenkins-ci.org/browse/JENKINS-4093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=145757#comment-145757 ]

mbaker000 commented on JENKINS-4093:
------------------------------------

Triage begin 2/10/2011

> Ec2 plugin can take down hudson due to lack of error checking
> -------------------------------------------------------------
>
>                 Key: JENKINS-4093
>                 URL: http://issues.jenkins-ci.org/browse/JENKINS-4093
>             Project: Jenkins
>          Issue Type: Bug
>          Components: core, ec2
>    Affects Versions: current
>         Environment: Platform: All, OS: All
>            Reporter: jehenrik
>
> While troubleshooting the ec2 plugin, I encountered a fairly common failure mode
> where the plugin can delete a particular "Node" (descendant is EC2Slave) but not
> the corresponding "Computer" (descendant is EC2Computer).  This results in a
> fairly deep failure mode because of a null pointer exception in this core code
> with many usages in the system:
> hudson/main/core/src/main/java/hudson/model/Hudson.java
>     public Computer[] getComputers() {
>         Computer[] r = computers.values().toArray(new Computer[computers.size()]
> );
>         Arrays.sort(r,new Comparator<Computer>() {
>             final Collator collator = Collator.getInstance();
>             public int compare(Computer lhs, Computer rhs) {
>                 if(lhs.getNode()==Hudson.this)  return -1;
>                 if(rhs.getNode()==Hudson.this)  return 1;
>                 return collator.compare(lhs.getDisplayName(), rhs.getDisplayName
> ());
>             }
>         });
>         return r;
>     }
> My suggestion is to check that lhs.getNode and rhs.getNode check for null, and
> fall back to sorting any such computers to the end of the list.  This is not a
> good situation to be in, and whatever upstream error caused the situation should
> definitely be fixed.  But in this case ec2 can't even recover itself here
> without serious hackwork because of the very many uses of Hudson.computers,
> including:
>     /*package*/ Computer getComputer(Node n) {
>         return computers.get(n);
>     }
>     public Computer getComputer(String name) {
>         if(name.equals("(master)"))
>             name = "";
>         for (Computer c : computers.values()) {
>             if(c.getNode().getNodeName().equals(name))
>                 return c;
>         }
>         return null;
>     }

--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira