[JIRA] Commented: (JENKINS-5413) SCM polling on slaves getting hung

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[JIRA] Commented: (JENKINS-5413) SCM polling on slaves getting hung

JIRA noreply@jenkins-ci.org

    [ http://issues.jenkins-ci.org/browse/JENKINS-5413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=146695#comment-146695 ]

eguess74 commented on JENKINS-5413:

for the record:
I was able to narrow it down to three jobs that were consistently getting stuck on the polling/fetching step.
I was trying different approaches but the only thing that actually resolved the problem was to recreate thos jobs from scratch. I.e. i blew away all related folders and workspace and recreated the job. This brought the CPU usage down and there is no stuck threads anymore already for a full day...

> SCM polling on slaves getting hung
> ----------------------------------
>                 Key: JENKINS-5413
>                 URL: http://issues.jenkins-ci.org/browse/JENKINS-5413
>             Project: Jenkins
>          Issue Type: Bug
>          Components: master-slave
>    Affects Versions: current
>            Reporter: Dean Yu
>         Attachments: hung_scm_pollers_02.PNG, threads.vetted.txt, thread_dump_02.txt
> This is to track the problem originally reported here: http://n4.nabble.com/Polling-hung-td1310838.html#a1310838
> What the problem boils down to is that many remote operations are performed synchronously causing the channel object to be locked while a response returns. In situations where a lengthy remote operations is using the channel, SCM polling can be blocked waiting for the monitor on the channel to be released. In extreme situations, all the polling threads can wind up waiting on object monitors for the channel objects, preventing further processing of polling tasks.
> Furthermore, if the slave dies, the locked channel object still exists in the master JVM. If no IOException is thrown to indicate the termination of the connection to the pipe, the channel can never be closed because Channel.close() itself is a sychronized operation.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira