Welcome to MilkyWay@home

Ghost processes in the task manager

Message boards : Number crunching : Ghost processes in the task manager
Message board moderation

To post messages, you must log in.

AuthorMessage
RvP_LaN

Send message
Joined: 11 Oct 07
Posts: 8
Credit: 10,243,352
RAC: 0
Message 8043 - Posted: 29 Dec 2008, 13:46:36 UTC

Hello everybody, best wishes to all Boinc people!!!

I mentioned this in another post, but the title could be confuse. It's better to create another thread.

I have spent some time on this before reporting, just to check what happens with the new clients versions (I'm currently using 6.5.0 for Windows x86_64), and the new MW version (0.7 optimized x86_64). By the way, thanx to provide us a true 64 bits client!

I have these "ghost processes" which stay in process list. They are well identified as boinc_project processes; they are well sons of the boinc_master process. They use just a few kilo-octets of RAM, but they don't run at all (0% activity).

To purge the ghost processes, I'm forced to stop and restart the Boinc's service. When I do so, all ghost processes are detached from father boinc_master. After that, you have to kill them one by one in the task manager. (I'm using BoincView and SysInternals Process Explorer to monitor this, the process parenting is graphicaly explicit.)

It seems that when a MW's process hangs into compute error, it is not (always?) able to die elegantly! It is well indicated that there's a "compute error" in the Boinc's task list, but the process stays in the Windows process list. I don't know if there's an automatic mechanism in Boinc's scheduler to detect dead processes. If yes, it doesn't work well!

MW is not the only project to generate ghost processes. So I won't conclude anything in a unique way! May be the new Boinc's scheduler (since 6.2.x for me) has something to do with this on multi-core's machines. I never remark anything like this with the very stable 5.10.45.

Could you check that? That for ALL cases, when MW exits in error, it does lead to terminate the process?

Regards
ID: 8043 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 8047 - Posted: 29 Dec 2008, 19:28:46 UTC - in response to Message 8043.  
Last modified: 29 Dec 2008, 19:29:19 UTC

Hmmm...

Based on what you're seeing, you'd probably be better off to complain on the BOINC Core Client Message Board, since this seems to be a CC issue and not one with MW per se.

At least you'd be stating the issue closer to the problem anyway. ;-)

Alinator
ID: 8047 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
RvP_LaN

Send message
Joined: 11 Oct 07
Posts: 8
Credit: 10,243,352
RAC: 0
Message 8059 - Posted: 30 Dec 2008, 9:32:29 UTC - in response to Message 8047.  

Based on what you're seeing, you'd probably be better off to complain on the BOINC Core Client Message Board, since this seems to be a CC issue and not one with MW per se.

Thx for the reply. I will post this message to the Boinc's forum.
ID: 8059 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
RvP_LaN

Send message
Joined: 11 Oct 07
Posts: 8
Credit: 10,243,352
RAC: 0
Message 8147 - Posted: 2 Jan 2009, 13:50:45 UTC - in response to Message 8059.  

As suggested, I posted on the Boinc Core Client message board, no answer yet.

But also, as I mentioned in the preamble, I have observed the phenomenon for some time before reporting: whatever the client versions of Boinc from 6.4.x, on 30 projects in which I participate, there are only 4 projects that cause dead processes: Milkyway, Simap, Spinhenge and Aqua.

For example, when (rarely) an Einstein or QMC process finishes in miscalculation, it doesn't stay stuck like described.

I understand that these dead processes problem is only a problem to those which have 24/24h participating hosts. Inevitably, when we switch off our computers, there are no more dead processes! But the trick is that we should be able to let a host calculating 24/24h without carefully monitoring Boinc and its projects.
ID: 8147 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 8149 - Posted: 3 Jan 2009, 2:47:16 UTC
Last modified: 3 Jan 2009, 2:49:49 UTC

FWIW, the stock SAH MB application will do the same thing too, and the problem has been around to a greater or lesser degree in one form or another for quite awhile now.

It seems to be most common on multicores, and typically happens when two or more tasks try to exit at or close to the same time.

There have been a few reports on single core machines too, and I suspect it happens on them when the preferences leave the tasks in memory when suspended and there is a lost CC heartbeat event which causes all the running tasks to do a 'forced' exit (which is by design) at close to the same time.

Alinator
ID: 8149 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Ghost processes in the task manager

©2024 Astroinformatics Group