Welcome to MilkyWay@home

Posts by RvP_LaN

1) Message boards : Number crunching : Ghost processes in the task manager (Message 8147)
Posted 2 Jan 2009 by RvP_LaN
Post:
As suggested, I posted on the Boinc Core Client message board, no answer yet.

But also, as I mentioned in the preamble, I have observed the phenomenon for some time before reporting: whatever the client versions of Boinc from 6.4.x, on 30 projects in which I participate, there are only 4 projects that cause dead processes: Milkyway, Simap, Spinhenge and Aqua.

For example, when (rarely) an Einstein or QMC process finishes in miscalculation, it doesn't stay stuck like described.

I understand that these dead processes problem is only a problem to those which have 24/24h participating hosts. Inevitably, when we switch off our computers, there are no more dead processes! But the trick is that we should be able to let a host calculating 24/24h without carefully monitoring Boinc and its projects.
2) Message boards : Number crunching : Ghost processes in the task manager (Message 8059)
Posted 30 Dec 2008 by RvP_LaN
Post:
Based on what you're seeing, you'd probably be better off to complain on the BOINC Core Client Message Board, since this seems to be a CC issue and not one with MW per se.

Thx for the reply. I will post this message to the Boinc's forum.
3) Message boards : Number crunching : Ghost processes in the task manager (Message 8043)
Posted 29 Dec 2008 by RvP_LaN
Post:
Hello everybody, best wishes to all Boinc people!!!

I mentioned this in another post, but the title could be confuse. It's better to create another thread.

I have spent some time on this before reporting, just to check what happens with the new clients versions (I'm currently using 6.5.0 for Windows x86_64), and the new MW version (0.7 optimized x86_64). By the way, thanx to provide us a true 64 bits client!

I have these "ghost processes" which stay in process list. They are well identified as boinc_project processes; they are well sons of the boinc_master process. They use just a few kilo-octets of RAM, but they don't run at all (0% activity).

To purge the ghost processes, I'm forced to stop and restart the Boinc's service. When I do so, all ghost processes are detached from father boinc_master. After that, you have to kill them one by one in the task manager. (I'm using BoincView and SysInternals Process Explorer to monitor this, the process parenting is graphicaly explicit.)

It seems that when a MW's process hangs into compute error, it is not (always?) able to die elegantly! It is well indicated that there's a "compute error" in the Boinc's task list, but the process stays in the Windows process list. I don't know if there's an automatic mechanism in Boinc's scheduler to detect dead processes. If yes, it doesn't work well!

MW is not the only project to generate ghost processes. So I won't conclude anything in a unique way! May be the new Boinc's scheduler (since 6.2.x for me) has something to do with this on multi-core's machines. I never remark anything like this with the very stable 5.10.45.

Could you check that? That for ALL cases, when MW exits in error, it does lead to terminate the process?

Regards
4) Message boards : Number crunching : Compute error: Can't acquire lockfile (Message 8025)
Posted 28 Dec 2008 by RvP_LaN
Post:
6.3.14 was an older alpha version. 6.3.21 is the current BOINC alpha version...

Hello everybody, best wishes to all Boinc people!!!

I Come back on this post, after a while. I wanted to spend some time on this, just to check what happens with the new clients versions (I'm currently using 6.5.0), and the new MW versions (current 0.7 optimized x86_64). By the way, thanx to provide us a true 64 bits client!

Anyway, I still have these "ghost processes" which stay in process list. They are well identified as boinc_project processes; they are well sons of the boinc_master process. They use just a few kilo-octects of MEM, but they don't run at all (0% activity).

To purge the ghost processes, I'm forced to stop and restart my Boinc's service in order to clean this. When I do so, all ghost processes are detached from father boinc_master (even if they still are identified as boinc_project). You have to kill them one by one in the task manager. (I'm using BoincView and SysInternals Process Explorer to monitor this.)

It seems that when a MW's process hangs into error, it is not able to die elegantly! So, it is well indicated that there's a "compute error" in the Boinc's task list, but the process stays in the Windows process list.

MW is not the only project to generate ghots processes. So I won't conclude anything in a unique way! May be the new Boinc's scheduler (since 6.2.x) has something to do with this on multi-core's machines.

Could you check that? That your error's exits don't lead all to terminate the process.

Regards
5) Message boards : Number crunching : Compute error: Can't acquire lockfile (Message 6070)
Posted 11 Nov 2008 by RvP_LaN
Post:
Hi there,

First time that I see this message.
<core_client_version>6.3.14</core_client_version>
<![CDATA[
<message>
too many exit(0)s
</message>
<stderr_txt>
Can't acquire lockfile - exiting
FILE_LOCK::unlock(): close failed.: No error
Can't acquire lockfile - exiting
FILE_LOCK::unlock(): close failed.: No error
Can't acquire lockfile - exiting
FILE_LOCK::unlock(): close failed.: No error
Can't acquire lockfile - exiting

Happend three times on a XP64 box, quad core Phenom, 4GB RAM, 18GB free disk space, VM large enough.
MW bin: astronomy_1.22_windows_x86_64.exe
BOINC Client: 6.3.14 for Windows XP64.
The box remains up 24/24.

Any clue about this error message and issue?

By the way, on this same box, same context, your binary now often remains stucked into the process list. I have to stop the BOINC service, for distinguish which alive processes are parented to the BOINC daemon. Then in the list, the remaining boinc_project processes are the MW stucked processes. They don't consume CPU time, but small amount of RAM.

Anyway, if I don't restart the BOINC service, after one week of MW stucked processes, one of the core doesn't compute anymore for ALL other projects. Not acceptable...

I don't blame MW (not yet!), I just report a fact. Maybe the problem is related to this new version of Boinc's client, on a multi-core box. I'm not quite sure to have observed this kind of events with MW and the older Boinc client: 5.10.45.

My Boinc preferences state that projects should NOT "leave applications into memory while suspended". Hope you are following these rules...

Regards.
6) Message boards : Number crunching : I've had enough also (Message 5383)
Posted 8 Oct 2008 by RvP_LaN
Post:
as long as they would finally put one or two days work in their highly inefficient code!

I think lot of things have already being said and debated!!! I don't want to play the troll, I don't want to start another argueing but...

I'm not a coder or developper, just System Engineer with some background. So, MilkyWay, inefficient code?!?!!

-- MilkyWay has never blocked my CPUs, nor my systems...
-- MilkyWay reports correctly percentage done, cpu perf and cpu efficiency...
-- MilkyWay terminates WUs correctly, and gives credits for that...
-- MilkyWay doesn't force Boinc's manager to goes into "running high priority" because of stupid deadlines...
-- MilkyWay respects Boinc's parameters for memory occupation, cpu occupation, or not staying into memory...
-- MilkyWay works as well on different systems, different platforms...

So, what else? What more? Doesn't it describe what a Boinc's project should be? Science projects running silently in the background of your computer without intervention of the user and gaining credit for it.

Have you spend a few minutes with the new projects which recently appears? Have you share your cpu time to these brand new projects? Have you observed (through BoincView for instance) what happend to your CPU and other projects being enslaved and put away by stupid coding?

As I don't want to be rude, and because I want to apply the kind of respect I would like to see in these very ugly, unprepared but unleashed projects, I won't compare and won't throw some names!

I'm glad that they released their code. I'm confident that competent users and time donators will help and bring something optimized for everyone participating. Everyone.

If MilkyWay was that much inefficient, for a Beta project, the ugly coded projects that I have in mind should be closed right know and their "creators" should be banned from Boinc.

Friendly yours.
7) Questions and Answers : Wish list : Multiple CPID and merging hosts (Message 471)
Posted 16 Nov 2007 by RvP_LaN
Post:
Sorry... Hum! Say no more! Didn't get the meaning of "Computer ID" link on the column's header!!!
8) Questions and Answers : Wish list : Multiple CPID and merging hosts (Message 469)
Posted 16 Nov 2007 by RvP_LaN
Post:
Hi,

I'm participate in your project, and I'm experiencing CPID problems.

It appears (in my particular case) that Milkyway is one of the projects which generate a wrong CPID for my account. One some other sites doing it so, I've been able to merge crappy computers trapped in the host's list and it cures the CPID problem.

Unfortunetaly your version of BOINC server doesn't provide this merge function in "computers on your account" page. I don't know how it works at the server's side, but could you consider to upgrade in order to provide this merge function?

Thanx a lot.
Regards




©2024 Astroinformatics Group