Welcome to MilkyWay@home

Compute error: Can't acquire lockfile

Message boards : Number crunching : Compute error: Can't acquire lockfile
Message board moderation

To post messages, you must log in.

AuthorMessage
RvP_LaN

Send message
Joined: 11 Oct 07
Posts: 8
Credit: 10,243,352
RAC: 0
Message 6070 - Posted: 11 Nov 2008, 0:22:50 UTC

Hi there,

First time that I see this message.
<core_client_version>6.3.14</core_client_version>
<![CDATA[
<message>
too many exit(0)s
</message>
<stderr_txt>
Can't acquire lockfile - exiting
FILE_LOCK::unlock(): close failed.: No error
Can't acquire lockfile - exiting
FILE_LOCK::unlock(): close failed.: No error
Can't acquire lockfile - exiting
FILE_LOCK::unlock(): close failed.: No error
Can't acquire lockfile - exiting

Happend three times on a XP64 box, quad core Phenom, 4GB RAM, 18GB free disk space, VM large enough.
MW bin: astronomy_1.22_windows_x86_64.exe
BOINC Client: 6.3.14 for Windows XP64.
The box remains up 24/24.

Any clue about this error message and issue?

By the way, on this same box, same context, your binary now often remains stucked into the process list. I have to stop the BOINC service, for distinguish which alive processes are parented to the BOINC daemon. Then in the list, the remaining boinc_project processes are the MW stucked processes. They don't consume CPU time, but small amount of RAM.

Anyway, if I don't restart the BOINC service, after one week of MW stucked processes, one of the core doesn't compute anymore for ALL other projects. Not acceptable...

I don't blame MW (not yet!), I just report a fact. Maybe the problem is related to this new version of Boinc's client, on a multi-core box. I'm not quite sure to have observed this kind of events with MW and the older Boinc client: 5.10.45.

My Boinc preferences state that projects should NOT "leave applications into memory while suspended". Hope you are following these rules...

Regards.
ID: 6070 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Stefan Ledwina
Avatar

Send message
Joined: 28 Aug 07
Posts: 16
Credit: 70,797,368
RAC: 0
Message 6073 - Posted: 11 Nov 2008, 16:23:04 UTC - in response to Message 6070.  

6.3.14 was an older alpha version. 6.3.21 is the current BOINC alpha version...
Maybe try if things are better with the new version...
ID: 6073 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
RvP_LaN

Send message
Joined: 11 Oct 07
Posts: 8
Credit: 10,243,352
RAC: 0
Message 8025 - Posted: 28 Dec 2008, 5:52:24 UTC - in response to Message 6073.  
Last modified: 28 Dec 2008, 6:07:21 UTC

6.3.14 was an older alpha version. 6.3.21 is the current BOINC alpha version...

Hello everybody, best wishes to all Boinc people!!!

I Come back on this post, after a while. I wanted to spend some time on this, just to check what happens with the new clients versions (I'm currently using 6.5.0), and the new MW versions (current 0.7 optimized x86_64). By the way, thanx to provide us a true 64 bits client!

Anyway, I still have these "ghost processes" which stay in process list. They are well identified as boinc_project processes; they are well sons of the boinc_master process. They use just a few kilo-octects of MEM, but they don't run at all (0% activity).

To purge the ghost processes, I'm forced to stop and restart my Boinc's service in order to clean this. When I do so, all ghost processes are detached from father boinc_master (even if they still are identified as boinc_project). You have to kill them one by one in the task manager. (I'm using BoincView and SysInternals Process Explorer to monitor this.)

It seems that when a MW's process hangs into error, it is not able to die elegantly! So, it is well indicated that there's a "compute error" in the Boinc's task list, but the process stays in the Windows process list.

MW is not the only project to generate ghots processes. So I won't conclude anything in a unique way! May be the new Boinc's scheduler (since 6.2.x) has something to do with this on multi-core's machines.

Could you check that? That your error's exits don't lead all to terminate the process.

Regards
ID: 8025 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 8275 - Posted: 12 Jan 2009, 23:30:42 UTC

The lock file problem I only saw it with Rosetta and a note on the Einstein boards suggested that it was because of using less than 100% of the CPU, in other words, using the ability to lower CPU usage below 100% is bugged.

I have not gotten back to running Rosetta on the computer that was so good at demonstrating the problem, but, it is something you may want to look at and post back if it has no effect or if you are using 100% runtime option ...

I am personally curious because I lost about 8 tasks because of this with many of them deep in the processing ... so like two days worth of run time (~48-50 hours)
ID: 8275 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Compute error: Can't acquire lockfile

©2024 Astroinformatics Group