Message boards :
Number crunching :
196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED
Message board moderation
Author | Message |
---|---|
Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0 |
I'm getting a few "Maximum disk usage exceeded" errors. I don't think I've ever seen this here. For my tasks, the client_state file shows: <rsc_disk_bound>15000000.000000</rsc_disk_bound> Example task result shows Peak disk usage 5,741.01 MB. |
Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0 |
These things keep coming. Updating to the latest driver doesn't seem to have helped and others running the same tasks don't seem to have any problems, either. I'm thinking I have a hardware problem... :-( |
Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0 |
Additional info, in case it matters... I have local preferences set that are pretty wide-open, I think. The host has an 1TB HDD with only about 250GB used. Local disk limits are set to: Use at most -- 150GB (most restrictive) Leave at least -- 0.1 GB (least restrictive) Use at most -- 50% of total (less restrictive) The BOINC manager says that 26 GB is used for BOINC with 124GB available and that MS@H is using less than 240MB. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Additional info, in case it matters... It probably won't help but change the top line "use at most" to 500gb, if it works back it off to just above where you get the errors again. If it doesn't work, as I suspect it won't but who knows, then you can switch it back to where it is now. |
Send message Joined: 19 Aug 08 Posts: 12 Credit: 2,500,263 RAC: 0 |
The project can define a max. disk usage per workunit, each workunit that fails to stay below this limit will be aborted by the BOINC core client. The only thing you could do against it would be to patch your core client so it ignores this limit - but then a workunit running wild would not be recognized. p.s.: usually all workunits of the same batch have the same disk limitation. Even though it is possible to calculate an individual limit for each single workunit, I doubt that too many projects do that. OOPS, sorry ... I thought your problem was related to the rsc_disk_bound value (as mentioned in the starting post) but this doesn't seem to be the case. Several years ago I made this thing, maybe it helps. Note that BOINC will probably still not get along well with NTFS compression, i.e. BOINC uses the physical disk space for the calculation, ignoring the compression potential. |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
I'm with Ananas' original thought - I think it's to do with <rsc_disk_bound>. There's a parallel thread at Einstein - Maximum disk usage exceeded - where ritterm cross-posted, and some related discussion in Results showing "Aborted by user". |
Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0 |
...I think it's to do with <rsc_disk_bound>... That's what I was thinking, too, of course. However, I now think I might have a GPU hardware problem -- all the tasks I've checked that errored out for me have been completed by another host without a problem. If the tasks I ran had bad parameters, would the same task work for another host? When I upgraded the video driver, I went to the AMD website, downloaded and ran their auto-detect tool, and let it pick and install a new driver. Is there anything else I need to install? |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
...I think it's to do with <rsc_disk_bound>... A bad driver by itself wouldn't cause a disk limit error - unless it's spewing out yards and yards of error messages. Look in the slot directory, as I said at Einstein. |
Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0 |
A bad driver by itself wouldn't cause a disk limit error - unless it's spewing out yards and yards of error messages. Look in the slot directory... I'm afraid I'm not sure I know what to look for. These tasks are failing right away, after only 1-2 seconds of run time. I don't see anything changing in the slot directory when this happens (\ProgramData\BOINC\slots, right?). |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
A bad driver by itself wouldn't cause a disk limit error - unless it's spewing out yards and yards of error messages. Look in the slot directory... Well, each task gets allocated to one particular numbered folder in there as it starts - which one is visible via the 'properties' button while it's active, but a couple of seconds doesn't give you much time to investigate. Each slot should be empty, unless there's a running task using it. Might be worth (re-)starting BOINC with GPU activity disabled, and emptying any slots which should be empty but aren't. Then, the next task you allow to run should occupy the lowest-numbered empty slot - watch that, and see if anything (big) appears in it. |
Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0 |
I suspended all tasks then resumed them one at a time and waited for one to crash. It didn't take too long, but all that's left in the directory is the stderr output file and it's only 4KB. If something "big" was generated before being deleted, I wasn't able to see anything. The only message in the BOINC manager log related to the task is something similar to this: Aborting task de_80_DR8_Rev_8_5_00004_1429700402_4384432_0: exceeded disk limit: 5115.01MB > 14.31MB I just don't understand what's going on. Only about 5% of the tasks are failing for me and others don't seem to be having any problem with them. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
I suspended all tasks then resumed them one at a time and waited for one to crash. It didn't take too long, but all that's left in the directory is the stderr output file and it's only 4KB. THAT'S the file you want to look at, post it here if you can please. |
Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0 |
THAT'S the file you want to look at, post it here if you can please. Well, other than the initial remarks about the "Maximum disk usage exceeded" error and what appears to be the lack of results data at the end, the rest of the file looks virtually identical to the stderr output of a valid task. However, I think my problem is solved. Following the suggestion of a forum post about a similar problem at another project, I checked my host's slots directories and found two "stray" VM image files left by one of the VM projects (probably CERN's CMS-dev), each of which was over 5GB. I deleted those files and slots and have been running trouble free for almost 12 hours. I'm not sure, though, that I understand why that was the problem. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
THAT'S the file you want to look at, post it here if you can please. I'm glad you got it fixed, that is strange. Could they be trying to use the same file name? |
Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0 |
I'm glad you got it fixed, that is strange. Could they be trying to use the same file name? Me, too! :-) I'm really not sure what's going on with the other project, but you can read more in this thread over at CERN/CMS-dev. |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
I reported this problem to the BOINC developers, and got this reply from David Anderson: I looked at this and couldn't immediately see the problem. So, help needed. Under what circumstances does the CMS .vdi image get left behind? Is there a difference between successful task completions and abnormal (error) exits? Can the .vdi be deleted manually? Immediately? Later? After BOINC restart? After reboot? Does BOINC ever clean it up by itself, say after a client restart? And anything else you can think of. Could somebody pass David's message over to CERN/CMS-dev, please? I don't even have an invitation code to create a posting account. |
Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0 |
Could somebody pass David's message over to CERN/CMS-dev, please? I don't even have an invitation code to create a posting account. Done. |
©2025 Astroinformatics Group