Message boards :
Number crunching :
"Completed, can't validate" - what's this?
Message board moderation
Author | Message |
---|---|
Send message Joined: 11 Apr 15 Posts: 58 Credit: 63,291,127 RAC: 0 |
Hi there, in the last time i get more and more WUs in this status: "Completed, can't validate". My cruncher is running fine and uploads valid results. Formerly i got only from time to time a unit like that, but now i have 4(!!) on one cruncher alone. I'm a bit Aloha, Uli |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey there, Looks like there are some people aborting a bunch of work units for some reason. The result is that a work unit shows too many "errors" because of the aborts and never gets validated. Luckily this seems to be a pretty low percentage occurrence but I am sorry for your lost credits. Jake |
Send message Joined: 4 Oct 11 Posts: 38 Credit: 309,729,457 RAC: 0 |
Ret question Why aren't Aborted by User errors ignored? |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Good question Tom. If I designed the system, I probably would have designed it to ignore them. I may look into changing it after I fix some of the more pressing issues. Jake |
Send message Joined: 30 Apr 09 Posts: 101 Credit: 29,874,293 RAC: 0 |
de_modfit_fast_15_3s_fixedangles_simBPLfixed2_1_1469198227_3079440 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1239937928 Too many errors (may have bug) Three members (wingmen) aborted this task... What happen with this task? It's lost? Will send out again? Thanks. |
Send message Joined: 11 Apr 15 Posts: 58 Credit: 63,291,127 RAC: 0 |
Wow, i got 18 invalids in a row: https://milkyway.cs.rpi.edu/milkyway/results.php?hostid=616064&offset=0&show_names=0&state=5&appid= Why do people abort these tasks and waste my and others crunching time... X( Aloha, Uli |
Send message Joined: 16 Mar 10 Posts: 211 Credit: 108,219,563 RAC: 4,968 |
Wow, i got 18 invalids in a row: For one reason or another, most of the "aborted" jobs seem to be failing with Exit status: 201 (0xc9) EXIT_MISSING_COPROC(So it's not actually the user aborting the task, though the effect's the same...) I am not a Windows user myself (and all the aborted tasks that have stopped my jobs from validating have been on Windows...) - I wonder whether it's a driver problem on their machines or whether it's a side-effect of running BOINC as a service (which, I seem to recall reading, stops GPU jobs from working.) In the meantime, there seem to be machines out there aborting thousands of jobs a day. I thought BOINC servers were supposed to automatically reduce the workload, but apparently not. (And I notice a post about that very thing. http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3990) Al. |
Send message Joined: 11 Apr 15 Posts: 58 Credit: 63,291,127 RAC: 0 |
Actually this starts to annoy me a lot... Maybe it is this issue with installing BOINC as a service or the user connect to that particular computer via remote desktop. If they do this, Windows loads a "virtual" graphics driver, which isn't capable of crunching, therefore no more GPU available to BOINC. This is only recoverable by rebooting the machine. This is a well known fact, which seems to do not bother a lot of individuals... :/ Aloha, Uli |
Send message Joined: 10 Aug 09 Posts: 9 Credit: 70,518,679 RAC: 0 |
Other projects have a higher threshold, Maybe the admins can set the limit to 10? 25? |
Send message Joined: 11 Apr 15 Posts: 58 Credit: 63,291,127 RAC: 0 |
In the last time i get more and more of these. I think it is related to some driver issues with AMD and/or Win 10 not recognizing/installing the required OpenCL features of the drivers. I think the threshold should be risen from 5 to at least 10, like in other projects. BTW: Will these WUs be resend in the future or will their scientific value be lost forever? Aloha, Uli |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey everyone, You should see an end to the can't validates at this point. These are pretty common whenever a new application is released as we work out issues with them. At this point we are running pretty stable again. If you continue to see these can't validates please let me know. As far as the scientific value of a single work unit, it is relatively small so not much is lost. We work in the large numbers game here. For perspective, it takes 3 million+ work units for me to do a single mapping of an SDSS data wedge, and even that number is a bit misleading since we run multiple mappings of each wedge to check our results. As for the lost credits, we are pretty sad about that, but we hope the stability we offer between releases makes up for the one or two bumpy days we have here and there. (Which I am actively working on decreasing the frequency and impact of those bumpy days.) Sorry for the issues over the last couple days. Jake |
©2024 Astroinformatics Group