Welcome to MilkyWay@home

"Completed, can't validate" - what's this?


Advanced search

Message boards : Number crunching : "Completed, can't validate" - what's this?
Message board moderation

To post messages, you must log in.

AuthorMessage
Ulrich Metzner
Avatar

Send message
Joined: 11 Apr 15
Posts: 58
Credit: 51,296,286
RAC: 19,858
50 million credit badge6 year member badge
Message 64778 - Posted: 30 Jun 2016, 11:20:42 UTC

Hi there,

in the last time i get more and more WUs in this status: "Completed, can't validate".
My cruncher is running fine and uploads valid results. Formerly i got only from time
to time a unit like that, but now i have 4(!!) on one cruncher alone.
I'm a bit pi**ed "unhappy" for wasting precious crunching time... ;)
Aloha, Uli

ID: 64778 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 75,271,794
RAC: 0
50 million credit badge8 year member badgeextraordinary contributions badge
Message 64779 - Posted: 30 Jun 2016, 12:30:14 UTC

Hey there,

Looks like there are some people aborting a bunch of work units for some reason. The result is that a work unit shows too many "errors" because of the aborts and never gets validated. Luckily this seems to be a pretty low percentage occurrence but I am sorry for your lost credits.

Jake
ID: 64779 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tom*

Send message
Joined: 4 Oct 11
Posts: 38
Credit: 305,140,808
RAC: 2
300 million credit badge10 year member badgeextraordinary contributions badge
Message 64787 - Posted: 1 Jul 2016, 22:49:28 UTC

Ret question

Why aren't Aborted by User errors ignored?
ID: 64787 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 75,271,794
RAC: 0
50 million credit badge8 year member badgeextraordinary contributions badge
Message 64792 - Posted: 2 Jul 2016, 14:13:27 UTC

Good question Tom. If I designed the system, I probably would have designed it to ignore them. I may look into changing it after I fix some of the more pressing issues.

Jake
ID: 64792 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileSutaru Tsureku

Send message
Joined: 30 Apr 09
Posts: 99
Credit: 26,324,467
RAC: 15,678
20 million credit badge12 year member badge
Message 64936 - Posted: 25 Jul 2016, 7:22:17 UTC

de_modfit_fast_15_3s_fixedangles_simBPLfixed2_1_1469198227_3079440
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1239937928
Too many errors (may have bug)

Three members (wingmen) aborted this task...

What happen with this task? It's lost? Will send out again?

Thanks.
ID: 64936 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ulrich Metzner
Avatar

Send message
Joined: 11 Apr 15
Posts: 58
Credit: 51,296,286
RAC: 19,858
50 million credit badge6 year member badge
Message 64978 - Posted: 4 Aug 2016, 8:22:16 UTC

Wow, i got 18 invalids in a row:
https://milkyway.cs.rpi.edu/milkyway/results.php?hostid=616064&offset=0&show_names=0&state=5&appid=
Why do people abort these tasks and waste my and others crunching time... X(
Aloha, Uli

ID: 64978 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alanb1951

Send message
Joined: 16 Mar 10
Posts: 71
Credit: 69,222,800
RAC: 24,595
50 million credit badge11 year member badgeextraordinary contributions badge
Message 64993 - Posted: 6 Aug 2016, 6:12:05 UTC - in response to Message 64978.  
Last modified: 6 Aug 2016, 6:12:51 UTC

Wow, i got 18 invalids in a row:
https://milkyway.cs.rpi.edu/milkyway/results.php?hostid=616064&offset=0&show_names=0&state=5&appid=
Why do people abort these tasks and waste my and others crunching time... X(


For one reason or another, most of the "aborted" jobs seem to be failing with
Exit status: 201 (0xc9) EXIT_MISSING_COPROC
(So it's not actually the user aborting the task, though the effect's the same...)

I am not a Windows user myself (and all the aborted tasks that have stopped my jobs from validating have been on Windows...) - I wonder whether it's a driver problem on their machines or whether it's a side-effect of running BOINC as a service (which, I seem to recall reading, stops GPU jobs from working.)

In the meantime, there seem to be machines out there aborting thousands of jobs a day. I thought BOINC servers were supposed to automatically reduce the workload, but apparently not. (And I notice a post about that very thing. http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3990)

Al.
ID: 64993 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ulrich Metzner
Avatar

Send message
Joined: 11 Apr 15
Posts: 58
Credit: 51,296,286
RAC: 19,858
50 million credit badge6 year member badge
Message 64996 - Posted: 6 Aug 2016, 14:57:26 UTC

Actually this starts to annoy me a lot...

Maybe it is this issue with installing BOINC as a service or the user connect to that particular computer via remote desktop. If they do this, Windows loads a "virtual" graphics driver, which isn't capable of crunching, therefore no more GPU available to BOINC. This is only recoverable by rebooting the machine. This is a well known fact, which seems to do not bother a lot of individuals... :/
Aloha, Uli

ID: 64996 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
waffleironhead

Send message
Joined: 10 Aug 09
Posts: 7
Credit: 31,539,585
RAC: 59,534
30 million credit badge12 year member badge
Message 64997 - Posted: 6 Aug 2016, 16:20:28 UTC

Other projects have a higher threshold, Maybe the admins can set the limit to 10? 25?
ID: 64997 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ulrich Metzner
Avatar

Send message
Joined: 11 Apr 15
Posts: 58
Credit: 51,296,286
RAC: 19,858
50 million credit badge6 year member badge
Message 65233 - Posted: 22 Sep 2016, 14:22:41 UTC

In the last time i get more and more of these. I think it is related to some driver issues with AMD and/or Win 10 not recognizing/installing the required OpenCL features of the drivers. I think the threshold should be risen from 5 to at least 10, like in other projects.

BTW: Will these WUs be resend in the future or will their scientific value be lost forever?
Aloha, Uli

ID: 65233 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 75,271,794
RAC: 0
50 million credit badge8 year member badgeextraordinary contributions badge
Message 65280 - Posted: 26 Sep 2016, 16:45:27 UTC

Hey everyone,

You should see an end to the can't validates at this point. These are pretty common whenever a new application is released as we work out issues with them. At this point we are running pretty stable again. If you continue to see these can't validates please let me know.

As far as the scientific value of a single work unit, it is relatively small so not much is lost. We work in the large numbers game here. For perspective, it takes 3 million+ work units for me to do a single mapping of an SDSS data wedge, and even that number is a bit misleading since we run multiple mappings of each wedge to check our results. As for the lost credits, we are pretty sad about that, but we hope the stability we offer between releases makes up for the one or two bumpy days we have here and there. (Which I am actively working on decreasing the frequency and impact of those bumpy days.)

Sorry for the issues over the last couple days.

Jake
ID: 65280 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : "Completed, can't validate" - what's this?

©2022 Astroinformatics Group