Welcome to MilkyWay@home

Suddenly most results invalid


Advanced search

Message boards : Number crunching : Suddenly most results invalid
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
ProfileMad Matt

Send message
Joined: 19 Sep 09
Posts: 16
Credit: 218,390,676
RAC: 0
200 million credit badge10 year member badge
Message 51231 - Posted: 27 Sep 2011, 10:57:35 UTC

Host ID: 312643

I did not make any changes on OS, driver, hardware or app there (opt. 0.82). It's W7, ATI 0(not used)=5450, ATI 1 =4770, ATI Catalyst 11.3. Suddenly this rig almost went into validation nirvana. Just a few WUs still get validated.

Any ideas?
Cheers
ID: 51231 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
10 million credit badge9 year member badge
Message 51235 - Posted: 27 Sep 2011, 16:23:57 UTC - in response to Message 51231.  

Hmm.

There seem to be a bunch marked as invalid on those workunits, not just from you and for all 3 types of applications. The ones that are validating get the same results I do for those. The ones that aren't are getting results different enough to not validate. There seems to be some problem here but I can't seem to reproduce it after many runs of a few of these.
ID: 51235 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileMad Matt

Send message
Joined: 19 Sep 09
Posts: 16
Credit: 218,390,676
RAC: 0
200 million credit badge10 year member badge
Message 51237 - Posted: 27 Sep 2011, 18:43:03 UTC - in response to Message 51235.  

Cheers for the feedback, Matt. At least I know now I don't have to start with turning the rig upside down.
ID: 51237 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
10 million credit badge9 year member badge
Message 51245 - Posted: 28 Sep 2011, 1:02:51 UTC - in response to Message 51235.  

Actually I've now managed to get it to happen on the R770 (4850/4870) but nothing else.
ID: 51245 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileMad Matt

Send message
Joined: 19 Sep 09
Posts: 16
Credit: 218,390,676
RAC: 0
200 million credit badge10 year member badge
Message 51310 - Posted: 6 Oct 2011, 15:27:09 UTC - in response to Message 51245.  
Last modified: 6 Oct 2011, 15:27:51 UTC

It looks like it has been fixed? First efforts today after a while:

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=312643
ID: 51310 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JLConawayII

Send message
Joined: 27 Apr 10
Posts: 35
Credit: 90,828,595
RAC: 0
50 million credit badge9 year member badge
Message 51418 - Posted: 15 Oct 2011, 0:27:07 UTC

One of your WUs tried to divide by the sqrt of a black hole, and it went downhill from there.
ID: 51418 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hampsteadpete
Avatar

Send message
Joined: 18 Jul 11
Posts: 22
Credit: 4,156,557
RAC: 0
3 million credit badge8 year member badge
Message 51426 - Posted: 16 Oct 2011, 12:51:30 UTC

Host ID: 322328

Ran just a few WU's last night because for some reason my machine wouldn't update after running 12. Half the WU's I did were "validation inconclusive." First time that's happened to me as well.

Pete
ID: 51426 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile7ri9991 [MM]
Avatar

Send message
Joined: 9 Jan 09
Posts: 12
Credit: 25,252,582
RAC: 0
20 million credit badge10 year member badge
Message 51436 - Posted: 17 Oct 2011, 19:30:36 UTC - in response to Message 51245.  

Actually I've now managed to get it to happen on the R770 (4850/4870) but nothing else.

Perfect. That's what I'm running.
ID: 51436 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileMad Matt

Send message
Joined: 19 Sep 09
Posts: 16
Credit: 218,390,676
RAC: 0
200 million credit badge10 year member badge
Message 51446 - Posted: 19 Oct 2011, 11:15:11 UTC

I should add in hindsight, it could be some of those results have been coming from using wrong GPU clocks, but it's hard to track this back.

Since posting and since definitely using my regular OC settings, the problem has been solved for me.
ID: 51446 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,790
RAC: 0
500 million credit badge10 year member badge
Message 51447 - Posted: 19 Oct 2011, 15:13:05 UTC

If you look at the top computers you may notice that almost all of them regularly post invalid tasks. This is VERY unusual compared to other projects and may indicate either a problem with 0.82 or a validator problem. The former CP apps did not exhibit this behavior.
ID: 51447 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
w1hue

Send message
Joined: 13 Feb 09
Posts: 48
Credit: 19,383,145
RAC: 16,619
10 million credit badge10 year member badgeextraordinary contributions badge
Message 51448 - Posted: 20 Oct 2011, 3:16:55 UTC - in response to Message 51447.  

If you look at the top computers you may notice that almost all of them regularly post invalid tasks. This is VERY unusual compared to other projects and may indicate either a problem with 0.82 or a validator problem. The former CP apps did not exhibit this behavior.

I'm pretty much at the bottom of the heap with my three CPU-only (well, one has a GPU, but not gudnuf for this project...) computers, so I doubt that my participation will be missed. After all the WU's submitted by one machine over ~30 period and the majority from the other two machines got marked as "invalid", I decided to quite wasting my machines's time and removed the project. All three have no problems with several other BOINC projects.
ID: 51448 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
frogger4

Send message
Joined: 24 Sep 11
Posts: 1
Credit: 19,080,726
RAC: 0
10 million credit badge8 year member badge
Message 51454 - Posted: 21 Oct 2011, 3:55:40 UTC

I really do apologize for being a bit of noob - But I was wondering, how do you know if your WUs get validated or not? I don't know if there is any connection, but I noticed I'm getting considerably less credit than I used to for doing what [I think] is as much work.
ID: 51454 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,790
RAC: 0
500 million credit badge10 year member badge
Message 51455 - Posted: 21 Oct 2011, 8:19:17 UTC - in response to Message 51454.  
Last modified: 21 Oct 2011, 8:19:56 UTC

I really do apologize for being a bit of noob - But I was wondering, how do you know if your WUs get validated or not? I don't know if there is any connection, but I noticed I'm getting considerably less credit than I used to for doing what [I think] is as much work.

Look here, notice the choices at the top of the page:

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=329377&offset=0&show_names=0&state=4&appid=
ID: 51455 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileMad Matt

Send message
Joined: 19 Sep 09
Posts: 16
Credit: 218,390,676
RAC: 0
200 million credit badge10 year member badge
Message 51469 - Posted: 22 Oct 2011, 11:38:22 UTC - in response to Message 51447.  

If you look at the top computers you may notice that almost all of them regularly post invalid tasks. This is VERY unusual compared to other projects and may indicate either a problem with 0.82 or a validator problem. The former CP apps did not exhibit this behavior.


Affirmative when looking at my rigs.
ID: 51469 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
10 million credit badge9 year member badge
Message 51479 - Posted: 23 Oct 2011, 21:01:11 UTC - in response to Message 51469.  
Last modified: 23 Oct 2011, 21:01:52 UTC

Whenever I check out validate errors, more than 95+% of the time it's because the result is missing. The workunit succeeded and everything, but the result is simply not in the output. We get results back from the stderr for quite a while (this wasn't true about a year ago), and I don't many other projects do this.

There are two issues (at least one is with BOINC) which I'm aware of which results in this. First, there seems to be an arbitrary cutoff where you lose part of the stderr log if it's large, but that is much less frequent.

Second, there seems to be a (more frequent) problem where sometimes for no reason some of the stderr is lost (the important part with the result). There are strange things that happen with CAL and OpenCL where stderr/stdout don't get flushed correctly on program end. I've sort of noticed this for a long time, and recently were mentioned on the AMD OpenCL forums as a AMD known issue.
ID: 51479 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,790
RAC: 0
500 million credit badge10 year member badge
Message 51515 - Posted: 26 Oct 2011, 14:12:12 UTC - in response to Message 51479.  

We get results back from the stderr for quite a while (this wasn't true about a year ago), and I don't many other projects do this.

Since it's not working so well, can this be changed?
ID: 51515 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge10 year member badge
Message 51520 - Posted: 26 Oct 2011, 19:31:53 UTC - in response to Message 51479.  

Whenever I check out validate errors, more than 95+% of the time it's because the result is missing. The workunit succeeded and everything, but the result is simply not in the output. We get results back from the stderr for quite a while (this wasn't true about a year ago), and I don't many other projects do this.

There are two issues (at least one is with BOINC) which I'm aware of which results in this. First, there seems to be an arbitrary cutoff where you lose part of the stderr log if it's large, but that is much less frequent.

Second, there seems to be a (more frequent) problem where sometimes for no reason some of the stderr is lost (the important part with the result). There are strange things that happen with CAL and OpenCL where stderr/stdout don't get flushed correctly on program end. I've sort of noticed this for a long time, and recently were mentioned on the AMD OpenCL forums as a AMD known issue.


I wonder if something as simple as flushing the stderr (or adding some extra newlines) before the end of the program could fix this.
ID: 51520 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
KAMasud

Send message
Joined: 23 Oct 11
Posts: 8
Credit: 480,330
RAC: 0
100 thousand credit badge8 year member badge
Message 51522 - Posted: 27 Oct 2011, 8:30:53 UTC

Not all machines are giving invalid results. So far have i been having a perfect run. That's another matter that i cant get enough WU's.
ID: 51522 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,790
RAC: 0
500 million credit badge10 year member badge
Message 51524 - Posted: 27 Oct 2011, 13:48:54 UTC - in response to Message 51522.  

Not all machines are giving invalid results. So far have i been having a perfect run. That's another matter that i cant get enough WU's.

You probably won't see much of the invalids with this low turnover rate:

Number of tasks completed 10
Max tasks per day 10010
Number of tasks today 4
Consecutive valid tasks 1
Average processing rate 2.9387240425833
Average turnaround time 0.31 days

Only 1 consecutive valid task...
ID: 51524 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,790
RAC: 0
500 million credit badge10 year member badge
Message 51525 - Posted: 27 Oct 2011, 13:56:54 UTC - in response to Message 51520.  

I wonder if something as simple as flushing the stderr (or adding some extra newlines) before the end of the program could fix this.

I bet it would. It looks like most of the ATI projects are using the Stderr output and this is the only one that's giving missing or truncated Stderr results.
ID: 51525 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Suddenly most results invalid

©2020 Astroinformatics Group