Welcome to MilkyWay@home

Suddenly most results invalid

Message boards : Number crunching : Suddenly most results invalid
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Crunch3r
Volunteer developer
Avatar

Send message
Joined: 17 Feb 08
Posts: 363
Credit: 258,227,990
RAC: 0
Message 51528 - Posted: 27 Oct 2011, 21:51:40 UTC - in response to Message 51479.  

Whenever I check out validate errors, more than 95+% of the time it's because the result is missing. The workunit succeeded and everything, but the result is simply not in the output. We get results back from the stderr for quite a while (this wasn't true about a year ago), and I don't many other projects do this.

There are two issues (at least one is with BOINC) which I'm aware of which results in this. First, there seems to be an arbitrary cutoff where you lose part of the stderr log if it's large, but that is much less frequent.

Second, there seems to be a (more frequent) problem where sometimes for no reason some of the stderr is lost (the important part with the result). There are strange things that happen with CAL and OpenCL where stderr/stdout don't get flushed correctly on program end. I've sort of noticed this for a long time, and recently were mentioned on the AMD OpenCL forums as a AMD known issue.



FWIW, here's my opinion on the subject...

There's a major issue with results from CPU, ATI AND NVIDIA OCL apps since all of them are producing way to different results.

I've seen this happening myself where a ATI Cayman result was compared to a CUDA OCL result and those two where way to different to validate (background_likelihood and and all the stream integrals and so on), so another WU was sent out to a CPU, which also had a result that was not within the needed precision to get validated....

So the question is, which app(CUDA OCL, ATI CAYMAN, ATI CYPRESS, CPU Optimized ) does return the proper result....

I do know that the MY source code (SSE2/SSE3 hand optimized) that is being used in the new stock 0.88 CPU app (which was ported from some old V 0.18/0.20 code) did return the correct results compared to the old optimized Gipsel CPU/GPU and old stock apps.

The question is which app does give wrong results as of now... What we need is some sort of "integrated integrity check" some fixed results that are hard coded into the app code and can be tested by running "mw_client_app.exe -test" or something like that (make sure that it'll take less than 30 min on a CPU)

Besides all that, there's still the issue with using boincs "stderr" for sending back the results.... Having an empty stderr is a long known issue with boinc which goes back at least 5 years... Of course that was never a problem till MW incorporated using stderr to include the WU results as well.

Regarding that, you'd better talk to Rom Walton or DA to get that fixed once and for all.





Join Support science! Joinc Team BOINC United now!
ID: 51528 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Suddenly most results invalid

©2024 Astroinformatics Group