Message boards :
News :
testing new validator
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
Send message Joined: 7 Nov 08 Posts: 14 Credit: 180,768,799 RAC: 0 |
|
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Is it possible for you to compile a test case for those of us who have both 48xx and 58xx cards so we can run them and see what is really doing the good work. I'll run some sample WUs standalone on my laptop tonight so I can be sure of the fitness. I'll put out the input files and the expected output when they're done. |
Send message Joined: 7 Feb 09 Posts: 9 Credit: 25,983,618 RAC: 0 |
Am running down MW wus until problem solved. *putting on devil's advocate hat* Could the problem not be that in the 58x0 series, an instruction has been included in cards that actually makes them more accurate? *removing hat* EDIT: make the DELETING wus You should see the world from my eyes. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Am running down MW wus until problem solved. Well, unless the 58x0 series is more accurate than CPUs (which I doubt), they're the culprit. From an email Anthony just sent me: this is a 2 stream workunit with sgr coordinates) (uses hardcoded values in atSurveyGeometry.c) -2.558875331749281 v0.19 CPU application (SSE3) -2.558875331749119 v0.20 apps ((ati 48xx) -2.558875331749284 v0.18 optimized -2.558875331749081 nvidia on boinc -2.558875355118770 (not v0.20) ati on boinc 58xx (use computed values in atSurveyGeometry.c) -2.558875329826787 cpu (from repository) -2.558875329826697 nvidia (old version, circa oct 2009) -2.558875329826689 nvidia (new unreleased version) the 58x0 series just isn't matching up to anything we have. |
Send message Joined: 27 Nov 09 Posts: 108 Credit: 430,760,953 RAC: 0 |
<removed> |
Send message Joined: 11 Nov 07 Posts: 232 Credit: 178,229,009 RAC: 0 |
A few questions. 1. The 58xx cards have been around for 6 months does this means the all the results during that time can have an error that are twice as big as they was supposed to be? 2. I am using 4870 cards to crunch an have something like 150 results that are marked as 'Invalid', caused by 58xx cards I assume. Will I get credit granted for that later? 3. If this has been going on for 6 months and you have spotted the problem just recently you obviously have serious problem with your validation method. Do you have a strategy now to prevent it from happen again? Despite this problems I still think you guy's in Milkyway@home are the nr.1 in BOINC |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
A few questions. I think this was maybe a more recent change? At any rate the results we've been getting for the searches have always been validated -- it's just that the issue didn't show up as much because we were not validating the vast majority of the workunits; we were just validating the ones which improved the searches we were doing. So while they had the error it didn't effect our results very much at all. The reason it's been a big deal lately was because in order to fix scripting and single precision app issues we started validating most workunits (even those that didnt improve our searches). So before we were only validating 2-5% of WUs, now we're validating 50-75%.
Right now my focus is on trying to get the server running stably again and upgrading to the new application. I'm not sure if I'm going to have time to go through the database manually and fix everyones lost credit. Most of these workunits have also been purged from the database right now, so there's really no good way to update and grant lost credit. I think it's just something everyone is going to have to live with and I apologize for that.
Well the real issue here was that we went from doing nearly no validation (we were only validating a minority of results which actually improved our search populations), to doing a lot more validation which made the problem really apparent -- so I guess the swap was a good thing :) On our end, we don't really need this extra validation because results which don't improve our search populations aren't particularly important, other than to weed out bad applications (which in this case we were unlucky enough to have one). But at any rate, I think with the more strict validation we have in place now, this kind of thing shouldn't happen again.
Glad after all of this we aren't totally hated here :) |
Send message Joined: 27 Nov 09 Posts: 108 Credit: 430,760,953 RAC: 0 |
But at any rate, I think with the more strict validation we have in place now, this kind of thing shouldn't happen again.For a possible counter-example, check Workunit 90623954. Two anonymous platforms sporting versions 0.20b and 0.22 out-quorumed an HD5870 running version 0.23. All of the results were from ATI Cypress boards (HD5870 and HD5850 apparently). Shouldn't 5xxx-series GPU results from applications prior to 0.23 be automatically discarded? |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
They should... but that's takes a couple extra database queries per workunit, and the server is crashing enough as it is. I had the check in there for awhile and the server couldn't keep up with it. |
Send message Joined: 1 Mar 09 Posts: 56 Credit: 1,984,937,499 RAC: 0 |
Two anonymous platforms sporting versions 0.20b and 0.22 out-quorumed an HD5870 running version 0.23. There are probably a significant number of people running AP and not paying close attention to the boards. Anybody noticing cases of 5800 series cards still running the wrong app should send a PM to the owner (if possible) since that will give them an email as well. Hopefully they are monitoring their email a bit more closely. Cheers, Gary. |
Send message Joined: 27 Nov 09 Posts: 108 Credit: 430,760,953 RAC: 0 |
Anybody noticing cases of 5800 series cards still running the wrong app should send a PM to the owner (if possible) since that will give them an email as well.I personally think it would be more appropriate if RPI were sending out these e-mails but... ...I've notified 4 other owners as requested. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
Anybody noticing cases of 5800 series cards still running the wrong app should send a PM to the owner (if possible) since that will give them an email as well.I personally think it would be more appropriate if RPI were sending out these e-mails but... Doing a mass email to everyone should be easy. Just the ones who don't want emails from the project would be left out, then individual emails. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 11 Nov 07 Posts: 232 Credit: 178,229,009 RAC: 0 |
Until that problem is sorted out I vill run Folding@home instead. Hope you will have this fixed soon. |
Send message Joined: 20 Sep 08 Posts: 1391 Credit: 203,563,566 RAC: 0 |
I have unfortunately had to swap 7 machines running various 3850, 4850, and 4870 cards on to another project as each one was producing 90% computation errors or work not validated. I'll check back in a few days and see if this is still continuing. Don't drink water, that's the stuff that rusts pipes |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I have unfortunately had to swap 7 machines running various 3850, 4850, and 4870 cards on to another project as each one was producing 90% computation errors or work not validated. I'll check back in a few days and see if this is still continuing. Did your machines upgrade to the correct application (0.23) and are they running the right brook32/64.dll? If they're giving that many errors it's probably because they're using the wrong application. |
Send message Joined: 23 Mar 09 Posts: 13 Credit: 100,032,796 RAC: 0 |
hi Travis I've got question, my 3 hosts (2x5870, 5870, 4850) were disconected form MW. And after you published new app 0.23 reconnected. Everything works fine. But from time to time i receve some WU for app 0.19 , even when i manualy delete an app from hdd Is it suppose to happen ? Join us at www.boincatpoland.org |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
hi Travis If you're running windows, for CPU the highest app version is still 0.19. So if it's running 0.19 on the CPU that's not a problem. |
Send message Joined: 23 Mar 09 Posts: 13 Credit: 100,032,796 RAC: 0 |
yes i run Win7 and XP but not using CPU at all (never used for MW) hmm maybe it's "fault" of MW preferences Use CPU (enforced by 6.10+ clients) Join us at www.boincatpoland.org |
Send message Joined: 20 Sep 08 Posts: 1391 Credit: 203,563,566 RAC: 0 |
Did your machines upgrade to the correct application (0.23) and are they running the right brook32/64.dll? Thanks for the response. I've had a bit of a change round and they seem OK now, so they are back crunching for MW after many days of lost work! See my post here. http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1679&nowrap=true#38670 Don't drink water, that's the stuff that rusts pipes |
Send message Joined: 17 Feb 08 Posts: 363 Credit: 258,227,990 RAC: 0 |
hi Travis I think i know what he's talking about. Yesterday, for some reason i had the same happening here on two machines. Although running on anonymous platform which only has the gpu app specified, the server send some tasks assigned for the CPU app (0.19) which was not selected in the prefs (Don't use CPU) nor specified in the app_info.xml. That shouldn't have happened at all. The boinc client started all of those tasks at once using the GPU app(i checked that in the tasks manager) and labeled them as CPU app in the boinc manager. So my 8 core machine had 2 GPU apps running (as specified in the app_info.xml) and another 8 active tasks showing up as CPU tasks in the manager, although it actually used the gpu app for them. Result of that was that the V8 slowed down quite a bit and the other one locked up completely. EDIT Same thing has happened on Collatz -> http://boinc.thesonntags.com/collatz/forum_thread.php?id=370 So i think that it's a bug server side, where it does't honor prefs nor apps specified in an app_info.xml. Join Support science! Joinc Team BOINC United now! |
©2024 Astroinformatics Group