testing new validator

Author	Message
Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 38041 - Posted: 5 Apr 2010, 6:11:26 UTC - in response to Message 38037. As stated in corresponding thread, 5xxx ability to work is very questionable right now. Bugs in ATI's OpenCL SDK implementation. They promised to fix those in new SDK release, will see... I recall GPUGRID was saying that ATI OpenCL was completely unusable. Kept locking up the machine at random. Also major problems with 4xxx performance that rendered them useless for any purpose. We have an OpenCL version of the MW@Home GPU application... and its about 10x slower on both NVIDIA and ATI cards. OpenCL still needs a lot of work it seems... If someone with both cards could do some comparison the numbers would be very helpful. When I release the code for the new application I'll have some real-sized workunit examples and the output that will be required (it will have to be within at least 10e-11). Hopefully this will help us either figure out the problem. ID: 38041 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 38042 - Posted: 5 Apr 2010, 6:13:21 UTC - in response to Message 38039. Also, the use of 1,6,6 for the error/total/success numbers is a bit strange. If the min quorum is 3 then the max errors should really be 3 also since you could still get 3 successful results and form a quorum. By leaving the errors at 1, a second error will immediately junk an otherwise potentially successful quorum. The 1 max error is because our application really shouldn't error out. Chances are if there's an error it was our fault (ie, a badly generated or specified workunit), and we don't want to send out more bad WUs. I don't mind upping it to 3 if people would prefer that, however. ID: 38042 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 38043 - Posted: 5 Apr 2010, 6:14:07 UTC - in response to Message 38040. IS validator looking at time taken? It knows the time taken but doesn't use this for validation. I'm not quite sure how that would be helpful. ID: 38043 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 38044 - Posted: 5 Apr 2010, 6:16:00 UTC - in response to Message 38039. Here's a quorum that is a bit strange. There are two 48xx results that validate against each other and there are three 58xx results that have been declared invalid. These three did come in last but how did the two manage to trump them when there are supposed to be three for a quorum? Good catch. There was a small bug in the check_set code for the validator. This shouldn't happen anymore. EDIT: Does anyone know if this is the bit of the returned data that is used for validation purposes? probability calculation (stars) Calculated about 3.34818e+009 floatingpoint ops on FPU. If not, what exactly is used? The only thing used is the fitness value reported by the application. If the fitness returned is within 10e-11 of 2 other fitnesses for the quorum, it's valid. ID: 38044 · Rating: 0 · rate: / Reply Quote

Microcruncher* Send message Joined: 1 Jul 09 Posts: 8 Credit: 1,734,500 RAC: 0	Message 38045 - Posted: 5 Apr 2010, 6:26:14 UTC Last modified: 5 Apr 2010, 6:32:13 UTC Shouldn't it read: Completed, waiting for validation instead of Completed, validation inconclusive if one result is returned and no other results are reported? [EDIT]: Fixed a typo... ID: 38045 · Rating: 0 · rate: / Reply Quote

zombie67 [MM] Send message Joined: 29 Aug 07 Posts: 115 Credit: 502,662,458 RAC: 0	Message 38046 - Posted: 5 Apr 2010, 6:29:15 UTC Just counting pages at 20 tasks each, I am currently at 9 valid and 2 invalid. 82% valid tasks, is not going to work in the long run, obviously. But I'll hang around for the shakedown. ID: 38046 · Rating: 0 · rate: / Reply Quote

Gary Roberts Send message Joined: 1 Mar 09 Posts: 56 Credit: 1,984,937,499 RAC: 0	Message 38048 - Posted: 5 Apr 2010, 6:40:55 UTC - in response to Message 38044. If not, what exactly is used? The only thing used is the fitness value reported by the application. If the fitness returned is within 10e-11 of 2 other fitnesses for the quorum, it's valid. Thanks very much for the reply. All we can see in the data returned is what's shown below. This is one of the invalids from the quorum I linked previously. Can't see any 'fitness' value in there so can you advise if it's possible to get that value from somewhere? I imagine you could trawl the slot directory and find it there for your own host before the result is uploaded but that doesn't help with finding the fitness for each of your wingmen. Device 0: ATI Radeon HD5800 series (Cypress) 1024 MB local RAM (remote 2047 MB cached + 2047 MB uncached) GPU core clock: 850 MHz, memory clock: 1200 MHz 1600 shader units organized in 20 SIMDs with 16 VLIW units (5-issue), wavefront size 64 threads supporting double precision Starting WU on GPU 0 main integral, 640 iterations predicted runtime per iteration is 123 ms (33.3333 ms are allowed), dividing each iteration in 4 parts borders of the domains at 0 400 800 1200 1600 Calculated about 3.28897e+013 floatingpoint ops on GPU, 2.47165e+008 on FPU. Approximate GPU time 84.7168 seconds. probability calculation (stars) Calculated about 3.34818e+009 floatingpoint ops on FPU. WU completed. CPU time: 3.04202 seconds, GPU time: 84.7168 seconds, wall clock time: 86.535 seconds, CPU frequency: 2.87056 GHz </stderr_txt> Cheers, Gary. ID: 38048 · Rating: 0 · rate: / Reply Quote

Microcruncher* Send message Joined: 1 Jul 09 Posts: 8 Credit: 1,734,500 RAC: 0	Message 38049 - Posted: 5 Apr 2010, 6:45:53 UTC Last modified: 5 Apr 2010, 6:49:09 UTC Here is another one: http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=89444466 errors: Too many success results One 0.19 result on a CPU and two 0.20b results on a HD 47xx/48xx and on a HD 58xx lead to this weird result. ID: 38049 · Rating: 0 · rate: / Reply Quote

Gary Roberts Send message Joined: 1 Mar 09 Posts: 56 Credit: 1,984,937,499 RAC: 0	Message 38050 - Posted: 5 Apr 2010, 6:59:01 UTC - in response to Message 38049. Here is another one: http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=89444466 errors: Too many success results Bug already fixed. Check the third post in this thread. Cheers, Gary. ID: 38050 · Rating: 0 · rate: / Reply Quote

Microcruncher* Send message Joined: 1 Jul 09 Posts: 8 Credit: 1,734,500 RAC: 0	Message 38051 - Posted: 5 Apr 2010, 7:02:43 UTC - in response to Message 38050. Here is another one: http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=89444466 errors: Too many success results Bug already fixed. Check the third post in this thread. Thank you. I think I need more coffee... ID: 38051 · Rating: 0 · rate: / Reply Quote

magyarficko Send message Joined: 22 Jan 09 Posts: 35 Credit: 46,731,190 RAC: 0	Message 38052 - Posted: 5 Apr 2010, 7:14:18 UTC - in response to Message 38046. 82% valid tasks, is not going to work in the long run, obviously. But I'll hang around for the shakedown. Well I'm out of here for the time being as 82% is not satisfactory for me! I realize that MilkyWay is still classed (as far as I know) as an Alpha project, but IMHO it is mature enough that they shouldn't be running tests in a production environment - at least some of these bugs (if not the majority of them) SHOULD have been caught in testing before releasing this new version validator into the wild. See y'all later. ID: 38052 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 38053 - Posted: 5 Apr 2010, 7:22:40 UTC - in response to Message 38052. Last modified: 5 Apr 2010, 7:25:39 UTC 82% valid tasks, is not going to work in the long run, obviously. But I'll hang around for the shakedown. Well I'm out of here for the time being as 82% is not satisfactory for me! I realize that MilkyWay is still classed (as far as I know) as an Alpha project, but IMHO it is mature enough that they shouldn't be running tests in a production environment - at least some of these bugs (if not the majority of them) SHOULD have been caught in testing before releasing this new version validator into the wild. See y'all later. Right now it looks like the problem isn't the validator but the (optimized?) GPU applications. I don't think it will take us too long to sort this out. And honestly, I put the new validator out tonight only screwing up a few workunits. I don't think that's too bad :P There's a lot of things you just can't catch until you put that kind of thing out in the wild anyways. Like I mentioned in the previous post, I rewrote the assimilator/validator code from the ground up in Java. This is going to make debugging and testing a LOT easier (yay garbage collection, exceptions and no more segmentation faults), and the validator much more stable (no memory leaks, writing to bad areas of memory). Oddly enough, it seems to be using significantly less CPU than the older version (which was c/c++). ID: 38053 · Rating: 0 · rate: / Reply Quote

Gary Roberts Send message Joined: 1 Mar 09 Posts: 56 Credit: 1,984,937,499 RAC: 0	Message 38054 - Posted: 5 Apr 2010, 7:27:41 UTC - in response to Message 38042. The 1 max error is because our application really shouldn't error out. Chances are if there's an error it was our fault (ie, a badly generated or specified workunit), and we don't want to send out more bad WUs. With an IR of 3, if the whole WU is bad all 3 will be bad and and you'll quickly hit the 3 error results limit. You shouldn't underestimate the ability of the average cruncher to trash the tasks even if the app itself really shouldn't error out :-). Also, it's very frustrating to the CPU crunchers to see many hours of work down the drain just because of a second error result in a quorum before the third success result has had a chance to come in. What problem is there in sending out an extra copy or two of the task to see if you can get a quorum? I don't mind upping it to 3 if people would prefer that, however. Well, at least make it 2 so as to give a bit more protection to those who have invested their resources (and put a memo on your monitor bezel to "Not send out any bad WUs" :-). Cheers, Gary. ID: 38054 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 38055 - Posted: 5 Apr 2010, 7:32:54 UTC - in response to Message 38054. Last modified: 5 Apr 2010, 7:33:35 UTC The 1 max error is because our application really shouldn't error out. Chances are if there's an error it was our fault (ie, a badly generated or specified workunit), and we don't want to send out more bad WUs. With an IR of 3, if the whole WU is bad all 3 will be bad and and you'll quickly hit the 3 error results limit. You shouldn't underestimate the ability of the average cruncher to trash the tasks even if the app itself really shouldn't error out :-). Also, it's very frustrating to the CPU crunchers to see many hours of work down the drain just because of a second error result in a quorum before the third success result has had a chance to come in. What problem is there in sending out an extra copy or two of the task to see if you can get a quorum? I don't mind upping it to 3 if people would prefer that, however. Well, at least make it 2 so as to give a bit more protection to those who have invested their resources (and put a memo on your monitor bezel to "Not send out any bad WUs" :-). Good points. I upped the max error results to 3. This should be reflected in all the current (and new) workunits. ID: 38055 · Rating: 0 · rate: / Reply Quote

Gary Roberts Send message Joined: 1 Mar 09 Posts: 56 Credit: 1,984,937,499 RAC: 0	Message 38057 - Posted: 5 Apr 2010, 7:59:07 UTC Last modified: 5 Apr 2010, 8:06:10 UTC Here's another example of the 'Too many success results' bug. Note that one of the victims actually invested over a day of CPU time for no reward. I guess he wont be particularly impressed. I wonder why people still persist with slow CPUs on a project like this? And here's one that is rather more important that I've just noticed. Looks like Travis has set 3,6,6 for errors/total/success and this quorum has failed with the error message "Too many total results". However there are only 6 tasks listed in the quorum, one of which is a 'client detached'. Perhaps that triggered an attempt to send out a 7th copy which junked the whole quorum. Because of the conflict between 48xx and 58xx, there mustn't have been 3 agreeing results at the time the attempt was made to send out the 7th copy. Until things are sorted regarding validation, perhaps it should be 3,9,6 rather than 3,6,6 to prevent this problem. EDIT: If you think about it, it makes sense to have the 'total' equal to the sum of 'errors' and 'success' so that all bases are covered. Cheers, Gary. ID: 38057 · Rating: 0 · rate: / Reply Quote

kashi Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0	Message 38058 - Posted: 5 Apr 2010, 8:05:44 UTC So it is possible that most of these results would be accurate to 10e-11 if compared only against an unoptimised CPU application but the results from ATI 48xx, NVIDIA and optimised CPU applications are on one side of the required fitness value and the results from ATI 58xx and 5970 are on the other side. Therefore the difference between these two sets of hardware is less accurate than 10e-11, even though individual results compared against an unoptimised CPU application may still have the required accuracy. And this is the reason that some projects that validate results with a quorum need to use homogeneous redundancy to ensure accurate results on different types of hardware? ID: 38058 · Rating: 0 · rate: / Reply Quote

Furlozza Send message Joined: 7 Feb 09 Posts: 9 Credit: 25,983,618 RAC: 0	Message 38059 - Posted: 5 Apr 2010, 8:20:19 UTC - in response to Message 38057. Last modified: 5 Apr 2010, 8:23:39 UTC Is the "Canonical" result used in anyway in determining the validity of results? I haven't checked too many, but have noted that the first result in sometimes determines validity or invalidity. Strangely enough, the main variance that I can see is if the first in is either 48xx or 57xx/58xx and made the Canonical result, then all wus returned with that series card is validated whereas higher (or lower) cards are invalidated. All other data showing in the text file we get to see is usually the same. It is expecially annoying when the seen calcs return 10e-13 on the GPU FPU, the required figure on the GPU, as with the canonical whereas we are ruled invalid. Stars are usually at 10-e9 Possibly invalid opinion, but sometimes a coincidence ...... ID: 38059 · Rating: 0 · rate: / Reply Quote

Gary Roberts Send message Joined: 1 Mar 09 Posts: 56 Credit: 1,984,937,499 RAC: 0	Message 38061 - Posted: 5 Apr 2010, 8:45:56 UTC - in response to Message 38059. Is the "Canonical" result used in anyway in determining the validity of results? I haven't checked too many, but have noted that the first result in sometimes determines validity or invalidity. I think it's the other way around. The validator selects those results that agree (within specification) and one of them (perhaps the first one) is nominated as 'canonical'. Maybe it's the one whose answer is the closest to the average of all valid results for that quorum. I guess it depends on how the validator has been written. .... It is expecially annoying when the seen calcs return 10e-13 on the GPU FPU, the required figure on the GPU, as with the canonical whereas we are ruled invalid. Stars are usually at 10-e9. Take a look more closely. The numbers you are quoting are 'flops' not 'fitness' and they are e+013 and e+009 rather than 'minus'. Travis has already said that only 'fitness' is used for validation but he hasn't answered (yet) about where we might be able to observe the actual 'fitness' values for results in a quorum. I suspect we can't access those values which will make it rather unsatisfactory for anyone trying to understand why results are being deemed invalid. Seeing as the program is being modified at the moment, it might be a good opportunity to add some code to display on the website the fitness value returned by each successful task. Cheers, Gary. ID: 38061 · Rating: 0 · rate: / Reply Quote

The Gas Giant Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0	Message 38062 - Posted: 5 Apr 2010, 8:57:10 UTC Last modified: 5 Apr 2010, 8:57:42 UTC Oh well...NNT until the problems with the validator have been overcome. ID: 38062 · Rating: 0 · rate: / Reply Quote

Arif Mert Kapicioglu Send message Joined: 14 Dec 09 Posts: 161 Credit: 589,318,064 RAC: 0	Message 38065 - Posted: 5 Apr 2010, 9:47:20 UTC I'm still gathering up a lot of "can't validate" messages. What does "check skipped" mean anyway? ID: 38065 · Rating: 0 · rate: / Reply Quote