stock ATI 58x0 apps updated

Author	Message
kashi Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0	Message 38295 - Posted: 7 Apr 2010, 11:59:03 UTC Processed two new freshly issued no quorum tasks with new app auto downloaded about 4 hours ago. Both tasks completed, uploaded, reported and remain in "Completed, waiting for validation" status. Installed new manually downloaded version to run 2 tasks concurrently. After server started working again downloaded a further 48 tasks: 35 newly issued tasks with no quorum are remaining in "Completed, waiting for validation" status. 4 reissued no quorum tasks validated successfully. 8 quorum of 2 tasks have validated against HD 4700/4800, HD 3800 and CPU wingmen. 1 quorum of 2 task is waiting on wingman All quorum of 2 tasks received were reissues of tasks from 30 March or before. I did not receive any newly issued tasks with a quorum of 2. All 37 tasks I received newly issued today with no quorum remain in "Completed, waiting for validation" status. Excellent that my HD 5870 is now returning the required accuracy and is validating against other hardware classes. Well done and thank you to all involved in diagnosing the problem and coding and testing the new application. Hopefully the server can be adjusted soon so that it begins to validate newly issued tasks. ID: 38295 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 38296 - Posted: 7 Apr 2010, 12:09:29 UTC - in response to Message 38288. Now that the ATI applications have been updated, I'm curious - Cluster Physik mentioned that with a few changes, the CUDA applications could give exactly the same results as the ATI and CPU applications. They are currently close (well within tolerances) but not the same. Do you intend to update the CUDA applications based on his advice, or does the current work updating them with the new science and server communication code prevent this from being a priority? I would think that the implementation of the new model takes precedence. If all applications run well with the new code (was just released as a beta version, it is not final yet), one can think about getting the results of CPU/ATI/CUDA versions even closer together. What I've done is basically only a different method for the final summation of the integration steps (using two double values instead of a single one to hold the values, limits the precision loss for this step and greatly reduces the effect of the different summation order of the values between CPUs and GPUs) and to use a simple trick in the likelihood/fitness calculation which buys you one or two digits of precision there. As long as the individual mathematical operations used are precise to all but the very last bit (which is true for all GPUs), minor differences in the very last bit doesn't change the final likelihood result, as the limit for the precision is the summation of the individual values from all the integration points. ID: 38296 · Rating: 0 · rate: / Reply Quote

David Glogau* Send message Joined: 12 Aug 09 Posts: 172 Credit: 645,240,165 RAC: 0	Message 38305 - Posted: 7 Apr 2010, 14:26:34 UTC A few early hicups, reset all four boxes, and all is well so far. Thanks for all your work CP, et. al. Kind Regards ID: 38305 · Rating: 0 · rate: / Reply Quote

Tails Send message Joined: 19 Feb 10 Posts: 17 Credit: 7,573,117 RAC: 0	Message 38308 - Posted: 7 Apr 2010, 15:43:29 UTC Good day, The new stock application v. 0.23 for ATI GPU has downloaded automatically and is running on my GPU, with several dozens of WU's already finished. Some of those have been validated OK, but some were either considered invalid, or the validation is reported "inconclusive". Thus I guess the problem is still there. Not overclocking. Cheers, ivk Same here and I think it appers on all ati cards 58xx and 48xx. Is this normal with the new validator or will it be ever fixed ? ID: 38308 · Rating: 0 · rate: / Reply Quote

Emanuel Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0	Message 38311 - Posted: 7 Apr 2010, 16:24:54 UTC - in response to Message 38296. I would think that the implementation of the new model takes precedence. If all applications run well with the new code (was just released as a beta version, it is not final yet), one can think about getting the results of CPU/ATI/CUDA versions even closer together. What I've done is basically only a different method for the final summation of the integration steps (using two double values instead of a single one to hold the values, limits the precision loss for this step and greatly reduces the effect of the different summation order of the values between CPUs and GPUs) and to use a simple trick in the likelihood/fitness calculation which buys you one or two digits of precision there. As long as the individual mathematical operations used are precise to all but the very last bit (which is true for all GPUs), minor differences in the very last bit doesn't change the final likelihood result, as the limit for the precision is the summation of the individual values from all the integration points. Sounds like it should be relatively simple to implement when the time comes. Thanks for the explanation :) ID: 38311 · Rating: 0 · rate: / Reply Quote

kashi Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0	Message 38312 - Posted: 7 Apr 2010, 16:33:40 UTC Last modified: 7 Apr 2010, 17:03:26 UTC Excellent, of the 37 newly issued tasks with no quorum that were "Completed, waiting for validation" a second task was issued 4-7 hours later for about half of them changing them from minimum quorum of 1 to minimum quorum of 2. The other half have now validated as unchanged minimum quorum of 1. So now only 7 of the 37 remain as "Completed, validation inconclusive" and are waiting for wingmen to complete and return their tasks. No invalids so all is good, just had to get used to the strange way some no quorum tasks get converted to quorum tasks hours later and others stay as no quorum tasks and validate hours later. ID: 38312 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 38327 - Posted: 7 Apr 2010, 21:32:48 UTC - in response to Message 38312. Last modified: 7 Apr 2010, 21:33:23 UTC No invalids so all is good, just had to get used to the strange way some no quorum tasks get converted to quorum tasks hours later and others stay as no quorum tasks and validate hours later. We're doing validation in a rather unique way (when i post our recent DAIS paper you can read all about it if you'd like). Since we're running evolutionary algorithms to optimize models to the observed data of the milky way galaxy, the server keeps a population of best found models. When it generates work, it creates permutations of these models to send out to hosts. When you report work back, you give the fitness of the model you just evaluated. Before, what we were doing was if the model you sent back to us was a good one (it would improve the population the server knew about), we'd validate it and if it was validated insert it into the population. If the model you just calculated wasn't better than what the server had in it's population we didn't validate it at all, because we were just going to discard it anyways. Because only 2-10% or so of results that come back actually improved the population, this meant we could get away with really minimal validation. Right now, we're still validating all of the workunits that do improve the populations, but also 50% of the workunits that don't; in order to catch bad applications (like single precision GPU apps) and other errors like the GPU thing we had this week. We're also going to be testing some new milky way models, so it's important that we have things to the appropriate degrees of accuracy. So anyways to answer your question, all workunits get sent out initially with a quorum of 1. When the validator looks at them, it checks to see if it will improve the population, or random() < 0.5; if either of those are the case, it increases the quorum on the workunit to 2 and waits for it to be validated again. Thats why some workunits start with a quorum of 1 and then later have it increased. ID: 38327 · Rating: 0 · rate: / Reply Quote

kashi Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0	Message 38358 - Posted: 8 Apr 2010, 2:29:51 UTC - in response to Message 38327. Thank you for the explanation of how the quorum system works. My 5870 is currently working on another relatively new ATI project. I only installed the fixed 0.23 application version and processed a batch to help test. I'll be back here later when you introduce your new application which uses a different model of the Milky Way galaxy and uses parameters from the command line. It will be very pleasing if this reduces server congestion because currently when the server cannot be contacted quickly it takes a long time before pending requests to the server time out in BOINC Manager and this causes delay to server requests and replies for other projects. This also happens on some other projects with busy servers and the combination of 2 or more slow project servers can create quite an inefficient and sluggish BOINC Manager if you are attached to multiple projects. Improved server performance will also reduce the occurrence of phantom tasks. All very good, I look forward to it. ID: 38358 · Rating: 0 · rate: / Reply Quote

Arif Mert Kapicioglu Send message Joined: 14 Dec 09 Posts: 161 Credit: 589,318,064 RAC: 0	Message 38366 - Posted: 8 Apr 2010, 11:10:34 UTC Last modified: 8 Apr 2010, 11:11:12 UTC it still says validation inconclusive when it's compared with cuda. Here is some links: http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90669112 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90686852 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90669102 ID: 38366 · Rating: 0 · rate: / Reply Quote

pollux_9t Send message Joined: 26 Jan 10 Posts: 3 Credit: 1,966,092 RAC: 0	Message 38369 - Posted: 8 Apr 2010, 11:27:08 UTC Same here with a Radeon HD4890: http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90932417 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90932416 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90932410 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90932400 But others seem to have worked: http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90932409 Regards pollux_9t ID: 38369 · Rating: 0 · rate: / Reply Quote

kashi Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0	Message 38370 - Posted: 8 Apr 2010, 12:03:02 UTC Last modified: 8 Apr 2010, 12:07:19 UTC That's because for both of you, your quorum partner (wingman) has currently not yet completed and reported the task that has been sent out. With a quorum of 2, two tasks are sent out to different contributors and the server compares the the 2 results when they are completed and reported. If only one task has been completed and reported there is nothing to compare yet ("Completed, validation inconclusive" status). You need to wait until the other task has been completed and reported and some time after that the server will compare the 2 results (your result and that of your wingman) and see if they agree to the required accuracy. If they agree then both tasks will be validated and credit granted. If they do not agree then another task will be sent out to see which one of the two is correct. At this stage both of the tasks that do not agree will also have a status of "Completed, validation inconclusive" All newly issued work units are sent out to just one contributor at first (quorum of 1). After they are completed and reported the server than sends out another task for some of these so the quorum changes to a quorum of 2. ID: 38370 · Rating: 0 · rate: / Reply Quote

Arif Mert Kapicioglu Send message Joined: 14 Dec 09 Posts: 161 Credit: 589,318,064 RAC: 0	Message 38371 - Posted: 8 Apr 2010, 12:29:34 UTC - in response to Message 38370. If they do not agree then another task will be sent out to see which one of the two is correct. At this stage both of the tasks that do not agree will also have a status of "Completed, validation inconclusive" What if at this stage, the computer which latest wu was sent to, detaches? ID: 38371 · Rating: 0 · rate: / Reply Quote

Gary Roberts Send message Joined: 1 Mar 09 Posts: 56 Credit: 1,984,937,499 RAC: 0	Message 38375 - Posted: 8 Apr 2010, 13:23:48 UTC - in response to Message 38371. What if at this stage, the computer which latest wu was sent to, detaches? If it was the proper BOINC 'detach' operation, the client would advise the server that the task was not going to be completed and the server would immediately send out a new copy to a 4th machine. A worse scenario would be if the owner of the 3rd machine happened to turn his machine off and go on holidays for a month, the server would not know and would have to wait for the deadline to expire before it could send out the 4th copy. The server can be infinitely patient and the quorun would eventually get completed. By that time it would be unlikely that such a delayed result would be of any use to the project and it would most likely be discarded. However, the hosts which were deemed to have supplied the agreeing results would still get credit even if the result was discarded. Cheers, Gary. ID: 38375 · Rating: 0 · rate: / Reply Quote

pollux_9t Send message Joined: 26 Jan 10 Posts: 3 Credit: 1,966,092 RAC: 0	Message 38382 - Posted: 8 Apr 2010, 14:26:58 UTC Thanks for the quick answer! Regards, pollux_9t ID: 38382 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0	Message 38383 - Posted: 8 Apr 2010, 14:28:12 UTC Last modified: 8 Apr 2010, 15:16:52 UTC Something still doesn't seem to be working correctly on the validator. Here's 5 new WUs in which in each case a 58xx card running v.23 and another 58xx running .20b validated against each other and invalidated good v.23 results from 48xx cards: http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90934041 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90924553 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90902501 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90873943 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90757381 As I understand it this should not be happening. Interesting coincidences that both 58xx machines have 4 GPUs and are from the same team & sub-team... Is the validator still broken or? Both of these 4 x 58xx machines running v.23 are putting out a massive number of invalid results. The above are just a few examples. Here's the 2 machines: http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=123623&offset=0&show_names=0&state=4 http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=39218&offset=0&show_names=0&state=4 Edit: Here's another WU with the same scenario: http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90970028 but with yet a 3rd 58xx with v.23 spewing out a massive number of bad results. This time a machine with only 2 x 58xx cards: http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=112491&offset=0&show_names=0&state=4 ID: 38383 · Rating: 0 · rate: / Reply Quote

Tails Send message Joined: 19 Feb 10 Posts: 17 Credit: 7,573,117 RAC: 0	Message 38384 - Posted: 8 Apr 2010, 14:59:50 UTC Last modified: 8 Apr 2010, 15:03:34 UTC Something still doesn't seem to be working correctly on the validator. Here's 5 new WUs in which in each case a 58xx card running v.23 and another 58xx running .20b validated against each other and invalidated good v.23 results from 48xx cards: http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90934041 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90924553 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90902501 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90873943 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90757381 As I understand it this should not be happening. Interesting coincidences that both 58xx machines have 4 GPUs and are from the same team & sub-team... Is the validator still broken or? Both of these 4 x 58xx machines running v.23 are putting out a massive number of invalid results. The above are just a few examples. Here's the 2 machines: http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=123623&offset=0&show_names=0&state=4 http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=39218&offset=0&show_names=0&state=4 I have random invalid wus, I suspect some driver problems. Try to update the ati driver to the newest. I'll do the same. ID: 38384 · Rating: 0 · rate: / Reply Quote

Beyond Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0	Message 38386 - Posted: 8 Apr 2010, 15:07:56 UTC - in response to Message 38384. Something still doesn't seem to be working correctly on the validator. Here's 5 new WUs in which in each case a 58xx card running v.23 and another 58xx running .20b validated against each other and invalidated good v.23 results from 48xx cards: http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90934041 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90924553 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90902501 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90873943 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90757381 As I understand it this should not be happening. Interesting coincidences that both 58xx machines have 4 GPUs and are from the same team & sub-team... Is the validator still broken or? Both of these 4 x 58xx machines running v.23 are putting out a massive number of invalid results. The above are just a few examples. Here's the 2 machines: http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=123623&offset=0&show_names=0&state=4 http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=39218&offset=0&show_names=0&state=4 I have random invalid wus, I suspect some driver problems. Try to update the ati driver to the newest. I'll do the same. On some of the above WUs all machines have the latest drivers. The driver version is irrelevant. ID: 38386 · Rating: 0 · rate: / Reply Quote

John Clark Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0	Message 38389 - Posted: 8 Apr 2010, 15:16:15 UTC Last modified: 8 Apr 2010, 15:18:03 UTC I am finding that about 25% of the output of my HD5850 GPU is not validated immediately, and about 50% to 60% is in the same state very shortly after returning the results. A rather interesting situation. But, none of them seems to be rejected after the wingmen report. Much better than a few days ago. Go away, I was asleep ID: 38389 · Rating: 0 · rate: / Reply Quote

Crunch3r Volunteer developer Send message Joined: 17 Feb 08 Posts: 363 Credit: 258,227,990 RAC: 0	Message 38390 - Posted: 8 Apr 2010, 15:19:03 UTC - in response to Message 38386. Last modified: 8 Apr 2010, 15:19:52 UTC http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90934041 As I understand it this should not be happening. Is the validator still broken or? Both of these 4 x 58xx machines running v.23 are putting out a massive number of invalid results. The above are just a few examples. On some of the above WUs all machines have the latest drivers. The driver version is irrelevant. That's what i'm seeing here too. I doubt that the validator does what it should. If it would work correctly it would NOT declare a result done on two 58xx with 0.20b and 0.23 as a match and VALID, discarding the most likely correct one done on a 48xx(marked as invalid) ... Join Support science! Joinc Team BOINC United now! ID: 38390 · Rating: 0 · rate: / Reply Quote