Welcome to MilkyWay@home

stock ATI 58x0 apps updated

Message boards : News : stock ATI 58x0 apps updated
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile kashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,490,184
RAC: 0
Message 38295 - Posted: 7 Apr 2010, 11:59:03 UTC

Processed two new freshly issued no quorum tasks with new app auto downloaded about 4 hours ago. Both tasks completed, uploaded, reported and remain in "Completed, waiting for validation" status.

Installed new manually downloaded version to run 2 tasks concurrently. After server started working again downloaded a further 48 tasks:

35 newly issued tasks with no quorum are remaining in "Completed, waiting for validation" status.

4 reissued no quorum tasks validated successfully.

8 quorum of 2 tasks have validated against HD 4700/4800, HD 3800 and CPU wingmen.

1 quorum of 2 task is waiting on wingman


All quorum of 2 tasks received were reissues of tasks from 30 March or before.
I did not receive any newly issued tasks with a quorum of 2.
All 37 tasks I received newly issued today with no quorum remain in "Completed, waiting for validation" status.

Excellent that my HD 5870 is now returning the required accuracy and is validating against other hardware classes. Well done and thank you to all involved in diagnosing the problem and coding and testing the new application.

Hopefully the server can be adjusted soon so that it begins to validate newly issued tasks.
ID: 38295 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 38296 - Posted: 7 Apr 2010, 12:09:29 UTC - in response to Message 38288.  

Now that the ATI applications have been updated, I'm curious - Cluster Physik mentioned that with a few changes, the CUDA applications could give exactly the same results as the ATI and CPU applications. They are currently close (well within tolerances) but not the same. Do you intend to update the CUDA applications based on his advice, or does the current work updating them with the new science and server communication code prevent this from being a priority?

I would think that the implementation of the new model takes precedence. If all applications run well with the new code (was just released as a beta version, it is not final yet), one can think about getting the results of CPU/ATI/CUDA versions even closer together.

What I've done is basically only a different method for the final summation of the integration steps (using two double values instead of a single one to hold the values, limits the precision loss for this step and greatly reduces the effect of the different summation order of the values between CPUs and GPUs) and to use a simple trick in the likelihood/fitness calculation which buys you one or two digits of precision there. As long as the individual mathematical operations used are precise to all but the very last bit (which is true for all GPUs), minor differences in the very last bit doesn't change the final likelihood result, as the limit for the precision is the summation of the individual values from all the integration points.
ID: 38296 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile David Glogau*
Avatar

Send message
Joined: 12 Aug 09
Posts: 172
Credit: 645,240,165
RAC: 0
Message 38305 - Posted: 7 Apr 2010, 14:26:34 UTC

A few early hicups, reset all four boxes, and all is well so far.

Thanks for all your work CP, et. al.

Kind Regards
ID: 38305 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tails
Avatar

Send message
Joined: 19 Feb 10
Posts: 17
Credit: 7,573,117
RAC: 0
Message 38308 - Posted: 7 Apr 2010, 15:43:29 UTC

Good day,

The new stock application v. 0.23 for ATI GPU has downloaded automatically and is running on my GPU, with several dozens of WU's already finished. Some of those have been validated OK, but some were either considered invalid, or the validation is reported "inconclusive". Thus I guess the problem is still there.

Not overclocking.

Cheers, ivk


Same here and I think it appers on all ati cards 58xx and 48xx. Is this normal with the new validator or will it be ever fixed ?
ID: 38308 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Emanuel

Send message
Joined: 18 Nov 07
Posts: 280
Credit: 2,442,757
RAC: 0
Message 38311 - Posted: 7 Apr 2010, 16:24:54 UTC - in response to Message 38296.  

I would think that the implementation of the new model takes precedence. If all applications run well with the new code (was just released as a beta version, it is not final yet), one can think about getting the results of CPU/ATI/CUDA versions even closer together.

What I've done is basically only a different method for the final summation of the integration steps (using two double values instead of a single one to hold the values, limits the precision loss for this step and greatly reduces the effect of the different summation order of the values between CPUs and GPUs) and to use a simple trick in the likelihood/fitness calculation which buys you one or two digits of precision there. As long as the individual mathematical operations used are precise to all but the very last bit (which is true for all GPUs), minor differences in the very last bit doesn't change the final likelihood result, as the limit for the precision is the summation of the individual values from all the integration points.

Sounds like it should be relatively simple to implement when the time comes. Thanks for the explanation :)
ID: 38311 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile kashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,490,184
RAC: 0
Message 38312 - Posted: 7 Apr 2010, 16:33:40 UTC
Last modified: 7 Apr 2010, 17:03:26 UTC

Excellent, of the 37 newly issued tasks with no quorum that were "Completed, waiting for validation" a second task was issued 4-7 hours later for about half of them changing them from minimum quorum of 1 to minimum quorum of 2. The other half have now validated as unchanged minimum quorum of 1.

So now only 7 of the 37 remain as "Completed, validation inconclusive" and are waiting for wingmen to complete and return their tasks.

No invalids so all is good, just had to get used to the strange way some no quorum tasks get converted to quorum tasks hours later and others stay as no quorum tasks and validate hours later.
ID: 38312 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 38327 - Posted: 7 Apr 2010, 21:32:48 UTC - in response to Message 38312.  
Last modified: 7 Apr 2010, 21:33:23 UTC

No invalids so all is good, just had to get used to the strange way some no quorum tasks get converted to quorum tasks hours later and others stay as no quorum tasks and validate hours later.


We're doing validation in a rather unique way (when i post our recent DAIS paper you can read all about it if you'd like).

Since we're running evolutionary algorithms to optimize models to the observed data of the milky way galaxy, the server keeps a population of best found models. When it generates work, it creates permutations of these models to send out to hosts.

When you report work back, you give the fitness of the model you just evaluated. Before, what we were doing was if the model you sent back to us was a good one (it would improve the population the server knew about), we'd validate it and if it was validated insert it into the population. If the model you just calculated wasn't better than what the server had in it's population we didn't validate it at all, because we were just going to discard it anyways. Because only 2-10% or so of results that come back actually improved the population, this meant we could get away with really minimal validation.

Right now, we're still validating all of the workunits that do improve the populations, but also 50% of the workunits that don't; in order to catch bad applications (like single precision GPU apps) and other errors like the GPU thing we had this week. We're also going to be testing some new milky way models, so it's important that we have things to the appropriate degrees of accuracy.

So anyways to answer your question, all workunits get sent out initially with a quorum of 1. When the validator looks at them, it checks to see if it will improve the population, or random() < 0.5; if either of those are the case, it increases the quorum on the workunit to 2 and waits for it to be validated again. Thats why some workunits start with a quorum of 1 and then later have it increased.
ID: 38327 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile kashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,490,184
RAC: 0
Message 38358 - Posted: 8 Apr 2010, 2:29:51 UTC - in response to Message 38327.  

Thank you for the explanation of how the quorum system works.

My 5870 is currently working on another relatively new ATI project. I only installed the fixed 0.23 application version and processed a batch to help test.

I'll be back here later when you introduce your new application which uses a different model of the Milky Way galaxy and uses parameters from the command line.

It will be very pleasing if this reduces server congestion because currently when the server cannot be contacted quickly it takes a long time before pending requests to the server time out in BOINC Manager and this causes delay to server requests and replies for other projects. This also happens on some other projects with busy servers and the combination of 2 or more slow project servers can create quite an inefficient and sluggish BOINC Manager if you are attached to multiple projects. Improved server performance will also reduce the occurrence of phantom tasks. All very good, I look forward to it.
ID: 38358 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Arif Mert Kapicioglu

Send message
Joined: 14 Dec 09
Posts: 161
Credit: 589,318,064
RAC: 0
Message 38366 - Posted: 8 Apr 2010, 11:10:34 UTC
Last modified: 8 Apr 2010, 11:11:12 UTC

ID: 38366 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pollux_9t

Send message
Joined: 26 Jan 10
Posts: 3
Credit: 1,966,092
RAC: 0
Message 38369 - Posted: 8 Apr 2010, 11:27:08 UTC

ID: 38369 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile kashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,490,184
RAC: 0
Message 38370 - Posted: 8 Apr 2010, 12:03:02 UTC
Last modified: 8 Apr 2010, 12:07:19 UTC

That's because for both of you, your quorum partner (wingman) has currently not yet completed and reported the task that has been sent out.

With a quorum of 2, two tasks are sent out to different contributors and the server compares the the 2 results when they are completed and reported. If only one task has been completed and reported there is nothing to compare yet ("Completed, validation inconclusive" status).

You need to wait until the other task has been completed and reported and some time after that the server will compare the 2 results (your result and that of your wingman) and see if they agree to the required accuracy.

If they agree then both tasks will be validated and credit granted.

If they do not agree then another task will be sent out to see which one of the two is correct. At this stage both of the tasks that do not agree will also have a status of "Completed, validation inconclusive"

All newly issued work units are sent out to just one contributor at first (quorum of 1). After they are completed and reported the server than sends out another task for some of these so the quorum changes to a quorum of 2.
ID: 38370 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Arif Mert Kapicioglu

Send message
Joined: 14 Dec 09
Posts: 161
Credit: 589,318,064
RAC: 0
Message 38371 - Posted: 8 Apr 2010, 12:29:34 UTC - in response to Message 38370.  

If they do not agree then another task will be sent out to see which one of the two is correct. At this stage both of the tasks that do not agree will also have a status of "Completed, validation inconclusive"


What if at this stage, the computer which latest wu was sent to, detaches?
ID: 38371 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Gary Roberts

Send message
Joined: 1 Mar 09
Posts: 56
Credit: 1,984,937,499
RAC: 0
Message 38375 - Posted: 8 Apr 2010, 13:23:48 UTC - in response to Message 38371.  

What if at this stage, the computer which latest wu was sent to, detaches?

If it was the proper BOINC 'detach' operation, the client would advise the server that the task was not going to be completed and the server would immediately send out a new copy to a 4th machine.

A worse scenario would be if the owner of the 3rd machine happened to turn his machine off and go on holidays for a month, the server would not know and would have to wait for the deadline to expire before it could send out the 4th copy. The server can be infinitely patient and the quorun would eventually get completed. By that time it would be unlikely that such a delayed result would be of any use to the project and it would most likely be discarded. However, the hosts which were deemed to have supplied the agreeing results would still get credit even if the result was discarded.

Cheers,
Gary.
ID: 38375 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pollux_9t

Send message
Joined: 26 Jan 10
Posts: 3
Credit: 1,966,092
RAC: 0
Message 38382 - Posted: 8 Apr 2010, 14:26:58 UTC

Thanks for the quick answer!

Regards,
pollux_9t
ID: 38382 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 38383 - Posted: 8 Apr 2010, 14:28:12 UTC
Last modified: 8 Apr 2010, 15:16:52 UTC

Something still doesn't seem to be working correctly on the validator. Here's 5 new WUs in which in each case a 58xx card running v.23 and another 58xx running .20b validated against each other and invalidated good v.23 results from 48xx cards:

http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90934041
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90924553
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90902501
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90873943
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90757381

As I understand it this should not be happening. Interesting coincidences that both 58xx machines have 4 GPUs and are from the same team & sub-team...

Is the validator still broken or? Both of these 4 x 58xx machines running v.23 are putting out a massive number of invalid results. The above are just a few examples.

Here's the 2 machines:

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=123623&offset=0&show_names=0&state=4
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=39218&offset=0&show_names=0&state=4

Edit: Here's another WU with the same scenario:

http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90970028

but with yet a 3rd 58xx with v.23 spewing out a massive number of bad results. This time a machine with only 2 x 58xx cards:

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=112491&offset=0&show_names=0&state=4
ID: 38383 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tails
Avatar

Send message
Joined: 19 Feb 10
Posts: 17
Credit: 7,573,117
RAC: 0
Message 38384 - Posted: 8 Apr 2010, 14:59:50 UTC
Last modified: 8 Apr 2010, 15:03:34 UTC

Something still doesn't seem to be working correctly on the validator. Here's 5 new WUs in which in each case a 58xx card running v.23 and another 58xx running .20b validated against each other and invalidated good v.23 results from 48xx cards:

http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90934041
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90924553
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90902501
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90873943
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90757381

As I understand it this should not be happening. Interesting coincidences that both 58xx machines have 4 GPUs and are from the same team & sub-team...

Is the validator still broken or? Both of these 4 x 58xx machines running v.23 are putting out a massive number of invalid results. The above are just a few examples.

Here's the 2 machines:

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=123623&offset=0&show_names=0&state=4
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=39218&offset=0&show_names=0&state=4


I have random invalid wus, I suspect some driver problems. Try to update the ati driver to the newest. I'll do the same.
ID: 38384 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 38386 - Posted: 8 Apr 2010, 15:07:56 UTC - in response to Message 38384.  

Something still doesn't seem to be working correctly on the validator. Here's 5 new WUs in which in each case a 58xx card running v.23 and another 58xx running .20b validated against each other and invalidated good v.23 results from 48xx cards:

http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90934041
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90924553
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90902501
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90873943
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90757381

As I understand it this should not be happening. Interesting coincidences that both 58xx machines have 4 GPUs and are from the same team & sub-team...

Is the validator still broken or? Both of these 4 x 58xx machines running v.23 are putting out a massive number of invalid results. The above are just a few examples.

Here's the 2 machines:

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=123623&offset=0&show_names=0&state=4
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=39218&offset=0&show_names=0&state=4


I have random invalid wus, I suspect some driver problems. Try to update the ati driver to the newest. I'll do the same.

On some of the above WUs all machines have the latest drivers. The driver version is irrelevant.

ID: 38386 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
Message 38389 - Posted: 8 Apr 2010, 15:16:15 UTC
Last modified: 8 Apr 2010, 15:18:03 UTC

I am finding that about 25% of the output of my HD5850 GPU is not validated immediately, and about 50% to 60% is in the same state very shortly after returning the results. A rather interesting situation. But, none of them seems to be rejected after the wingmen report.

Much better than a few days ago.
Go away, I was asleep


ID: 38389 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Crunch3r
Volunteer developer
Avatar

Send message
Joined: 17 Feb 08
Posts: 363
Credit: 258,227,990
RAC: 0
Message 38390 - Posted: 8 Apr 2010, 15:19:03 UTC - in response to Message 38386.  
Last modified: 8 Apr 2010, 15:19:52 UTC



http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90934041

As I understand it this should not be happening.

Is the validator still broken or? Both of these 4 x 58xx machines running v.23 are putting out a massive number of invalid results. The above are just a few examples.

On some of the above WUs all machines have the latest drivers. The driver version is irrelevant.


That's what i'm seeing here too. I doubt that the validator does what it should.
If it would work correctly it would NOT declare a result done on two 58xx with 0.20b and 0.23 as a match and VALID, discarding the most likely correct one done on a 48xx(marked as invalid) ...

Join Support science! Joinc Team BOINC United now!
ID: 38390 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : News : stock ATI 58x0 apps updated

©2024 Astroinformatics Group