Welcome to MilkyWay@home

**UPDATE: ATI 58x issue resolved!


Advanced search

Message boards : Number crunching : **UPDATE: ATI 58x issue resolved!
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
ProfileBlurf
Volunteer moderator
Project administrator

Send message
Joined: 13 Mar 08
Posts: 804
Credit: 26,380,161
RAC: 0
20 million credit badge10 year member badgeextraordinary contributions badge
Message 38252 - Posted: 7 Apr 2010, 1:54:37 UTC
Last modified: 7 Apr 2010, 4:09:21 UTC

ATI 58x0 GPU fix released

A big round of thanks go to Cluster Physik for quickly updating the 58x0 ATI GPU application, which was causing the validation problems we've been experiencing lately.
If you're running MilkyWay@Home on a 58x0 ATI GPU, please upgrade your application. A link is to the application is below.
Please take the time to update to this, as not only are the bad ATI 58x0 GPU applications causing their own workunits to be often flagged invalid, they report results quick enough that they have been quoruming up against valid results and causing the valid ones to be flagged invalid.
Thanks, Travis 7 Apr 2010 1:39:17 UTC


Link to Milkyway Version 0.23

ID: 38252 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 519
Credit: 282,827,859
RAC: 1,870
200 million credit badge10 year member badgeextraordinary contributions badge
Message 38277 - Posted: 7 Apr 2010, 5:37:37 UTC

This is excellent news, but it appears that everyone returning at once overloaded the server - so the feeder is now not running. I suspect things will settle down on the morrow.


ID: 38277 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge10 year member badge
Message 38278 - Posted: 7 Apr 2010, 5:59:12 UTC - in response to Message 38277.  

This is excellent news, but it appears that everyone returning at once overloaded the server - so the feeder is now not running. I suspect things will settle down on the morrow.




Fixed it. The problems we're seeing now is just that because we're using validation this bumped the number of parameter files on the server for results up by 2-3x. So the server is just buckling under the strain when the file deleter runs.

When we swap over to the new application, which doesn't generate these files, the server should be running much smoother.
ID: 38278 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 519
Credit: 282,827,859
RAC: 1,870
200 million credit badge10 year member badgeextraordinary contributions badge
Message 38280 - Posted: 7 Apr 2010, 6:07:00 UTC - in response to Message 38278.  

Good news -- I just did a detatch/re-attach on a couple of 4850 workstations to insure the current application was in place.

Thanks for the quick attention -- seems you do 'university hours' (ie late nights = good, morning = sleep) <smile>



Fixed it. The problems we're seeing now is just that because we're using validation this bumped the number of parameter files on the server for results up by 2-3x. So the server is just buckling under the strain when the file deleter runs.

When we swap over to the new application, which doesn't generate these files, the server should be running much smoother.


ID: 38280 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
100 million credit badge10 year member badgeextraordinary contributions badge
Message 38281 - Posted: 7 Apr 2010, 6:08:01 UTC

Maybe a throw back to the problems - I have 40+ in pendings but none showing awaiting on the server stats page.

On the positive side, validations are happening very quickly again seem to have caught up as such.

Regards
Zy
ID: 38281 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BulldogPO

Send message
Joined: 30 Jun 09
Posts: 17
Credit: 40,702,094
RAC: 0
30 million credit badge10 year member badge
Message 38282 - Posted: 7 Apr 2010, 6:40:19 UTC
Last modified: 7 Apr 2010, 6:40:40 UTC

Seems like it is working, I did have some errors and now new packets with opti app 0.22 yesterday.
ID: 38282 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
dyeman

Send message
Joined: 5 Mar 09
Posts: 6
Credit: 60,945,378
RAC: 26,224
50 million credit badge10 year member badge
Message 38354 - Posted: 8 Apr 2010, 1:26:17 UTC

I am running the "stock" GPU app on a 4770. With the new 0.23 app, processing time has increased by a factor of 3 - from 4 minutes or so to 11+ minutes. Anyone else seeing this??
ID: 38354 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
dyeman

Send message
Joined: 5 Mar 09
Posts: 6
Credit: 60,945,378
RAC: 26,224
50 million credit badge10 year member badge
Message 38359 - Posted: 8 Apr 2010, 2:57:28 UTC - in response to Message 38354.  

I am running the "stock" GPU app on a 4770. With the new 0.23 app, processing time has increased by a factor of 3 - from 4 minutes or so to 11+ minutes. Anyone else seeing this??


OK looks like processing times back to normal with some later WUs..
ID: 38359 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJStateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 173
Credit: 1,046,888,404
RAC: 2,934,927
1 billion credit badge10 year member badge
Message 38392 - Posted: 8 Apr 2010, 15:23:52 UTC

I have a machine that I cannot easily access. Can I just wait out the problem? ie: will the new app show up eventually?

If not, then I guess I can tell bam! to detach it and later on I can re-attach it. That assumes the defective code is not sticky.
ID: 38392 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge10 year member badge
Message 38416 - Posted: 8 Apr 2010, 19:46:06 UTC - in response to Message 38392.  

I have a machine that I cannot easily access. Can I just wait out the problem? ie: will the new app show up eventually?

If not, then I guess I can tell bam! to detach it and later on I can re-attach it. That assumes the defective code is not sticky.


The new app is probably there, the problem is that it may not be downloading the brook32.dll or brook64.dll you need for it. If it hasn't updated then you're going to need to detach and reattach somehow so it gets the new dll.
ID: 38416 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilednolan
Avatar

Send message
Joined: 26 Oct 09
Posts: 55
Credit: 352,166,802
RAC: 0
300 million credit badge10 year member badge
Message 38427 - Posted: 8 Apr 2010, 21:55:58 UTC - in response to Message 38416.  


The new app is probably there, the problem is that it may not be downloading the brook32.dll or brook64.dll you need for it. If it hasn't updated then you're going to need to detach and reattach somehow so it gets the new dll.


On my only system that has an HD58xx card, it left the old brook32.dll in place and downloaded a new version called brook32a_ati.dll, is that what should happen? This was done automatically, ie. no app_info on that machine.

-Dave
ID: 38427 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
50 million credit badge10 year member badgeextraordinary contributions badge
Message 38432 - Posted: 9 Apr 2010, 0:34:26 UTC - in response to Message 38427.  


The new app is probably there, the problem is that it may not be downloading the brook32.dll or brook64.dll you need for it. If it hasn't updated then you're going to need to detach and reattach somehow so it gets the new dll.


On my only system that has an HD58xx card, it left the old brook32.dll in place and downloaded a new version called brook32a_ati.dll, is that what should happen? This was done automatically, ie. no app_info on that machine.

No, the downloaded file should have been renamed to brook32.dll. Maybe this didn't worked as the old one was still there. If you don't want to detach and reattach, you can delete the old brook32.dll and rename the brook32a_ati.dll to brook32.dll (you should quit BOINC before). That should work, too.

I think there should be a possibility to configure the application on the server in such a way that this is done automatically. Exchanging a file on a host can't be a thing the BOINC developers havn't thought of, isn't it?
ID: 38432 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilekashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,489,957
RAC: 6,420
100 million credit badge10 year member badge
Message 38434 - Posted: 9 Apr 2010, 3:57:49 UTC
Last modified: 9 Apr 2010, 4:11:13 UTC

I could be wrong but I think the automatically downloaded Windows 32-bit 0.23 version will need the file to be called "brook32a_ati.dll" because the brook32a_ati.dll file is downloaded and copied and used as brook32.dll:
<file_ref>
<file_name>brook32a_ati.dll</file_name>
<open_name>brook32.dll</open_name>
<copy_file/>

I don't have the automatically downloaded Windows 32-bit v0.23 currently installed but going by the previous version the above code or something like it will be found in the "sched_request_milkyway.cs.rpi.edu_milkyway.xml" file in the BOINC folder.

I'm not certain why the _ati was added to the filename, but it was possibly to distinguish it from planned _amd versions of the brook file (for Cat 8.12 and Cat 9.1). The auto _amd versions ended up not being deployed at MilkyWay. Just a guess but the extra "a" in the auto downloaded brook file probably denotes the recently updated brook version that includes the Cypress fix.
ID: 38434 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge10 year member badge
Message 38435 - Posted: 9 Apr 2010, 4:22:55 UTC - in response to Message 38434.  

I could be wrong but I think the automatically downloaded Windows 32-bit 0.23 version will need the file to be called "brook32a_ati.dll" because the brook32a_ati.dll file is downloaded and copied and used as brook32.dll:

brook32a_ati.dll
brook32.dll


I don't have the automatically downloaded Windows 32-bit v0.23 currently installed but going by the previous version the above code or something like it will be found in the "sched_request_milkyway.cs.rpi.edu_milkyway.xml" file in the BOINC folder.

I'm not certain why the _ati was added to the filename, but it was possibly to distinguish it from planned _amd versions of the brook file (for Cat 8.12 and Cat 9.1). The auto _amd versions ended up not being deployed at MilkyWay. Just a guess but the extra "a" in the auto downloaded brook file probably denotes the recently updated brook version that includes the Cypress fix.



You're exactly right. The boinc client should download brook32a_ati.dll from the server (which is at http://milkyway.cs.rpi.edu/milkyway/download/brook32a_ati.dll) and then rename it to brook32.dll

I think I might email the dev lists to see if there's some bug that would make this not overwrite a previous brook32.dll
ID: 38435 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Microcruncher*
Avatar

Send message
Joined: 1 Jul 09
Posts: 8
Credit: 1,734,500
RAC: 0
1 million credit badge10 year member badge
Message 38437 - Posted: 9 Apr 2010, 4:36:16 UTC - in response to Message 38435.  
Last modified: 9 Apr 2010, 4:39:02 UTC

I could be wrong but I think the automatically downloaded Windows 32-bit 0.23 version will need the file to be called "brook32a_ati.dll" because the brook32a_ati.dll file is downloaded and copied and used as brook32.dll:
<file_ref>
<file_name>brook32a_ati.dll</file_name>
<open_name>brook32.dll</open_name>
<copy_file/>

I don't have the automatically downloaded Windows 32-bit v0.23 currently installed but going by the previous version the above code or something like it will be found in the "sched_request_milkyway.cs.rpi.edu_milkyway.xml" file in the BOINC folder.

I'm not certain why the _ati was added to the filename, but it was possibly to distinguish it from planned _amd versions of the brook file (for Cat 8.12 and Cat 9.1). The auto _amd versions ended up not being deployed at MilkyWay. Just a guess but the extra "a" in the auto downloaded brook file probably denotes the recently updated brook version that includes the Cypress fix.



You're exactly right. The boinc client should download brook32a_ati.dll from the server (which is at http://milkyway.cs.rpi.edu/milkyway/download/brook32a_ati.dll) and then rename it to brook32.dll

I think I might email the dev lists to see if there's some bug that would make this not overwrite a previous brook32.dll


Maybe this isn't a bug. If a MW WU was running during the download then the "old" DLL was in use and so it couldn't be replaced by the "new" one.
ID: 38437 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilekashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,489,957
RAC: 6,420
100 million credit badge10 year member badge
Message 38438 - Posted: 9 Apr 2010, 4:50:18 UTC - in response to Message 38435.  

Yes the code in the "sched_request_milkyway.cs.rpi.edu_milkyway.xml" file automatically copies and opens the brook file with the correct name format when it is used. However it remains as "brook32a_ati.dll" in the BOINC\project\milkyway.cs.rpi.edu_milkyway folder.

Just making sure that it is understood that if someone manually renames brook32a_ati.dll to brook32.dll in their MilkyWay projects folder then the auto downloaded v0.23 application will no longer work.
ID: 38438 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilednolan
Avatar

Send message
Joined: 26 Oct 09
Posts: 55
Credit: 352,166,802
RAC: 0
300 million credit badge10 year member badge
Message 38464 - Posted: 9 Apr 2010, 13:16:16 UTC - in response to Message 38437.  


Maybe this isn't a bug. If a MW WU was running during the download then the "old" DLL was in use and so it couldn't be replaced by the "new" one.


In my case, I ran out MW work and wasn't doing any for a while before it was updated, so no MW work was being processed.
Also, I didn't rename the a_ati version, I removed the older version and copied this into it's place, so both files are there now. I noticed I'm still seeing some invalid work, more than I would expect.

-Dave
ID: 38464 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Patrick Vo

Send message
Joined: 7 Jan 10
Posts: 2
Credit: 87,865,132
RAC: 0
50 million credit badge9 year member badge
Message 38519 - Posted: 10 Apr 2010, 3:31:42 UTC

I am not sure if the new version is causing me problems or not, so I am going to telll you guys what is happening and maybe someone can help me.

I did detach and re-attached to the new version of Boinc. But after running it on a 5870 graphic card for about 3 hours, my pc would slow down to a halt. The system would tell me that I ran out of resource. Nothing would work. The mouse would not respond. I had to reboot my machine and then my windows 7 would start checking for my disk drive because I had to do a reset.

I am running the latest version of Boinc and only use my graphic card to compute. Running on an MSI board with i 920 and 6 MB of memory.

I knew it had to be thhe new version of Boinc. But I can not prove it.

I then detached Milkyway, and attached to Collatz and it has been running for about 5 hours without any problems. So much for Boinc's problem. I do not understand what the real culprit is then.

Please help.
ID: 38519 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
100 million credit badge10 year member badgeextraordinary contributions badge
Message 38520 - Posted: 10 Apr 2010, 6:28:59 UTC - in response to Message 38519.  
Last modified: 10 Apr 2010, 6:35:26 UTC

I notice at Collatz you are on 850/1200 with that PC. Its a long shot, but its possible that 850 is a tad too high for you at MW with the new app, without a slight overvolt. Try resetting to default clocks then run MW,if it goes ok, you'll know, and in that case, probably brining it down to 800 or 825 would get it working.

If it does, reduce the memory down to as low as you can get it - memory speed is irrelevant at MW (I am running a 5970 on 300 memory with no issues, others I have seen running 4xxx below 200 with no issues). A low memory setting will save you a bunch on power, run cooler and/or allow a higher gpu clocks without overvolting.

If all that holds true ..... a check on PSU capacity/output/power draw maybe a further check once you get going.

Regards
Zy
ID: 38520 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Patrick Vo

Send message
Joined: 7 Jan 10
Posts: 2
Credit: 87,865,132
RAC: 0
50 million credit badge9 year member badge
Message 38530 - Posted: 10 Apr 2010, 16:03:48 UTC - in response to Message 38520.  

Thanks Zy, but I did not overclock my 5870. That is stocked memory speed. Also, I have been running Collatz for 24 hours now without any problems. So it has to be Milky Way process.

I was also running at 4.00 GHz on my i920. I am now running 3.8 Ghz and will try to run Milky Way again to see if it does mess up my system again. Only if I can get some work units...
ID: 38530 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : **UPDATE: ATI 58x issue resolved!

©2019 Astroinformatics Group