Welcome to MilkyWay@home

Computational Error?

Message boards : Number crunching : Computational Error?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,947,431
RAC: 22,257
Message 58469 - Posted: 1 Jun 2013, 12:24:09 UTC - in response to Message 58462.  

AMD is again changing their software to favor gamers and not crunchers, they are working on it but don't hold your breath! If you ever find a version that is stable and just works AND you are not a gamer, then don't upgrade unless you hear the new one is MUCH faster and has been tested by others with your configuration. I am still running version 11.9 on one of my pc's and it works here just fine.


I agree completely!! Updating GPU software/drivers is almost always in-favour of the gamer. 13.4 is currently on my "thou shall not pass" software list.


I agree too, but there was an issue with my 12.x version and Einstein@home.
You have 13.4 drivers and MilkyWay is working with no errors, do I understand that from your post correctly?
Then I must update at once.
Thanks.


No he is downgrading to 12.8 to see if it works for him like it does for John Clark.
ID: 58469 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TLSI2000

Send message
Joined: 15 Mar 10
Posts: 17
Credit: 1,221,936,867
RAC: 0
Message 58474 - Posted: 1 Jun 2013, 16:07:06 UTC

To get back to operational, I downgraded back to 13.1 and it works fine.

To be successful at this downgrade, you *must* use the separate Catalyst Uninstall which is downloadable from the same page as the installs. The Catalyst Install Manger's included Uninstall option does not do the clean-up necessary to get back to a solid re-install condition.

I am now back in operation.
Good Luck.

Uninstall Link

http://support.amd.com/us/gpudownload/windows/Pages/catalyst-uninstall-utility.aspx
ID: 58474 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Un4given

Send message
Joined: 14 Feb 09
Posts: 19
Credit: 62,373,513
RAC: 0
Message 58477 - Posted: 1 Jun 2013, 22:20:46 UTC - in response to Message 58474.  

Thanks for the link, that was very helpful in removing the old drivers. Like many others I was having issues with the 13.4 drivers, but the 13.1 drivers, under a completely clean install, seem to be working again.

Oddly enough my 6950 was the only one having issues with the 13.4. My 7950 is happily crunching units with the 13.4 drivers.

Oh well. Both of my machines are crunching again, and that's what counts!
ID: 58477 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Barblovesroses

Send message
Joined: 20 May 10
Posts: 3
Credit: 59,404
RAC: 0
Message 58676 - Posted: 11 Jun 2013, 4:53:04 UTC - in response to Message 58477.  

Thanks for posting this thread.

I can't believe how many different projects this AMD Catalyst driver series is a problem for. I have had to download and reload it several times for different projects myself now in the hopes that it would resolve issues that I was having at different projects. Some of the projects had me go back to the older version 12.8 (volpex) and you seem to want 13.1 for Milky Way to work properly while I believe that T4T wants the latest version 13.4. I'm not sure what version keeps Poem happy but its running smoothly at present for me so I'm not complaining!(and to think I didn't even know about Catalyst when I first was running Poem and got this new computer and got introduced to the world of GPU's almost a year ago...hahaha)

I just switched back down to 13.1 Catalyst and I hope my other projects will run ok too for now...I should know within a few days. It gets confusing not knowing why you are getting all of these computation errors every time there is an upgrade of a system and you don't know if its something you need to change or if its a programming issue. Do you post something on the board yourself or do you wait to see if someone else does...then sometimes you get busy and forget and the problem is still there a month later repeating itself and noone else has complained about it either.

Anyway, thanks everyone for talking about this issue and figuring it out before I looked into it myself!!! Hugs and cyberblessings to all of you!

ID: 58676 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GregO

Send message
Joined: 29 Jun 11
Posts: 6
Credit: 620,411
RAC: 0
Message 58692 - Posted: 11 Jun 2013, 13:14:51 UTC - in response to Message 58369.  

RESOLVED, at least on my machine. I thought I'd go back in and try the water this morning and ended up receiving several new applications. One of which was milkyway_seperation_1.02_windows_x86_64_ati14. It appears to be crunching without any of the Compuatational Error messages I had a month ago.

As a volunteer I just didn't feel that going into my machine and removing drivers and downgrading to previous Catalyst versions should be necessary, especially when this was the only application that I could see which was effected.

Happy crunching everyone !
[Win7x64, Boinc 7.0.64(x64), Radeon HD5870 w/Catalyst 13.4]
ID: 58692 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
GregO

Send message
Joined: 29 Jun 11
Posts: 6
Credit: 620,411
RAC: 0
Message 58798 - Posted: 12 Jun 2013, 20:31:31 UTC - in response to Message 58692.  

RESOLVED, at least on my machine. I thought I'd go back in and try the water this morning and ended up receiving several new applications. One of which was milkyway_seperation_1.02_windows_x86_64_ati14. It appears to be crunching without any of the Compuatational Error messages I had a month ago.

(Win7x64, Boinc 7.0.64(x64), Xeon W3580, Radeon HD5870 w/Catalyst 13.4)


Just to clarify, it's the milkyway_seperation_1.02_windows_x86_64_opencl_amd_ati that was, and still is, causing the Computational Error on my machine.
ID: 58798 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Seraphim401

Send message
Joined: 3 Apr 10
Posts: 8
Credit: 27,124,250
RAC: 0
Message 59057 - Posted: 23 Jun 2013, 14:52:06 UTC - in response to Message 58798.  

Just to clarify, it's the milkyway_seperation_1.02_windows_x86_64_opencl_amd_ati that was, and still is, causing the Computational Error on my machine.


Yes that work unit always errors out on me,but only on my x64 rig with an HD 5770 and HD 7970.
My 2 two HD 5870's in running on a x86 have no such problems.

ID: 59057 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 59059 - Posted: 23 Jun 2013, 15:12:01 UTC

Hey there,

It looks to me like one of you GPUs doesn't support Double Precision. We require that GPUs have double precision support for Milkyway@home apps. Sometimes this is an issue with a driver not supporting it even though the card does because for some silly reason it is deemed optional to implement it in the OpenCL standards. I recommend you ensure all drivers are up to date because there may be a new driver that does support double precision.

Sorry for the long response time,

Jake W
ID: 59059 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 59066 - Posted: 23 Jun 2013, 21:06:22 UTC - in response to Message 59057.  

Just to clarify, it's the milkyway_seperation_1.02_windows_x86_64_opencl_amd_ati that was, and still is, causing the Computational Error on my machine.


Yes that work unit always errors out on me,but only on my x64 rig with an HD 5770 and HD 7970.
My 2 two HD 5870's in running on a x86 have no such problems.



You need to exclude your HD 5770 from being used for mw. It can not do double precision calculations.
See Updated GPU Requirements
ID: 59066 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Seraphim401

Send message
Joined: 3 Apr 10
Posts: 8
Credit: 27,124,250
RAC: 0
Message 59083 - Posted: 24 Jun 2013, 20:51:23 UTC - in response to Message 59066.  
Last modified: 24 Jun 2013, 20:52:18 UTC

The program does this automatically.
It only uses the 7970.
Thanks though :)
ID: 59083 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Seraphim401

Send message
Joined: 3 Apr 10
Posts: 8
Credit: 27,124,250
RAC: 0
Message 59085 - Posted: 24 Jun 2013, 21:36:58 UTC - in response to Message 59059.  

Hey there,

It looks to me like one of you GPUs doesn't support Double Precision. We require that GPUs have double precision support for Milkyway@home apps. Sometimes this is an issue with a driver not supporting it even though the card does because for some silly reason it is deemed optional to implement it in the OpenCL standards. I recommend you ensure all drivers are up to date because there may be a new driver that does support double precision.

Sorry for the long response time,

Jake W


Thanks for the response.
It looks like I spoke to soon,now all WU are failing.
Milkyway never uses the 5770.
Did a project reset and it looks like the WU are not failing anymore.
What I don't understand is the fact that the WU's always used to fail at the end of the calculations.
This was particularly annoying when doing the N-body calculation since it would use the cpu for hours only to give an error at the end.

ID: 59085 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 59089 - Posted: 25 Jun 2013, 10:05:38 UTC

host 521265

Right now you have like 40% of your tasks working (HD7970) and 60% error tasks (HD5770).

The error task logs are showing
for separation:
Found 2 CL devices
Device 'Juniper' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Driver version: 1124.2 (VM)
Version: OpenCL 1.2 AMD-APP (1124.2)
Compute capability: 0.0
Max compute units: 10
Clock frequency: 500 Mhz
Global mem size: 1073741824
Local mem size: 32768
Max const buf size: 65536
Double extension: (none)
Device doesn't support double precision
Failed to calculate likelihood


and for separation modified fit:
Found 2 CL devices
Device 'Juniper' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Board: ATI Radeon HD 5700 Series

Driver version: 1124.2 (VM)
Version: OpenCL 1.2 AMD-APP (1124.2)
Compute capability: 0.0
Max compute units: 10
Clock frequency: 500 Mhz
Global mem size: 1073741824
Local mem size: 32768
Max const buf size: 65536
Double extension: (none)
Device doesn't support double precision
Failed to calculate likelihood
ID: 59089 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Seraphim401

Send message
Joined: 3 Apr 10
Posts: 8
Credit: 27,124,250
RAC: 0
Message 59101 - Posted: 25 Jun 2013, 17:33:16 UTC - in response to Message 59089.  

host 521265

Right now you have like 40% of your tasks working (HD7970) and 60% error tasks (HD5770).

The error task logs are showing
for separation:
Found 2 CL devices
Device 'Juniper' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Driver version: 1124.2 (VM)
Version: OpenCL 1.2 AMD-APP (1124.2)
Compute capability: 0.0
Max compute units: 10
Clock frequency: 500 Mhz
Global mem size: 1073741824
Local mem size: 32768
Max const buf size: 65536
Double extension: (none)
Device doesn't support double precision
Failed to calculate likelihood


and for separation modified fit:
Found 2 CL devices
Device 'Juniper' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Board: ATI Radeon HD 5700 Series

Driver version: 1124.2 (VM)
Version: OpenCL 1.2 AMD-APP (1124.2)
Compute capability: 0.0
Max compute units: 10
Clock frequency: 500 Mhz
Global mem size: 1073741824
Local mem size: 32768
Max const buf size: 65536
Double extension: (none)
Device doesn't support double precision
Failed to calculate likelihood


That is weird.
The device it detects as a Juniper is the 7970!
I downclocked the 7970 to 500 Mhz and left the 5770 at stock (850 Mhz).
So how do I disable the 5770 for this project only.
The other projects I have running have no problem with the 5770.

Thanks for your help.
ID: 59101 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Seraphim401

Send message
Joined: 3 Apr 10
Posts: 8
Credit: 27,124,250
RAC: 0
Message 59108 - Posted: 25 Jun 2013, 19:31:52 UTC - in response to Message 59101.  
Last modified: 25 Jun 2013, 19:46:48 UTC

<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
<exclude_gpu>
<url>http://milkyway.cs.rpi.edu/milkyway/</url>
<device_num>1</device_num>
<type>ATI</type>
<app><name>milkyway_separation__modified_fit_1.22_windows_x86_64__opencl_amd_ati.exe</name></app>
<app><name>milkyway_separation_1.02_windows_x86_64__opencl_amd_ati.exe</name></app>
</exclude_gpu>
</options>
</cc_config>


I'm using this setting in my cc_config.xml.
So far so good.
Will report if the errors still persists.
ID: 59108 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,947,431
RAC: 22,257
Message 59119 - Posted: 26 Jun 2013, 11:22:07 UTC - in response to Message 59101.  


That is weird.
The device it detects as a Juniper is the 7970!
I downclocked the 7970 to 500 Mhz and left the 5770 at stock (850 Mhz).
So how do I disable the 5770 for this project only.
The other projects I have running have no problem with the 5770.

Thanks for your help.


Try this one:
<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
<exclude_gpu>
<url>http://http:/moowrap.net/</url>
<device_num>0</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://boinc.thesonntags.com/collatz/</url>
<device_num>1</device_num>
</exclude_gpu>
<report_results_immediately>1</report_results_immediately>
</options>
</cc_config>

What you have to do is figure out which gpu is device zero and which is device 1, to do this open Boinc Manager and click Advanced and then event log. Scroll to the top and about 5 lines down to will list the gpu's Boinc found, make a note of which is which and then adjust the file above to make it work.

What the file above is doing is first telling Boinc to use all the gpu's it finds thru this line:
<use_all_gpus>1</use_all_gpus>

Then it is telling Boinc to exclude gpu zero from Moo:
<exclude_gpu>
<url>http://http:/moowrap.net/</url>
<device_num>0</device_num>
</exclude_gpu>

That means gpu zero can now work on any other project I am attached to EXCEPT Moo.

The file then tells Boinc to exclude gpu from Collatz:
<exclude_gpu>
<url>http://boinc.thesonntags.com/collatz/</url>
<device_num>1</device_num>
</exclude_gpu>

This means that gpu zero CAN run Collatz but gpu zero CANNOT!!

The next line prevents some possible side effects in some cases by telling Boinc to report all results back to the project immediately:
<report_results_immediately>1</report_results_immediately>

If you do not have a problem you can safely remove this line as it can cause excessive 'banging on the door' at each project. Each project only has so much Server time for each person and if you, or I, are ALWAYS connecting there is less time for everyone else.

Adjust the file to fit your programs and you should be good to go.
ID: 59119 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Seraphim401

Send message
Joined: 3 Apr 10
Posts: 8
Credit: 27,124,250
RAC: 0
Message 59122 - Posted: 26 Jun 2013, 13:30:31 UTC - in response to Message 59119.  
Last modified: 26 Jun 2013, 13:46:05 UTC

Thanks mikey,but it seems that the settings i'm using now did the trick.
I'm keeping your settings in mind incase the madness returns :)

Thank you.
ID: 59122 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,947,431
RAC: 22,257
Message 59135 - Posted: 27 Jun 2013, 11:48:16 UTC - in response to Message 59122.  

Thanks mikey,but it seems that the settings i'm using now did the trick.
I'm keeping your settings in mind incase the madness returns :)

Thank you.


No problem, it will work on any project, so if you ever get more pc's or just switch projects just change the website and you will be good to go.
ID: 59135 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Computational Error?

©2024 Astroinformatics Group