rpi_logo
All Milkyway@Home 1.02 tasks ending in computation error on HD6950.
All Milkyway@Home 1.02 tasks ending in computation error on HD6950.
log in

Advanced search

Message boards : Number crunching : All Milkyway@Home 1.02 tasks ending in computation error on HD6950.

1 · 2 · 3 · 4 . . . 7 · Next
Author Message
[AF>FAH-Addict.net]toTOW
Send message
Joined: 30 Oct 10
Posts: 6
Credit: 10,334,227
RAC: 0

Message 60361 - Posted: 11 Nov 2013, 17:42:28 UTC
Last modified: 11 Nov 2013, 17:48:03 UTC

Hi all,

I've restarted MW on my ATI which I didn't use for a very long time. It used to work fine, but now, I'm getting errors on every MilkyWay@Home v1.02 (opencl_amd_ati) tasks.

The WU seem to works fine (GPU usage 100%), but when they reach 100%, they error out.

The error seems to be always the same :

<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073741819 (0xc0000005)
</message>


Using AMD IL kernel
Binary status (0): CL_SUCCESS
Estimated AMD GPU GFLOP/s: 2580 SP GFLOP/s, 645 DP FLOP/s
Using a target frequency of 60.0
Using a block size of 6144 with 52 blocks/chunk
Using clWaitForEvents() for polling (mode -1)
Range: { nu_steps = 320, mu_steps = 1600, r_steps = 1400 }
Iteration area: 2240000
Chunk estimate: 7
Num chunks: 8
Chunk size: 319488
Added area: 315904
Effective area: 2555904
Initial wait: 12 ms
Integration time: 36.739142 s. Average time per iteration = 114.809818 ms


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x000007FEE9E71DCD read attempt to address 0x00000010

Engaging BOINC Windows Runtime Debugger...


These tasks are computed on this host, http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=250340 ...

Note that Milkyway@Home Separation (Modified Fit) v1.28 (opencl_amd_ati) tasks are doing fine on the same system. So as other BOINC apps.

Host is running on 7 64bits with Catalyst 13.4 (I noticed that 13.9 wouldn't update the OpenCL driver, so I didn't install it).

Any advice ?

Profile mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2182
Credit: 231,022,148
RAC: 208,392

Message 60368 - Posted: 12 Nov 2013, 11:25:58 UTC - in response to Message 60361.

Hi all,

I've restarted MW on my ATI which I didn't use for a very long time. It used to work fine, but now, I'm getting errors on every MilkyWay@Home v1.02 (opencl_amd_ati) tasks.

The WU seem to works fine (GPU usage 100%), but when they reach 100%, they error out.

The error seems to be always the same :

<core_client_version>7.0.64</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073741819 (0xc0000005)
</message>


Using AMD IL kernel
Binary status (0): CL_SUCCESS
Estimated AMD GPU GFLOP/s: 2580 SP GFLOP/s, 645 DP FLOP/s
Using a target frequency of 60.0
Using a block size of 6144 with 52 blocks/chunk
Using clWaitForEvents() for polling (mode -1)
Range: { nu_steps = 320, mu_steps = 1600, r_steps = 1400 }
Iteration area: 2240000
Chunk estimate: 7
Num chunks: 8
Chunk size: 319488
Added area: 315904
Effective area: 2555904
Initial wait: 12 ms
Integration time: 36.739142 s. Average time per iteration = 114.809818 ms


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x000007FEE9E71DCD read attempt to address 0x00000010

Engaging BOINC Windows Runtime Debugger...


These tasks are computed on this host, http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=250340 ...

Note that Milkyway@Home Separation (Modified Fit) v1.28 (opencl_amd_ati) tasks are doing fine on the same system. So as other BOINC apps.

Host is running on 7 64bits with Catalyst 13.4 (I noticed that 13.9 wouldn't update the OpenCL driver, so I didn't install it).

Any advice ?


The pc of yours with the 69?? gpu in it is working just fine, when I click on your link it says pc not found though. When I click on your name and then view your computers I only see one pc.

Matt
Send message
Joined: 22 Aug 10
Posts: 32
Credit: 86,011,750
RAC: 0

Message 60376 - Posted: 12 Nov 2013, 21:20:28 UTC

I am also having a problem with every MilkyWay@Home WU failing on my machine as well, though the Separation (Modified Fit) runs complete successfully. I have both a 6950 and 7950 in my rig.
____________

[AF>FAH-Addict.net]toTOW
Send message
Joined: 30 Oct 10
Posts: 6
Credit: 10,334,227
RAC: 0

Message 60377 - Posted: 12 Nov 2013, 22:38:19 UTC

mikey> yes, I forgot to take the last 0 in the URL tags ... but it's the only one I have on the project with DP capable GPU, so it's not hard to find it in my profile :D

I unchecked standard Milkyway@Home application in my profile to avoid the failing application, but I'd rather figure out what's goning wrong with my setup, it would be more useful to the project ...

Profile mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2182
Credit: 231,022,148
RAC: 208,392

Message 60380 - Posted: 13 Nov 2013, 12:21:26 UTC - in response to Message 60376.

I am also having a problem with every MilkyWay@Home WU failing on my machine as well, though the Separation (Modified Fit) runs complete successfully. I have both a 6950 and 7950 in my rig.


Your pc's are hidden so I can't see...what version of the Catalyst software are you using? Are you overclocking?

Profile mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2182
Credit: 231,022,148
RAC: 208,392

Message 60381 - Posted: 13 Nov 2013, 12:27:22 UTC - in response to Message 60377.

mikey> yes, I forgot to take the last 0 in the URL tags ... but it's the only one I have on the project with DP capable GPU, so it's not hard to find it in my profile :D

I unchecked standard Milkyway@Home application in my profile to avoid the failing application, but I'd rather figure out what's goning wrong with my setup, it would be more useful to the project ...


That worked for me, thanks!
The only thing that looks funky to me is the lack of system ram in your i7 8cpu machine, 4gb of ram is just shabby these days, 8gb or even 16gb is the norm now. One other thing could be if you aren't leaving a cpu core free just for the gpu to use, gpu's can be sensitive to not being fed new info when they want it and if the cpu is busy crunching it can delay things causing problems.

Matt
Send message
Joined: 22 Aug 10
Posts: 32
Credit: 86,011,750
RAC: 0

Message 60389 - Posted: 14 Nov 2013, 4:27:15 UTC - in response to Message 60380.
Last modified: 14 Nov 2013, 4:29:52 UTC

Didn't realize I had my PCs hidden. Fixed now. I only run MW on the AMD/ATI machine. I'm using Catalyst 13.4. I do have my cards overclocked. They never error out on Separation runs - only the the MilkyWay 1.02 WUs.
____________

Profile mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2182
Credit: 231,022,148
RAC: 208,392

Message 60390 - Posted: 14 Nov 2013, 11:49:23 UTC - in response to Message 60389.

Didn't realize I had my PCs hidden. Fixed now. I only run MW on the AMD/ATI machine. I'm using Catalyst 13.4. I do have my cards overclocked. They never error out on Separation runs - only the the MilkyWay 1.02 WUs.


The overclocking is probably the reason then, some units are much more sensitive to any ever so slight differences as they go thru their crunching and can error out in a heartbeat. Don't worry about it, others will pick those up and do them, just keep your eye on the News section and when they release new units allow them again and if they work great, if not it's okay too.

[AF>FAH-Addict.net]toTOW
Send message
Joined: 30 Oct 10
Posts: 6
Credit: 10,334,227
RAC: 0

Message 60397 - Posted: 15 Nov 2013, 19:11:52 UTC - in response to Message 60390.

I'm pretty much sure that the issue is not from the overclocking or the card itself.

If my memories are right, error code 0xc0000005 on Windows means that a memory violation occurred ... but this is a Windows error, not something coming from the GPU.

Matt
Send message
Joined: 22 Aug 10
Posts: 32
Credit: 86,011,750
RAC: 0

Message 60404 - Posted: 16 Nov 2013, 6:53:44 UTC
Last modified: 16 Nov 2013, 6:59:11 UTC

Some MW 1.02 WUs came through today even though I have only Separation runs checked in my preferences. Of these, roughly half seemed to process alright. The others resulted in errors as usual. Still, this is the first time in quite a while I've seen ANY of those WUs successful. No changes to my setup. The MilkyWay@Home site shows my machine as having two HD 7900 series cards when in reality it's one HD 6950 and one HD 7950.
____________

Profile mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2182
Credit: 231,022,148
RAC: 208,392

Message 60409 - Posted: 16 Nov 2013, 12:27:29 UTC - in response to Message 60404.

The MilkyWay@Home site shows my machine as having two HD 7900 series cards when in reality it's one HD 6950 and one HD 7950.


This happens for me too, I have a pc with dual gpu's in it and it says both are the higher card, when in fact they are not both that card.

Ascension
Send message
Joined: 20 Nov 07
Posts: 2
Credit: 7,215,939
RAC: 4,738

Message 60421 - Posted: 18 Nov 2013, 12:10:04 UTC

Hi,

I am getting the same errors on about 95% of the Milkyway jobs.

I have had this issue with the since the last 3 official versions of BOINC and only on the GPU runs

My PC has Windows 8.1 Pro 64 bit (it did this on Win 7 and regular Win 8 too).

I have 3 Graphics cards (4 if you count the Intel GPU that some app can use),2x ATi and 1 nVidia in and a i7-3770 CPU with 32GB RAM

http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=357704

hope you can see the host

It seems only to affect my PC's with dual or more graphics cards

Profile mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2182
Credit: 231,022,148
RAC: 208,392

Message 60424 - Posted: 18 Nov 2013, 12:34:50 UTC - in response to Message 60421.

Hi,

I am getting the same errors on about 95% of the Milkyway jobs.

I have had this issue with the since the last 3 official versions of BOINC and only on the GPU runs

My PC has Windows 8.1 Pro 64 bit (it did this on Win 7 and regular Win 8 too).

I have 3 Graphics cards (4 if you count the Intel GPU that some app can use),2x ATi and 1 nVidia in and a i7-3770 CPU with 32GB RAM

http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=357704

hope you can see the host

It seems only to affect my PC's with dual or more graphics cards


Are you using a cc_config.xml file kind of like this one:
<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
<skip_cpu_benchmarks>1</skip_cpu_benchmarks>
</options>
</cc_config>

Essentially it tells Boinc to use all the gpu's it finds, not just one for everything. I am not SURE that is your problem though, some units are working, but some aren't.

What you might end up doing is much more complicated, but that doesn't seem to be a problem for you as multiple kinds of gpu's seems to be a piece of cake for you! One of the more complicated ways would be too use another cc_config.xml file like this one:
<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
<exclude_gpu>
<url>http://boinc.fzk.de/poem/</url>
<device_num>1</device_num>
[<type>NVIDIA|ATI|intel_gpu</type>]
</exclude_gpu>
</options>
</cc_config>


or simply add <type>ATI</type> inside <exclude_gpu> without <device_num> specified.

And simply exclude the Nvidia, or AMD gpu that is causing the problems here and attach to a 2nd project and exclude the cards that work here from that project.

One other much more complicated way would be to install Boinc TWICE on the pc, to two different locations, and exclude the Nvidia gpu in one installation and the AMD gpu's in the other installation.

Ascension
Send message
Joined: 20 Nov 07
Posts: 2
Credit: 7,215,939
RAC: 4,738

Message 60426 - Posted: 18 Nov 2013, 13:01:33 UTC - in response to Message 60424.

thanks for the reply, the GPU's all work by themselves but when combined in one setup the issue comes up and only for MW.

My other apps run fine with this config and use all the GPU's like it's Xmas

<cc_config>

<options>
<http_1_0>1</http_1_0>
<use_all_gpus>1</use_all_gpus>
</options>

</cc_config>

this is mine, straight forward.

SETI, Collatz, Einstein and a few others running perfect with all the resources at its disposal

Profile mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2182
Credit: 231,022,148
RAC: 208,392

Message 60427 - Posted: 18 Nov 2013, 15:59:23 UTC - in response to Message 60426.
Last modified: 18 Nov 2013, 16:08:52 UTC

thanks for the reply, the GPU's all work by themselves but when combined in one setup the issue comes up and only for MW.

My other apps run fine with this config and use all the GPU's like it's Xmas

<cc_config>

<options>
<http_1_0>1</http_1_0>
<use_all_gpus>1</use_all_gpus>
</options>

</cc_config>

this is mine, straight forward.

SETI, Collatz, Einstein and a few others running perfect with all the resources at its disposal


I don't know then, you are way beyond what I have ever done.
As part of the troubleshooting check though you might start excluding gpu's until you figure out which one is giving you the troubles, then see if you can fix it or just use it elsewhere. Right now it seems you don't know which one is the problem, right?

[AF>FAH-Addict.net]toTOW
Send message
Joined: 30 Oct 10
Posts: 6
Credit: 10,334,227
RAC: 0

Message 60434 - Posted: 19 Nov 2013, 23:22:46 UTC

I'm pretty sure that something is wrong with MW application, but it looks like no project developer is watching the forums ... that's a shame for such a big project :(

Tom*
Send message
Joined: 4 Oct 11
Posts: 38
Credit: 283,140,578
RAC: 0

Message 60441 - Posted: 20 Nov 2013, 2:41:48 UTC

I PM'd ascension that all his failures are on device 0 on platform 0 which is a
turks, device 1 on platform 0 is a cape verde.

according to wikipedia turks have no double precision, while cape verde does.

so I wonder how his turks worked when by itself? Unless BOINC or Einstein is reporting an incorrect device?
ex of failing task
Found 2 CL devices
Device 'Turks' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Driver version: 1268.1 (VM)
Version: OpenCL 1.2 AMD-APP (1268.1)
Compute capability: 0.0
Max compute units: 6
Clock frequency: 800 Mhz
Global mem size: 1073741824
Local mem size: 32768
Max const buf size: 65536
Double extension: (none)
Device doesn't support double precision
Failed to calculate likelihood

Is there anyway to just disable device 0 on platform 0? Leaving device 1 to
process MilkyWay?

Profile mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2182
Credit: 231,022,148
RAC: 208,392

Message 60444 - Posted: 20 Nov 2013, 12:41:08 UTC - in response to Message 60441.

I PM'd ascension that all his failures are on device 0 on platform 0 which is a
turks, device 1 on platform 0 is a cape verde.

according to wikipedia turks have no double precision, while cape verde does.

so I wonder how his turks worked when by itself? Unless BOINC or Einstein is reporting an incorrect device?
ex of failing task
Found 2 CL devices
Device 'Turks' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Driver version: 1268.1 (VM)
Version: OpenCL 1.2 AMD-APP (1268.1)
Compute capability: 0.0
Max compute units: 6
Clock frequency: 800 Mhz
Global mem size: 1073741824
Local mem size: 32768
Max const buf size: 65536
Double extension: (none)
Device doesn't support double precision
Failed to calculate likelihood

Is there anyway to just disable device 0 on platform 0? Leaving device 1 to
process MilkyWay?


You are correct the 66?? series does NOT have DP, good catch!!
As for not using it here yes he he can use the exclude_gpu line like this:

<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
<exclude_gpu>
<url>http://milkyway.cs.rpi.edu/milkyway/</url>
<device_num>0</device_num>
</exclude_gpu>
</options>
</cc_config>

If he replaces his current cc_config.xml file with the one above it should work just fine and exclude gpu zero from MilkyWay.

To use gpu zero on another project such as Poem for gpu zero he will have to add lines such as:

<exclude_gpu>
<url>http://boinc.fzk.de/poem/</url>
<device_num>1</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://boinc.fzk.de/poem/</url>
<device_num>2</device_num>
</exclude_gpu>

Adding the above lines would exclude gpu's 1 and 2 from Poem. On the homepage of every Boinc project is a website link that it tells you to use if it isn't on the list of projects, use that address to replace the address above if Poem is not your project of choice.

Blainer
Send message
Joined: 7 Dec 10
Posts: 1
Credit: 72,892,302
RAC: 0

Message 60504 - Posted: 2 Dec 2013, 15:47:31 UTC - in response to Message 60444.
Last modified: 2 Dec 2013, 15:49:08 UTC

Hi.

I've also been getting computation errors from all of the 'de_separation' units I've been running, for a couple weeks now. I'm running an ATI 6950 w/ Cat 13.9 in my system. Should I exclude Milkyway from using it? The 'modified fit' units run without errors.

Thanks.

Profile mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2182
Credit: 231,022,148
RAC: 208,392

Message 60506 - Posted: 3 Dec 2013, 12:51:12 UTC - in response to Message 60504.

Hi.

I've also been getting computation errors from all of the 'de_separation' units I've been running, for a couple weeks now. I'm running an ATI 6950 w/ Cat 13.9 in my system. Should I exclude Milkyway from using it? The 'modified fit' units run without errors.

Thanks.


I have a similar pc and am also running the 'modified fit' units but on my cpu, not my gpu, and they are working just fine. If it were me yes I would exclude the gpu from MilkyWay and the sign onto another project and use it there. I see you also run Seti, I think they have gpu units, so you could run cpu units here and the gpu units from there, contributing to two projects at once. There are several other gpu projects out there that would love to have your gpu help them out..in no particular order there is Collaz, Moo, Prime Grid, DistrRTgen, Poem, GpuGrid, World Community Grid)although they do not always have a gpu project running), Einstein and I am sure I am forgetting some others too.

1 · 2 · 3 · 4 . . . 7 · Next
Post to thread

Message boards : Number crunching : All Milkyway@Home 1.02 tasks ending in computation error on HD6950.


Main page · Your account · Message boards


Copyright © 2018 AstroInformatics Group