Welcome to MilkyWay@home

All Milkyway@Home 1.02 tasks ending in computation error on HD6950.

Message boards : Number crunching : All Milkyway@Home 1.02 tasks ending in computation error on HD6950.
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

AuthorMessage
Randall Hose

Send message
Joined: 18 Oct 12
Posts: 8
Credit: 145,940,032
RAC: 0
Message 60517 - Posted: 5 Dec 2013, 4:30:55 UTC

Hi,
I can confirm that I see the same errors of WU's failing, only the "normal" 1.02 programs.
My system has dual 5850's and has been running Milkway for well over a year, I don't check it religously, but noted a number of weeks ago that the output of the machine had dropped, i.e. it wasn't crunching using the GPUs. Usually it earns 300k credit a day, but it had dropped to around 80k. The reason for this was that it was receiving 1.02 units, which were all failing within 10 seconds - therefore it capped out a day's work in a few hours and didn't send more.
The units appear to start OK but rapidly progress through the percentages and end with 100% and a computation error between 7-10 seconds or so.
Both of the cards are not overclocked and never have been. I also updated to Catalyst 13.9 and BOINC 7.2.28 but this has not helped. The separation runs progress with no issues.
As others have mentioned, it may be a dual GPU thing. If I get time I will remove one of the cards and see what happens, although this is hardly an optimal solution.

Thanks.
ID: 60517 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt

Send message
Joined: 22 Aug 10
Posts: 32
Credit: 86,014,800
RAC: 0
Message 60542 - Posted: 9 Dec 2013, 6:48:41 UTC - in response to Message 60517.  

@Randall

Let us know what you find out if you remove one of the GPUs. I also began seeing a near 100% fail rate with MW 1.02 after installation of a second GPU (actually a re-installation of my 6950 after upgrading to a 7950). I've tried everything I can think of short of removing a card.

I don't know why removing a card would help, though. The system doesn't seem to be struggling for resources. I cut out out CPU tasks entirely for a while to see if that made a difference, but the MW 1.02 tasks kept failing while the Separation runs went through without a problem.

ID: 60542 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,946,492
RAC: 22,242
Message 60547 - Posted: 10 Dec 2013, 13:02:23 UTC - in response to Message 60542.  

@Randall

Let us know what you find out if you remove one of the GPUs. I also began seeing a near 100% fail rate with MW 1.02 after installation of a second GPU (actually a re-installation of my 6950 after upgrading to a 7950). I've tried everything I can think of short of removing a card.

I don't know why removing a card would help, though. The system doesn't seem to be struggling for resources. I cut out out CPU tasks entirely for a while to see if that made a difference, but the MW 1.02 tasks kept failing while the Separation runs went through without a problem.


Are you talking about your AMD machine or your Nvidia machine? Because there IS a Boinc bug that pops up sometimes when two Nvidia cards are both crunching for the same project and neither end up working. The best thing in that case is to each split the cards so one of each type is in each machine, or just use a project exclude line in a cc_config.xml file you create like this example:

<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
<exclude_gpu>
<url>http://milkyway.cs.rpi.edu/milkyway/</url>
<device_num>0</device_num>
</exclude_gpu>
<exclude_gpu>
<url>http://boinc.thesonntags.com/collatz/</url>
<device_num>1</device_num>
</exclude_gpu>
</options>
</cc_config>

That file excludes gpu #0 from Milkyway AND it excludes gpu #1 from Poem, meaning each can now work at their own another project with no problem.
ID: 60547 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>FAH-Addict.net]toTOW

Send message
Joined: 30 Oct 10
Posts: 6
Credit: 10,368,264
RAC: 0
Message 60598 - Posted: 15 Dec 2013, 15:09:38 UTC

I really feel like project developers don't give a shit to their project !

Aren't you interested into getting less computation errors that would speed up the project and waste less resources ?

Also mikey, if you don't have a solution the issue described in the thread, please get out of it stop going of topic, that would be more useful !

To the other contributors reporting an issue, can you confirm the error codes you get are similar to those reported in the first post ?

I think I have given all the details I could to help someone who know to investigate on the issue. If you need more details, feel free to contact me.
ID: 60598 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Phil

Send message
Joined: 29 Aug 10
Posts: 25
Credit: 2,172,252,217
RAC: 0
Message 60600 - Posted: 15 Dec 2013, 17:05:19 UTC

I get the same error with my 6990s when using 13.9 drivers. I have been using 11.12 and these work fine.
ID: 60600 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Black~Mystic

Send message
Joined: 13 Nov 10
Posts: 10
Credit: 212,710,651
RAC: 0
Message 60613 - Posted: 17 Dec 2013, 15:06:10 UTC

Having the same issue using a 5850. single GPU, all 1.02's failing.
ID: 60613 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jintho

Send message
Joined: 11 Dec 11
Posts: 6
Credit: 4,823,436
RAC: 0
Message 60663 - Posted: 30 Dec 2013, 13:03:45 UTC - in response to Message 60613.  
Last modified: 30 Dec 2013, 13:08:04 UTC

Exactly the same as Black~Mystic - just rejoined the project today only to see all WUs fail. Card not overclocked; used to work fine over a year ago. Driver 13.12 (most recent).
Edit: Not only 1.02 but also "modified fit" fail immediately. Card worked fine on POEM; out of WUs atm...
ID: 60663 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Randall Hose

Send message
Joined: 18 Oct 12
Posts: 8
Credit: 145,940,032
RAC: 0
Message 60667 - Posted: 31 Dec 2013, 3:17:22 UTC

Hi all,
I finally got time over the Christmas break to remove one card, and voila! Fails with just the one card, as well. All 1.02 units fail.

Randall.
ID: 60667 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Randall Hose

Send message
Joined: 18 Oct 12
Posts: 8
Credit: 145,940,032
RAC: 0
Message 60668 - Posted: 31 Dec 2013, 3:18:06 UTC - in response to Message 60600.  

I get the same error with my 6990s when using 13.9 drivers. I have been using 11.12 and these work fine.


Phil, do you mean a really old version of the Catalyst driver?
ID: 60668 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Randall Hose

Send message
Joined: 18 Oct 12
Posts: 8
Credit: 145,940,032
RAC: 0
Message 60673 - Posted: 2 Jan 2014, 3:28:55 UTC - in response to Message 60547.  
Last modified: 2 Jan 2014, 3:29:28 UTC

Are you talking about your AMD machine or your Nvidia machine?


Mikey, I'm pretty sure I said 5850's.
I don't crunch MW on any Nvidias, not that I own any Nvidias anyway.

I'm hoping Phil replies soon to the question of older/newer Catalyst drivers.
ID: 60673 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,946,492
RAC: 22,242
Message 60674 - Posted: 2 Jan 2014, 12:00:09 UTC - in response to Message 60673.  

[quote
I'm hoping Phil replies soon to the question of older/newer Catalyst drivers.[/quote]

You can get non beta AMD/ATI drivers here, download links are at the very bottom of the page. http://www.hal6000.com/seti/boinc_ati_gpu_cheat_sheet.htm
ID: 60674 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Randall Hose

Send message
Joined: 18 Oct 12
Posts: 8
Credit: 145,940,032
RAC: 0
Message 60678 - Posted: 3 Jan 2014, 3:35:06 UTC

Hi,
Today I did some more troubleshooting.
Uninstalled 13.12, installed 12.1 of Catalyst.
OpenCL wrong version, so apps wouldn't run.
Updated to 12.6, same.
Updated to 13.1, OpenCL correct version, however 1.02 units still erroring.
Removed 1 card, ran with 13.1, same issue.
Uninstalled 13.1, rebooted, installed 13.1, single card, same issue.

So it seems I'm snookered. I'll now swap these cards over to Einstein as I don't think it's right to run them when only about 40% of the work is being accepted. Seems a really strange issue, I might have thought it was just my system, but others are having the same issue too.

Thanks Mikey for the drivers suggestion
ID: 60678 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,946,492
RAC: 22,242
Message 60679 - Posted: 3 Jan 2014, 12:46:50 UTC - in response to Message 60678.  

Hi,
Today I did some more troubleshooting.
Uninstalled 13.12, installed 12.1 of Catalyst.
OpenCL wrong version, so apps wouldn't run.
Updated to 12.6, same.
Updated to 13.1, OpenCL correct version, however 1.02 units still erroring.
Removed 1 card, ran with 13.1, same issue.
Uninstalled 13.1, rebooted, installed 13.1, single card, same issue.

So it seems I'm snookered. I'll now swap these cards over to Einstein as I don't think it's right to run them when only about 40% of the work is being accepted. Seems a really strange issue, I might have thought it was just my system, but others are having the same issue too.

Thanks Mikey for the drivers suggestion


Do both cards work in the same machine at Einstein? If so that may be for the best at least right now, as this problem seems to be a difficult one to solve.
ID: 60679 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Randall Hose

Send message
Joined: 18 Oct 12
Posts: 8
Credit: 145,940,032
RAC: 0
Message 60695 - Posted: 6 Jan 2014, 23:37:35 UTC - in response to Message 60679.  

yep, both cards now crunching away no issues on Einstein.
ID: 60695 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,946,492
RAC: 22,242
Message 60697 - Posted: 7 Jan 2014, 11:41:32 UTC - in response to Message 60695.  

yep, both cards now crunching away no issues on Einstein.


Oh well...I really have no more ideas for you. Although one person at another project was having problems with a 6950/6970 and switched to the newest beta of Boinc and it worked for him. You can get version 7.2.36 here:
http://boinc.berkeley.edu/dl/?C=M;O=D
ID: 60697 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 22 Jan 11
Posts: 375
Credit: 64,657,871
RAC: 0
Message 60699 - Posted: 7 Jan 2014, 18:30:37 UTC - in response to Message 60695.  
Last modified: 7 Jan 2014, 18:38:59 UTC

Installed a 5850 yesterday to replace a 4870, now I notice I'm getting loads of errors with MW 1.02, as you other folk have discovered.

Unfortunately (for diagnosis reasons) I also update the driver when I swapped cards, the 4870 was on 13.1, my 5850 on 13.12 [edit] nm I see Randall's tried 13.1 on his 5850s).

Btw I personally don't buy someone else's earlier comment that not leaving a free cpu core can cause problems, yea it'll slow it down but I don't believe it'd cause errors, it didn't with my 4870.
Rather this seems to be a problem with MW 1.02, 58xx, 69xx & 79xx(?).

Is there a way to stop just that app for now?
Team AnandTech - SETI@H, DPAD, F@H, MW@H, A@H, LHC, POGS, R@H, Einstein@H, DHEP, WCG

Main rig - Ryzen 5 3600, MSI B450 G.Pro C. AC, RTX 3060Ti 8GB, 32GB DDR4 3200, Win 10 64bit
2nd rig - i7 4930k @4.1 GHz, HD 7870 XT 3GB(DS), 16GB DDR3 1866, Win7
ID: 60699 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt

Send message
Joined: 22 Aug 10
Posts: 32
Credit: 86,014,800
RAC: 0
Message 60702 - Posted: 8 Jan 2014, 2:19:41 UTC - in response to Message 60699.  

Installed a 5850 yesterday to replace a 4870, now I notice I'm getting loads of errors with MW 1.02, as you other folk have discovered.

Unfortunately (for diagnosis reasons) I also update the driver when I swapped cards, the 4870 was on 13.1, my 5850 on 13.12 [edit] nm I see Randall's tried 13.1 on his 5850s).

Btw I personally don't buy someone else's earlier comment that not leaving a free cpu core can cause problems, yea it'll slow it down but I don't believe it'd cause errors, it didn't with my 4870.
Rather this seems to be a problem with MW 1.02, 58xx, 69xx & 79xx(?).

Is there a way to stop just that app for now?


Yes, you can deselect the MW 1.02 app in the project preferences in your account page. This way you will only receive Separation and Separation (Modified Fit) runs.

ID: 60702 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 22 Jan 11
Posts: 375
Credit: 64,657,871
RAC: 0
Message 60705 - Posted: 8 Jan 2014, 18:01:03 UTC - in response to Message 60702.  

Hi, thx for that :), as it happens I stumbled across that latter on.

No more errors now, just 1 invalid mod fit v1.28 WU - http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=482270478
Which errored out on 2 different GTS 450 PCs.

So I'm thinking this might be a driver issue (wouldn't be the 1st time IIRC), so I'm going to try other drivers (tonight if I get a chance).
13.1 & 13.12 have already been tried with no luck, so I figured I'd start with 13.11 & work my down until I have some luck or hit 13.2!

Anyone else fancy trying a few with me to save me some time??
Team AnandTech - SETI@H, DPAD, F@H, MW@H, A@H, LHC, POGS, R@H, Einstein@H, DHEP, WCG

Main rig - Ryzen 5 3600, MSI B450 G.Pro C. AC, RTX 3060Ti 8GB, 32GB DDR4 3200, Win 10 64bit
2nd rig - i7 4930k @4.1 GHz, HD 7870 XT 3GB(DS), 16GB DDR3 1866, Win7
ID: 60705 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 18 Jul 09
Posts: 300
Credit: 303,562,776
RAC: 0
Message 60706 - Posted: 8 Jan 2014, 19:59:05 UTC - in response to Message 60705.  

Anyone else fancy trying a few with me to save me some time??

Try 12.8
ID: 60706 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 22 Jan 11
Posts: 375
Credit: 64,657,871
RAC: 0
Message 60711 - Posted: 9 Jan 2014, 18:28:43 UTC - in response to Message 60706.  

Blimey that's old, but I'll give it a shot if none of the 13s work, cheers.
Team AnandTech - SETI@H, DPAD, F@H, MW@H, A@H, LHC, POGS, R@H, Einstein@H, DHEP, WCG

Main rig - Ryzen 5 3600, MSI B450 G.Pro C. AC, RTX 3060Ti 8GB, 32GB DDR4 3200, Win 10 64bit
2nd rig - i7 4930k @4.1 GHz, HD 7870 XT 3GB(DS), 16GB DDR3 1866, Win7
ID: 60711 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next

Message boards : Number crunching : All Milkyway@Home 1.02 tasks ending in computation error on HD6950.

©2024 Astroinformatics Group