Message boards :
Number crunching :
All Milkyway@Home 1.02 tasks ending in computation error on HD6950.
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next
Author | Message |
---|---|
Send message Joined: 18 Oct 12 Posts: 8 Credit: 145,940,032 RAC: 0 |
Hi, I can confirm that I see the same errors of WU's failing, only the "normal" 1.02 programs. My system has dual 5850's and has been running Milkway for well over a year, I don't check it religously, but noted a number of weeks ago that the output of the machine had dropped, i.e. it wasn't crunching using the GPUs. Usually it earns 300k credit a day, but it had dropped to around 80k. The reason for this was that it was receiving 1.02 units, which were all failing within 10 seconds - therefore it capped out a day's work in a few hours and didn't send more. The units appear to start OK but rapidly progress through the percentages and end with 100% and a computation error between 7-10 seconds or so. Both of the cards are not overclocked and never have been. I also updated to Catalyst 13.9 and BOINC 7.2.28 but this has not helped. The separation runs progress with no issues. As others have mentioned, it may be a dual GPU thing. If I get time I will remove one of the cards and see what happens, although this is hardly an optimal solution. Thanks. |
Send message Joined: 22 Aug 10 Posts: 32 Credit: 86,014,800 RAC: 0 |
@Randall Let us know what you find out if you remove one of the GPUs. I also began seeing a near 100% fail rate with MW 1.02 after installation of a second GPU (actually a re-installation of my 6950 after upgrading to a 7950). I've tried everything I can think of short of removing a card. I don't know why removing a card would help, though. The system doesn't seem to be struggling for resources. I cut out out CPU tasks entirely for a while to see if that made a difference, but the MW 1.02 tasks kept failing while the Separation runs went through without a problem. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
@Randall Are you talking about your AMD machine or your Nvidia machine? Because there IS a Boinc bug that pops up sometimes when two Nvidia cards are both crunching for the same project and neither end up working. The best thing in that case is to each split the cards so one of each type is in each machine, or just use a project exclude line in a cc_config.xml file you create like this example: <cc_config> <options> <use_all_gpus>1</use_all_gpus> <exclude_gpu> <url>http://milkyway.cs.rpi.edu/milkyway/</url> <device_num>0</device_num> </exclude_gpu> <exclude_gpu> <url>http://boinc.thesonntags.com/collatz/</url> <device_num>1</device_num> </exclude_gpu> </options> </cc_config> That file excludes gpu #0 from Milkyway AND it excludes gpu #1 from Poem, meaning each can now work at their own another project with no problem. |
Send message Joined: 30 Oct 10 Posts: 6 Credit: 10,437,789 RAC: 0 |
I really feel like project developers don't give a shit to their project ! Aren't you interested into getting less computation errors that would speed up the project and waste less resources ? Also mikey, if you don't have a solution the issue described in the thread, please get out of it stop going of topic, that would be more useful ! To the other contributors reporting an issue, can you confirm the error codes you get are similar to those reported in the first post ? I think I have given all the details I could to help someone who know to investigate on the issue. If you need more details, feel free to contact me. |
Send message Joined: 29 Aug 10 Posts: 25 Credit: 2,172,252,217 RAC: 0 |
I get the same error with my 6990s when using 13.9 drivers. I have been using 11.12 and these work fine. |
Send message Joined: 13 Nov 10 Posts: 10 Credit: 212,710,651 RAC: 0 |
Having the same issue using a 5850. single GPU, all 1.02's failing. |
Send message Joined: 11 Dec 11 Posts: 6 Credit: 4,823,436 RAC: 0 |
Exactly the same as Black~Mystic - just rejoined the project today only to see all WUs fail. Card not overclocked; used to work fine over a year ago. Driver 13.12 (most recent). Edit: Not only 1.02 but also "modified fit" fail immediately. Card worked fine on POEM; out of WUs atm... |
Send message Joined: 18 Oct 12 Posts: 8 Credit: 145,940,032 RAC: 0 |
Hi all, I finally got time over the Christmas break to remove one card, and voila! Fails with just the one card, as well. All 1.02 units fail. Randall. |
Send message Joined: 18 Oct 12 Posts: 8 Credit: 145,940,032 RAC: 0 |
I get the same error with my 6990s when using 13.9 drivers. I have been using 11.12 and these work fine. Phil, do you mean a really old version of the Catalyst driver? |
Send message Joined: 18 Oct 12 Posts: 8 Credit: 145,940,032 RAC: 0 |
Are you talking about your AMD machine or your Nvidia machine? Mikey, I'm pretty sure I said 5850's. I don't crunch MW on any Nvidias, not that I own any Nvidias anyway. I'm hoping Phil replies soon to the question of older/newer Catalyst drivers. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
[quote I'm hoping Phil replies soon to the question of older/newer Catalyst drivers.[/quote] You can get non beta AMD/ATI drivers here, download links are at the very bottom of the page. http://www.hal6000.com/seti/boinc_ati_gpu_cheat_sheet.htm |
Send message Joined: 18 Oct 12 Posts: 8 Credit: 145,940,032 RAC: 0 |
Hi, Today I did some more troubleshooting. Uninstalled 13.12, installed 12.1 of Catalyst. OpenCL wrong version, so apps wouldn't run. Updated to 12.6, same. Updated to 13.1, OpenCL correct version, however 1.02 units still erroring. Removed 1 card, ran with 13.1, same issue. Uninstalled 13.1, rebooted, installed 13.1, single card, same issue. So it seems I'm snookered. I'll now swap these cards over to Einstein as I don't think it's right to run them when only about 40% of the work is being accepted. Seems a really strange issue, I might have thought it was just my system, but others are having the same issue too. Thanks Mikey for the drivers suggestion |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Hi, Do both cards work in the same machine at Einstein? If so that may be for the best at least right now, as this problem seems to be a difficult one to solve. |
Send message Joined: 18 Oct 12 Posts: 8 Credit: 145,940,032 RAC: 0 |
yep, both cards now crunching away no issues on Einstein. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
yep, both cards now crunching away no issues on Einstein. Oh well...I really have no more ideas for you. Although one person at another project was having problems with a 6950/6970 and switched to the newest beta of Boinc and it worked for him. You can get version 7.2.36 here: http://boinc.berkeley.edu/dl/?C=M;O=D |
Send message Joined: 22 Jan 11 Posts: 375 Credit: 64,707,164 RAC: 10 |
Installed a 5850 yesterday to replace a 4870, now I notice I'm getting loads of errors with MW 1.02, as you other folk have discovered. Unfortunately (for diagnosis reasons) I also update the driver when I swapped cards, the 4870 was on 13.1, my 5850 on 13.12 [edit] nm I see Randall's tried 13.1 on his 5850s). Btw I personally don't buy someone else's earlier comment that not leaving a free cpu core can cause problems, yea it'll slow it down but I don't believe it'd cause errors, it didn't with my 4870. Rather this seems to be a problem with MW 1.02, 58xx, 69xx & 79xx(?). Is there a way to stop just that app for now? Team AnandTech - SETI@H, DPAD, F@H, MW@H, A@H, LHC, POGS, R@H, Einstein@H, DHEP, WCG Main rig - Ryzen 5 3600, MSI B450 G.Pro C. AC, RTX 3060Ti 8GB, 32GB DDR4 3200, Win 10 64bit 2nd rig - i7 4930k @4.1 GHz, HD 7870 XT 3GB(DS), 16GB DDR3 1866, Win7 |
Send message Joined: 22 Aug 10 Posts: 32 Credit: 86,014,800 RAC: 0 |
Installed a 5850 yesterday to replace a 4870, now I notice I'm getting loads of errors with MW 1.02, as you other folk have discovered. Yes, you can deselect the MW 1.02 app in the project preferences in your account page. This way you will only receive Separation and Separation (Modified Fit) runs. |
Send message Joined: 22 Jan 11 Posts: 375 Credit: 64,707,164 RAC: 10 |
Hi, thx for that :), as it happens I stumbled across that latter on. No more errors now, just 1 invalid mod fit v1.28 WU - http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=482270478 Which errored out on 2 different GTS 450 PCs. So I'm thinking this might be a driver issue (wouldn't be the 1st time IIRC), so I'm going to try other drivers (tonight if I get a chance). 13.1 & 13.12 have already been tried with no luck, so I figured I'd start with 13.11 & work my down until I have some luck or hit 13.2! Anyone else fancy trying a few with me to save me some time?? Team AnandTech - SETI@H, DPAD, F@H, MW@H, A@H, LHC, POGS, R@H, Einstein@H, DHEP, WCG Main rig - Ryzen 5 3600, MSI B450 G.Pro C. AC, RTX 3060Ti 8GB, 32GB DDR4 3200, Win 10 64bit 2nd rig - i7 4930k @4.1 GHz, HD 7870 XT 3GB(DS), 16GB DDR3 1866, Win7 |
Send message Joined: 18 Jul 09 Posts: 300 Credit: 303,583,449 RAC: 696 |
Anyone else fancy trying a few with me to save me some time?? Try 12.8 |
Send message Joined: 22 Jan 11 Posts: 375 Credit: 64,707,164 RAC: 10 |
Blimey that's old, but I'll give it a shot if none of the 13s work, cheers. Team AnandTech - SETI@H, DPAD, F@H, MW@H, A@H, LHC, POGS, R@H, Einstein@H, DHEP, WCG Main rig - Ryzen 5 3600, MSI B450 G.Pro C. AC, RTX 3060Ti 8GB, 32GB DDR4 3200, Win 10 64bit 2nd rig - i7 4930k @4.1 GHz, HD 7870 XT 3GB(DS), 16GB DDR3 1866, Win7 |
©2024 Astroinformatics Group