Welcome to MilkyWay@home

Posts by Tex1954

21) Message boards : Number crunching : What is the cause of these 'validate errors' (Message 62986)
Posted 11 Jan 2015 by ProfileTex1954
Post:
That IS a lot of testing I wouldn't have the patience for. I run two identical Nvidia cards in Sabertooth 990FX motherboards. I had the issue with the GTX 670's and now still with the GTX 970's. The cards will run the tasks from SETI and Einstein with NO issues. Just have the problem with the Modified Fit tasks ... about a 3% error rate. No problem with the standard GPU tasks. I ran with the problem with the 670's, stopped the Modified Fit and retried again with the 970's to see if the change in cards might have fixed the issue. Still have the problem so have turned off the Modified Fit again. I think, like you do that the problem is with BOINC or the application itself. Running the latest BOINC and only a point version or two behind the current drivers. Had to update the Nvidia drivers so they could handle the 970's.

As I stated, the error rate is only about 3% on the Modified Fit. The majority of tasks finish correctly. I think there is some kind of contention problem going on. Whether it is with the application, BOINC or the hardware.... I don't know. I tried running single task, single project on the cards and it made no difference in the error rate. From your experiments, I kinda don't think it is a hardware issue. Wish the project app developers would chime in on this observed behavior and issue some kind of statement on what the issue is.

Cheers, Keith


With Einstein being down, I've been running MW on another system... a totally stable XEON E3-1230 V2 3.5GHZ on new Z77 mobo and have 182 invalid and 4155 valid on a 7970 using AMD 14.12 drivers and BOINC 7.4.36... looks like about 4.5% error rate at slightly less than stock speeds.

Soo, there is for sure a problem with the tasks somehow and it would be really nice if some developer could figure it out... All these errors can't be helping the project...

8-)
22) Message boards : Number crunching : What is the cause of these 'validate errors' (Message 62943)
Posted 3 Jan 2015 by ProfileTex1954
Post:
Well, I did also check to see if I can still run my 6990 with 12.4 and it still works, but anything past 12.8 fails miserably.

I wish someone somewhere (maybe the development folks at MW?) would nail down the 69xx series problems and dual GPU problems so more folks could participate.

Seems to ME we sorta PAYED them to do it didn't we?

How about some help guys!

Good crunching and Happy New Year!

8-)
23) Questions and Answers : Windows : Work Units with Computation Error (Message 62814)
Posted 10 Dec 2014 by ProfileTex1954
Post:
Y'all really need to see what is up with the Modified Fit code... the error rates are way too high and there is an obvious problem when running 2 7970 GPU's on a single Mobo even at stock settings or even reduced speed settings.

Other than that, I have to agree that at least "I" donate for the project, not to receive any gift in return. It's nice, but a cheapo case badge or something cheaper would be better from my point of view.

Keeping you all going is payback in itself!

JMHO

8-)
24) Message boards : Number crunching : What is the cause of these 'validate errors' (Message 62803)
Posted 8 Dec 2014 by ProfileTex1954
Post:
I just wanted to narrow it down. I've also tested some overclocking and it's very very sensitive to anything not perfect. A very slight change up/down can make the WU's in question fail... and it doesn't seem to be a real problem per-se, rather I think it's a timing problem somehow on the PCIe buss. But, that is speculation.

It could easily be a base OpenCL driver problem that only shows up when two cards are being used. Personally, I haven't debug tools to even begin to assess the problem area nor the expertise.

In any case, I've about done all I can... I can always make the Modified Fit WU's fail with a marginal overclock... it's very touchy... but I can also overclock the boards 10% without problems.

Sooo, somebody else will have to figure out the rest.

8-)
25) Message boards : Number crunching : What is the cause of these 'validate errors' (Message 62801)
Posted 8 Dec 2014 by ProfileTex1954
Post:
Okay, moved two 7970's to my Sabertooth 990FX and tried all the tests again... same kind of failures on this DUAL PCIe 16x16 setup. The PCIe buss width seems not to be an issue, merely the fact that two AMD cards are installed. IF I run ONLY one card or the other, it works fine.

IF I run MW on either card and anything else on the other, it screws up also.

Sooo, seems there is definitely something wrong with Modified Fit tasks since no problems of any sort are exhibited on any other MW tasks or other projects with DUAL 7970's.

To solve the problem, I swapped a GTX 670 from one system with one of the 7970's and swapped tasks on systems. This seems to work fine so far... MW on the 7970 and another project on the 670.

Again, I am only running Modified Fit tasks...

8-)

PS: FWIW, install NVidia drivers, then re-install the AMD drivers and make sure the AMD/ATI card is the MAIN display in the FIRST PCIe slot and it all works fine. Doesn't seem to work in the reverse... AMD insists it be king I suppose... LOL!
26) Message boards : Number crunching : What is the cause of these 'validate errors' (Message 62800)
Posted 8 Dec 2014 by ProfileTex1954
Post:
Okay, 6 hours running on one GPU and then the other... no problems at all. Not a single error from either GPU.

The INSTANT I run BOTH GPU's, it starts spitting out errors... ONLY does this with Modified fit tasks. Runs PrimeGrid, Einstein, Collatz just fine.

Seems to me I either have a unique and weird Mobo problem or there is something in the Modified Fit tasks interfering with each other.

8-)

Edit: Running PrimeGrid on one of the cards while MW is running on the other card (only Modified Fit WUs) causes no problems!!!! Both run fine! I think that eliminates any motherboard PCIe problem as the PG tasks use a LOT more bandwidth on the PCIe buss compared to MW tasks..

I think it is certain now that some global parameter is getting clobbered or some sort of variable/name/label is messing up when MW runs on BOTH GPU's at the same time... I think I have to leave it to developers to solve this one.

8-)
27) Message boards : Number crunching : What is the cause of these 'validate errors' (Message 62799)
Posted 7 Dec 2014 by ProfileTex1954
Post:
well, I had thought to check if one of the 7970's is causing problems... then forgot about it.

And yes, I don't like to update if things are working well...

I'll give it a go and see if it's board specific..

Thanks!

9-)
28) Message boards : Number crunching : What is the cause of these 'validate errors' (Message 62795)
Posted 7 Dec 2014 by ProfileTex1954
Post:
I have gone through 5 versions of AMD drivers, lowered clock speeds a lot, raised them a little, re-installed Windoz, changed the MTU, went from SSD to hard disk, underclocked and overclocked the CPU, changed memory sticks, disabled HD caching, several versions of BOINC, and pretty much tried everything except change the CPU & motherboard and still get tons of errors on the Modified Fit tasks.

The thing is, I see others are running it fine!

The only two things different is the PCIe bus is running x8 mode because I have 2 GPU's plugged into it and the CPU on some invalid tasks.

I've done everything I can otherwise and only managed to slightly reduce the error rate which seems to be about 10% or so...

Any more idea's????

8-)
29) Message boards : Number crunching : What is the cause of these 'validate errors' (Message 62782)
Posted 5 Dec 2014 by ProfileTex1954
Post:
I get a ton of them as well, but ONLY with the modified fit WUs... everything else runs fine..

8-)
30) Message boards : Number crunching : 1.44 N-Body Validation Inconclusive (Message 62705)
Posted 16 Nov 2014 by ProfileTex1954
Post:
Well, seems it is normal until then get consensus!

I didn't wait long enough...

8-)
31) Message boards : Number crunching : 1.44 N-Body Validation Inconclusive (Message 62703)
Posted 15 Nov 2014 by ProfileTex1954
Post:
Sure seems like 2/3 of the new N-Body tasks are messing up...

Is it my system or is this normal?

8-)

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=506176&offset=0&show_names=0&state=3&appid=
32) Message boards : Number crunching : 50% "Validation Inconclusive" Running N-Body (Message 61465)
Posted 4 Apr 2014 by ProfileTex1954
Post:
Howdy.

I just ran 26 N-Body tasks and 50% got the stated error; didn't matter if it was a single or multi thread.

Here is a for instance:

http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=705830538

Any idea if it's hardware or a WU flaw?

Thanks!

8-)
33) Message boards : Number crunching : AMD GPU Computation errors (Message 60062)
Posted 30 Sep 2013 by ProfileTex1954
Post:
As stated before, you must install AMD drivers properly to ensure success.

The method DR and I use is as follows:

Uninstall AMD Display drivers using control panel.

Reboot.

Use Driver Sweeper to uninstall AMD Display.

Use Driver Fusion to catch what Driver Sweeper didn't get.

Reboot.

Run DS and DF again to make sure. If all good, install 12.10 with 12.4 OpenCL drivers.

Reboot.

Enjoy.

8-)
34) Message boards : Number crunching : AMD GPU Computation errors (Message 60034)
Posted 27 Sep 2013 by ProfileTex1954
Post:
Using the following drivers (Going by Boinc) and getting mixed results for Modified Fit, but no out and out 'computation errors'.

CAL: ATI GPU 0: ATI Radeon HD 5800 series (Cypress) (CAL version 1.4.1848, 1024MB, 991MB available, 4176 GFLOPS peak)
OpenCL: AMD/ATI GPU 0: ATI Radeon HD 5800 series (Cypress) (driver version 1272.2 (VM), device version OpenCL 1.2 AMD-APP (1272.2), 1024MB, 991MB available, 4176 GFLOPS peak)



Looks like different APP version but same OpenCL version as I use?

This is what is reported by BOINC running the MOD 12.10 drivers and I have zero problems...

Win7-F2

12 9/27/2013 12:20:36 PM CAL: ATI GPU 0: AMD Radeon HD 6900 series (Cayman) (CAL version 1.4.1720, 2048MB, 2016MB available, 6144 GFLOPS peak)
13 9/27/2013 12:20:36 PM CAL: ATI GPU 1: AMD Radeon HD 6900 series (Cayman) (CAL version 1.4.1720, 2048MB, 2016MB available, 6144 GFLOPS peak)
14 9/27/2013 12:20:36 PM OpenCL: AMD/ATI GPU 0: AMD Radeon HD 6900 series (Cayman) (driver version CAL 1.4.1720 (VM), device version OpenCL 1.2 AMD-APP (923.1), 2048MB, 2016MB available, 6144 GFLOPS peak)
15 9/27/2013 12:20:36 PM OpenCL: AMD/ATI GPU 1: AMD Radeon HD 6900 series (Cayman) (driver version CAL 1.4.1720 (VM), device version OpenCL 1.2 AMD-APP (923.1), 2048MB, 2016MB available, 6144 GFLOPS peak)

Have you tried the 12.10 moded version? Maybe it would work better?

8-)
35) Message boards : Number crunching : AMD GPU Computation errors (Message 60021)
Posted 27 Sep 2013 by ProfileTex1954
Post:
Well, I discovered something...

An older version of drivers seems to work on the 6990... These are a special version 12.10 wirh 12.4 OpenCL driver.

You can get it by clicking here:

http://darkryder.com/files/Drivers/Amd/12-10_vista_win7_win8_64_dd_ccc_whql_net4_opencl_12.4.rar

Or just go to DarkRyder.com and look at bottom/right area.

Also, be sure to use Driver Sweeper and Driver Fusion to remove ALL old AMD Display drivers from system before installing... and that after you UnInstall the AMD display drivers...

Good Luck!

8-)

36) Message boards : Number crunching : AMD GPU Computation errors (Message 60020)
Posted 27 Sep 2013 by ProfileTex1954
Post:


All well and good... and exactly what does that have to do with the topic of computational errors?

Are you suggesting some other version works?

8-)


Yes lots of versions will work, the key is too find the best for your machine that still works. 99% of my machines are Boinc only machines, meaning all they do all day long is crunch. That gives me more latitude to tweak them more then other people that use their machines for other things too.


So exactly what AMD/ATI driver version do YOU run that works?

Do tell!

8-)
37) Message boards : Number crunching : AMD GPU Computation errors (Message 59999)
Posted 26 Sep 2013 by ProfileTex1954
Post:

Although I do not use my gpu's here I too upgraded to the 13.10 Beta and it is faster at Rainbow. It is however SLOWER at Collatz, so it sort of depends on your focus. For gaming yes it is faster, and for at least some projects it is faster too, but not all projects. I am cruncher not a gamer so am not tied to any particular version or other.


All well and good... and exactly what does that have to do with the topic of computational errors?

Are you suggesting some other version works?

8-)
38) Message boards : Number crunching : AMD GPU Computation errors (Message 59996)
Posted 26 Sep 2013 by ProfileTex1954
Post:
I have same problem. Modified Fit is the only OpenCL that doesn't generate a compute error on my 6990... Doesn't matter what version I run in the 13.xx... and I ain't going backward because some serious improvements were made in 13.9/13/10 beta.

However, everything runs on my 7970's...

8-)
39) Message boards : Number crunching : After driver update all gpu wu's fail (Message 57717)
Posted 29 Mar 2013 by ProfileTex1954
Post:
For Windoze 7/8 64b, we use the 13-1_vista_win7_win8_64_dd_c-cc_whql--opencl-12.4.exe that can be downloaded at http://www.DarkRyder.com at the bottom-right of the main page.

This driver is a fusion of the modules, the 12.4 OpenCL and the latest other stuff.

Darkryder and I are editors at the www.overclock.net BOINC Team

Works great for us!

8-)

PS: 13.3 beta crashes everything.... Also, use Driver Sweeper and Driver Fusion to clean the who old install after a normal install/reboot process.
40) Message boards : Number crunching : GPU Requirements (Message 57716)
Posted 29 Mar 2013 by ProfileTex1954
Post:
Hi,
I use a HD6990 and a HD7950 But I am not able to use them both on the same MB using WIN7 or WIN8. When I have them on one machine the work units are done at the same speed as having only the HD7950 in on its own. I have removed all the AMD drivers and re installed them but the result is the same. with or without crossfire. Is there a trick to using different GPU cards on one machine?
Thanks for your reply.


I have the same setup, and HD6990 and HD7970 in same box with windoze 7.

The only thing I had to do to make it work was make sure the 7970 was the primary card. The 7970 takes about 54s to finish and the 6990 takes about 115s at stock speeds.

8-)


Previous 20 · Next 20

©2022 Astroinformatics Group