Welcome to MilkyWay@home

ATI generate some invalids and nVidia not

Message boards : Number crunching : ATI generate some invalids and nVidia not
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,943,276
RAC: 22,449
Message 57144 - Posted: 2 Feb 2013, 13:14:50 UTC - in response to Message 57140.  

The drivers I had in September (before the update to opencl) didn't had the opencl feature so they won't work. That was the reason I updated the drivers from ATI.


At Seti forums I've read, that the current ATI drivers have some issues with OpenCL. Not sure if that applies here, but the latest version they recommended there is 12.8. Maybe you could try these.


Aha, nice to read that some issues with ATI are possible (likely).
Indeed the 12.8 drivers were good, I had them until a few day ago when I updated to the latest in the hope to get better results. But thanks anyway.


I am using the newer 13.1 drivers on an AMD gpu at Moo and it IS crunching units faster, not alot faster but some. It seems to be using 20 less gpu seconds and 10 fewer cpu seconds too, this on a unit that normally takes closer to 60 minutes is now around 57 minutes, both numbers are average and NOT done with a calculator.
ID: 57144 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JLConawayII

Send message
Joined: 27 Apr 10
Posts: 35
Credit: 90,828,595
RAC: 0
Message 57155 - Posted: 3 Feb 2013, 19:08:09 UTC

I have run a few hundred thousand WUs on my 5850, with very few failures. Most of those were due to system crashes or other issues. They're great cards for projects that actually support them.
ID: 57155 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill Brumbaugh

Send message
Joined: 1 Aug 08
Posts: 4
Credit: 115,508,956
RAC: 0
Message 57177 - Posted: 6 Feb 2013, 4:30:27 UTC

Yes there seems to an issue with there is an issue with the code on the N-Body Simulation on ATI/AMD for me as well. It does not seem to be with the GPU either as all of my
MilkyWay@Home v1.02 (opencl_nvidia) and MilkyWay@Home v1.00 complete without errors. Here are two work unit results:
name

de_nbody_100K_1_30_13_1358941502_20715



application

MilkyWay@Home N-Body Simulation



created

31 Jan 2013 | 15:35:16 UTC



minimum quorum

2



initial replication

3



max # of error/total/success tasks

3, 9, 6



errors

Too many errors (may have bug)


Name

de_nbody_100K_1_30_13_1358941502_56031_0



Workunit

304634204



Created

2 Feb 2013 | 9:44:23 UTC



Sent

2 Feb 2013 | 9:51:28 UTC



Received

2 Feb 2013 | 15:38:31 UTC



Server state

Over



Outcome

Computation error



Client state

Compute error



Exit status

-1073741511 (0xffffffffc0000139) Unknown error number



Computer ID

356381



Report deadline

14 Feb 2013 | 9:51:28 UTC



Run time

0.00



CPU time

0.00



Validate state

Invalid



Credit

0.00



Application version

MilkyWay@Home N-Body Simulation v1.06


Stderr output
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
- exit code -1073741511 (0xc0000139)
</message>
]]>





396318720

306279395

29021

5 Feb 2013 | 17:18:45 UTC

5 Feb 2013 | 23:47:01 UTC

Error while computing

151.15

0.00

---

MilkyWay@Home N-Body Simulation v1.06



396318719

306279394

29021

5 Feb 2013 | 17:18:45 UTC

5 Feb 2013 | 23:47:01 UTC

Error while computing

154.50

0.00

---

MilkyWay@Home N-Body Simulation v1.06



396318718

306279393

29021

5 Feb 2013 | 17:18:45 UTC

5 Feb 2013 | 23:47:01 UTC

Error while computing

153.25

0.00

---

MilkyWay@Home N-Body Simulation v1.06



396318717

306279392

29021

5 Feb 2013 | 17:18:45 UTC

5 Feb 2013 | 23:47:01 UTC

Error while computing

149.31

0.00

---

MilkyWay@Home N-Body Simulation v1.06


My video on this machine is a Gigabyte HD6850 factory overclocked.
My cpu is an AMD X4 640 and it is overclocked a bit but I am sure that is not the issue. This computer crunches all my other projects fine and I 5 others with AMD cards and rarely ever have any units error out. The other thing that makes me think bug is all the workunits with errors are almost exactly 150 secconds long. Didn't really think much about the errors till I read this post so I decided to give my feedback. I guess I will just uncheck the N-Body app untill they get it sorted out.
ID: 57177 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill Brumbaugh

Send message
Joined: 1 Aug 08
Posts: 4
Credit: 115,508,956
RAC: 0
Message 57178 - Posted: 6 Feb 2013, 4:34:56 UTC - in response to Message 57177.  

Typo from the end. Just to let everyone know I think it is an ATI issue:
"This computer crunches all my other projects fine and I 5 others with AMD cards" Should read "This computer crunches all my other projects fine and I have 5 others with NVIDIA cards that do not have errors"
ID: 57178 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill Brumbaugh

Send message
Joined: 1 Aug 08
Posts: 4
Credit: 115,508,956
RAC: 0
Message 57179 - Posted: 6 Feb 2013, 4:54:56 UTC - in response to Message 57178.  

Just one more post and then I will wait for a reply. Looking over some of the failed workunits I discovered that it does not seem to be an ATI or AMD issues as I found a bunch of others with Nvidia cards and Intel cpu's that are having errors. It definately seems to be just that application. I count 4 others on this workunit that errored out.


http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=299078972

Workunit 299078972



Bill Brumbaugh | log out




name

de_nbody_orphan_real_CHISQ_1356215205_558387



application

MilkyWay@Home N-Body Simulation



created

21 Jan 2013 | 12:35:11 UTC



minimum quorum

2



initial replication

3



max # of error/total/success tasks

3, 9, 6



errors

Too many errors (may have bug)




Task
click for details

Computer

Sent

Time reported
or deadline
explain

Status

Run time
(sec)

CPU time
(sec)

Credit

Application



386004668

459250

21 Jan 2013 | 12:44:45 UTC

21 Jan 2013 | 20:40:33 UTC

Completed, can't validate

15.08

9.41

0.00

MilkyWay@Home N-Body Simulation v1.04



386239118

495021

21 Jan 2013 | 20:50:20 UTC

24 Jan 2013 | 12:38:12 UTC

Completed, can't validate

7.53

5.52

0.00

MilkyWay@Home N-Body Simulation v1.04



387885762

362466

24 Jan 2013 | 12:38:25 UTC

5 Feb 2013 | 12:38:25 UTC

Timed out - no response

0.00

0.00

---

MilkyWay@Home N-Body Simulation v1.04



396167838

29021

5 Feb 2013 | 12:45:39 UTC

5 Feb 2013 | 17:18:45 UTC

Error while computing

150.13

0.00

---

MilkyWay@Home N-Body Simulation v1.06



396321853

496301

5 Feb 2013 | 17:23:08 UTC

5 Feb 2013 | 17:25:41 UTC

Error while computing

0.00

0.00

---

MilkyWay@Home N-Body Simulation v1.06



396325924

351074

5 Feb 2013 | 17:29:53 UTC

6 Feb 2013 | 2:14:19 UTC

Error while computing

0.00

0.00

---

MilkyWay@Home N-Body Simulation v1.06



396611703

479231

6 Feb 2013 | 2:22:03 UTC

6 Feb 2013 | 4:15:12 UTC

Error while computing

0.00

0.00

---

MilkyWay@Home N-Body Simulation v1.06
ID: 57179 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 578
Credit: 18,845,239
RAC: 856
Message 57183 - Posted: 6 Feb 2013, 9:36:09 UTC - in response to Message 57179.  

N-Body runs on CPU, not GPU, so it has nothing to do with ATI or nVidia. Issues with those WUs should be posted in the news froum in the corresponding thread.
ID: 57183 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
homeboy

Send message
Joined: 22 Feb 13
Posts: 7
Credit: 131,958,276
RAC: 0
Message 57328 - Posted: 23 Feb 2013, 16:04:32 UTC - in response to Message 57183.  

Hi all,


I've been reading this thread anxiously hoping to find the reason why my new 7970 has a success rate of less than 15.9% on GPU crunched workunits. total WU for GPU is 395, 63 valid, 197 invalid, 135 pending. Settings are no more than 6 cpu used, 16gig ram. I was using a gt 520 before and everything was fine, i know it's CUDA, but wanted what was considered the best cruncher around but very dissapointed by failure rate. I'm using lastest offcial drivers from AMD. I just started Milky was to see if the invalids were app scpecific because most happen with einstein. Unfortunately i cant seem to be able to see through boinc all my results with milky way so i cannot tell wether the wu are valid or not. I have a total score but i've also crunched wu with cpu thus i cant conclude that my total is solely from GPU WU. Can smeone help me?
ID: 57328 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tom*

Send message
Joined: 4 Oct 11
Posts: 38
Credit: 309,729,457
RAC: 0
Message 57329 - Posted: 23 Feb 2013, 18:44:59 UTC - in response to Message 57328.  
Last modified: 23 Feb 2013, 18:51:30 UTC

State: All (131) | In progress (25) | Pending (64) | Valid (37) | Invalid (1) | Error (4)

Looks good to me so far

All new and returning users for this project need to proove themselves
At least thats how it works for me.

While the pending all show validation inconclusive that is 100% normal for this
project. After MilkyWay validates some more (not sure how many more) and you still keep the real Invalids down to 0 or 1 Milkyway is one of the few projects
(that I participate in) that have a minimum quorum of 1 (not sure how they do this)?

Have patience and watch your real INVALIDS

as far as Einstein If you check your Programs and Features and you have Intels
OpenCL installed try as a test to remove it as it sometimes interferes with
OpenCL on other Devices. 7.0.52 is supposed to detect this but not sure if it fixes this.
ID: 57329 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
homeboy

Send message
Joined: 22 Feb 13
Posts: 7
Credit: 131,958,276
RAC: 0
Message 57331 - Posted: 23 Feb 2013, 21:56:33 UTC - in response to Message 57329.  

Where in Boinc can i access the status of a particular WU. It's easy in einstein and SETI but in milKyway...tHE only tab i have is HOMe paGE.

ID: 57331 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tom*

Send message
Joined: 4 Oct 11
Posts: 38
Credit: 309,729,457
RAC: 0
Message 57332 - Posted: 23 Feb 2013, 22:57:45 UTC - in response to Message 57331.  

Homeboy

Homepage --> Returning participants
•Your account - view stats, modify preferences
Tasks

ID: 57332 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
homeboy

Send message
Joined: 22 Feb 13
Posts: 7
Credit: 131,958,276
RAC: 0
Message 57333 - Posted: 24 Feb 2013, 2:33:33 UTC - in response to Message 57332.  

thanks...unfortunate we cant get those stats directly through tabs in BOINC app


here goes does this look like normal behavior?


All (254) | In progress (49) | Pending (45) | Valid (152) | Invalid (1) | Error(7)
Application: All (254) | MilkyWay@Home (253) | MilkyWay@Home N-Body Simulation (1) | Milkyway@Home Separation (0)

ID: 57333 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tom*

Send message
Joined: 4 Oct 11
Posts: 38
Credit: 309,729,457
RAC: 0
Message 57334 - Posted: 24 Feb 2013, 3:21:49 UTC - in response to Message 57333.  

No Invalids since that first one

The errors come in groups of three possibly something is resetting
your 79xx ATI Since I run Nvidia I cannot help with driver versions
and which are best or worst for MilkyWay someone else who has been thru
catalyst versions should jump in now .

Are you running more than one GPU job at a time?

Good luck it seems to be running good otherwise

ID: 57334 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 57335 - Posted: 24 Feb 2013, 3:51:57 UTC - in response to Message 57333.  

Your invalid WU is only valid up to 7 or 8 digits and your error WUs are showing lots of "NAN" in the results.
What type of 7970 is it exactly?
"Clock frequency: 1000 Mhz"
Is it factory overclocked or done by yourself?
Try setting it back to default clock and see if you still get errors.
ID: 57335 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
homeboy

Send message
Joined: 22 Feb 13
Posts: 7
Credit: 131,958,276
RAC: 0
Message 57336 - Posted: 24 Feb 2013, 4:48:12 UTC - in response to Message 57335.  

it's XFX 1000mhz not overclocked by me or by hardware. It's not the ghz editon at 1050mhz. pretty stable no artefatcs in benchmarks.
ID: 57336 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
homeboy

Send message
Joined: 22 Feb 13
Posts: 7
Credit: 131,958,276
RAC: 0
Message 57337 - Posted: 24 Feb 2013, 5:07:21 UTC - in response to Message 57336.  

i've downclocked from original manufacturer 1000mhz to 925mhz original. I'll try overnight and see results. thanks for tips. I'll keep you posted on results.
ID: 57337 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,943,276
RAC: 22,449
Message 57338 - Posted: 24 Feb 2013, 12:11:22 UTC - in response to Message 57337.  

i've downclocked from original manufacturer 1000mhz to 925mhz original. I'll try overnight and see results. thanks for tips. I'll keep you posted on results.


One other thing are you using all of your cpu cores to crunch with or are you leaving one or more free? Also how many gpu units are you running at one time?
ID: 57338 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
homeboy

Send message
Joined: 22 Feb 13
Posts: 7
Credit: 131,958,276
RAC: 0
Message 57346 - Posted: 25 Feb 2013, 1:35:57 UTC - in response to Message 57338.  

well i only have 2 invalid and 10 compute errors on over 1200 units since friday. Seems to be within tolerable error rate. I had stated before but i only use 6 cores out of 8. My error rate is way to high with einstein...85% plus error so i think i'll stick with Milky way. Got to say that its really crunching away with this baby. ;-)
ID: 57346 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,943,276
RAC: 22,449
Message 57348 - Posted: 25 Feb 2013, 12:09:02 UTC - in response to Message 57346.  

well i only have 2 invalid and 10 compute errors on over 1200 units since friday. Seems to be within tolerable error rate. I had stated before but i only use 6 cores out of 8. My error rate is way to high with einstein...85% plus error so i think i'll stick with Milky way. Got to say that its really crunching away with this baby. ;-)


That is GOOD news!!!

But how many MW units do you crunch at the same time with that gpu? I know it is a 3gb unit, do you crunch multiple units at once? I am thinking of getting one and am wondering how you are doing it. And YES I too like the XFX ones, the warranty is just too good to pass up! Although 'knock on wood' I have NEVER had a gpu totally die on me. I do have to spray the Power Lube in mine when the gunk builds up and the fans stop spinning, but they are usually back up and running in a day or so.
ID: 57348 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
homeboy

Send message
Joined: 22 Feb 13
Posts: 7
Credit: 131,958,276
RAC: 0
Message 57352 - Posted: 25 Feb 2013, 19:02:14 UTC - in response to Message 57348.  

Just one but it takes 48 seconds/per unit so i dont think it's necessary to over do it. BTW can it crunch more than one at a time. I dont have a crossfire setup so one GPU one unit, 2 GPU 2 units..although it would be costly and pretty hot in that case. The cpu is crunching 6 other with mix of einstein and seti (but not the gpu projects). I've downclocked th4e GPU to original factory specs and it's been fine so far. THE RAC went from 0 to 47k in 3 days..guess it is working.

;-)
ID: 57352 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,943,276
RAC: 22,449
Message 57362 - Posted: 26 Feb 2013, 12:38:49 UTC - in response to Message 57352.  
Last modified: 26 Feb 2013, 12:39:40 UTC

Just one but it takes 48 seconds/per unit so i dont think it's necessary to over do it. BTW can it crunch more than one at a time. I dont have a crossfire setup so one GPU one unit, 2 GPU 2 units..although it would be costly and pretty hot in that case. The cpu is crunching 6 other with mix of einstein and seti (but not the gpu projects). I've downclocked th4e GPU to original factory specs and it's been fine so far. THE RAC went from 0 to 47k in 3 days..guess it is working.

;-)


YES it IS possible but no it doesn't always help, I tried in on my 5870's and it did NOT help. But I only have 1gb of ram and each unit is taking a couple of minutes or so to finish, when I ran 2 units at once it jumped to almost 9 minutes per unit.

To run more then one unit in Windows use NOTEPAD:

<app_config>
<app>
<name>milkyway</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.05</cpu_usage>
</gpu_versions>
</app>
</app_config>

Save the text file as app_config.xml in your Milkyway directory
(….\BOINC\projects\milkyway.cs.rpi_milkyway)

The line <gpu_usage>0.5</gpu_usage> tells your gpu to run 2 units at once, to run 3 you would use 0.33, to run 4 units you would use 0.25 etc. You also MUST be using a version of Boinc at least as new as 7.0.40 for this type of file to work. You can always get the latest version of Boinc here:
http://boinc.berkeley.edu/dl/?C=M;O=D
ID: 57362 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : ATI generate some invalids and nVidia not

©2024 Astroinformatics Group