Welcome to MilkyWay@home

Posts by XJR-Maniac

21) Message boards : Number crunching : Sudden mass of WU's finishing with Computation Error (Message 34147)
Posted 3 Dec 2009 by Profile XJR-Maniac
Post:
I've found an old backup with a short MW WU and it finishes successful. Fitness is a real number, like the results posted by starfire.

My GPUs are NOT overclocked, at least not by me.

Here's the clock settings:

Core: 576MHz
Shader: 1242MHz
Memory: 1000MHz

Clocks read by RivaTuner

OK, I've tried the following:

Core: 400MHz
Shader: 862MHz
Memory: 800MHz

And, what a surprise, the new long WUs still finish invalid.
22) Message boards : Number crunching : Sudden mass of WU's finishing with Computation Error (Message 34142)
Posted 3 Dec 2009 by Profile XJR-Maniac
Post:
Fitness is the result of the calculation, in your case the result is -1.#QNAN
a quiet NaN (Not a Number) due to an error in the computation. For what can
cause these things see : http://en.wikipedia.org/wiki/QNaN
What the cause is in your case I haven't got a clue (hardware ?)


OK, this is a small hint of a calculation error. But a hardware problem seems to be very unlikely, or two GPUs have to fail exactly the same day. And it's not only my GPUs that fail. Are the GTX260 cards some kind of degraded material e.g. GTX280 chips that didn't pass QA or something? As Paul said in his last post it's more likely that there's something wrong with the application that comes to light due to the increased WU size.

Would it be possible to reduce the WU size to e.g. twice the size of the former ones instead of four times?
23) Message boards : Number crunching : Sudden mass of WU's finishing with Computation Error (Message 34132)
Posted 3 Dec 2009 by Profile XJR-Maniac
Post:
OK folks, this was the last driver test I've done for you ;-)))

Even with the 195.62 my results are invalid.

@David: This is what's driving me nuts. All looks fine, only the line with the fitness argument looks a bit weird. But where are the coders if you need them. There must be someone who knows what this fitness thingy is all about.

I'm out now. Have to get the cruncher up again for the other projects and then I'll have another Talisker for some sweet dreams or it will become a MW nightmare!
24) Message boards : Number crunching : Sudden mass of WU's finishing with Computation Error (Message 34128)
Posted 3 Dec 2009 by Profile XJR-Maniac
Post:
The following results were calculated using Gipsels ATI app - fitness is displaying a real number here:

fitness: -3.200465856044408
Gipsel_GPU_CAL_0.20_x64: 0.20



OK, thanks Starfire, this is something completely different. On my results, this is:

fitness: -1.#QNAN000000000000000
stock_win32_gpu: 0.21 double


I'm using the stock application like others do and it never caused me any trouble since tuesday the 1st of december.

OK, very last chance fot tonite. I'll try the latest driver befor I go to bed, even though the 191.07 is running fine for others.
CUL8R
25) Message boards : Number crunching : Sudden mass of WU's finishing with Computation Error (Message 34119)
Posted 2 Dec 2009 by Profile XJR-Maniac
Post:
FWIW, try using the 190.62 drivers. I've seen the 191.07 driver crash my 8800GT on Collatz. Since i reverted back to 190.62 everything was fine again.


Mope, it's not about drivers.

I have an emergency install of WinXP x86 on another partition so I booted up this one, installed driver 190.62 and BOINC 6.10.18 but still invalid.

And I did NOT install BOINC as a service because I remembered that there was an issue with CUDA errors related to service installation but it was only on Vista and Windows 7. This was due to security and services running in different sessions on those versions. That's why I'm still using XP.

Can someone please suspend network activity for one WU and look into the result file of the finished WU to see what this "fitness" thing is good for? If fitness on a valid WU is set to -1 too then it means nothing, I think.

This was my last shot for now so I'm out of MW for today. It's a real shame, because MW is the project I bought this two GTX260 crunchers for.
26) Message boards : Number crunching : Sudden mass of WU's finishing with Computation Error (Message 34113)
Posted 2 Dec 2009 by Profile XJR-Maniac
Post:
OK, did a clean install of BOINC 6.10.18, rebooted machine, attached to Milkyway and crunched 1 WU. Result = invalid

Your turn, I'm runnin' out of ideas.
27) Message boards : Number crunching : Sudden mass of WU's finishing with Computation Error (Message 34112)
Posted 2 Dec 2009 by Profile XJR-Maniac
Post:
The only thing i can think of is that now that the WU size increased 'damatically', is that the cuda app is reaching the maximum time that a cuda kernel is allowed to run and therefore crashes.

One could try dviding the 'domain size' to prevent that.
Anyway, that's just a guess without having a look at the cuda code at all.

Might be something totaly different. Only Anthony will know for sure.



But why is it not crashing on all machines? I found some GTX260 GPUs that are crunching fine.

I did another WU with only BOINC running. Disabled antvirus and all other running programs, but still invalid. Next I will try a clean install of BOINC 6.10.18

Here's the content of the WU parameter file:


de_s222_3s_best_1p_01r_41
parameters [20]: 0.849478381012573 7.954547382437530 -5.675839216148361 151.090487200445807 12.349743231700344 4.081479068461466 2.303908188777022 3.287842683092169 -1.305747455655341 169.772682049654520 24.374000713727746 6.032829617788646 2.985739186116882 6.146475181027592 -11.078468810318160 180.234612303055911 17.070581989881532 0.558447994066062 0.000000000000000 1.000000000000000
metadata: i: 94


Here's the content of the result file:

de_s222_3s_best_1p_01r_41
parameters[20]: 0.84947838101257300000, 7.95454738243753030000, -5.67583921614836130000, 151.09048720044581000000, 12.34974323170034400000, 4.08147906846146570000, 2.30390818877702190000, 3.28784268309216900000, -1.30574745565534100000, 169.77268204965452000000, 24.37400071372774600000, 6.03282961778864560000, 2.98573918611688200000, 6.14647518102759170000, -11.07846881031816000000, 180.23461230305591000000, 17.07058198988153200000, 0.55844799406606205000, 0.00000000000000000000, 1.00000000000000000000
metadata: i: 94
fitness: -1.#QNAN000000000000000
stock_win32_gpu: 0.21 double


Maybe this is of any use for you. Does fitness: -1 mean the result is invalid?
28) Message boards : Number crunching : Sudden mass of WU's finishing with Computation Error (Message 34108)
Posted 2 Dec 2009 by Profile XJR-Maniac
Post:
OK, now after I tried some more things like project reset, reboot, detach and reattach, I'm completely stumped what to do next. Seems that I can no longer deliver valid results.

I browsed through the user stats and there can be seen that it's not a global problem. All of the top users seem to have no problems crunching the new WUs, and it's independent of BOINC client version or GPU type.

My GPUs are almost new. One is crunching since two weeks, the other one is running since the beginning of november and it's very unlikely that both cards will die in the same second. The fact that I'm not the only one having this problem makes it less likely that all those GPUs are malfunctioning.

Other projects like seti and collatz are running fine on both machines so it is assumed that the GPUs are all fine.

So what's going on here? Due to excessive DB purging here at MW there is not much of a history but I'm almost sure that it all began when the WU size has been increased. And the only thing that has been changed on both machines was anti virus pattern files.

Here are the machines again:

Machine 1: intel Q9650, WinXP SP2, GTX260 191.07, BOINC 6.10.17
Machine 2: intel Pentium D, Win2003 SP2, GTX260 191.07, BOINC 6.10.17

Please help, any suggestion will be appreciated.
29) Message boards : Number crunching : Sudden mass of WU's finishing with Computation Error (Message 34067)
Posted 1 Dec 2009 by Profile XJR-Maniac
Post:
I have had no errors/invalid tasks yet with my W7 64 bit, Q6600, NVIDIA GeForce GTX 260 (896MB) driver: 19107...


Hidden computers are always very helpful for diagnostics. How big is your cache? Maybe you're still crunching older WUs. When did you get them?
30) Message boards : Number crunching : Sudden mass of WU's finishing with Computation Error (Message 34065)
Posted 1 Dec 2009 by Profile XJR-Maniac
Post:
All new WUs seem to finish invalid on both of my cuda machines. Have no ATI here.

Machine 1: intel Q9650 WinXP SP2 GTX260 191.07 BOINC 6.10.17
Machine 2: intel Pentium D Win2003 SP2 GTX260 191.07 BOINC 6.10.17

Project degraded to NNT until this is fixed.

Oh boy, this reminds me of the song Flakes from Frank Zappa ;-)
31) Message boards : Number crunching : Sudden mass of WU's finishing with Computation Error (Message 33914)
Posted 29 Nov 2009 by Profile XJR-Maniac
Post:
Not a mass but I had four of them on my GTX260 with 191.07 friday afternoon (UTC), too. OS is WinXP32 SP2 with BOINC 6.10.17. All errors were 0x1, invalid function.
32) Message boards : Number crunching : Donating to Milkyway@Home (Message 33343)
Posted 18 Nov 2009 by Profile XJR-Maniac
Post:
OK, here's another 25 bucks from good ol' germany. Hope the new drives are on their way. I'm awaiting a brand new GTX260 to start the burn in test when the WUs are flowing again ;-)
33) Message boards : Number crunching : application v1.21/v1.22 errors/memory leaks/crashes here (Message 2110)
Posted 8 Mar 2008 by Profile XJR-Maniac
Post:
W2K and NT4 issue Fixed (hopefully)! Well done!
I successfully finished a WU with 1.22 on my NT4 box, so I hope it will run on W2K, too.
34) Message boards : Number crunching : application v1.21/v1.22 errors/memory leaks/crashes here (Message 2050)
Posted 7 Mar 2008 by Profile XJR-Maniac
Post:
Hello,

don't know if this is helping you:
On one of my W2000 hosts the following message appears immediately after the start of the 1.21 app:

The procedure entry point LogonUserExA could not be located in the dynamic link library ADVAPI32.dll.




Dave compiled the 1.21 windows apps, so i'll have him take a look into these.


Same problem here on Win2000 SP4 and WinNT Server SP6. WinXP SP1 works fine.

Yesterday, I suspended all other projects to check v1.19 and it worked fine on all machines, including WinNT 4.

Sometimes, I get a pop up window that locks the system so that no more work will be done until someone clicks OK!

ADVAPI32.dll Versions:

Windows 2000 SP4: 5.0.2195.7038
Windows NT 4 Terminal Server with Citrix Metaframe 1.8: 4.00 (File by Citrix)

Maybe this could be of any interest:

http://support.microsoft.com/kb/142606/EN-US/

BTW, what are all those error messages on your website about, complaining about "Undefined variables" or "non given properties". Examples:

Message board posts:
Notice: Undefined variable: out in /export/share0/www/boinc/milkyway/html/inc/text_transform.inc on line 236

View a result:
Notice: Trying to get property of non-object in /export/share0/www/boinc/milkyway/html/inc/result.inc on line 79

View a user profile:
Fatal error: Call to undefined method stdClass::hasImagesAsLinks() in /export/share0/www/boinc/milkyway/html/inc/text_transform.inc on line 109




could you let us know any workunits that cause a windows popup? we've added code in the new version of the application that should spit out what's causing the error, so if you can point us to the right work units we should be able to diagnose and fix the problem.



The last WU that crashed whith a popup was this one:

resultid=4795441 (gs_281_1204884506_90429_0)

Also, have a look at my last post, I added somethig I found at MSDN.
35) Message boards : Number crunching : application v1.21/v1.22 errors/memory leaks/crashes here (Message 2047)
Posted 7 Mar 2008 by Profile XJR-Maniac
Post:
Hello,

don't know if this is helping you:
On one of my W2000 hosts the following message appears immediately after the start of the 1.21 app:

The procedure entry point LogonUserExA could not be located in the dynamic link library ADVAPI32.dll.




Dave compiled the 1.21 windows apps, so i'll have him take a look into these.


Same problem here on Win2000 SP4 and WinNT Server SP6. WinXP SP1 works fine.

Yesterday, I suspended all other projects to check v1.19 and it worked fine on all machines, including WinNT 4.

Sometimes, I get a pop up window that locks the system so that no more work will be done until someone clicks OK!

ADVAPI32.dll Versions:

Windows 2000 SP4: 5.0.2195.7038
Windows NT 4 Terminal Server with Citrix Metaframe 1.8: 4.00 (File by Citrix)

Maybe this could be of any interest:

MS Knowledge Base article 142606

or

MSDN article aa378189

LogonUserExA function isn't available in Win2k or WinNT. It's only available for WinXP and Vista.

BTW, what are all those error messages on your website about, complaining about "Undefined variables" or "non given properties". Examples:

Message board posts:
Notice: Undefined variable: out in /export/share0/www/boinc/milkyway/html/inc/text_transform.inc on line 236

View a result:
Notice: Trying to get property of non-object in /export/share0/www/boinc/milkyway/html/inc/result.inc on line 79

View a user profile:
Fatal error: Call to undefined method stdClass::hasImagesAsLinks() in /export/share0/www/boinc/milkyway/html/inc/text_transform.inc on line 109



Previous 20

©2024 Astroinformatics Group