Welcome to MilkyWay@home

Compute Errors

Message boards : Number crunching : Compute Errors
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6

AuthorMessage
Profile [AF>HFR>RR] ThierryH

Send message
Joined: 2 Jan 08
Posts: 23
Credit: 495,882,464
RAC: 0
Message 24593 - Posted: 8 Jun 2009, 19:09:30 UTC - in response to Message 24268.  

Looks like the searches are stopped, we'll not do 3 stream runs until the ATI code is fixed :)


Thanks Travis. It was a very good decision.
Now, Cluster Physik gave us fixed code since 3 days. Perhaps it's time to restart 3 stream runs. This kind of WUs is longer to calculate than others. It could give more work for everyone.

Thank you,
Thierry.
ID: 24593 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Yankton

Send message
Joined: 28 Sep 08
Posts: 1
Credit: 18,371,048
RAC: 71,619
Message 24725 - Posted: 9 Jun 2009, 21:08:38 UTC

I'm getting a lot of sigsev errors on 2s-4 and 2s-6.

Is this a related problem?
ID: 24725 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Crunch3r
Volunteer developer
Avatar

Send message
Joined: 17 Feb 08
Posts: 363
Credit: 258,227,990
RAC: 0
Message 24726 - Posted: 9 Jun 2009, 21:18:25 UTC - in response to Message 24725.  
Last modified: 9 Jun 2009, 21:19:23 UTC

I'm getting a lot of sigsev errors on 2s-4 and 2s-6.

Is this a related problem?


No. It's Ubuntu causing it. Get a proper linux distribution or downgrade to 8.xx.

Join Support science! Joinc Team BOINC United now!
ID: 24726 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile verstapp
Avatar

Send message
Joined: 26 Jan 09
Posts: 589
Credit: 497,834,261
RAC: 0
Message 24728 - Posted: 9 Jun 2009, 21:28:15 UTC

_3s_ WUs crunching madly with Cluster's v.0.19f.
Cheers,

PeterV

.
ID: 24728 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Phil
Avatar

Send message
Joined: 13 Feb 08
Posts: 1124
Credit: 46,740
RAC: 0
Message 24731 - Posted: 9 Jun 2009, 21:33:15 UTC - in response to Message 24726.  
Last modified: 9 Jun 2009, 21:34:36 UTC

I'm getting a lot of sigsev errors on 2s-4 and 2s-6.

Is this a related problem?


No. It's Ubuntu causing it. Get a proper linux distribution or downgrade to 8.xx.

Ha, that was the conclusion I came to as well ;-)
It runs einstein okay but practically nothing else.
ID: 24731 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KWSN imcrazynow
Avatar

Send message
Joined: 22 Nov 08
Posts: 136
Credit: 319,414,799
RAC: 0
Message 24795 - Posted: 10 Jun 2009, 11:58:38 UTC
Last modified: 10 Jun 2009, 12:18:24 UTC

Something caused both of my gpu crunchers to freeze up around 5:30 UTC. I say that time because that was the time the last result remaining on my systems was sent out to me.I have no idea what it was. That was 12:30 AM my time. I just got up this AM and found it. I don't see any errors but insta purge probably took the units away before I could see them. Might be something worth looking into.

<edit>
It was only milkyway that froze. Also running Prime Grid on both systems. I was still running.
<edit 2> It just froze up on one machine again. It was running ps_sgr_208_3s_6 and a ps_sgr_210_3s_5 and a ps_sgr_235_2s_6 When it froze. This is definately worth looking into.
<edit 3> It was either the 210_3 or the 235_2 that locked it up.

4870 GPU
4870 GPU
ID: 24795 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Vickers
Volunteer moderator
Project developer
Project scientist
Avatar

Send message
Joined: 11 May 09
Posts: 30
Credit: 81,093
RAC: 0
Message 24912 - Posted: 11 Jun 2009, 2:10:13 UTC
Last modified: 11 Jun 2009, 2:11:07 UTC

KWSN,

Are you using the code recently released by Cluster Physik ( http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=886#24282 ) that included a fix for 3 stream runs on ATI GPUs? This is most likely the issue if its the *_3s_* runs crashing and only on GPU.

Thanks,
John Vickers
ID: 24912 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Lord Tedric
Avatar

Send message
Joined: 9 Nov 07
Posts: 151
Credit: 8,391,608
RAC: 0
Message 24924 - Posted: 11 Jun 2009, 7:00:06 UTC

I'm not so much getting a system freeze but a GPU reset! These seem to be happenong with the 3s, specifically the '3s 6', I have visually seen this happen, have'nt noticed it on '3s 5', but will be watching - difficult to catch as it has only happened on three seperate occasions in the last 36 hours.
ID: 24924 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KWSN imcrazynow
Avatar

Send message
Joined: 22 Nov 08
Posts: 136
Credit: 319,414,799
RAC: 0
Message 24950 - Posted: 11 Jun 2009, 12:14:51 UTC

Yes, I am using the latest version 0.19f. I was able to play around with it a little more last night. It seems that absolutely everything is running high priority for some reason. I made no changes to my BOINC prefrences either. I also found that if I suspend my other project (prime grid) everthing starts back up. That will however have a negative impact on PG. None of this started happening until a recent windows update. I'm very much open to suggestions on how to correct it. I thought I might try and reinstall 19f as soon as I can get a chance in case something got messed up with the update. If that doesn't work maybe reinstalling BOINC. The two systems are running 6.4.7

4870 GPU
4870 GPU
ID: 24950 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [KWSN]John Galt 007
Avatar

Send message
Joined: 12 Dec 08
Posts: 56
Credit: 269,889,439
RAC: 0
Message 24971 - Posted: 11 Jun 2009, 13:56:45 UTC - in response to Message 24950.  

Yes, I am using the latest version 0.19f. I was able to play around with it a little more last night. It seems that absolutely everything is running high priority for some reason. I made no changes to my BOINC prefrences either. I also found that if I suspend my other project (prime grid) everthing starts back up. That will however have a negative impact on PG. None of this started happening until a recent windows update. I'm very much open to suggestions on how to correct it. I thought I might try and reinstall 19f as soon as I can get a chance in case something got messed up with the update. If that doesn't work maybe reinstalling BOINC. The two systems are running 6.4.7


I have seen that with my 4850 in my i7. It seems like the WU hangs at some point, either from the CPU getting overloaded (all MW tasks 'running' but only 3 crunching) or BOINC trying to do task switching.
Click to help Seti City.




ID: 24971 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JAMC

Send message
Joined: 9 Sep 08
Posts: 96
Credit: 336,443,946
RAC: 0
Message 24976 - Posted: 11 Jun 2009, 14:34:29 UTC

I am getting the 0.19f GPU lock up as well on 2 different pc's this am. MW GPU app locks up but the SETI AP CPU keeps going without problems. Stop BOINC and start BOINC again gets the GPU app going again. I am also getting High Priority running on GPU WU's as well.
ID: 24976 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KWSN imcrazynow
Avatar

Send message
Joined: 22 Nov 08
Posts: 136
Credit: 319,414,799
RAC: 0
Message 25004 - Posted: 11 Jun 2009, 17:24:11 UTC
Last modified: 11 Jun 2009, 17:26:07 UTC

Look at this one. Note the GPU time and the wall clock time.
<EDIT>
I would definately call that a hang!

Task ID 77725727
Name ps_sgr_235_2s_6_1603847_1244722735_0
Workunit 76446733
Created 11 Jun 2009 12:18:59 UTC
Sent 11 Jun 2009 12:20:07 UTC
Received 11 Jun 2009 15:50:10 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 39176
Report deadline 14 Jun 2009 12:20:07 UTC
CPU time 11878.53
stderr out <core_client_version>6.4.7</core_client_version>
<![CDATA[
<stderr_txt>
Running Milkyway@home ATI GPU application version 0.19f by Gipsel
CPU: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz (4 cores/threads) 3.54598 GHz (227ms)

CAL Runtime: 1.3.145
Found 1 CAL device

Device 0: ATI Radeon HD 4800 (RV770) 1024 MB local RAM (remote 28 MB cached + 1024 MB uncached)
GPU core clock: 750 MHz, memory clock: 900 MHz
800 shader units organized in 10 SIMDs with 16 VLIW units (5-issue), wavefront size 64 threads
supporting double precision

3 WUs already running on GPU 0
No free GPU! Waiting ... 93.7969 seconds.
Starting WU on GPU 0

main integral, 160 iterations
predicted runtime per iteration is 145 ms (33.3333 ms are allowed), dividing each iteration in 5 parts
borders of the domains at 0 320 640 960 1280 1600
Calculated about 3.70012e+012 floatingpoint ops on GPU, 6.34181e+007 on FPU. Approximate GPU time 11878.5 seconds.

probability calculation (stars)
Calculated about 1.20373e+009 floatingpoint ops on FPU.

WU completed.
CPU time: 1.35938 seconds, GPU time: 11878.5 seconds, wall clock time: 12032.2 seconds, CPU frequency: 3.546 GHz

</stderr_txt>
]]>

Validate state Valid
Claimed credit 82.9045335151938
Granted credit 27.75994
application version 0.19

4870 GPU
4870 GPU
ID: 25004 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kevint
Avatar

Send message
Joined: 22 Nov 07
Posts: 285
Credit: 1,076,786,368
RAC: 0
Message 25009 - Posted: 11 Jun 2009, 17:59:36 UTC
Last modified: 11 Jun 2009, 18:01:01 UTC

Imcrazy,

This happens quit a bit on hosts that are shared with other projects.

It seems that the shorter the other projects WU's the more MW hangs.

I believe it has something to do with the way BOINC handles debt.

I think you mentioned you are also crunching Prime Grid and Aqua. The shorter WU's will suspend your MW WU's until your short term, long term debt is cleared.

The new Multi Thread aqua can play havoc on the ATI app since a Aqua WU now wants to use multiple CPU's and will occasionally put MW in suspend mode.



To test this, when you see MW hung up, just suspend the other projects, MW should take off and start crunching again without having to reset or reboot your box
.
ID: 25009 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [KWSN]John Galt 007
Avatar

Send message
Joined: 12 Dec 08
Posts: 56
Credit: 269,889,439
RAC: 0
Message 25011 - Posted: 11 Jun 2009, 18:12:26 UTC - in response to Message 25009.  

Imcrazy,

This happens quit a bit on hosts that are shared with other projects.

It seems that the shorter the other projects WU's the more MW hangs.

I believe it has something to do with the way BOINC handles debt.

I think you mentioned you are also crunching Prime Grid and Aqua. The shorter WU's will suspend your MW WU's until your short term, long term debt is cleared.

The new Multi Thread aqua can play havoc on the ATI app since a Aqua WU now wants to use multiple CPU's and will occasionally put MW in suspend mode.



To test this, when you see MW hung up, just suspend the other projects, MW should take off and start crunching again without having to reset or reboot your box


Thanks, Kevin...a good explanation, since I am running PG on my i7 with the GPU doing MW, and I see the PSP sieve WUs jumping into EDF mode, even though the due date is 7 days off and I have a .5 day cache.
Click to help Seti City.




ID: 25011 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Spankinmonkee [TopGun] Divisio...
Avatar

Send message
Joined: 22 Mar 08
Posts: 38
Credit: 48,762,331
RAC: 0
Message 25031 - Posted: 11 Jun 2009, 18:48:30 UTC
Last modified: 11 Jun 2009, 18:48:56 UTC

I've had 3 systems hang up also...and I'm not running any other project.

Do you think the shorter WU that Travis took care of was causing this?
ID: 25031 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KWSN imcrazynow
Avatar

Send message
Joined: 22 Nov 08
Posts: 136
Credit: 319,414,799
RAC: 0
Message 25054 - Posted: 11 Jun 2009, 20:11:10 UTC - in response to Message 25009.  

The only other project that I have running on that system is prime grid. After the windows update it seems like everthing PG and MW went into high priority. It did that on both quads. Last night I reset my debts to 0 on one system using BOINC DV. I'll try that on the other(the one this long winded WU came from) tonight.

4870 GPU
4870 GPU
ID: 25054 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KWSN imcrazynow
Avatar

Send message
Joined: 22 Nov 08
Posts: 136
Credit: 319,414,799
RAC: 0
Message 25853 - Posted: 18 Jun 2009, 0:12:51 UTC - in response to Message 25009.  

Thanks for the info! I did notice that i could suspend the Prime Grid PSP Sieve units and MW would imediately start back up and run for an extended period. Now it's happening with Prime Grid PSP LLR units (Prime Grid Challenge). Not exactly a short running task 30+ hours for each one. I'm only running 3 at a time to leave 1 core free for MW. I did start a new thread "Hanging Work Units" Please look there for any new developments or suggestions.

4870 GPU
4870 GPU
ID: 25853 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PeteS

Send message
Joined: 19 Mar 09
Posts: 27
Credit: 117,670,452
RAC: 0
Message 26680 - Posted: 29 Jun 2009, 6:30:11 UTC

Can you people using 3850 tell what settings you are using in app_info.xml? I have tried many different settings, but still get VPU Recovery events..

I am running on Win7 64bit, ATI 0.19f and BOINC 6.6.36

Everything is fine and the WU's are processed at peak efficiency IF I don't do anything on the computer. But if it is used normally (watch videos, browse web with Firefox etc) then I constantly get blank screen+VPU recovery+jammed WU that I have to either kill with task manager or restart BOINC. I don't seem to have this problem on my 48xx series cards only the 3850.

I get somewhat better functionality (still problems but less) with f60 w1.7 n1, but then GPU utilization is only 50-60%.
ID: 26680 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Haksu

Send message
Joined: 28 Nov 08
Posts: 4
Credit: 47,239,720
RAC: 418
Message 26683 - Posted: 29 Jun 2009, 7:34:32 UTC - in response to Message 26680.  

Hi
this might be of only little help as both my pcs where I have a 3850 are old and dedicated to crunching but anyway..
I have two AGP bus 3850 on MW, one is on a AMD Athlon and the other Intel Celeron, both running XP Home.
Both are running on standard settings except n2 as the cards are 512 mb and the motherboards have only 256 mb each.
Basic functionality (web etc) is ok even if as said these are dedicated to MW and pretty much taken back to use as I noted that I can fit a AGP 3850 into them and do some crunching
ID: 26683 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6

Message boards : Number crunching : Compute Errors

©2024 Astroinformatics Group