Welcome to MilkyWay@home

Separation updated to 0.82


Advanced search

Message boards : News : Separation updated to 0.82
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
ProfileBeyond

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,790
RAC: 0
500 million credit badge10 year member badge
Message 49288 - Posted: 14 Jun 2011, 14:27:14 UTC - in response to Message 49272.  

All I get is "Maximum time limit exceeded" with v.82, 64bit ATI on an HD 4770. That's after around 3 minutes.

Turns out that installing .82 on the first 2 machines coincided with the the rash of bad test WUs that were all failing with "Maximum time limit exceeded".
ID: 49288 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
200 million credit badge10 year member badge
Message 49289 - Posted: 14 Jun 2011, 14:32:13 UTC

Next: it's a bit difficult to say due to insta-purge, but it seems the ps_test are OK now, after manually correcting the "result duration correction factor".

Which makes me think.. if you're including this factor into your completion time estimation you're bound to get all sorts of seemingly random problems, since this value sometimes get totally screwed (e.g. application changes etc.). Might this explain the recent problems with "exceeded elapsed time limit"?

(I don't want to spam this thread, but I think these observation are worth publishing)

MrS
Scanning for our furry friends since Jan 2002
ID: 49289 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,790
RAC: 0
500 million credit badge10 year member badge
Message 49290 - Posted: 14 Jun 2011, 14:39:09 UTC - in response to Message 49289.  

Next: it's a bit difficult to say due to insta-purge, but it seems the ps_test are OK now, after manually correcting the "result duration correction factor".

Which makes me think.. if you're including this factor into your completion time estimation you're bound to get all sorts of seemingly random problems, since this value sometimes get totally screwed (e.g. application changes etc.). Might this explain the recent problems with "exceeded elapsed time limit"? MrS

I hope you've found the problem. If it wasn't for insta-purge all these bugs would be found and corrected more easily with fewer headaches and ill will.
ID: 49290 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,790
RAC: 0
500 million credit badge10 year member badge
Message 49297 - Posted: 14 Jun 2011, 17:32:47 UTC

I'm getting these error lines in the output of all my .82 tasks:

<stderr_txt>
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' 
Error reading astronomy parameters from file 'astronomy_parameters.txt'
  Trying old parameters file
Using SSE3 path
Found 2 CAL devices
Chose device 0

They run and validate fine, but is something wrong with the setup?
ID: 49297 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilekashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,489,957
RAC: 6,420
100 million credit badge10 year member badge
Message 49301 - Posted: 14 Jun 2011, 18:35:38 UTC - in response to Message 49297.  

ID: 49301 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
10 million credit badge9 year member badge
Message 49302 - Posted: 14 Jun 2011, 18:36:12 UTC - in response to Message 49289.  

Which makes me think.. if you're including this factor into your completion time estimation you're bound to get all sorts of seemingly random problems, since this value sometimes get totally screwed (e.g. application changes etc.). Might this explain the recent problems with "exceeded elapsed time limit"?

(I don't want to spam this thread, but I think these observation are worth publishing)

MrS
The maximum time exceeded thing has nothing to with the client code or the time estimates used for the GPU. Workunits have to be assigned some flops values and then BOINC uses those to estimate how long they take to prevent broken things from never finishing. It used to be a frequent problem with N-body workunits since they're hard to estimate, but after the server update it started happening to some separation workunits.
ID: 49302 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
10 million credit badge9 year member badge
Message 49303 - Posted: 14 Jun 2011, 18:36:55 UTC - in response to Message 49297.  

I'm getting these error lines in the output of all my .82 tasks:

<stderr_txt>
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' 
Error reading astronomy parameters from file 'astronomy_parameters.txt'
  Trying old parameters file
Using SSE3 path
Found 2 CAL devices
Chose device 0

They run and validate fine, but is something wrong with the setup?
That's fine. It tries to use a new parameters file before the actual parameters file since we haven't actually switched to using it yet.
ID: 49303 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
200 million credit badge10 year member badge
Message 49308 - Posted: 14 Jun 2011, 21:18:53 UTC - in response to Message 49302.  

]The maximum time exceeded thing has nothing to with the client code or the time estimates used for the GPU. Workunits have to be assigned some flops values and then BOINC uses those to estimate how long they take to prevent broken things from never finishing. It used to be a frequent problem with N-body workunits since they're hard to estimate, but after the server update it started happening to some separation workunits.


Thanks Matt. So it seems BOINC factors in the result duration correction factor, if it calculates whether a task is overdue or not. Which is correct, as long as the factor is correct. However: if the correction factor is much too large (in my case it started at ~100 when going from 0.62 to 0.82), BOINC assumes the tasks should finish in 1/(correction factor), which may lead to WU aborts.

MrS
Scanning for our furry friends since Jan 2002
ID: 49308 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
100 million credit badge10 year member badge
Message 49310 - Posted: 14 Jun 2011, 21:48:04 UTC - in response to Message 49281.  

On my 5850 I get quite different figures for a unit that finished in 151 secs for a credit of 267.


Can you give your device info from the log?
Mine are

Device target: CAL_TARGET_CYPRESS
Revision: 2
CAL Version: 1.4.1332
Engine clock: 775 Mhz
Memory clock: 1125 Mhz
GPU RAM: 1024
Wavefront size: 64
Double precision: CAL_TRUE
Compute shader: CAL_TRUE
Number SIMD: 18
Number shader engines: 2
Pitch alignment: 256
Surface alignment: 256
Max size 2D: { 16384, 16384 }


Maybe this will give us an idea why estimated to average iteration time varies even between same gpu types.
ID: 49310 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileThe Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
200 million credit badge10 year member badge
Message 49316 - Posted: 15 Jun 2011, 0:52:09 UTC

I've seen an increase in completion times on my Q9450, Win7/64bit, 5970 (also has a 4850 in it, but these times are not reported here)

Previously (0.62) I was seeing the following (app_info, target frequency 90, 2 wu's at a time)

159 credits -> 132 to 137 seconds
213 credits -> 184 to 192 seconds
267 credits -> 232 to 239 seconds

Now (0.82) I'm seeing (app_info, target frequency 90, 2 wu's at a time)

159 credits -> 141 to 145 seconds
213 credits -> 194 to 197 seconds
367 credits -> 241 to 246 seconds.

Tonight I'll reduce the target frequency to see if there is a difference, but this is a fairly large increase in calculation time.
ID: 49316 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
robertmiles

Send message
Joined: 30 Sep 09
Posts: 208
Credit: 18,470,894
RAC: 21,961
10 million credit badge10 year member badgeextraordinary contributions badge
Message 49318 - Posted: 15 Jun 2011, 1:38:45 UTC - in response to Message 49289.  

Next: it's a bit difficult to say due to insta-purge, but it seems the ps_test are OK now, after manually correcting the "result duration correction factor".

Which makes me think.. if you're including this factor into your completion time estimation you're bound to get all sorts of seemingly random problems, since this value sometimes get totally screwed (e.g. application changes etc.). Might this explain the recent problems with "exceeded elapsed time limit"?

(I don't want to spam this thread, but I think these observation are worth publishing)

MrS


I remember a few messages on boinc_dev saying that at least some of the 6.12.* versions of BOINC never initialize one of the values that many BOINC projects use for calculating time limits. Could this mean you've identified which one?
ID: 49318 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
200 million credit badge10 year member badge
Message 49324 - Posted: 15 Jun 2011, 8:19:41 UTC

@Gas Giant: interesting.. I tried to check the time for my HD6950, but couldn't find any WUs still within the database for which credits were given. Nevertheless, I was seeing 94 s for some WU type previously (running 1 at a time), now I've got a few at 96 - 99 s. Previously I used target frequency 60, now I'm running without app_info and it's smoother than before. This improved responsiveness might directly lead to the slight drop in performance. Could also be checkpointing, though.

MrS
Scanning for our furry friends since Jan 2002
ID: 49324 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileThe Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
200 million credit badge10 year member badge
Message 49327 - Posted: 15 Jun 2011, 11:31:12 UTC

On my 3850 0.62 wu's completed in 530-560 seconds for 213 cs. Now completing 0.82 wu's in 590 seconds. No app_info.
ID: 49327 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilebanditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
500 thousand credit badge10 year member badge
Message 49328 - Posted: 15 Jun 2011, 11:52:37 UTC

On my P4 XP de_separation_13_3s ran 28140 seconds/7.75 hours. So a cast improvement over the previous application, but still a tad slow in comparison to the old opti apps.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 49328 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileBeyond

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 501,817,790
RAC: 0
500 million credit badge10 year member badge
Message 49342 - Posted: 15 Jun 2011, 19:29:26 UTC - in response to Message 49316.  

I've seen an increase in completion times on my Q9450, Win7/64bit, 5970 (also has a 4850 in it, but these times are not reported here)

Previously (0.62) I was seeing the following (app_info, target frequency 90, 2 wu's at a time)

159 credits -> 132 to 137 seconds
213 credits -> 184 to 192 seconds
267 credits -> 232 to 239 seconds

Now (0.82) I'm seeing (app_info, target frequency 90, 2 wu's at a time)

159 credits -> 141 to 145 seconds
213 credits -> 194 to 197 seconds
367 credits -> 241 to 246 seconds.

Tonight I'll reduce the target frequency to see if there is a difference, but this is a fairly large increase in calculation time.

I downgraded one machine (with 2 x HD5850 cards) back to v.62 from v.82 to test this. The older version completed WUs 1-2 seconds faster on the average. That's running 2x WU/GPU with the same commandline parameters.
ID: 49342 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nigel Garvey

Send message
Joined: 2 Apr 11
Posts: 13
Credit: 2,778,724
RAC: 2,237
2 million credit badge8 year member badge
Message 49348 - Posted: 15 Jun 2011, 21:07:21 UTC

PowerPC, Mac OS 10.4.11, BOINC 6.10.58. Same for 0.82 as for 0.80. Completed in about 14.5 hours, inconclusive validation, Stderr output shows several iterations of the "Error loading Lua script…" message.
ID: 49348 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Chris S
Avatar

Send message
Joined: 20 Sep 08
Posts: 1387
Credit: 186,726,858
RAC: 0
100 million credit badge10 year member badge
Message 49354 - Posted: 16 Jun 2011, 7:29:56 UTC

Can you give your device info from the log?
Mine are


Device target: CAL_TARGET_CYPRESS
Revision: 2
CAL Version: 1.4.1016
Engine clock: 775 Mhz
Memory clock: 1125 Mhz
GPU RAM: 1024
Wavefront size: 64
Double precision: CAL_TRUE
Compute shader: CAL_TRUE
Number SIMD: 18
Number shader engines: 2
Pitch alignment: 256
Surface alignment: 4096
Max size 2D: { 16384, 16384 }
Don't drink water, that's the stuff that rusts pipes
ID: 49354 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilebanditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
500 thousand credit badge10 year member badge
Message 49357 - Posted: 16 Jun 2011, 12:37:22 UTC - in response to Message 49328.  

On my P4 XP de_separation_13_3s ran 28140 seconds/7.75 hours. So a cast improvement over the previous application, but still a tad slow in comparison to the old opti apps.


Had 2 of the same tasks complete in 34800 seconds.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 49357 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
100 million credit badge10 year member badge
Message 49382 - Posted: 16 Jun 2011, 22:10:00 UTC - in response to Message 49354.  

Can you give your device info from the log?
Mine are


Device target: CAL_TARGET_CYPRESS
Revision: 2
CAL Version: 1.4.1016
Engine clock: 775 Mhz
Memory clock: 1125 Mhz
GPU RAM: 1024
Wavefront size: 64
Double precision: CAL_TRUE
Compute shader: CAL_TRUE
Number SIMD: 18
Number shader engines: 2
Pitch alignment: 256
Surface alignment: 4096
Max size 2D: { 16384, 16384 }


Interesting: Same gpu, same clocks, same mem size but 10% slower on the calculations.
You are using cat 11.2 right? Mine runs with cat 11.3.
Wonder if it's the different cat version or something else slowing your gpu down.
ID: 49382 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nigel Garvey

Send message
Joined: 2 Apr 11
Posts: 13
Credit: 2,778,724
RAC: 2,237
2 million credit badge8 year member badge
Message 49390 - Posted: 17 Jun 2011, 10:08:22 UTC - in response to Message 49348.  

I wrote:
PowerPC, Mac OS 10.4.11, BOINC 6.10.58. Same for 0.82 as for 0.80. Completed in about 14.5 hours, inconclusive validation, Stderr output shows several iterations of the "Error loading Lua script…" message.


A 0.82 task reported this morning has validated OK, although with the same error messages.

NG
ID: 49390 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : News : Separation updated to 0.82

©2019 Astroinformatics Group