Author | Message |
Conan
Send message Joined: 2 Jan 08 Posts: 123 Credit: 69,734,841 RAC: 932
|
G'Day Travis,
Here is something different to credit (I might contribute to that later if this problem keeps happening).
I have just returned a result 5048610 , that has gone way past normal run time with optimised app. Normal time is around 990 to 1,100 seconds for the s21 and s20 work units but this one took 6,301 seconds.
I have another s86 running at the moment that has been going for 33 minutes and is at 30%, this one should of finished after 670 seconds.
If this becomes normal then I will have to stop this computer as the recent credit cuts make this a 5 credit/ hour machine and that is not worth the electricity.
Thanks, Conan.
|
|
Travis Volunteer moderator Project administrator Project developer Project tester Project scientist
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0
|
G'Day Travis,
Here is something different to credit (I might contribute to that later if this problem keeps happening).
I have just returned a result 5048610 , that has gone way past normal run time with optimised app. Normal time is around 990 to 1,100 seconds for the s21 and s20 work units but this one took 6,301 seconds.
I have another s86 running at the moment that has been going for 33 minutes and is at 30%, this one should of finished after 670 seconds.
If this becomes normal then I will have to stop this computer as the recent credit cuts make this a 5 credit/ hour machine and that is not worth the electricity.
Thanks, Conan.
Have you tried using the stock app? I'm not quite sure what would be bugging your optimized application out... especially since it might not necessarily be our code. If anything it shouldn't be normal...
|
|
Cluster Physik
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0
|
Here is something different to credit (I might contribute to that later if this problem keeps happening).
I have just returned a result 5048610 , that has gone way past normal run time with optimised app. Normal time is around 990 to 1,100 seconds for the s21 and s20 work units but this one took 6,301 seconds.
I have another s86 running at the moment that has been going for 33 minutes and is at 30%, this one should of finished after 670 seconds.
If this becomes normal then I will have to stop this computer as the recent credit cuts make this a 5 credit/ hour machine and that is not worth the electricity.
Thanks, Conan.
I have seen this, too. I couldn't find the reason yet, but it appears to happen only for certain WUs (but for all types, so it is no single/dual stream issue).
@Travis:
Are there some WUs with an unusual parameter set (some angles outside of [-2Pi, 2Pi] or whatever) and/or are you seeing some unusual results? The affected WUs just take longer, they still come out as valid, so at least the outlier detection does not kick in.
I will try to narrow it down and will report, if the stock app also shows this behaviour. At least if I can find an affected WU on my computers.
|
|
Travis Volunteer moderator Project administrator Project developer Project tester Project scientist
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0
|
Here is something different to credit (I might contribute to that later if this problem keeps happening).
I have just returned a result 5048610 , that has gone way past normal run time with optimised app. Normal time is around 990 to 1,100 seconds for the s21 and s20 work units but this one took 6,301 seconds.
I have another s86 running at the moment that has been going for 33 minutes and is at 30%, this one should of finished after 670 seconds.
If this becomes normal then I will have to stop this computer as the recent credit cuts make this a 5 credit/ hour machine and that is not worth the electricity.
Thanks, Conan.
I have seen this, too. I couldn't find the reason yet, but it appears to happen only for certain WUs (but for all types, so it is no single/dual stream issue).
@Travis:
Are there some WUs with an unusual parameter set (some angles outside of [-2Pi, 2Pi] or whatever) and/or are you seeing some unusual results? The affected WUs just take longer, they still come out as valid, so at least the outlier detection does not kick in.
I will try to narrow it down and will report, if the stock app also shows this behaviour. At least if I can find an affected WU on my computers.
Angles outside of +/- 2PI shouldn't effect the calculation time at all.
What might be happening is there's a line in the optimized client code that does the following check:
if (alpha==delta==1) {
double rs = rg + r0;
(*bg_prob) += qw_r3_N[i] / (rg*rs*rs*rs);
} else (*bg_prob) += qw_r3_N[i] / (pow(rg, alpha) * pow(rg + r0, alpha_delta3));
What might be happening is that since alpha and delta are doubles (not integers) they might not always evaluate == 1, which would make the application do the more expensive calculation.
It'd be doing 2 pows and a multiply, instead of 3 multiplies in the middle of our integral loop, which could make it run quite a bit longer.
|
|
Cluster Physik
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0
|
Angles outside of +/- 2PI shouldn't effect the calculation time at all. Who knows? But the trigonometric stuff is really unlikely. Maybe these WUs are hitting a limit of the argument range for some other function causing some additional work. I have no idea in the moment.
What might be happening is there's a line in the optimized client code that does the following check:
if (alpha==delta==1) {
double rs = rg + r0;
(*bg_prob) += qw_r3_N[i] / (rg*rs*rs*rs);
} else (*bg_prob) += qw_r3_N[i] / (pow(rg, alpha) * pow(rg + r0, alpha_delta3));
What might be happening is that since alpha and delta are doubles (not integers) they might not always evaluate == 1, which would make the application do the more expensive calculation.
It'd be doing 2 pows and a multiply, instead of 3 multiplies in the middle of our integral loop, which could make it run quite a bit longer.
Have you seen this behaviour also with the stock application?
If this is the reason, it would surprise me as 1.0 has an exact binary representation and the values in alpha and delta are just read from the parameters file and not the result of a calculation. I think the runtime difference is slightly too small for this being the reason. Furthermore it should have been visible much earlier, not only in the last days. Already the public version of the optimized 1.22 app has used this trick.
Maybe I find some time to look into it later today. First I need to find an affected WU to play around with. But in the moment I'm busy trying to get the Win32 ATI GPU app running ;)
|
|
Cluster Physik
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0
|
It is definitely not the selection of the code path going wrong.
Just found two neighbouring WUs:
ps_s23_9_search_parameters_132749_1235002485
ps_s23_9_search_parameters_132750_1235002485
The first being a slow one, the second takes the normal time. The difference is that for the slow one the first stream integral (it is a dual stream WU) is very close to zero (10^-269 after half of the WU, just looked in a checkpoint). The second one shows more common values (some hundred).
I guess what happens is an underflow within the calculations. The handling of the denormalized floating point numbers most likely cause the speed difference.
I have saved the parameter files, so one can investigate that later. Have to get some sleep now (it's 2:13 a.m. here).
|
|
arkayn
Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0
|
Yep, just got a pair as well.
One is blazing and the other iced.
|
|
Paul D. Buck
Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0
|
What might be happening is that since alpha and delta are doubles (not integers) they might not always evaluate == 1, which would make the application do the more expensive calculation.
It'd be doing 2 pows and a multiply, instead of 3 multiplies in the middle of our integral loop, which could make it run quite a bit longer.
Should you not be then doing an ABS difference and then checking to see if they are within a tolerance?
Probably not C code but:
if ABS( alpha - delta) < 0.0001 then ... (or whatever the error bound is ...
|
|
Cluster Physik
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0
|
Yep, just got a pair as well.
One is blazing and the other iced.
Is that your Linux Notebook or the OSX one? For your Windows Core2Duo it's too slow I guess.
That would show it happens also with the stock app (OSX) and the Linux variant (compiled from unaltered sources). That would make it a general problem.
Maybe not for the GPU app (at least performance wise) as GPUs flush denormals to zero. In fact, I have not found slow WUs running on a GPU, yet.
So we are back to some strange parameters of the affected WUs or some quirk of the algorithm present in all applications. Have to work now, a more thorough look at the problem has to wait a bit.
|
|
Cluster Physik
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0
|
What might be happening is that since alpha and delta are doubles (not integers) they might not always evaluate == 1, which would make the application do the more expensive calculation.
It'd be doing 2 pows and a multiply, instead of 3 multiplies in the middle of our integral loop, which could make it run quite a bit longer.
Should you not be then doing an ABS difference and then checking to see if they are within a tolerance?
Probably not C code but:
if ABS( alpha - delta) < 0.0001 then ... (or whatever the error bound is ...
As I explained above, the values alpha and delta are not the result of a calculation, they are read from a parameter file. It is okay to test them for being equal to exactly 1.0.
|
|
Travis Volunteer moderator Project administrator Project developer Project tester Project scientist
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0
|
What might be happening is that since alpha and delta are doubles (not integers) they might not always evaluate == 1, which would make the application do the more expensive calculation.
It'd be doing 2 pows and a multiply, instead of 3 multiplies in the middle of our integral loop, which could make it run quite a bit longer.
Should you not be then doing an ABS difference and then checking to see if they are within a tolerance?
Probably not C code but:
if ABS( alpha - delta) < 0.0001 then ... (or whatever the error bound is ...
As I explained above, the values alpha and delta are not the result of a calculation, they are read from a parameter file. It is okay to test them for being equal to exactly 1.0.
Yeah, I haven't had any problems with this either, however if we compile our code using MPI it does throw a warning. That's why I thought it might be a problem.
|
|
Travis Volunteer moderator Project administrator Project developer Project tester Project scientist
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0
|
So we are back to some strange parameters of the affected WUs or some quirk of the algorithm present in all applications. Have to work now, a more thorough look at the problem has to wait a bit.
Yeah this is very strange, especially since the code is pretty deterministic, just a bunch of for loops with some calculations in the middle. I guess the next step would be to recreate the problem.
If anyone sees a WU running slowly and can copy the parameter files for it and post them here (or in the code discussion forum), that would be really helpful in seeing if it's some problem with the parameter files or something else.
|
|
Cluster Physik
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0
|
So we are back to some strange parameters of the affected WUs or some quirk of the algorithm present in all applications. Have to work now, a more thorough look at the problem has to wait a bit.
Yeah this is very strange, especially since the code is pretty deterministic, just a bunch of for loops with some calculations in the middle. I guess the next step would be to recreate the problem.
If anyone sees a WU running slowly and can copy the parameter files for it and post them here (or in the code discussion forum), that would be really helpful in seeing if it's some problem with the parameter files or something else.
Done that already.
slow WU (ps_s23_9_132749_1235002485):
ps_s23_9
parameters [14]: 0.384982866758338 11.716938315232948 -9.530803058571149 244.000000000000000 24.879482473428617 0.671980681029048 1.195094522806092 0.347010963442642 -2.503350179338553 127.081445210280563 42.589815649952655 -0.773784478366408 -0.170601219297428 4.217621990463419
metadata: p: 25, v: -0.32212604431145441808 -7.24983019171141052084 -9.99016639791811300597 83.60989900627373572206 -3.96856150450361511162 -0.89646890334008133561 -0.14631965843522620752 -3.41733213852705741687 -2.50228714667047125175 -12.67721075492640459004 -1.45595380931267825986 -0.55616183961568199656 -0.11059700452165788231 1.41831072203777797114
normal WU (ps_s23_9_132750_1235002485):
ps_s23_9
parameters [14]: 0.500328464135606 14.146488926207834 -7.482278426832348 125.656033665464520 34.791471228744534 -0.409786454609398 0.197787903906911 10.587824794891745 -10.612079867124347 120.000000000000000 39.700270546547692 -2.369933143104002 0.456520722526815 11.994485610071056
metadata: p: 26, v: -0.00658038053454158313 -0.42932120142078844527 2.49709041383319219065 -10.91291757291236130811 -10.26014421603925796944 -0.96618751761879506645 -0.09580232216327819184 7.53826578563810478784 -1.87679150831014185385 0.00000000000000000000 -5.57224893664772302770 -3.01263491905762048617 0.65262994540488483697 1.64374790339396126271
Both should use the astronomy parameters:
number_parameters: 4
background_weight: 0.000000
background_parameters[4]: 1.00000000000000000000, 0.56999999999999995115, 12.00000000000000000000, 1.00000000000000000000
background_step[4]: 0.02000000000000000042, 0.00000400000000000000, 0.00008000000000000001, 0.02000000000000000042
background_min[4]: 0.00000000000000000000, 0.29999999999999998890, 1.00000000000000000000, 0.10000000000000000555
background_max[4]: 3.00000000000000000000, 1.00000000000000000000, 30.00000000000000000000, 3.00000000000000000000
optimize_parameter[4]: 0, 1, 1, 0
number_streams: 2, 5
stream_weight: -3.300000
stream_weight_step: 0.000001
stream_weight_min: -20.000000
stream_weight_max: 20.000000
optimize_weight: 1
stream_parameters[5]: 148.00000000000000000000, 22.00000000000000000000, 0.41999999999999998446, -0.46999999999999997335, 0.80000000000000004441
stream_step[5]: 0.00003000000000000000, 0.00004000000000000000, 0.00006000000000000000, 0.00004000000000000000, 0.00000400000000000000
stream_min[5]: 120.00000000000000000000, 2.29999999999999982236, -6.28318530717958623200, -6.28318530717958623200, 0.10000000000000000555
stream_max[5]: 244.00000000000000000000, 45.60000000000000142109, 6.28318530717958623200, 6.28318530717958623200, 20.00000000000000000000
optimize_parameter[5]: 1, 1, 1, 1, 1
stream_weight: -1.400000
stream_weight_step: 0.000001
stream_weight_min: -20.000000
stream_weight_max: 20.000000
optimize_weight: 1
stream_parameters[5]: 177.00000000000000000000, 23.00000000000000000000, 1.19999999999999995559, -1.60000000000000008882, 8.50000000000000000000
stream_step[5]: 0.00003000000000000000, 0.00004000000000000000, 0.00006000000000000000, 0.00004000000000000000, 0.00000400000000000000
stream_min[5]: 120.00000000000000000000, 2.39999999999999991118, -6.28318530717958623200, -6.28318530717958623200, 0.10000000000000000555
stream_max[5]: 244.00000000000000000000, 45.60000000000000142109, 6.28318530717958623200, 6.28318530717958623200, 20.00000000000000000000
optimize_parameter[5]: 1, 1, 1, 1, 1
convolve: 60
sgr_coordinates: 0
wedge: 23
r[min,max,steps]: 16.000000, 22.500000, 700
mu[min,max,steps]: 120.000000, 244.000000, 1600
nu[min,max,steps]: -1.250000, 1.250000, 160
number_cuts: 1
r_cut[min,max,steps][3]: 16.000000, 22.500000, 350
mu_cut[min,max,steps][3]: 207.000000, 209.000000, 400
nu_cut[min,max,steps][3]: -1.250000, -0.800000, 40
As already mentioned the stream_integral[0] comes out very low for the slow WU (which gives a quite bad fitness value). Most likely the application experiences an underflow condition and the handling of the denormalized numbers slows things down.
|
|
Travis Volunteer moderator Project administrator Project developer Project tester Project scientist
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0
|
slow WU (ps_s23_9_132749_1235002485):
ps_s23_9
parameters [14]: 0.384982866758338 11.716938315232948 -9.530803058571149 244.000000000000000 24.879482473428617 0.671980681029048 1.195094522806092 0.347010963442642 -2.503350179338553 127.081445210280563 42.589815649952655 -0.773784478366408 -0.170601219297428 4.217621990463419
metadata: p: 25, v: -0.32212604431145441808 -7.24983019171141052084 -9.99016639791811300597 83.60989900627373572206 -3.96856150450361511162 -0.89646890334008133561 -0.14631965843522620752 -3.41733213852705741687 -2.50228714667047125175 -12.67721075492640459004 -1.45595380931267825986 -0.55616183961568199656 -0.11059700452165788231 1.41831072203777797114
normal WU (ps_s23_9_132750_1235002485):
ps_s23_9
parameters [14]: 0.500328464135606 14.146488926207834 -7.482278426832348 125.656033665464520 34.791471228744534 -0.409786454609398 0.197787903906911 10.587824794891745 -10.612079867124347 120.000000000000000 39.700270546547692 -2.369933143104002 0.456520722526815 11.994485610071056
metadata: p: 26, v: -0.00658038053454158313 -0.42932120142078844527 2.49709041383319219065 -10.91291757291236130811 -10.26014421603925796944 -0.96618751761879506645 -0.09580232216327819184 7.53826578563810478784 -1.87679150831014185385 0.00000000000000000000 -5.57224893664772302770 -3.01263491905762048617 0.65262994540488483697 1.64374790339396126271
The v in the metadata shouldn't effect what the application is doing. The metadata is basically just cut and pasted from the parameters to the output. The v is just that particular parameter sets velocity for particle swarm optimization.
|
|
Travis Volunteer moderator Project administrator Project developer Project tester Project scientist
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0
|
It might be because one of the stream parameters is at the bound in the main integral (ie, the expensive integral):
slow WU (ps_s23_9_132749_1235002485):
ps_s23_9
parameters [14]: 0.384982866758338 11.716938315232948 -9.530803058571149 244.000000000000000 24.879482473428617 0.671980681029048 1.195094522806092 0.347010963442642 -2.503350179338553 127.081445210280563 42.589815649952655 -0.773784478366408 -0.170601219297428 4.217621990463419
metadata: p: 25, v: -0.32212604431145441808 -7.24983019171141052084 -9.99016639791811300597 83.60989900627373572206 -3.96856150450361511162 -0.89646890334008133561 -0.14631965843522620752 -3.41733213852705741687 -2.50228714667047125175 -12.67721075492640459004 -1.45595380931267825986 -0.55616183961568199656 -0.11059700452165788231 1.41831072203777797114
normal WU (ps_s23_9_132750_1235002485):
ps_s23_9
parameters [14]: 0.500328464135606 14.146488926207834 -7.482278426832348 125.656033665464520 34.791471228744534 -0.409786454609398 0.197787903906911 10.587824794891745 -10.612079867124347 120.000000000000000 39.700270546547692 -2.369933143104002 0.456520722526815 11.994485610071056
metadata: p: 26, v: -0.00658038053454158313 -0.42932120142078844527 2.49709041383319219065 -10.91291757291236130811 -10.26014421603925796944 -0.96618751761879506645 -0.09580232216327819184 7.53826578563810478784 -1.87679150831014185385 0.00000000000000000000 -5.57224893664772302770 -3.01263491905762048617 0.65262994540488483697 1.64374790339396126271
|
|
Cluster Physik
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0
|
It might be because one of the stream parameters is at the bound in the main integral (ie, the expensive integral):
slow WU (ps_s23_9_132749_1235002485):
ps_s23_9
parameters [14]: 0.384982866758338 11.716938315232948 -9.530803058571149 244.000000000000000 [snip]
normal WU (ps_s23_9_132750_1235002485):
ps_s23_9
parameters [14]: 0.500328464135606 14.146488926207834 -7.482278426832348 125.656033665464520 [snip]
So maybe the bound can be lowered a bit to prevent these cases (it appears to be a bad fit either way from the fitness values of the affected WUs).
|
|
arkayn
Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0
|
It was the OSX host, too bad it happened just before I went to bed.
|
|
Travis Volunteer moderator Project administrator Project developer Project tester Project scientist
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0
|
It might be because one of the stream parameters is at the bound in the main integral (ie, the expensive integral):
slow WU (ps_s23_9_132749_1235002485):
ps_s23_9
parameters [14]: 0.384982866758338 11.716938315232948 -9.530803058571149 244.000000000000000 [snip]
normal WU (ps_s23_9_132750_1235002485):
ps_s23_9
parameters [14]: 0.500328464135606 14.146488926207834 -7.482278426832348 125.656033665464520 [snip]
So maybe the bound can be lowered a bit to prevent these cases (it appears to be a bad fit either way from the fitness values of the affected WUs).
I'll get a change in the boundary checking code today which will keep things slightly inside the bounds, which hopefully will fix the problem.
|
|
Cluster Physik
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0
|
It might be because one of the stream parameters is at the bound in the main integral (ie, the expensive integral):
slow WU (ps_s23_9_132749_1235002485):
ps_s23_9
parameters [14]: 0.384982866758338 11.716938315232948 -9.530803058571149 244.000000000000000 [snip]
normal WU (ps_s23_9_132750_1235002485):
ps_s23_9
parameters [14]: 0.500328464135606 14.146488926207834 -7.482278426832348 125.656033665464520 [snip]
So maybe the bound can be lowered a bit to prevent these cases (it appears to be a bad fit either way from the fitness values of the affected WUs).
I'll get a change in the boundary checking code today which will keep things slightly inside the bounds, which hopefully will fix the problem.
Just tested it, one needs quite some distance to the bounds, just setting this parameter to 243.0 is not a solution. Furthermore, the slowdown doesn't disappear suddenly, it just gets less and less pronounced.
|
|