Possible Bug in stPbxConvolved

Author	Message
Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 5267 - Posted: 30 Sep 2008, 16:01:34 UTC Hello! First post here it seems. The discussion about the optimized binary compiled by Crunch3r sparked my interest and I had a look at the source code. I noticed an inconsistency in the declaration of the stPsgConvolved in probability.h and .c respectively. In the header it is declared double stPbxConvolved(const double* coordpar, const double* bpars, int numpoints, int wedge); double stPsgConvolved(const double* coordpar, const double* spars, int numpoints, int wedge); But in the .c file it looks like double stPbxConvolved(const double* coordpar, const double* bpars, int numpoints, int wedge) and double stPsgConvolved(const double* coordpar, const double* spars, int wedge, int numpoints) Note the switched positions of the parameters numpoints and wedge. When looking at how these two function are invoked and call qgaus, I come to the conclusion that the header file simply doesn't reflect the correct ordering of the parameters and in the implementation it is in the wrong order for stPbxConvolved. Effectively this leads to the calculation of the integral in qgaus with a wrong number of interpolation points (wedge instead of numpoints, that means 82 instead of 30 with current WUs). In the moment, it may not make much of a difference to the numerical results, but it will certainly break if the parameters of the WUs may change in the future. A side effect is that the integration loops more often in qgaus as requested by the WU, which costs some calculation speed. ID: 5267 · Rating: 0 · rate: / Reply Quote

Dave Przybylo Send message Joined: 5 Feb 08 Posts: 236 Credit: 49,648 RAC: 0	Message 5271 - Posted: 30 Sep 2008, 18:31:39 UTC - in response to Message 5267. Hello! First post here it seems. The discussion about the optimized binary compiled by Crunch3r sparked my interest and I had a look at the source code. I noticed an inconsistency in the declaration of the stPsgConvolved in probability.h and .c respectively. In the header it is declared double stPbxConvolved(const double* coordpar, const double* bpars, int numpoints, int wedge); double stPsgConvolved(const double* coordpar, const double* spars, int numpoints, int wedge); But in the .c file it looks like double stPbxConvolved(const double* coordpar, const double* bpars, int numpoints, int wedge) and double stPsgConvolved(const double* coordpar, const double* spars, int wedge, int numpoints) Note the switched positions of the parameters numpoints and wedge. When looking at how these two function are invoked and call qgaus, I come to the conclusion that the header file simply doesn't reflect the correct ordering of the parameters and in the implementation it is in the wrong order for stPbxConvolved. Effectively this leads to the calculation of the integral in qgaus with a wrong number of interpolation points (wedge instead of numpoints, that means 82 instead of 30 with current WUs). In the moment, it may not make much of a difference to the numerical results, but it will certainly break if the parameters of the WUs may change in the future. A side effect is that the integration loops more often in qgaus as requested by the WU, which costs some calculation speed. wow good catch. let me see if this has been fixed. Dave Przybylo MilkyWay@home Developer Department of Computer Science Rensselaer Polytechnic Institute ID: 5271 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 5272 - Posted: 30 Sep 2008, 19:57:38 UTC - in response to Message 5271. Last modified: 30 Sep 2008, 20:05:09 UTC wow good catch. let me see if this has been fixed. You are welcome. May I add a follow-on question? Besides the general inefficiency of the implementation of your algorithm that was highlighted by Crunch3r in no unclear terms and is evident by the huge gains possible with an optimized application, it appears to me the application spends an awful lot of time in converting between different coordinate sytems. This can only be alleviated by a more efficient implementation, not eliminated. Have you thought about a different algorithm, that makes this conversions unnecessary? In layman terms this means: just stick to a cleverly chosen coordinate system, calculate your stuff and you are done faster than even with Crunch3rs app that still convert values between 3 different coordinate systems (I don't think he changed the algorithm). May this be somehow possible? I know, this probably means a lot of work to do, but the return would be a much more streamlined version. ID: 5272 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 5273 - Posted: 30 Sep 2008, 20:07:11 UTC - in response to Message 5271. Well this was weird, i took a look over the code and even though the headers were wrong the right values were being passed to the functions :P I've updated the code to at least make it consistent. A lot of the astronomy code is pretty old legacy code (written by astronomy grad students and not computer science students) so i'm willing to bet there'll be more things like this in there :( ID: 5273 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 5274 - Posted: 30 Sep 2008, 20:37:29 UTC - in response to Message 5273. Last modified: 30 Sep 2008, 20:38:02 UTC Well this was weird, i took a look over the code and even though the headers were wrong the right values were being passed to the functions :P I've updated the code to at least make it consistent. Well, that's not true for the version I am just looking at. Let's trace the function calls: In evalution.c it is called as stPbxConvolved(integral_point, ap->background_parameters, ap->wedge, ap->convolve); // wedge in penultimate position Within stPbxConvolved wedge is passed to the numpoints variable (because of the wrong ordering) that reads in probability.c double stPbxConvolved(const double* coordpar, const double* bpars, int numpoints, int wedge) // wedge and numpoints get switched! it is passed to qgaus also in the penultimate position as pbx = qgaus(backgroundConvolve, a, b, numpoints, wedge); // numpoints and wedge still switched And, you may already guess it, qgaus is declared as double qgaus(double (*func)(double, int), double a, double b, int n, int wedge) // numpoints and wedge stay switched This leads to the integration with a number of interpolation points of wedge (normally set to 82) instead of numpoints, respectively ap->convolve (set to 30 in current WUs). Or does it look different over at you? ID: 5274 · Rating: 0 · rate: / Reply Quote

Dave Przybylo Send message Joined: 5 Feb 08 Posts: 236 Credit: 49,648 RAC: 0	Message 5275 - Posted: 30 Sep 2008, 22:03:54 UTC - in response to Message 5274. Last modified: 30 Sep 2008, 22:07:12 UTC Well this was weird, i took a look over the code and even though the headers were wrong the right values were being passed to the functions :P I've updated the code to at least make it consistent. Well, that's not true for the version I am just looking at. Let's trace the function calls: In evalution.c it is called as stPbxConvolved(integral_point, ap->background_parameters, ap->wedge, ap->convolve); // wedge in penultimate position Within stPbxConvolved wedge is passed to the numpoints variable (because of the wrong ordering) that reads in probability.c double stPbxConvolved(const double* coordpar, const double* bpars, int numpoints, int wedge) // wedge and numpoints get switched! it is passed to qgaus also in the penultimate position as pbx = qgaus(backgroundConvolve, a, b, numpoints, wedge); // numpoints and wedge still switched And, you may already guess it, qgaus is declared as double qgaus(double (*func)(double, int), double a, double b, int n, int wedge) // numpoints and wedge stay switched This leads to the integration with a number of interpolation points of wedge (normally set to 82) instead of numpoints, respectively ap->convolve (set to 30 in current WUs). Or does it look different over at you? This looks to be the case. Good eye. Travis, when you check in the final code, I can change the function calls to match if that's what you want. I did follow it through and it does look like the wrong variables get passed to qgaus. Dave Przybylo MilkyWay@home Developer Department of Computer Science Rensselaer Polytechnic Institute ID: 5275 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 5277 - Posted: 30 Sep 2008, 22:18:25 UTC - in response to Message 5275. Last modified: 30 Sep 2008, 22:18:59 UTC This looks to be the case. Good eye. Travis, when you check in the final code, I can change the function calls to match if that's what you want. I did follow it through and it does look like the wrong variables get passed to qgaus. When you are at it, you could strip the unnecessary weight out of qgaus. I guess this will help the performance a lot. ID: 5277 · Rating: 0 · rate: / Reply Quote

Dave Przybylo Send message Joined: 5 Feb 08 Posts: 236 Credit: 49,648 RAC: 0	Message 5278 - Posted: 30 Sep 2008, 22:35:01 UTC - in response to Message 5277. This looks to be the case. Good eye. Travis, when you check in the final code, I can change the function calls to match if that's what you want. I did follow it through and it does look like the wrong variables get passed to qgaus. When you are at it, you could strip the unnecessary weight out of qgaus. I guess this will help the performance a lot. I'll take a look at qgaus. Travis and Nate usually handle all the astronomy related coding. So I'd be looking at code I didn't know much about. Dave Przybylo MilkyWay@home Developer Department of Computer Science Rensselaer Polytechnic Institute ID: 5278 · Rating: 0 · rate: / Reply Quote

Milksop at try Send message Joined: 1 Oct 08 Posts: 106 Credit: 24,162,445 RAC: 0	Message 5283 - Posted: 1 Oct 2008, 17:38:47 UTC - in response to Message 5278. I'll take a look at qgaus. Travis and Nate usually handle all the astronomy related coding. So I'd be looking at code I didn't know much about. That's good. Frankly speaking, considering the code quality, I am surprised there are not more users with an optimized application around. You should hurry with the new version! ID: 5283 · Rating: 0 · rate: / Reply Quote

Webmaster Yoda Send message Joined: 21 Dec 07 Posts: 69 Credit: 7,048,412 RAC: 0	Message 5334 - Posted: 7 Oct 2008, 4:34:38 UTC - in response to Message 5283. Last modified: 7 Oct 2008, 4:35:25 UTC Frankly speaking, considering the code quality, I am surprised there are not more users with an optimized application around. You should hurry with the new version! Yeah, I'd like yours - I can't match the output of your Athlon X2 with more than a dozen cores of mine using the standard app... Glad to see you're not in a team, as it would really upset the balance of team statistics. Join the #1 Aussie Alliance on MilkyWay! ID: 5334 · Rating: 0 · rate: / Reply Quote

Milksop at try Send message Joined: 1 Oct 08 Posts: 106 Credit: 24,162,445 RAC: 0	Message 5346 - Posted: 7 Oct 2008, 22:30:20 UTC - in response to Message 5334. Yeah, I'd like yours - I can't match the output of your Athlon X2 with more than a dozen cores of mine using the standard app... Glad to see you're not in a team, as it would really upset the balance of team statistics. As mentioned in our profile, we use Virtual Machines to get enough WUs to crunch. So don't take the entries in the top computer list too serious. To give you some numbers of the total possible output for different CPUs: Athlon64 X2 3.2GHz: 140-145k/day 65nm Core2Quad 3.2GHz: 330-340k/day 45nm Core2Quad 3.2GHz: >400k/day And of course I am personally in a team. I only devoted some part of my farm to this Milksop account (together with a partner, the majority of the crunching power is actually his), which is intentionally a teamless account. We thought it would be an unfair advantage our team would get from the situation, so we refrained from joining our team with this account. I hope this will help to convince the people that we don't just try to grab a lot of credits. We really want to end the incredible waste of energy and crunching power by this project. ID: 5346 · Rating: 0 · rate: / Reply Quote

banditwolf Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0	Message 5355 - Posted: 8 Oct 2008, 2:14:52 UTC - in response to Message 5346. We really want to end the incredible waste of energy and crunching power by this project. I think many/most people want to use Cruncher's app(or any other) to speed along the wu's. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. ID: 5355 · Rating: 0 · rate: / Reply Quote

Brian Silvers Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0	Message 5404 - Posted: 9 Oct 2008, 14:46:35 UTC - in response to Message 5355. We really want to end the incredible waste of energy and crunching power by this project. I think many/most people want to use Cruncher's app(or any other) to speed along the wu's. Well, I don't know what all was done by Crunch3r, and these people don't have a V8 running (at least I don't think they do), but there was even more of a speedup for what he did. As I understand it, the claim is that all that has been "tuned" are redundant or unnecessary lines of code and/or ways to do the code better, but NOT the use of SSE or any of the SIMD optimizations and no compiler optimizations to favor a specific processor architecture. ID: 5404 · Rating: 0 · rate: / Reply Quote

speedimic Send message Joined: 22 Feb 08 Posts: 260 Credit: 57,387,048 RAC: 0	Message 5414 - Posted: 9 Oct 2008, 17:27:10 UTC Glad to see you're not in a team, as it would really upset the balance of team statistics. Fine. But they mess up the country stats. Go internetional! mic. ID: 5414 · Rating: 0 · rate: / Reply Quote

Milksop at try Send message Joined: 1 Oct 08 Posts: 106 Credit: 24,162,445 RAC: 0	Message 5416 - Posted: 9 Oct 2008, 18:59:08 UTC - in response to Message 5404. Last modified: 9 Oct 2008, 19:06:00 UTC Well, I don't know what all was done by Crunch3r, and these people don't have a V8 running (at least I don't think they do), but there was even more of a speedup for what he did. As I understand it, the claim is that all that has been "tuned" are redundant or unnecessary lines of code and/or ways to do the code better, but NOT the use of SSE or any of the SIMD optimizations and no compiler optimizations to favor a specific processor architecture. Yes, we don't have a V8 running. The list of computers in our profile is correct (some of the Quads don't even run 24/7). Our application is faster than Crunch3rs original app, but just two days ago or so, he had another look to the source and is now again more than a factor 2 faster than our app. He only didn't take the effort to install multiple VirtualMachines on his V8, so he runs into a number of limits. Potentially, he could calculate almost 6,000 260cr WUs a day with his V8 (worth 1.5 million). But you would need A LOT of VMs to get that to work. Furthermore he noted with some more effort, it could be even another factor of five to six faster (about 20s for a 260cr WU). And I believe that. Concerning the optimization towards a specific architecture one has to say, that it is not done in the code. But of course you can tell the compiler for what architecture it should optimize. But the additional speedup going from a P3 target to a Core 2 target (instruction scheduling optimized for Core2 and allowing SSSE3 intructions) only leads to a 15% to 20% improvement. I haven't looked if it actually generates SSE/2/3 instructions (I would think so as it is often faster even when not vectorized), but according to Crunch3r, his app has ~99% of its current speed even without the SSE stuff. So I guess the relatively minor speed increase comes mainly from the optimized instruction scheduling when specifying a Core 2 as target not from SSE/2/3 instructions. ID: 5416 · Rating: 0 · rate: / Reply Quote

Milksop at try Send message Joined: 1 Oct 08 Posts: 106 Credit: 24,162,445 RAC: 0	Message 5417 - Posted: 9 Oct 2008, 19:05:04 UTC - in response to Message 5414. But they mess up the country stats. You are not a patriot! I think in the US that short sentence would do it. lol ID: 5417 · Rating: 0 · rate: / Reply Quote

koschi Send message Joined: 10 Aug 08 Posts: 5 Credit: 12,071,988 RAC: 31,888	Message 5418 - Posted: 9 Oct 2008, 19:07:19 UTC Last modified: 9 Oct 2008, 19:08:20 UTC Then international guys come and complain, hey, don't f*k up international statistics, switch to the country you come from ;-) Guys, check the profile of Milksop at try and check out what boxes they are running and how big the improvements by some code clean up actually is! It is up to 75 faster than the projects standard app... Since I learned to know how inefficient the current official app is, I removed my C2D and C2Q from the project. Just an old AthlonXP remains to complete the 100.000 Credits, than it will also be removed if there is no progress from the project leaders side... edit: I was way to slow... ID: 5418 · Rating: 0 · rate: / Reply Quote

Milksop at try Send message Joined: 1 Oct 08 Posts: 106 Credit: 24,162,445 RAC: 0	Message 5419 - Posted: 9 Oct 2008, 19:12:26 UTC - in response to Message 5418. Guys, check the profile of Milksop at try and check out what boxes they are running and how big the improvements by some code clean up actually is! It is up to 75* faster than the projects standard app... But as I said, this is not the limit. Crunch3r thinks it is more at a factor of 1000 or so. ID: 5419 · Rating: 0 · rate: / Reply Quote

speedimic Send message Joined: 22 Feb 08 Posts: 260 Credit: 57,387,048 RAC: 0	Message 5420 - Posted: 9 Oct 2008, 19:36:24 UTC Last modified: 9 Oct 2008, 19:36:45 UTC You are not a patriot! I think in the US that short sentence would do it. lol Where is the patriotism when push all German users down one rank? I was talking about This Stat, not that one. But wait - keep on as German user - we might overtake the US within some days.... :-)) mic. ID: 5420 · Rating: 0 · rate: / Reply Quote