Message boards :
Application Code Discussion :
Possible Bug in stPbxConvolved
Message board moderation
Author | Message |
---|---|
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
Hello! First post here it seems. The discussion about the optimized binary compiled by Crunch3r sparked my interest and I had a look at the source code. I noticed an inconsistency in the declaration of the stPsgConvolved in probability.h and .c respectively. In the header it is declared double stPbxConvolved(const double* coordpar, const double* bpars, int numpoints, int wedge); double stPsgConvolved(const double* coordpar, const double* spars, int numpoints, int wedge); But in the .c file it looks like double stPbxConvolved(const double* coordpar, const double* bpars, int numpoints, int wedge) and double stPsgConvolved(const double* coordpar, const double* spars, int wedge, int numpoints) Note the switched positions of the parameters numpoints and wedge. When looking at how these two function are invoked and call qgaus, I come to the conclusion that the header file simply doesn't reflect the correct ordering of the parameters and in the implementation it is in the wrong order for stPbxConvolved. Effectively this leads to the calculation of the integral in qgaus with a wrong number of interpolation points (wedge instead of numpoints, that means 82 instead of 30 with current WUs). In the moment, it may not make much of a difference to the numerical results, but it will certainly break if the parameters of the WUs may change in the future. A side effect is that the integration loops more often in qgaus as requested by the WU, which costs some calculation speed. |
Send message Joined: 5 Feb 08 Posts: 236 Credit: 49,648 RAC: 0 |
Hello! wow good catch. let me see if this has been fixed. Dave Przybylo MilkyWay@home Developer Department of Computer Science Rensselaer Polytechnic Institute |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
wow good catch. let me see if this has been fixed. You are welcome. May I add a follow-on question? Besides the general inefficiency of the implementation of your algorithm that was highlighted by Crunch3r in no unclear terms and is evident by the huge gains possible with an optimized application, it appears to me the application spends an awful lot of time in converting between different coordinate sytems. This can only be alleviated by a more efficient implementation, not eliminated. Have you thought about a different algorithm, that makes this conversions unnecessary? In layman terms this means: just stick to a cleverly chosen coordinate system, calculate your stuff and you are done faster than even with Crunch3rs app that still convert values between 3 different coordinate systems (I don't think he changed the algorithm). May this be somehow possible? I know, this probably means a lot of work to do, but the return would be a much more streamlined version. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Well this was weird, i took a look over the code and even though the headers were wrong the right values were being passed to the functions :P I've updated the code to at least make it consistent. A lot of the astronomy code is pretty old legacy code (written by astronomy grad students and not computer science students) so i'm willing to bet there'll be more things like this in there :( |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
Well this was weird, i took a look over the code and even though the headers were wrong the right values were being passed to the functions :P I've updated the code to at least make it consistent. Well, that's not true for the version I am just looking at. Let's trace the function calls: In evalution.c it is called as stPbxConvolved(integral_point, ap->background_parameters, ap->wedge, ap->convolve); // wedge in penultimate position Within stPbxConvolved wedge is passed to the numpoints variable (because of the wrong ordering) that reads in probability.c double stPbxConvolved(const double* coordpar, const double* bpars, int numpoints, int wedge) // wedge and numpoints get switched! it is passed to qgaus also in the penultimate position as pbx = qgaus(backgroundConvolve, a, b, numpoints, wedge); // numpoints and wedge still switched And, you may already guess it, qgaus is declared as double qgaus(double (*func)(double, int), double a, double b, int n, int wedge) // numpoints and wedge stay switched This leads to the integration with a number of interpolation points of wedge (normally set to 82) instead of numpoints, respectively ap->convolve (set to 30 in current WUs). Or does it look different over at you? |
Send message Joined: 5 Feb 08 Posts: 236 Credit: 49,648 RAC: 0 |
Well this was weird, i took a look over the code and even though the headers were wrong the right values were being passed to the functions :P I've updated the code to at least make it consistent. This looks to be the case. Good eye. Travis, when you check in the final code, I can change the function calls to match if that's what you want. I did follow it through and it does look like the wrong variables get passed to qgaus. Dave Przybylo MilkyWay@home Developer Department of Computer Science Rensselaer Polytechnic Institute |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
This looks to be the case. Good eye. Travis, when you check in the final code, I can change the function calls to match if that's what you want. I did follow it through and it does look like the wrong variables get passed to qgaus. When you are at it, you could strip the unnecessary weight out of qgaus. I guess this will help the performance a lot. |
Send message Joined: 5 Feb 08 Posts: 236 Credit: 49,648 RAC: 0 |
This looks to be the case. Good eye. Travis, when you check in the final code, I can change the function calls to match if that's what you want. I did follow it through and it does look like the wrong variables get passed to qgaus. I'll take a look at qgaus. Travis and Nate usually handle all the astronomy related coding. So I'd be looking at code I didn't know much about. Dave Przybylo MilkyWay@home Developer Department of Computer Science Rensselaer Polytechnic Institute |
Send message Joined: 1 Oct 08 Posts: 106 Credit: 24,162,445 RAC: 0 |
I'll take a look at qgaus. Travis and Nate usually handle all the astronomy related coding. So I'd be looking at code I didn't know much about. That's good. Frankly speaking, considering the code quality, I am surprised there are not more users with an optimized application around. You should hurry with the new version! |
Send message Joined: 21 Dec 07 Posts: 69 Credit: 7,048,412 RAC: 0 |
Frankly speaking, considering the code quality, I am surprised there are not more users with an optimized application around. You should hurry with the new version! Yeah, I'd like yours - I can't match the output of your Athlon X2 with more than a dozen cores of mine using the standard app... Glad to see you're not in a team, as it would really upset the balance of team statistics. Join the #1 Aussie Alliance on MilkyWay! |
Send message Joined: 1 Oct 08 Posts: 106 Credit: 24,162,445 RAC: 0 |
Yeah, I'd like yours - I can't match the output of your Athlon X2 with more than a dozen cores of mine using the standard app... Glad to see you're not in a team, as it would really upset the balance of team statistics. As mentioned in our profile, we use Virtual Machines to get enough WUs to crunch. So don't take the entries in the top computer list too serious. To give you some numbers of the total possible output for different CPUs: Athlon64 X2 3.2GHz: 140-145k/day 65nm Core2Quad 3.2GHz: 330-340k/day 45nm Core2Quad 3.2GHz: >400k/day And of course I am personally in a team. I only devoted some part of my farm to this Milksop account (together with a partner, the majority of the crunching power is actually his), which is intentionally a teamless account. We thought it would be an unfair advantage our team would get from the situation, so we refrained from joining our team with this account. I hope this will help to convince the people that we don't just try to grab a lot of credits. We really want to end the incredible waste of energy and crunching power by this project. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
We really want to end the incredible waste of energy and crunching power by this project. I think many/most people want to use Cruncher's app(or any other) to speed along the wu's. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 21 Aug 08 Posts: 625 Credit: 558,425 RAC: 0 |
We really want to end the incredible waste of energy and crunching power by this project. Well, I don't know what all was done by Crunch3r, and these people don't have a V8 running (at least I don't think they do), but there was even more of a speedup for what he did. As I understand it, the claim is that all that has been "tuned" are redundant or unnecessary lines of code and/or ways to do the code better, but NOT the use of SSE or any of the SIMD optimizations and no compiler optimizations to favor a specific processor architecture. |
Send message Joined: 22 Feb 08 Posts: 260 Credit: 57,387,048 RAC: 0 |
Glad to see you're not in a team, as it would really upset the balance of team statistics. Fine. But they mess up the country stats. Go internetional! mic. |
Send message Joined: 1 Oct 08 Posts: 106 Credit: 24,162,445 RAC: 0 |
Well, I don't know what all was done by Crunch3r, and these people don't have a V8 running (at least I don't think they do), but there was even more of a speedup for what he did. Yes, we don't have a V8 running. The list of computers in our profile is correct (some of the Quads don't even run 24/7). Our application is faster than Crunch3rs original app, but just two days ago or so, he had another look to the source and is now again more than a factor 2 faster than our app. He only didn't take the effort to install multiple VirtualMachines on his V8, so he runs into a number of limits. Potentially, he could calculate almost 6,000 260cr WUs a day with his V8 (worth 1.5 million). But you would need A LOT of VMs to get that to work. Furthermore he noted with some more effort, it could be even another factor of five to six faster (about 20s for a 260cr WU). And I believe that. Concerning the optimization towards a specific architecture one has to say, that it is not done in the code. But of course you can tell the compiler for what architecture it should optimize. But the additional speedup going from a P3 target to a Core 2 target (instruction scheduling optimized for Core2 and allowing SSSE3 intructions) only leads to a 15% to 20% improvement. I haven't looked if it actually generates SSE/2/3 instructions (I would think so as it is often faster even when not vectorized), but according to Crunch3r, his app has ~99% of its current speed even without the SSE stuff. So I guess the relatively minor speed increase comes mainly from the optimized instruction scheduling when specifying a Core 2 as target not from SSE/2/3 instructions. |
Send message Joined: 1 Oct 08 Posts: 106 Credit: 24,162,445 RAC: 0 |
But they mess up the country stats. You are not a patriot! I think in the US that short sentence would do it. *lol* |
Send message Joined: 10 Aug 08 Posts: 5 Credit: 10,415,493 RAC: 24,561 |
Then international guys come and complain, hey, don't f**k up international statistics, switch to the country you come from ;-) Guys, check the profile of Milksop at try and check out what boxes they are running and how big the improvements by some code clean up actually is! It is up to 75* faster than the projects standard app... Since I learned to know how inefficient the current official app is, I removed my C2D and C2Q from the project. Just an old AthlonXP remains to complete the 100.000 Credits, than it will also be removed if there is no progress from the project leaders side... edit: I was way to slow... |
Send message Joined: 1 Oct 08 Posts: 106 Credit: 24,162,445 RAC: 0 |
Guys, check the profile of Milksop at try and check out what boxes they are running and how big the improvements by some code clean up actually is! But as I said, this is not the limit. Crunch3r thinks it is more at a factor of 1000 or so. |
Send message Joined: 22 Feb 08 Posts: 260 Credit: 57,387,048 RAC: 0 |
|
©2024 Astroinformatics Group