Welcome to MilkyWay@home

Possible Bug in stPbxConvolved

Message boards : Application Code Discussion : Possible Bug in stPbxConvolved
Message board moderation

To post messages, you must log in.

AuthorMessage
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 5267 - Posted: 30 Sep 2008, 16:01:34 UTC

Hello!

First post here it seems. The discussion about the optimized binary compiled by Crunch3r sparked my interest and I had a look at the source code.
I noticed an inconsistency in the declaration of the stPsgConvolved in probability.h and .c respectively.

In the header it is declared

double stPbxConvolved(const double* coordpar, const double* bpars, int numpoints, int wedge);
double stPsgConvolved(const double* coordpar, const double* spars, int numpoints, int wedge);

But in the .c file it looks like

double stPbxConvolved(const double* coordpar, const double* bpars, int numpoints, int wedge)
and
double stPsgConvolved(const double* coordpar, const double* spars, int wedge, int numpoints)

Note the switched positions of the parameters numpoints and wedge. When looking at how these two function are invoked and call qgaus, I come to the conclusion that the header file simply doesn't reflect the correct ordering of the parameters and in the implementation it is in the wrong order for stPbxConvolved.
Effectively this leads to the calculation of the integral in qgaus with a wrong number of interpolation points (wedge instead of numpoints, that means 82 instead of 30 with current WUs).

In the moment, it may not make much of a difference to the numerical results, but it will certainly break if the parameters of the WUs may change in the future.
A side effect is that the integration loops more often in qgaus as requested by the WU, which costs some calculation speed.
ID: 5267 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Dave Przybylo
Avatar

Send message
Joined: 5 Feb 08
Posts: 236
Credit: 49,648
RAC: 0
Message 5271 - Posted: 30 Sep 2008, 18:31:39 UTC - in response to Message 5267.  

Hello!

First post here it seems. The discussion about the optimized binary compiled by Crunch3r sparked my interest and I had a look at the source code.
I noticed an inconsistency in the declaration of the stPsgConvolved in probability.h and .c respectively.

In the header it is declared

double stPbxConvolved(const double* coordpar, const double* bpars, int numpoints, int wedge);
double stPsgConvolved(const double* coordpar, const double* spars, int numpoints, int wedge);

But in the .c file it looks like

double stPbxConvolved(const double* coordpar, const double* bpars, int numpoints, int wedge)
and
double stPsgConvolved(const double* coordpar, const double* spars, int wedge, int numpoints)

Note the switched positions of the parameters numpoints and wedge. When looking at how these two function are invoked and call qgaus, I come to the conclusion that the header file simply doesn't reflect the correct ordering of the parameters and in the implementation it is in the wrong order for stPbxConvolved.
Effectively this leads to the calculation of the integral in qgaus with a wrong number of interpolation points (wedge instead of numpoints, that means 82 instead of 30 with current WUs).

In the moment, it may not make much of a difference to the numerical results, but it will certainly break if the parameters of the WUs may change in the future.
A side effect is that the integration loops more often in qgaus as requested by the WU, which costs some calculation speed.



wow good catch. let me see if this has been fixed.
Dave Przybylo
MilkyWay@home Developer
Department of Computer Science
Rensselaer Polytechnic Institute
ID: 5271 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 5272 - Posted: 30 Sep 2008, 19:57:38 UTC - in response to Message 5271.  
Last modified: 30 Sep 2008, 20:05:09 UTC

wow good catch. let me see if this has been fixed.

You are welcome.

May I add a follow-on question?

Besides the general inefficiency of the implementation of your algorithm that was highlighted by Crunch3r in no unclear terms and is evident by the huge gains possible with an optimized application, it appears to me the application spends an awful lot of time in converting between different coordinate sytems. This can only be alleviated by a more efficient implementation, not eliminated. Have you thought about a different algorithm, that makes this conversions unnecessary?
In layman terms this means: just stick to a cleverly chosen coordinate system, calculate your stuff and you are done faster than even with Crunch3rs app that still convert values between 3 different coordinate systems (I don't think he changed the algorithm).

May this be somehow possible? I know, this probably means a lot of work to do, but the return would be a much more streamlined version.
ID: 5272 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 5273 - Posted: 30 Sep 2008, 20:07:11 UTC - in response to Message 5271.  

Well this was weird, i took a look over the code and even though the headers were wrong the right values were being passed to the functions :P I've updated the code to at least make it consistent. A lot of the astronomy code is pretty old legacy code (written by astronomy grad students and not computer science students) so i'm willing to bet there'll be more things like this in there :(
ID: 5273 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 5274 - Posted: 30 Sep 2008, 20:37:29 UTC - in response to Message 5273.  
Last modified: 30 Sep 2008, 20:38:02 UTC

Well this was weird, i took a look over the code and even though the headers were wrong the right values were being passed to the functions :P I've updated the code to at least make it consistent.

Well, that's not true for the version I am just looking at.

Let's trace the function calls:
In evalution.c it is called as

stPbxConvolved(integral_point, ap->background_parameters, ap->wedge, ap->convolve); // wedge in penultimate position

Within stPbxConvolved wedge is passed to the numpoints variable (because of the wrong ordering) that reads in probability.c

double stPbxConvolved(const double* coordpar, const double* bpars, int numpoints, int wedge) // wedge and numpoints get switched!

it is passed to qgaus also in the penultimate position as

pbx = qgaus(backgroundConvolve, a, b, numpoints, wedge); // numpoints and wedge still switched

And, you may already guess it, qgaus is declared as

double qgaus(double (*func)(double, int), double a, double b, int n, int wedge) // numpoints and wedge stay switched

This leads to the integration with a number of interpolation points of wedge (normally set to 82) instead of numpoints, respectively ap->convolve (set to 30 in current WUs).

Or does it look different over at you?
ID: 5274 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Dave Przybylo
Avatar

Send message
Joined: 5 Feb 08
Posts: 236
Credit: 49,648
RAC: 0
Message 5275 - Posted: 30 Sep 2008, 22:03:54 UTC - in response to Message 5274.  
Last modified: 30 Sep 2008, 22:07:12 UTC

Well this was weird, i took a look over the code and even though the headers were wrong the right values were being passed to the functions :P I've updated the code to at least make it consistent.

Well, that's not true for the version I am just looking at.

Let's trace the function calls:
In evalution.c it is called as

stPbxConvolved(integral_point, ap->background_parameters, ap->wedge, ap->convolve); // wedge in penultimate position

Within stPbxConvolved wedge is passed to the numpoints variable (because of the wrong ordering) that reads in probability.c

double stPbxConvolved(const double* coordpar, const double* bpars, int numpoints, int wedge) // wedge and numpoints get switched!

it is passed to qgaus also in the penultimate position as

pbx = qgaus(backgroundConvolve, a, b, numpoints, wedge); // numpoints and wedge still switched

And, you may already guess it, qgaus is declared as

double qgaus(double (*func)(double, int), double a, double b, int n, int wedge) // numpoints and wedge stay switched

This leads to the integration with a number of interpolation points of wedge (normally set to 82) instead of numpoints, respectively ap->convolve (set to 30 in current WUs).

Or does it look different over at you?


This looks to be the case. Good eye. Travis, when you check in the final code, I can change the function calls to match if that's what you want. I did follow it through and it does look like the wrong variables get passed to qgaus.
Dave Przybylo
MilkyWay@home Developer
Department of Computer Science
Rensselaer Polytechnic Institute
ID: 5275 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 5277 - Posted: 30 Sep 2008, 22:18:25 UTC - in response to Message 5275.  
Last modified: 30 Sep 2008, 22:18:59 UTC

This looks to be the case. Good eye. Travis, when you check in the final code, I can change the function calls to match if that's what you want. I did follow it through and it does look like the wrong variables get passed to qgaus.

When you are at it, you could strip the unnecessary weight out of qgaus. I guess this will help the performance a lot.
ID: 5277 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Dave Przybylo
Avatar

Send message
Joined: 5 Feb 08
Posts: 236
Credit: 49,648
RAC: 0
Message 5278 - Posted: 30 Sep 2008, 22:35:01 UTC - in response to Message 5277.  

This looks to be the case. Good eye. Travis, when you check in the final code, I can change the function calls to match if that's what you want. I did follow it through and it does look like the wrong variables get passed to qgaus.

When you are at it, you could strip the unnecessary weight out of qgaus. I guess this will help the performance a lot.


I'll take a look at qgaus. Travis and Nate usually handle all the astronomy related coding. So I'd be looking at code I didn't know much about.
Dave Przybylo
MilkyWay@home Developer
Department of Computer Science
Rensselaer Polytechnic Institute
ID: 5278 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Milksop at try

Send message
Joined: 1 Oct 08
Posts: 106
Credit: 24,162,445
RAC: 0
Message 5283 - Posted: 1 Oct 2008, 17:38:47 UTC - in response to Message 5278.  

I'll take a look at qgaus. Travis and Nate usually handle all the astronomy related coding. So I'd be looking at code I didn't know much about.

That's good.
Frankly speaking, considering the code quality, I am surprised there are not more users with an optimized application around. You should hurry with the new version!
ID: 5283 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Webmaster Yoda
Avatar

Send message
Joined: 21 Dec 07
Posts: 69
Credit: 7,048,412
RAC: 0
Message 5334 - Posted: 7 Oct 2008, 4:34:38 UTC - in response to Message 5283.  
Last modified: 7 Oct 2008, 4:35:25 UTC

Frankly speaking, considering the code quality, I am surprised there are not more users with an optimized application around. You should hurry with the new version!


Yeah, I'd like yours - I can't match the output of your Athlon X2 with more than a dozen cores of mine using the standard app... Glad to see you're not in a team, as it would really upset the balance of team statistics.
Join the #1 Aussie Alliance on MilkyWay!
ID: 5334 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Milksop at try

Send message
Joined: 1 Oct 08
Posts: 106
Credit: 24,162,445
RAC: 0
Message 5346 - Posted: 7 Oct 2008, 22:30:20 UTC - in response to Message 5334.  

Yeah, I'd like yours - I can't match the output of your Athlon X2 with more than a dozen cores of mine using the standard app... Glad to see you're not in a team, as it would really upset the balance of team statistics.

As mentioned in our profile, we use Virtual Machines to get enough WUs to crunch. So don't take the entries in the top computer list too serious. To give you some numbers of the total possible output for different CPUs:

Athlon64 X2 3.2GHz: 140-145k/day
65nm Core2Quad 3.2GHz: 330-340k/day
45nm Core2Quad 3.2GHz: >400k/day

And of course I am personally in a team. I only devoted some part of my farm to this Milksop account (together with a partner, the majority of the crunching power is actually his), which is intentionally a teamless account. We thought it would be an unfair advantage our team would get from the situation, so we refrained from joining our team with this account.
I hope this will help to convince the people that we don't just try to grab a lot of credits. We really want to end the incredible waste of energy and crunching power by this project.
ID: 5346 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 5355 - Posted: 8 Oct 2008, 2:14:52 UTC - in response to Message 5346.  

We really want to end the incredible waste of energy and crunching power by this project.


I think many/most people want to use Cruncher's app(or any other) to speed along the wu's.

Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 5355 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Silvers

Send message
Joined: 21 Aug 08
Posts: 625
Credit: 558,425
RAC: 0
Message 5404 - Posted: 9 Oct 2008, 14:46:35 UTC - in response to Message 5355.  

We really want to end the incredible waste of energy and crunching power by this project.


I think many/most people want to use Cruncher's app(or any other) to speed along the wu's.


Well, I don't know what all was done by Crunch3r, and these people don't have a V8 running (at least I don't think they do), but there was even more of a speedup for what he did.

As I understand it, the claim is that all that has been "tuned" are redundant or unnecessary lines of code and/or ways to do the code better, but NOT the use of SSE or any of the SIMD optimizations and no compiler optimizations to favor a specific processor architecture.
ID: 5404 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile speedimic
Avatar

Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 5414 - Posted: 9 Oct 2008, 17:27:10 UTC

Glad to see you're not in a team, as it would really upset the balance of team statistics.


Fine.
But they mess up the country stats.

Go internetional!

mic.


ID: 5414 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Milksop at try

Send message
Joined: 1 Oct 08
Posts: 106
Credit: 24,162,445
RAC: 0
Message 5416 - Posted: 9 Oct 2008, 18:59:08 UTC - in response to Message 5404.  
Last modified: 9 Oct 2008, 19:06:00 UTC

Well, I don't know what all was done by Crunch3r, and these people don't have a V8 running (at least I don't think they do), but there was even more of a speedup for what he did.

As I understand it, the claim is that all that has been "tuned" are redundant or unnecessary lines of code and/or ways to do the code better, but NOT the use of SSE or any of the SIMD optimizations and no compiler optimizations to favor a specific processor architecture.

Yes, we don't have a V8 running. The list of computers in our profile is correct (some of the Quads don't even run 24/7). Our application is faster than Crunch3rs original app, but just two days ago or so, he had another look to the source and is now again more than a factor 2 faster than our app. He only didn't take the effort to install multiple VirtualMachines on his V8, so he runs into a number of limits. Potentially, he could calculate almost 6,000 260cr WUs a day with his V8 (worth 1.5 million). But you would need A LOT of VMs to get that to work. Furthermore he noted with some more effort, it could be even another factor of five to six faster (about 20s for a 260cr WU). And I believe that.

Concerning the optimization towards a specific architecture one has to say, that it is not done in the code. But of course you can tell the compiler for what architecture it should optimize. But the additional speedup going from a P3 target to a Core 2 target (instruction scheduling optimized for Core2 and allowing SSSE3 intructions) only leads to a 15% to 20% improvement. I haven't looked if it actually generates SSE/2/3 instructions (I would think so as it is often faster even when not vectorized), but according to Crunch3r, his app has ~99% of its current speed even without the SSE stuff. So I guess the relatively minor speed increase comes mainly from the optimized instruction scheduling when specifying a Core 2 as target not from SSE/2/3 instructions.
ID: 5416 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Milksop at try

Send message
Joined: 1 Oct 08
Posts: 106
Credit: 24,162,445
RAC: 0
Message 5417 - Posted: 9 Oct 2008, 19:05:04 UTC - in response to Message 5414.  

But they mess up the country stats.

You are not a patriot!

I think in the US that short sentence would do it. *lol*
ID: 5417 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
koschi

Send message
Joined: 10 Aug 08
Posts: 5
Credit: 10,367,691
RAC: 22,144
Message 5418 - Posted: 9 Oct 2008, 19:07:19 UTC
Last modified: 9 Oct 2008, 19:08:20 UTC

Then international guys come and complain, hey, don't f**k up international statistics, switch to the country you come from ;-)

Guys, check the profile of Milksop at try and check out what boxes they are running and how big the improvements by some code clean up actually is!
It is up to 75* faster than the projects standard app...

Since I learned to know how inefficient the current official app is, I removed my C2D and C2Q from the project.
Just an old AthlonXP remains to complete the 100.000 Credits, than it will also be removed if there is no progress from the project leaders side...

edit:

I was way to slow...
ID: 5418 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Milksop at try

Send message
Joined: 1 Oct 08
Posts: 106
Credit: 24,162,445
RAC: 0
Message 5419 - Posted: 9 Oct 2008, 19:12:26 UTC - in response to Message 5418.  

Guys, check the profile of Milksop at try and check out what boxes they are running and how big the improvements by some code clean up actually is!
It is up to 75* faster than the projects standard app...

But as I said, this is not the limit. Crunch3r thinks it is more at a factor of 1000 or so.
ID: 5419 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile speedimic
Avatar

Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 5420 - Posted: 9 Oct 2008, 19:36:24 UTC
Last modified: 9 Oct 2008, 19:36:45 UTC

You are not a patriot!

I think in the US that short sentence would do it. *lol*


Where is the patriotism when push all German users down one rank?

I was talking about This Stat, not that one.

But wait - keep on as German user - we might overtake the US within some days....

:-))
mic.


ID: 5420 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Application Code Discussion : Possible Bug in stPbxConvolved

©2024 Astroinformatics Group