Welcome to MilkyWay@home

source v0.14 released


Advanced search

Message boards : Application Code Discussion : source v0.14 released
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge14 year member badge
Message 8938 - Posted: 24 Jan 2009, 19:22:01 UTC
Last modified: 24 Jan 2009, 19:37:43 UTC

I've released the source for v0.14 in the code release directory.

This should fix the checkpoint errors as discussed in this thread.

Additionally, I was able to remove a multiply and divide from the inner loop of calculate_integrals, which got about a 4% peformance increase from the application here:

ir[i] = ((next_r * next_r * next_r) - (r * r * r))/3.0;
to
line 401: irv[i] = (((next_r * next_r * next_r) - (r * r * r))/3.0) * ia->mu_step_size / deg;

and
V = ir[ia->r_step_current] * ia->mu_step_size / deg;
to
line 477: V = irv[ia->r_step_current] * id;

I also added two new #define's, NEW_FORMULA and WEDGE_ALLOW_ZERO which removed some of the conditionals from the inner loop, and split calculate_integral into calculate_integral_convolved and calculate_integral_unconvolved (the calculate_integral_unconvolved is pretty much deprecated), I made a similar change in calculate_likelihood as well. This removed warnings evaluation_optimized.c was throwing, and had a slight performance improvement as well.

Again, when compiling the new version please use the:

APP_VERSION = 0.14
APP_NAME = your_app_name

-DBOINC_APP_VERSION=$(APP_VERSION)
and
-DBOINC_APP_NAME='"$(APP_NAME)"'

flags to make sure your binary gets credit. I've also updated the_parameters.sh script so to test a WU you just need to run:
./set_parameters

The results should be the same as in the last post:

79:
searchname
parameters [8]: 0.342173733203920 25.951791084662300 -2.170941473882660 38.272511356953906 30.225190442596112 2.214906001337289 0.323161690642917 2.774024471628528
metadata: this is the metadata
fitness: -2.946683357256020
your_app_name: 0.14

82:
searchname
parameters [8]: 0.405879611547422 17.529961843393409 -1.857514527214484 29.360893891378243 31.228263575178566 -1.551741065334000 0.064096152599308 2.554282099127810
metadata: this is the metadata
fitness: -2.985569777902147
your_app_name: 0.14

86:
searchname
parameters [8]: 0.733171635575244 14.657212876628332 -1.705465347395041 16.911711745343634 28.077212666463502 -1.203290851581461 3.527360643924728 2.224821450587501
metadata: this is the metadata
fitness: -3.027909854710189
your_app_name: 0.14
ID: 8938 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ebahapo
Avatar

Send message
Joined: 6 Sep 07
Posts: 66
Credit: 636,861
RAC: 0
500 thousand credit badge14 year member badge
Message 8973 - Posted: 24 Jan 2009, 22:44:37 UTC - in response to Message 8938.  

ir[i] = ((next_r * next_r * next_r) - (r * r * r))/3.0;
to
line 401: irv[i] = (((next_r * next_r * next_r) - (r * r * r))/3.0) * ia->mu_step_size / deg;

You could remove yet another division by changing line 401 to:

irv [i]  = ((next_r * next_r * next_r) - (r * r * r)) * ia->mu_step_size / (3.0 * deg);

Since a division is typically 10x slower than a multiplication, it could improve the performance of this line alone by about 40%.

HTH
ID: 8973 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
50 million credit badge13 year member badgeextraordinary contributions badge
Message 8975 - Posted: 24 Jan 2009, 22:58:57 UTC - in response to Message 8973.  

You could remove yet another division by changing line 401 to:

irv [i]  = ((next_r * next_r * next_r) - (r * r * r)) * ia->mu_step_size / (3.0 * deg);

Since a division is typically 10x slower than a multiplication, it could improve the performance of this line alone by about 40%.

As it is divided by a constant that is known at compile time (the 3.0 is hardcoded), any decent compiler will exchange it with a multipication by 1/3 (calculated at compile time) either way. This kind of changes are only necessary if one uses an ancient compiler or turns off optimizations. With -O2 or even -O3 I bet it won't make a difference.
And even if it would not be a literal, with the fastmath option the compiler should do this kind of optimizations even for variables. Compilers got quite clever in such things.

But from my old days, when compilers didn't optimize that well, I also prefer doing such things by hand. One never knows ;)
ID: 8975 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ebahapo
Avatar

Send message
Joined: 6 Sep 07
Posts: 66
Credit: 636,861
RAC: 0
500 thousand credit badge14 year member badge
Message 8977 - Posted: 24 Jan 2009, 23:23:43 UTC - in response to Message 8975.  

As it is divided by a constant that is known at compile time (the 3.0 is hardcoded), any decent compiler will exchange it with a multipication by 1/3 (calculated at compile time) either way. This kind of changes are only necessary if one uses an ancient compiler or turns off optimizations.

Actually, for floating-point data, only with -ffast-math would this optimization be automatically performed by the compiler. And, since this option cannot be used for this project, tipping the scale for the compiler is a good rule-of-thumb.

Moreover, since the compiler does not change the order of floating-point computations, this code should have an edge too:

irv [i] = ((next_r * next_r * next_r * ia->mu_step_size) - (r * r * r * ia->mu_step_size)) / (3.0 * deg);

Reducing the dependency sequence on out-of-order processors reduces the latency of long operations like these.

HTH

ID: 8977 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
50 million credit badge13 year member badge
Message 8983 - Posted: 25 Jan 2009, 0:22:14 UTC
Last modified: 25 Jan 2009, 0:32:23 UTC

Is this release one that will Download on a detach and reattach, or will 0.13 for Windos be pulled down again?

No need to answer as I see I have downloaded 0.14 WUs. So, I assume the latest Windos MW client has been downloaded by BOINC already.
ID: 8983 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge14 year member badge
Message 9015 - Posted: 25 Jan 2009, 2:08:22 UTC - in response to Message 8977.  
Last modified: 25 Jan 2009, 2:08:57 UTC

As it is divided by a constant that is known at compile time (the 3.0 is hardcoded), any decent compiler will exchange it with a multipication by 1/3 (calculated at compile time) either way. This kind of changes are only necessary if one uses an ancient compiler or turns off optimizations.

Actually, for floating-point data, only with -ffast-math would this optimization be automatically performed by the compiler. And, since this option cannot be used for this project, tipping the scale for the compiler is a good rule-of-thumb.

Moreover, since the compiler does not change the order of floating-point computations, this code should have an edge too:

irv [i] = ((next_r * next_r * next_r * ia->mu_step_size) - (r * r * r * ia->mu_step_size)) / (3.0 * deg);

Reducing the dependency sequence on out-of-order processors reduces the latency of long operations like these.

HTH


The linux and osx binaries are compiled with -ffast-math. I'm pretty sure Dave is compiling windows with it as well.

also, the irv[i] = ... is being calculated before the 3 main integral loops, so it's only calculated 'ia->r_steps' times, not once every interior loop which is done ia->r_steps * ia->mu-steps * ia->nu_steps times. So this would probably not be noticable.
ID: 9015 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilespeedimic
Avatar

Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
50 million credit badge13 year member badge
Message 9018 - Posted: 25 Jan 2009, 2:16:21 UTC - in response to Message 9015.  

The linux and osx binaries are compiled with -ffast-math.

[...]


hmm, no signs of that in the makefiles...

what about your post from november, where you told us not to use it?
mic.


ID: 9018 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileLogan
Avatar

Send message
Joined: 15 Aug 08
Posts: 163
Credit: 3,876,869
RAC: 0
3 million credit badge13 year member badge
Message 9019 - Posted: 25 Jan 2009, 2:19:58 UTC - in response to Message 9018.  
Last modified: 25 Jan 2009, 2:20:38 UTC

The linux and osx binaries are compiled with -ffast-math.

[...]


hmm, no signs of that in the makefiles...

what about your post from november, where you told us not to use it?


:D :D :D :D :D.....
Logan.

BOINC FAQ Service (Ahora, también disponible en Español/Now available in Spanish)
ID: 9019 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge14 year member badge
Message 9020 - Posted: 25 Jan 2009, 2:21:10 UTC - in response to Message 9019.  

The linux and osx binaries are compiled with -ffast-math.

[...]


hmm, no signs of that in the makefiles...

what about your post from november, where you told us not to use it?


:) :) :) :) :).....


lol my bad, i think i was thinking of -funroll-loops.

Either way that line is only executed ~700 times so I don't think optimizing it will have much effect ;P
ID: 9020 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Application Code Discussion : source v0.14 released

©2021 Astroinformatics Group