Message boards :
Application Code Discussion :
source v0.14 released
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I've released the source for v0.14 in the code release directory. This should fix the checkpoint errors as discussed in this thread. Additionally, I was able to remove a multiply and divide from the inner loop of calculate_integrals, which got about a 4% peformance increase from the application here: ir[i] = ((next_r * next_r * next_r) - (r * r * r))/3.0; to line 401: irv[i] = (((next_r * next_r * next_r) - (r * r * r))/3.0) * ia->mu_step_size / deg; and V = ir[ia->r_step_current] * ia->mu_step_size / deg; to line 477: V = irv[ia->r_step_current] * id; I also added two new #define's, NEW_FORMULA and WEDGE_ALLOW_ZERO which removed some of the conditionals from the inner loop, and split calculate_integral into calculate_integral_convolved and calculate_integral_unconvolved (the calculate_integral_unconvolved is pretty much deprecated), I made a similar change in calculate_likelihood as well. This removed warnings evaluation_optimized.c was throwing, and had a slight performance improvement as well. Again, when compiling the new version please use the: APP_VERSION = 0.14 APP_NAME = your_app_name -DBOINC_APP_VERSION=$(APP_VERSION) and -DBOINC_APP_NAME='"$(APP_NAME)"' flags to make sure your binary gets credit. I've also updated the_parameters.sh script so to test a WU you just need to run: ./set_parameters The results should be the same as in the last post: 79: searchname parameters [8]: 0.342173733203920 25.951791084662300 -2.170941473882660 38.272511356953906 30.225190442596112 2.214906001337289 0.323161690642917 2.774024471628528 metadata: this is the metadata fitness: -2.946683357256020 your_app_name: 0.14 82: searchname parameters [8]: 0.405879611547422 17.529961843393409 -1.857514527214484 29.360893891378243 31.228263575178566 -1.551741065334000 0.064096152599308 2.554282099127810 metadata: this is the metadata fitness: -2.985569777902147 your_app_name: 0.14 86: searchname parameters [8]: 0.733171635575244 14.657212876628332 -1.705465347395041 16.911711745343634 28.077212666463502 -1.203290851581461 3.527360643924728 2.224821450587501 metadata: this is the metadata fitness: -3.027909854710189 your_app_name: 0.14 |
Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0 |
ir[i] = ((next_r * next_r * next_r) - (r * r * r))/3.0; You could remove yet another division by changing line 401 to: irv [i] = ((next_r * next_r * next_r) - (r * r * r)) * ia->mu_step_size / (3.0 * deg); Since a division is typically 10x slower than a multiplication, it could improve the performance of this line alone by about 40%. HTH |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
You could remove yet another division by changing line 401 to: As it is divided by a constant that is known at compile time (the 3.0 is hardcoded), any decent compiler will exchange it with a multipication by 1/3 (calculated at compile time) either way. This kind of changes are only necessary if one uses an ancient compiler or turns off optimizations. With -O2 or even -O3 I bet it won't make a difference. And even if it would not be a literal, with the fastmath option the compiler should do this kind of optimizations even for variables. Compilers got quite clever in such things. But from my old days, when compilers didn't optimize that well, I also prefer doing such things by hand. One never knows ;) |
Send message Joined: 6 Sep 07 Posts: 66 Credit: 636,861 RAC: 0 |
As it is divided by a constant that is known at compile time (the 3.0 is hardcoded), any decent compiler will exchange it with a multipication by 1/3 (calculated at compile time) either way. This kind of changes are only necessary if one uses an ancient compiler or turns off optimizations. Actually, for floating-point data, only with -ffast-math would this optimization be automatically performed by the compiler. And, since this option cannot be used for this project, tipping the scale for the compiler is a good rule-of-thumb. Moreover, since the compiler does not change the order of floating-point computations, this code should have an edge too: irv [i] = ((next_r * next_r * next_r * ia->mu_step_size) - (r * r * r * ia->mu_step_size)) / (3.0 * deg); Reducing the dependency sequence on out-of-order processors reduces the latency of long operations like these. HTH |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 |
Is this release one that will Download on a detach and reattach, or will 0.13 for Windos be pulled down again? No need to answer as I see I have downloaded 0.14 WUs. So, I assume the latest Windos MW client has been downloaded by BOINC already. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
As it is divided by a constant that is known at compile time (the 3.0 is hardcoded), any decent compiler will exchange it with a multipication by 1/3 (calculated at compile time) either way. This kind of changes are only necessary if one uses an ancient compiler or turns off optimizations. The linux and osx binaries are compiled with -ffast-math. I'm pretty sure Dave is compiling windows with it as well. also, the irv[i] = ... is being calculated before the 3 main integral loops, so it's only calculated 'ia->r_steps' times, not once every interior loop which is done ia->r_steps * ia->mu-steps * ia->nu_steps times. So this would probably not be noticable. |
Send message Joined: 22 Feb 08 Posts: 260 Credit: 57,387,048 RAC: 0 |
The linux and osx binaries are compiled with -ffast-math. hmm, no signs of that in the makefiles... what about your post from november, where you told us not to use it? mic. |
Send message Joined: 15 Aug 08 Posts: 163 Credit: 3,876,869 RAC: 0 |
The linux and osx binaries are compiled with -ffast-math. :D :D :D :D :D..... Logan. BOINC FAQ Service (Ahora, también disponible en Español/Now available in Spanish) |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
The linux and osx binaries are compiled with -ffast-math. lol my bad, i think i was thinking of -funroll-loops. Either way that line is only executed ~700 times so I don't think optimizing it will have much effect ;P |
©2024 Astroinformatics Group