Welcome to MilkyWay@home

Posts by ebahapo

1) Message boards : Number crunching : Intel GPU (Message 60353)
Posted 10 Nov 2013 by ebahapo
Post:
Now that BOINC supports both CPU and GPU Intel OpenCL, when will the Intel OpenCL applications be supported?

TIA
2) Message boards : Number crunching : ARM Devices (Message 57532)
Posted 17 Mar 2013 by ebahapo
Post:
Would it be possible to build this application for a generic ARM Linux platforms like arm-unknown-linux-gnueabi (without hardware support for floating-point, like ARMv5) and arm-unknown-linux-gnueabihf (with support for floating-point, like ARMv6)?

Though ARM is not a high-performance processor by today's standards, it may be as fast as the typical PC of a few years ago and comparable to a current Intel Atom.

Other projects like Enigma, OProject, QCN, Radioactive, Yoyo and WUProp already provide an application for such platforms.

I've helped out other projects getting the applications built and tested, as can be seen here. Please, let me know if I can help.

TIA
3) Message boards : Number crunching : Run only selected applications (Message 51295)
Posted 4 Oct 2011 by ebahapo
Post:
Any feedback about this request, please?
4) Message boards : Number crunching : Run only selected applications (Message 51197)
Posted 23 Sep 2011 by ebahapo
Post:
Could the project please add an option to allow volunteers to choose which applications they prefer to run?

TIA
5) Message boards : Application Code Discussion : Recompiled Linux 32/64 apps (Message 17677)
Posted 5 Apr 2009 by ebahapo
Post:
The linux client is simply inefficient compared to win32.

Different compilers perhaps?
6) Message boards : Application Code Discussion : Recompiled Linux 32/64 apps (Message 17665)
Posted 5 Apr 2009 by ebahapo
Post:
Still a little disappointed with the linux apps...

Whereas the windows op app yields a time of 20minutes on a slower processor.(1.8ghz T2390)

The difference seems a lot to me or is this typical ?

It might be because Linux manages power differently from Windows, running BOINC applications at a slow CPU frequency in order to save energy. See more details here.

HTH
7) Message boards : Application Code Discussion : Recompiled Linux 32/64 apps (Message 12403)
Posted 22 Feb 2009 by ebahapo
Post:
Not sure of the difference in speed between SSE3 and SSSE3 versions...

It should be zilch, since it's unlikely that the compiler will find opportunities in MW code to fit the multimedia-like SSSE3 instructions.

HTH
8) Message boards : Number crunching : v0.18/v0.19 issues here (Message 11027)
Posted 16 Feb 2009 by ebahapo
Post:
But SSE3 is quite a lot faster. ;-)
I don't know how many old 64-bit system are out there though.

Per my off-the-cuff analysis here, only 17% of hosts support SSE3. However, I couldn't break this figure out between 32 and 64-bit hosts.

HTH
9) Message boards : Application Code Discussion : Recompiled Linux 32/64 apps (Message 9302)
Posted 28 Jan 2009 by ebahapo
Post:
cat /proc/cpuinfo flags shows -
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl est tm2 xtpr

I know pni = SSE3 but how is SSE4.1 indicated?

Update your kernel.

HTH
10) Message boards : Application Code Discussion : Recompiled Linux 32/64 apps (Message 9271)
Posted 27 Jan 2009 by ebahapo
Post:
Normally the 64bit (stock) apps are compiled with SSE2 enabled, because a 64bit-capable cpu is also capable of at least SSE2.

And, if the compiler is capable of auto-vectorization, it should always be enabled for x86-64. For GCC, the option is -ftree-vectorize, which is implied by -O3 on versions 4.3 and later. Unfortunately, MS VS does not support auto-vectorization. For Windows the Intel compiler could be used instead.

HTH
11) Message boards : Application Code Discussion : Recompiled Linux 32/64 apps (Message 9265)
Posted 27 Jan 2009 by ebahapo
Post:
Here are some results from this host:

  • 0.16: 82min
  • 0.16: 84min
  • 0.16: 82min
  • 0.16: 81min
  • 0.16: 84min
  • 0.16: 81min
  • 0.15: 83min


I didn't record the times for 0.14, but IIRC they were roughly in the same ballpark, i.e., no noticeable improvement between versions.

Note that this host is a production server which idles most of the time, therefore Linux power-management runs it at the slowest speed of 1GHz most of the time.

12) Message boards : Application Code Discussion : Recompiled Linux 32/64 apps (Message 9219)
Posted 26 Jan 2009 by ebahapo
Post:
Since x86-64 guarantees that at least SSE2 is available, did you make sure to enable vectorization through GCC's -ftree-vectorize option (implied by -O3 in versions 4.3 and later)? For that matter, any SSE build could benefit from vectorization.

HTH

13) Message boards : Number crunching : app v12 (Message 9124)
Posted 25 Jan 2009 by ebahapo
Post:
yes you are right with one small exception, scientific applications with supported code and structure for SSE3 and SSSE3 can get really very much and good performance...

Note that I said that only SSSE3 is useless for scientific applications. MMX-like instructions are not well suited for scientific applications nor for compiler-generated code, requiring hand-assembly to be taken advantage of.

HTH
14) Message boards : Number crunching : app v12 (Message 9112)
Posted 25 Jan 2009 by ebahapo
Post:
I explored the differences among the several SSE flavors starting here.

SSSE3 is probably useless for scientific applications.

HTH
15) Message boards : Application Code Discussion : source v0.14 released (Message 8977)
Posted 24 Jan 2009 by ebahapo
Post:
As it is divided by a constant that is known at compile time (the 3.0 is hardcoded), any decent compiler will exchange it with a multipication by 1/3 (calculated at compile time) either way. This kind of changes are only necessary if one uses an ancient compiler or turns off optimizations.

Actually, for floating-point data, only with -ffast-math would this optimization be automatically performed by the compiler. And, since this option cannot be used for this project, tipping the scale for the compiler is a good rule-of-thumb.

Moreover, since the compiler does not change the order of floating-point computations, this code should have an edge too:

irv [i] = ((next_r * next_r * next_r * ia->mu_step_size) - (r * r * r * ia->mu_step_size)) / (3.0 * deg);

Reducing the dependency sequence on out-of-order processors reduces the latency of long operations like these.

HTH
16) Message boards : Application Code Discussion : Recompiled Linux 32/64 apps (Message 8976)
Posted 24 Jan 2009 by ebahapo
Post:
SSE3 might be doing some other optimizations (or have some changes in optimizations) which are better than what was in SSE2, because it's newer.

Yes, but the Intel compiler checks if the code is running on an Intel CPU and, if it's not, it runs an alternative SSE2 code instead. It'll run SSE3 or later only on Intel processors. As these results are on an AMD CPU, it's not benefiting from the SSE3 optimizations.

HTH

17) Message boards : Application Code Discussion : source v0.14 released (Message 8973)
Posted 24 Jan 2009 by ebahapo
Post:
ir[i] = ((next_r * next_r * next_r) - (r * r * r))/3.0;
to
line 401: irv[i] = (((next_r * next_r * next_r) - (r * r * r))/3.0) * ia->mu_step_size / deg;

You could remove yet another division by changing line 401 to:

irv [i]  = ((next_r * next_r * next_r) - (r * r * r)) * ia->mu_step_size / (3.0 * deg);

Since a division is typically 10x slower than a multiplication, it could improve the performance of this line alone by about 40%.

HTH
18) Message boards : Application Code Discussion : Recompiled Linux 32/64 apps (Message 8923)
Posted 24 Jan 2009 by ebahapo
Post:

2. But I've made more tests:

averaged boost in calculation times for 126 runs of milkyway app on idle machine
SSE3 app: 121.86%
SSE2 app: 119.03%
base app: 100.00%

I don't think this is 'noise' only...
I'm confused now...
Maybe this is milkyway specific...

Hard to explain why. Maybe even though the processor doesn't get to run SSE3 code, the code is different, though SSE2, and the outcome is better, perhaps because of something as mundane as some branches getting aligned favorably. Regardless, I agree that it's more than noise.

Thanks.
19) Message boards : Application Code Discussion : Recompiled Linux 32/64 apps (Message 8903)
Posted 23 Jan 2009 by ebahapo
Post:
1. This article is very old: 11:58 AM on July 13, 2005
2. I've got a lot better performance with SSE2 (20% boost) than without it, and slightly better performance with SSE3 than SSE2 (another 1% boost) and I'm talking about AMD chip and milkyway app of course.

1 - Yet, it's still true. It's been known in the open source community and Intel's response was that they cannot guarantee their compiler except on their processors, fair enough. Is this new enough for you?

2 - 1% is too close to noise to call a boost.
20) Message boards : Application Code Discussion : Recompiled Linux 32/64 apps (Message 8850)
Posted 22 Jan 2009 by ebahapo
Post:
The SSE3 code runs very well on AMD LE-1600, like C2D with the same clock.
I use icpc -xO for this.

At run-time the processor is probed and if it's by AMD, then degraded code is run instead of the SSE3 code.

See http://techreport.com/discussions.x/8547 for a snippet.

HTH


Next 20

©2024 Astroinformatics Group