Message boards :
Number crunching :
Windows and linux compiler optimization differences.
Message board moderation
Author | Message |
---|---|
Send message Joined: 22 May 11 Posts: 71 Credit: 5,685,114 RAC: 0 |
Why does milkyway_nbody_1.83_windows_x86_64__mt.exe has only BMI, CMOV, MODE64, SSE1 and SSE2 instruction set extensions, but milkyway_nbody_1.83_x86_64-pc-linux-gnu__mt has AVX, AVX2, AVX512, BMI, CMOV, FMA4, MODE64, NOT64BITMODE, NOVLX, RTM, SSE1, SSE2, SSE3, SSE41 and SSE42 extensions? Modern cpu users running Windows aren't using their cpu to full potential. |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,299,762 RAC: 2,614 |
Modern cpu users running Windows aren't using their cpu to full potential.Apparently the application can't even use the full potential of my Core 2 Duo, which I don't consider modern anymore. But have you checked the current version, i.e. 1.86 with Orbit Fitting? |
Send message Joined: 22 May 11 Posts: 71 Credit: 5,685,114 RAC: 0 |
milkyway_nbody_orbit_fitting_1.86_windows_x86_64__mt.exe shows BMI, CMOV, MODE64, SSE1, SSE2 |
Send message Joined: 1 Jan 17 Posts: 37 Credit: 111,041,580 RAC: 35,436 |
I only have Linux, thus can't compare Linux and Windows performance myself. But I am pretty sure that at least vector math is not used to any extent that would matter. 1.) Zen 4, Zen 2, and Broadwell-EP computers pull rather little electric power when running nbody or nbody with orbit fitting, compared to when they run other projects. 2.) Broadwell-EP does not engage the negative AVX2 clock rate offset when it runs nbody or nbody with orbit fitting. |
Send message Joined: 8 Sep 07 Posts: 7 Credit: 2,357,594 RAC: 104 |
elfx86exts is a handy indicator of what's in the binary, but it could be misinterpreted. Reporting an AVX instruction doesn't mean it's optimized for AVX. It means, there's at least 1 AVX instruction somewhere, but it could be 1 instruction out of millions... the current app has no manual SIMD optimizations, It is utilizing SSE2 (which is a requirement for x86_64) thanks to auto-vectorization done by the compiler. Recompiling it with AVX+ makes no difference additional instructions on Linux are likely from dependent libraries like glibc (for memcpy etc.). The algorithm (where the app spends time) must efficiently utilize SIMD, to make the app faster and it usually requires much more work than just enabling an AVX flag :) |
©2024 Astroinformatics Group