Welcome to MilkyWay@home

Request for Windows ARM64 support - Snapdragon X2 Elite Extreme (Oryon)

Message boards : Application Code Discussion : Request for Windows ARM64 support - Snapdragon X2 Elite Extreme (Oryon)
Message board moderation

To post messages, you must log in.

AuthorMessage
kasdashdfjsah

Send message
Joined: 3 Feb 24
Posts: 16
Credit: 232,541
RAC: 0
Message 77926 - Posted: 22 Apr 2026, 13:09:19 UTC

Just got a new X2 Elite Extreme Snapdragon laptop. The CPU single and multi core speed is insane. Performance could be much better with native ARM64 support.

x86 emulation hits these chips harder than Apple Silicon on MacOS. Please consider adding native Windows ARM64 support. I would be happy to test this out.

I can also do the work myself if I get access to relevant project files. My 18-core setup is ready for testing. Data shows native apps run much faster than emulated ones on Oryon.
ID: 77926 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kasdashdfjsah

Send message
Joined: 3 Feb 24
Posts: 16
Credit: 232,541
RAC: 0
Message 77927 - Posted: 24 Apr 2026, 8:51:26 UTC

Update:

Subject: Windows ARM64 support - Benchmarking the Snapdragon X2 Elite (Oryon)

I see that Nbody v1.95 was recently released. I am currently running native Windows ARM64 builds on Asteroids@home and Einstein@home with great results on the new Snapdragon X2 Elite Extreme.

My 18-core setup is delivering high-end desktop throughput at only 25-30W CPU power. I would love to bring this efficiency to MilkyWay@home.

Since the Adreno GPU lacks FP64 support, I am looking for a native Windows ARM64 CPU build (MSVC/CMake). Given the current source supports OpenMP and Double Precision, a native Oryon build should be extremely efficient for N-body simulations.

Is there a windows_arm64 plan class in the works, or could a test binary be provided? I am ready to provide benchmarks and stability data immediately.
ID: 77927 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team

Send message
Joined: 8 Sep 07
Posts: 10
Credit: 2,566,112
RAC: 658
Message 77936 - Posted: 26 Apr 2026, 15:55:29 UTC

The current CPU application can be built on ARM, but it produces incorrect results. I tried to fix it, but I wasn’t successful:
https://github.com/Milkyway-at-home/milkywayathome_client/pull/224/changes
without a proper fix to make it match the x64 results, the ARM version won’t be useful.

and the admins don’t seem to have much interest in those platforms either, so it is unlikely there will be an ARM version for Apple, Android, or Windows on ARM.

Windows on ARM x64 emulation may still work, though.

> should be extremely efficient for N-body simulations
Don’t make those assumptions if you haven’t tested it. Just because your brand-new laptop’s CPU only consumes 30W doesn’t necessarily mean it’s efficient. If you compare the performance-per-watt ratio with recent x64 CPUs, you’ll see it’s not that impressive.
ID: 77936 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gimmyk
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 11 Sep 24
Posts: 33
Credit: 710,354
RAC: 477
Message 77937 - Posted: 27 Apr 2026, 17:25:42 UTC

I would love to be able to release a version on ARM. I've looked into it a little bit, but I don't think its something that's feasible. The simulations we run are very chaotic, so even a small difference in a single calculation will propagate and produce a very different result. Chances are that the only way we could get results within the needed precision to validate against each other on the server is by forcing identical results between the two architectures for every calculation. I'm not even sure if doing that is possible, and if it is it would take more work than we are able to dedicate to that at this time.

As mentioned below the application can be build on ARM, but it can't be used for much but a low accuracy approximation. You're free to look at it and try to fix it if you want, but I wouldn't recommend it.
ID: 77937 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ahorek's team

Send message
Joined: 8 Sep 07
Posts: 10
Credit: 2,566,112
RAC: 658
Message 77938 - Posted: 30 Apr 2026, 16:45:36 UTC

> it can't be used for much but a low accuracy approximation

ARM chips do support double-precision arithmetic, so they can absolutely be used to compute results with sufficient accuracy. However, your current code depends on undefined behaviour as explained here
https://github.com/Milkyway-at-home/milkywayathome_client/pull/224#issuecomment-4354305606

this is a bug, not a precision issue or a hardware limitation.

Unfortunately, my fix alone isn’t enough. Either there are other places with the same bug, or another problem. Comparing results at each step between 2 architectures is pretty time-consuming...
ID: 77938 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gimmyk
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 11 Sep 24
Posts: 33
Credit: 710,354
RAC: 477
Message 77940 - Posted: 1 May 2026, 16:45:25 UTC

Just to clarify, when I talk about the accuracy of the result I am referring to our final simulation result. While individual calculations may be extremely close, any errors we have will grow exponentially in this kind of simulation. I worry that the only way to ensure results close enough to validate would be to have exact bitwise identity, which would be difficult to do and likely cost a lot of performance. I'm certainly no expert on the topic, but I think we would need to change our math in many places to ensure this kind of strict reproducibility. From my understanding nbody applications typically do not ensure this level of consistency between architectures.

It may be possible to keep many things the same and fix bugs like what you have pointed out to get statistically similar results that we can still consider "good", but I don't know if these will consistently be close enough that we could have them validate against each other.
ID: 77940 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 18 Nov 22
Posts: 97
Credit: 653,649,280
RAC: 14,807
Message 77941 - Posted: 1 May 2026, 20:19:17 UTC - in response to Message 77940.  
Last modified: 1 May 2026, 20:21:34 UTC

Just to clarify, when I talk about the accuracy of the result I am referring to our final simulation result. While individual calculations may be extremely close, any errors we have will grow exponentially in this kind of simulation. I worry that the only way to ensure results close enough to validate would be to have exact bitwise identity, which would be difficult to do and likely cost a lot of performance. I'm certainly no expert on the topic, but I think we would need to change our math in many places to ensure this kind of strict reproducibility. From my understanding nbody applications typically do not ensure this level of consistency between architectures.

It may be possible to keep many things the same and fix bugs like what you have pointed out to get statistically similar results that we can still consider "good", but I don't know if these will consistently be close enough that we could have them validate against each other.


i have (with much AI help) fixed the issue with accuracy on aarch64 in Linux. a single small change to the nbody_histogram.c file will correct the issue for aarch64 Linux builds (excluding other small changes needed to target aarch64)

my short test WU I ran with this is accurate to 4 decimals on all 8 output parameters. not sure how strict the validator is, but I am testing this now via anonymous platform.

edit, I tried to post the code fix here but the forum freaks out, maybe you have something on the forums the prevents posting code bits?

ID: 77941 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 18 Nov 22
Posts: 97
Credit: 653,649,280
RAC: 14,807
Message 77942 - Posted: 1 May 2026, 20:24:36 UTC - in response to Message 77941.  

diff --git a/nbody/src/nbody_histogram.c b/nbody/src/nbody_histogram.c
index 47fb6715..1731984d 100644
--- a/nbody/src/nbody_histogram.c
+++ b/nbody/src/nbody_histogram.c
@@ -915,12 +915,21 @@ MainStruct* nbCreateHistogram(const NBodyCtx* ctx,        /* Simulation context
             mu_ras[ub_counter] = DEFAULT_NOT_USE;
             mu_decs[ub_counter] = DEFAULT_NOT_USE;
 
-            /* Find the indices */
-            lambdaIndex = (unsigned int) mw_floor((lambda - lambdaStart) / lambdaSize);
-            betaIndex = (unsigned int) mw_floor((beta - betaStart) / betaSize);
+            /* Find the indices. Casting a negative double to unsigned int is
+             * undefined behavior, and x86_64 vs aarch64 implement it
+             * differently: x86_64 wraps to a huge unsigned (correctly fails the
+             * < lambdaBins check), aarch64 saturates to 0 (incorrectly bins
+             * out-of-range particles into bin 0). Bound-check on the float
+             * first to keep behavior identical across architectures. */
+            real lambdaIdxF = mw_floor((lambda - lambdaStart) / lambdaSize);
+            real betaIdxF   = mw_floor((beta  - betaStart)  / betaSize);
+            mwbool inRange  = (lambdaIdxF >= 0.0 && lambdaIdxF < (real) lambdaBins
+                            && betaIdxF   >= 0.0 && betaIdxF   < (real) betaBins);
+            lambdaIndex = inRange ? (unsigned int) lambdaIdxF : lambdaBins;
+            betaIndex   = inRange ? (unsigned int) betaIdxF   : betaBins;


ID: 77942 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 18 Nov 22
Posts: 97
Credit: 653,649,280
RAC: 14,807
Message 77943 - Posted: 1 May 2026, 20:36:34 UTC
Last modified: 1 May 2026, 20:48:36 UTC

this is the first workunit with the newest code (ignore earlier runs with more error)

https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1020117489 (mine is the anonymous platform result)

very very close, but it looks like the validator still doesn't accept it since it wasn't judged valid.

is this really an invalid result? or can the validator strictness be loosened to allow this?

  Metric                       aarch64                  x86_64                   Delta (x86_64 - aarch64)         
  ---------------------------  -----------------------  -----------------------  ------------------------         
  search_likelihood            -751.206790741635700     -751.232077413926845       -0.025286672291145             
  search_likelihood_EMD         -16.071059557630502      -16.099033208117227       -0.027973650486725             
  search_likelihood_Mass        -48.396293126839204      -48.396293126839204        0.000000000000000             
  search_likelihood_Beta       -112.139917748761746     -111.501316752592331       +0.638600996169415             
  search_likelihood_BetaAvg    -142.601938056101801     -143.279383333977933       -0.677445277876132             
  search_likelihood_VelAvg     -101.435569456358564     -101.418970908206191       +0.016598548152373             
  search_likelihood_Dist       -112.500000000000000     -112.500000000000000        0.000000000000000             
  search_likelihood_Momentum   -218.062012795943843     -218.037080084194088       +0.024932711749755


ID: 77943 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gimmyk
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 11 Sep 24
Posts: 33
Credit: 710,354
RAC: 477
Message 77944 - Posted: 1 May 2026, 21:05:22 UTC

This looks promising. I guess none of us never caught that behavior with the casting!

I'll have to look into testing this with some other cases and see how well it does; particularly around the important regions of the likelihood surface. If it is consistently this good we may be able to increase the range on the validator, but thats something I'd need to bring up and get permission for.
ID: 77944 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 18 Nov 22
Posts: 97
Credit: 653,649,280
RAC: 14,807
Message 77945 - Posted: 1 May 2026, 21:11:42 UTC - in response to Message 77944.  

thanks.

i just realized that my build for that sample WU didnt include some fp math strictness flags that might be necessary on aarch64 also. added those in and will keep testing to see if they come out closer.

ID: 77945 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 18 Nov 22
Posts: 97
Credit: 653,649,280
RAC: 14,807
Message 77947 - Posted: 2 May 2026, 1:56:49 UTC - in response to Message 77945.  

https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1019638235

added three build arguments
-ffp-contract=off
-fno-associative-math
-fno-finite-math-only

and now my results seems to be bitwise identical to x86_64 results. and they are validating.

just -ffp-contract=off alone might be enough to get by the validator limits, i'll try that tomorrow.

ID: 77947 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 18 Nov 22
Posts: 97
Credit: 653,649,280
RAC: 14,807
Message 77948 - Posted: 2 May 2026, 18:14:15 UTC - in response to Message 77947.  

looks like just -ffp-contract=off is enough.

ID: 77948 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 18 Nov 22
Posts: 97
Credit: 653,649,280
RAC: 14,807
Message 77952 - Posted: 3 May 2026, 14:21:48 UTC - in response to Message 77948.  
Last modified: 3 May 2026, 14:27:32 UTC

I have 6 different aarch64 SBCs all running my app now. all producing valid results against the stock v1.95 x86_64 app.

you should be good to push out an official Linux aarch64 I think. if you have an appropriate device to build it on.

my hosts running this app:
Nvidia Jetson Orin Nano: https://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=1066634
Nvidia Jetson Orin NX: https://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=1066909
Nvidia Jetson Orin NX: https://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=1066906
Radxa Rock 5C: https://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=1066913
Radxa Rock 5C: https://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=1066911
Raspberry Pi 5: https://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=1066934

and maybe ahorek can build a Win_arm version as well for kasdashdfjsah to test. (on topic for this thread)

ID: 77952 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gimmyk
Volunteer moderator
Project administrator
Project developer
Project scientist
Avatar

Send message
Joined: 11 Sep 24
Posts: 33
Credit: 710,354
RAC: 477
Message 77953 - Posted: 6 May 2026, 21:28:16 UTC

This seems to be working for me as well. I'll begin testing out the linux version, but I don't know if we'll be able to build for windows arm.
ID: 77953 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 18 Nov 22
Posts: 97
Credit: 653,649,280
RAC: 14,807
Message 77956 - Posted: 7 May 2026, 13:39:59 UTC - in response to Message 77953.  

thanks!

ID: 77956 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Application Code Discussion : Request for Windows ARM64 support - Snapdragon X2 Elite Extreme (Oryon)

©2026 Astroinformatics Group