Welcome to MilkyWay@home

Posts by ahorek's team

1) Message boards : Number crunching : One WU using only 16 threads. (Message 75755)
Posted 20 Jun 2023 by ahorek's team
Post:
just use app_config.xml

<app_config>
<app>
<name>milkyway_nbody</name>
<max_concurrent>1</max_concurrent>
</app>
<app_version>
<app_name>milkyway_nbody</app_name>
<plan_class>mt</plan_class>
<avg_ncpus>32</avg_ncpus>
<cmdline>-n 32</cmdline>
</app_version>
</app_config>

2) Message boards : Number crunching : One WU using only 16 threads. (Message 75747)
Posted 20 Jun 2023 by ahorek's team
Post:
the app does support > 16 threads, but it doesn't scale well. There is only a 10% improvement 16 vs 32 threads on Ryzen 5950x, so running multiple wu's like 2x16 threads is more efficient.

it would be nice to setup "Max # of CPUs for this project" in the administration like other projects (amicable, primegrid...) have.
app_config.xml is another alternative...
3) Message boards : News : Separation Project Coming To An End (Message 75567)
Posted 15 Jun 2023 by ahorek's team
Post:
if the main parts are changing all the time, it's hard to write an efficient GPU implementation without enough manpower. It's not like optimizing a well-known algorithm like FFT...

what about a hybrid approach? accelerate only hot parts that don't change too often? it could ease the maintenance, but on the other hand, the overhead may be too large, so it won't be efficient.

thank you for your feedback and efforts. Much appreciated!
4) Message boards : News : Separation Project Coming To An End (Message 75549)
Posted 14 Jun 2023 by ahorek's team
Post:
it loads up the GPU to 100%, but I can't be sure that it's actually doing anything useful or it's just spinning it's wheels


yeah, I've retested it on my linux + nvidia (turing) PC, but it seems the current nbody opencl version doesn't work at all. Maybe developers here have some insights if it ever worked? Anyway, if the current code is buggy, it would take much more effort to fix it and optimize it. It would be great to have a GPU version, but I don't think I'll be able to fix it...
5) Message boards : News : Separation Project Coming To An End (Message 75539)
Posted 14 Jun 2023 by ahorek's team
Post:
> how did you compile the binary?
I'm testing the standalone version because dealing with Boinc dependencies is painful... it's easier to build it on Linux:
1/ install prerequisites (depends on your system, not sure if this is the full list)
apt-get install -y git build-essential cmake ocl-icd-opencl-dev ninja-build
2/ git clone
git clone https://github.com/Milkyway-at-home/milkywayathome_client.git
cd milkywayathome_client
3/ enable OPENCL build
sed -i 's/DNBODY_OPENCL=OFF/DNBODY_OPENCL=ON/g' make_nbody_lite.sh
4/ compile it
sh make_nbody_lite.sh
cd ../../build
5/ the binary should be here
./milkyway_nbody


also, I had to change to code to use the right device (CPU OpenCL platform is preferred for some reason) and [-p|--platform=INT][-d|--device=INT] options doesn't seem to work as expected.[/code]
6) Message boards : News : Separation Project Coming To An End (Message 75535)
Posted 14 Jun 2023 by ahorek's team
Post:
the code is available here
https://github.com/Milkyway-at-home/milkywayathome_client/blob/master/nbody/kernels/nbody_kernels.cl

I wanted to test the CPU vs GPU difference by myself

enable OpenCL
-DNBODY_OPENCL=ON
+ specify libraries
-DOPENCL_LIBRARIES=C:/mingw/msys64/mingw64/bin/OpenCL.dll -DOPENCL_INCLUDE_DIRS=C:/mingw/msys64/mingw64/include/CL

the GPU load rises, but after a few seconds, it crashes on a driver timeout.

It could be a problem with my environment and I'm on Windows, so...
milkyway_nbody -f settings.lua -o output_0gy.out -h correct_hist.hist -z hist_test.hist -n 32 -b  -i 3.0 1.0 0.2 0.2 12 0.2 -p 0 -d 0
Using OpenMP 32 max threads on a system with 32 processors
Found 1 platform
Platform 0 information:
  Name:       AMD Accelerated Parallel Processing
  Version:    OpenCL 2.1 AMD-APP (3516.0)
  Vendor:     Advanced Micro Devices, Inc.
  Extensions: cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices
  Profile:    FULL_PROFILE
Using device 0 on platform 0
Found 1 CL device
Device 'gfx1100' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Board: AMD Radeon RX 7900 XTX
Driver version:      3516.0 (PAL,LC)
Version:             OpenCL 2.0 AMD-APP (3516.0)
Compute capability:  0.0
Max compute units:   48
Clock frequency:     2482 Mhz
Global mem size:     25753026560
Local mem size:      65536
Max const buf size:  25753026560
Double extension:    cl_khr_fp64
Running MilkyWay@home Nbody v1.85
Optimal Softening Length = 0.112929680735593 kpc
Dwarf Initial Position: [-34.375055953159666,104.152234974946268,-20.716946238453204]
Dwarf Initial Velocity: [9.468491711421557,92.047887884146391,-59.319309910185652]
Initial LMC position: [82.245240275588799,509.425796198476291,-150.407601305508365]
Initial LMC velocity: [-16.741582632867392,-129.695727529343941,7.340647956328183]
Kernel Compile Flags: -DDEBUG=0 -DDOUBLEPREC=1 -cl-mad-enable -DNBODY=40000 -DEFFNBODY=40000 -DNNODE=79999 -DWARPSIZE=64 -DNOSORT=0 -DTHREADS1=256 -DTHREADS2=256 -DTHREADS3=256 -DTHREADS4=256 -DTHREADS5=256 -DTHREADS6=256 -DTHREADS7=256 -DTHREADS8=256 -DMAXDEPTH=128 -DTIMESTEP=0x1.eed840c14c795p-12 -DEPS2=0x1.a1e4dd2e0b9bdp-7 -DTHETA=0x1p+0 -DUSE_QUAD=1 -DTREECODE=1 -DSW93=0 -DBH86=0 -DEXACT=0 -DUSE_EXTERNAL_POTENTIAL=1 -DDISK_TYPE=1 -DDISK_2_TYPE=0 -DHALO_TYPE=1 -DSPHERE_TYPE=1 -DSPHERICAL_MASS=0x1.2abd3374bc6a8p+17 -DSPHERICAL_SCALE=0x1.6666666666666p-1 -DDISK_MASS=0x1.b36a78d4fdf3bp+18 -DDISK_SCALE_LENGTH=0x1.ap+2 -DDISK_SCALE_HEIGHT=0x1.0a3d70a3d70a4p-2 -DHALO_VHALO=0x1.2a70a3d70a3d7p+6 -DHALO_SCALE_LENGTH=0x1.8p+3 -DHALO_FLATTEN_Z=0x1p+0 -DHALO_FLATTEN_Y=0x0p+0 -DHALO_FLATTEN_X=0x0p+0 -DHALO_TRIAX_ANGLE=0x0p+0 -DHALO_C1=0x0p+0 -DHALO_C2=0x0p+0 -DHALO_C3=0x0p+0 -DHALO_MASS=0x0p+0 -DHALO_GAMMA=0x0p+0 -DHALO_LAMBDA=0x0p+0 -DHALO_RHO0=0x0p+0   -DHAVE_INLINE_PTX=0 -DHAVE_CONSISTENT_MEMORY=0
19:47:17: Process 136840 created scene instance 0

--------------------------------------------------------------------------------
Total timing over 6357 steps:
                         Average             Total            Fraction
                    ----------------   ----------------   ----------------
  boundingBox:              0.000000           0.000000               nan%
  buildTree:                0.000000           0.000000               nan%
  summarization:            0.000000           0.000000               nan%
  sort:                     0.000000           0.000000               nan%
  quad moments:             0.000000           0.000000               nan%
  forceCalculation:         0.000000           0.000000               nan%
  integration:              0.000000           0.000000               nan%
  ==============================================================================
  total                     0.000000           0.000000               nan%

--------------------------------------------------------------------------------

19:47:37: Making final checkpoint
Running MilkyWay@home Nbody v1.85
Running MilkyWay@home Nbody v1.85
Error opening histogram file 'correct_hist.hist'
19:47:38: Removing checkpoint file 'nbody_checkpoint'




©2024 Astroinformatics Group