1)
Message boards :
Number crunching :
One WU using only 16 threads.
(Message 75755)
Posted 20 Jun 2023 by ahorek's team Post: just use app_config.xml <app_config> <app> <name>milkyway_nbody</name> <max_concurrent>1</max_concurrent> </app> <app_version> <app_name>milkyway_nbody</app_name> <plan_class>mt</plan_class> <avg_ncpus>32</avg_ncpus> <cmdline>-n 32</cmdline> </app_version> </app_config> |
2)
Message boards :
Number crunching :
One WU using only 16 threads.
(Message 75747)
Posted 20 Jun 2023 by ahorek's team Post: the app does support > 16 threads, but it doesn't scale well. There is only a 10% improvement 16 vs 32 threads on Ryzen 5950x, so running multiple wu's like 2x16 threads is more efficient. it would be nice to setup "Max # of CPUs for this project" in the administration like other projects (amicable, primegrid...) have. app_config.xml is another alternative... |
3)
Message boards :
News :
Separation Project Coming To An End
(Message 75567)
Posted 15 Jun 2023 by ahorek's team Post: if the main parts are changing all the time, it's hard to write an efficient GPU implementation without enough manpower. It's not like optimizing a well-known algorithm like FFT... what about a hybrid approach? accelerate only hot parts that don't change too often? it could ease the maintenance, but on the other hand, the overhead may be too large, so it won't be efficient. thank you for your feedback and efforts. Much appreciated! |
4)
Message boards :
News :
Separation Project Coming To An End
(Message 75549)
Posted 14 Jun 2023 by ahorek's team Post: it loads up the GPU to 100%, but I can't be sure that it's actually doing anything useful or it's just spinning it's wheels yeah, I've retested it on my linux + nvidia (turing) PC, but it seems the current nbody opencl version doesn't work at all. Maybe developers here have some insights if it ever worked? Anyway, if the current code is buggy, it would take much more effort to fix it and optimize it. It would be great to have a GPU version, but I don't think I'll be able to fix it... |
5)
Message boards :
News :
Separation Project Coming To An End
(Message 75539)
Posted 14 Jun 2023 by ahorek's team Post: > how did you compile the binary? I'm testing the standalone version because dealing with Boinc dependencies is painful... it's easier to build it on Linux: 1/ install prerequisites (depends on your system, not sure if this is the full list) apt-get install -y git build-essential cmake ocl-icd-opencl-dev ninja-build 2/ git clone git clone https://github.com/Milkyway-at-home/milkywayathome_client.git cd milkywayathome_client 3/ enable OPENCL build sed -i 's/DNBODY_OPENCL=OFF/DNBODY_OPENCL=ON/g' make_nbody_lite.sh 4/ compile it sh make_nbody_lite.sh cd ../../build 5/ the binary should be here ./milkyway_nbody also, I had to change to code to use the right device (CPU OpenCL platform is preferred for some reason) and [-p|--platform=INT][-d|--device=INT] options doesn't seem to work as expected.[/code] |
6)
Message boards :
News :
Separation Project Coming To An End
(Message 75535)
Posted 14 Jun 2023 by ahorek's team Post: the code is available here https://github.com/Milkyway-at-home/milkywayathome_client/blob/master/nbody/kernels/nbody_kernels.cl I wanted to test the CPU vs GPU difference by myself enable OpenCL -DNBODY_OPENCL=ON + specify libraries -DOPENCL_LIBRARIES=C:/mingw/msys64/mingw64/bin/OpenCL.dll -DOPENCL_INCLUDE_DIRS=C:/mingw/msys64/mingw64/include/CL the GPU load rises, but after a few seconds, it crashes on a driver timeout. It could be a problem with my environment and I'm on Windows, so... milkyway_nbody -f settings.lua -o output_0gy.out -h correct_hist.hist -z hist_test.hist -n 32 -b -i 3.0 1.0 0.2 0.2 12 0.2 -p 0 -d 0 Using OpenMP 32 max threads on a system with 32 processors Found 1 platform Platform 0 information: Name: AMD Accelerated Parallel Processing Version: OpenCL 2.1 AMD-APP (3516.0) Vendor: Advanced Micro Devices, Inc. Extensions: cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices Profile: FULL_PROFILE Using device 0 on platform 0 Found 1 CL device Device 'gfx1100' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU) Board: AMD Radeon RX 7900 XTX Driver version: 3516.0 (PAL,LC) Version: OpenCL 2.0 AMD-APP (3516.0) Compute capability: 0.0 Max compute units: 48 Clock frequency: 2482 Mhz Global mem size: 25753026560 Local mem size: 65536 Max const buf size: 25753026560 Double extension: cl_khr_fp64 Running MilkyWay@home Nbody v1.85 Optimal Softening Length = 0.112929680735593 kpc Dwarf Initial Position: [-34.375055953159666,104.152234974946268,-20.716946238453204] Dwarf Initial Velocity: [9.468491711421557,92.047887884146391,-59.319309910185652] Initial LMC position: [82.245240275588799,509.425796198476291,-150.407601305508365] Initial LMC velocity: [-16.741582632867392,-129.695727529343941,7.340647956328183] Kernel Compile Flags: -DDEBUG=0 -DDOUBLEPREC=1 -cl-mad-enable -DNBODY=40000 -DEFFNBODY=40000 -DNNODE=79999 -DWARPSIZE=64 -DNOSORT=0 -DTHREADS1=256 -DTHREADS2=256 -DTHREADS3=256 -DTHREADS4=256 -DTHREADS5=256 -DTHREADS6=256 -DTHREADS7=256 -DTHREADS8=256 -DMAXDEPTH=128 -DTIMESTEP=0x1.eed840c14c795p-12 -DEPS2=0x1.a1e4dd2e0b9bdp-7 -DTHETA=0x1p+0 -DUSE_QUAD=1 -DTREECODE=1 -DSW93=0 -DBH86=0 -DEXACT=0 -DUSE_EXTERNAL_POTENTIAL=1 -DDISK_TYPE=1 -DDISK_2_TYPE=0 -DHALO_TYPE=1 -DSPHERE_TYPE=1 -DSPHERICAL_MASS=0x1.2abd3374bc6a8p+17 -DSPHERICAL_SCALE=0x1.6666666666666p-1 -DDISK_MASS=0x1.b36a78d4fdf3bp+18 -DDISK_SCALE_LENGTH=0x1.ap+2 -DDISK_SCALE_HEIGHT=0x1.0a3d70a3d70a4p-2 -DHALO_VHALO=0x1.2a70a3d70a3d7p+6 -DHALO_SCALE_LENGTH=0x1.8p+3 -DHALO_FLATTEN_Z=0x1p+0 -DHALO_FLATTEN_Y=0x0p+0 -DHALO_FLATTEN_X=0x0p+0 -DHALO_TRIAX_ANGLE=0x0p+0 -DHALO_C1=0x0p+0 -DHALO_C2=0x0p+0 -DHALO_C3=0x0p+0 -DHALO_MASS=0x0p+0 -DHALO_GAMMA=0x0p+0 -DHALO_LAMBDA=0x0p+0 -DHALO_RHO0=0x0p+0 -DHAVE_INLINE_PTX=0 -DHAVE_CONSISTENT_MEMORY=0 19:47:17: Process 136840 created scene instance 0 -------------------------------------------------------------------------------- Total timing over 6357 steps: Average Total Fraction ---------------- ---------------- ---------------- boundingBox: 0.000000 0.000000 nan% buildTree: 0.000000 0.000000 nan% summarization: 0.000000 0.000000 nan% sort: 0.000000 0.000000 nan% quad moments: 0.000000 0.000000 nan% forceCalculation: 0.000000 0.000000 nan% integration: 0.000000 0.000000 nan% ============================================================================== total 0.000000 0.000000 nan% -------------------------------------------------------------------------------- 19:47:37: Making final checkpoint Running MilkyWay@home Nbody v1.85 Running MilkyWay@home Nbody v1.85 Error opening histogram file 'correct_hist.hist' 19:47:38: Removing checkpoint file 'nbody_checkpoint' |
©2024 Astroinformatics Group