New Poll Regarding GPU Application of N-Body

Author	Message
Sebastian* Send message Joined: 8 Apr 09 Posts: 70 Credit: 11,036,171,711 RAC: 4,114	Message 71014 - Posted: 23 Jul 2021, 17:19:14 UTC - in response to Message 71005. To clarify, running the GPU was testing doing the same task as the CPU. AMD GPUs from a purely teraflop performance standpoint should preform much faster than the RTX 3070 when simply looking at spec sheet as AMD invests more into this feature, however it remains to be seen weather this advantage will be realized in practice. The main idea here is to recognize that if the GPU and CPU are doing the same amount of work and an average computer has both a GPU and CPU; it is likely to double the amount of computation preformed overall on part of the network. I would like to point out that, although this slightly effects CPU performance, it is a minimal cost as the CPU section of the GPU code is designed to be lightweight and serve only for control purposes; this fact was realized when testing running both at the same time. If anyone wants to get an idea of performance for their specific system, the Github repo does have a working version of the GPU code although it does not support as many features like the LMC at the moment. Thank you for that info. About the 10900K, did it run with the intel suggested PL1 and PL2 tau time limits, after it will stay at 125W, or was it allowed to boost all day at the max sustainable frequency? If it would stay at 125W we could judge the comparison more accurate, even with performance per Watt. But if it was allowed to boost all day then it would be less efficient then a GPU. i am not sure how fare you are willing to optimize the app (CPU and GPU) but if you do, you can get way more results at the same time compared to before optimisations. What compiler do you use by the way? And are all the flags set to support the Ryzen CPUs to it's full potential? ID: 71014 · Rating: 0 · rate: / Reply Quote

Sebastian* Send message Joined: 8 Apr 09 Posts: 70 Credit: 11,036,171,711 RAC: 4,114	Message 71015 - Posted: 23 Jul 2021, 17:22:47 UTC - in response to Message 71008. There is an "anomaly" in the latest version of BOINC client's use of tasks that use the GPU option. It has something to do with the Intel GPU. It can hang a task indefinitely. Trod carefully before checking this out. As Redbeard the pirate might say if he were still alive, "Matey, ye be warned!" I have had that "hangin indefinitely" on AMD GPUs as well. It seems to depend on the driver. On my R9 390X it was fixed in the 19.7.2 driver i think. Not sure if it is still working. On the Fire Pro W9100, with professional drivers, it still hangs. And on the Radeon VII the last working driver is the professional 20q4 driver. The desktop drivers always cause hanging. That is why i would like to see longer running tasks like the nbody ones on the GPU. A single WU at a time runs fine tho. ID: 71015 · Rating: 0 · rate: / Reply Quote

Toby Broom Send message Joined: 13 Jun 09 Posts: 24 Credit: 160,973,121 RAC: 187,094	Message 71016 - Posted: 23 Jul 2021, 17:36:13 UTC - in response to Message 71010. If I have a high FP64 GPU then I assume it would be similar to the current WU, in terms of speed up? ID: 71016 · Rating: 0 · rate: / Reply Quote

WMD Send message Joined: 15 Jun 13 Posts: 15 Credit: 2,070,897,222 RAC: 0	Message 71018 - Posted: 23 Jul 2021, 18:35:45 UTC I've been waiting for this for years! I only run MilkyWay on GPU, as the CPU is used for other things - generally, projects that don't have a GPU version. And I do have a Titan V, so, I imagine the performance will be quite good. :) The performance differential between CPU and GPU won't matter so much to me, though, since GPU is all I do here. Would just be nice to be able to contribute to both applications! As an aside, if the workunits do take longer, it'll at least be able to help with the never-fixed problem this project has with sending workunits - namely, the one where you get 300 tasks, it runs through them all without receiving any more, and then it sits for almost 15 minutes with nothing to do (since the server has a bad setting somewhere related to intervals between sending units, and the BOINC client backs off for ~10 minutes upon hitting up against that). Was tolerable with my old video card, but when your Titan V can do 300 tasks in about 35 minutes, it becomes kind of a pain. I've set up some backup projects with a work share of 0 to keep the system busy during that time, but... when you spend the money on an FP64 card, you want to do FP64 work! :) ID: 71018 · Rating: 0 · rate: / Reply Quote

Sebastian* Send message Joined: 8 Apr 09 Posts: 70 Credit: 11,036,171,711 RAC: 4,114	Message 71019 - Posted: 23 Jul 2021, 18:48:04 UTC - in response to Message 71016. If I have a high FP64 GPU then I assume it would be similar to the current WU, in terms of speed up? Not really. Nbody is pretty complex to calculate. I don't know how exactly it works tho. But if there are a lot of parts which have to be finished, before you can continue you calculations, then you have to wait. CPUs can handle forks and dependencies well, GPUs not so much. You can stall most of the GPU to wait for one important calculation to finish. Video onecoding is such a thing, that is why you have special hardware for video encoding on GPUs and some intel CPUs. Even CPUs are faster compared to video encoding, when you don't use the special hardware inside the GPU for it. ID: 71019 · Rating: 0 · rate: / Reply Quote

Toby Broom Send message Joined: 13 Jun 09 Posts: 24 Credit: 160,973,121 RAC: 187,094	Message 71023 - Posted: 24 Jul 2021, 10:39:42 UTC - in response to Message 71019. Last modified: 24 Jul 2021, 10:39:54 UTC Thanks for the comment. If the split of n body FP64 calculations is 1/8 and a consumer GPU is 1/64 and a high FP64 GPU is 1/2 then you would expect some speed up. Anyway I would vote for whatever the project team thinks gives best return on investment. ID: 71023 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 30 Sep 09 Posts: 212 Credit: 37,228,680 RAC: 1,946	Message 71035 - Posted: 28 Jul 2021, 19:29:00 UTC Nvidia provides a CUDA sample that looks closely related to the N-body problem. On my computer, it is in directory: C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.4\5_Simulations Its name is nbody. Have you checked if it contains anything useful for the N-body problem? I've seen no sign that programs partly in CUDA and partly in OpenCL are supported. I've found a program named swan that is supposed to translate CUDA into OpenCL. However, it hasn't been updated in over 10 years, so I haven't put much effort into checking further. dnzykbrzxk-1 I'm trying to devote my GPU to BOINC projects related to medical research. However, few enough such tasks are available that I'm also getting the shorter GPU tasks from other BOINC projects to avoid changes in the speed of my computer fans at night. Folding@Home doesn't use BOINC, or it would be a suitable choice. I don't consider whether you offer GPU N-body tasks to be important for me. However, I'll consider running them if I like their running times. Is the GPU N-body program in the sane computer language as the CPU version? If so, you may be able to simply copy the code for the unimplemented features from the CPU version to the GPU version. ID: 71035 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 9 Jul 17 Posts: 100 Credit: 16,967,906 RAC: 0	Message 71036 - Posted: 28 Jul 2021, 20:19:46 UTC - in response to Message 71035. Nvidia provides a CUDA sample that looks closely related to the N-body problem. On my computer, it is in directory: C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.4\5_Simulations Its name is nbody. That must be in the toolkit. They have some info on it: https://docs.nvidia.com/cuda/cuda-samples/index.html#cuda-n-body-simulation If this could make it go faster on CUDA, I am all for it. There are plenty of Nvidia cards around. They could use the AMD cards for the separation work units. ID: 71036 · Rating: 0 · rate: / Reply Quote

Dunx Send message Joined: 13 Feb 11 Posts: 31 Credit: 1,403,524,537 RAC: 0	Message 71037 - Posted: 29 Jul 2021, 9:16:41 UTC Please run this code on a Radeon VII.... But mine sits idle waiting for WU's anyway, so why bother ? Seriously, if it will keep my PC running 100% of the time then YES do it ! D.C. ID: 71037 · Rating: 0 · rate: / Reply Quote

S984s5KN6muKjYePgfqf7F37RiXw5f... Send message Joined: 8 May 09 Posts: 3339 Credit: 524,374,462 RAC: 3,759	Message 71039 - Posted: 29 Jul 2021, 10:22:11 UTC - in response to Message 71037. Last modified: 29 Jul 2021, 10:24:17 UTC Please run this code on a Radeon VII.... But mine sits idle waiting for WU's anyway, so why bother ? Seriously, if it will keep my PC running 100% of the time then YES do it ! D.C. Have you tried using the alternate version of Boinc posted here: https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4532#69225 Another option is joining the Team GPU Users Group as they also have a 'fix' that seems to work, but I don't know the criteria for getting access to the 'fix'. I know that Keith Myers can tell you some of that though https://milkyway.cs.rpi.edu/milkyway/view_profile.php?userid=147145 The new Admin here at MilkyWay has said if he could find the timer setting he would fix it but the former Admin who made the setting the way it is now didn't write things down. ID: 71039 · Rating: 0 · rate: / Reply Quote

Wee Todd Didd Send message Joined: 28 Jun 15 Posts: 3 Credit: 11,475,553 RAC: 0	Message 71044 - Posted: 2 Aug 2021, 20:52:08 UTC - in response to Message 70990. Wondering this as well. Could not find any recent N-Body in my results to look at to see if there was a verification of results. Current GPU's uses 2x the power of a CPU, So a GPU version with verification, would have to be 4x as fast at minimum to break even power consumption wise. quote]When you say it's not much faster than a CPU, do you mean single-core CPU vs single GPU? Or do you mean an 8 or 12 thread CPU vs a 512 CUDA Core GPU? Because N-Body can scale itself to multiple threads, and I image the CPU also uses all CUDA cores on a GPU....[/quote] ID: 71044 · Rating: 0 · rate: / Reply Quote

[VENETO] boboviz Send message Joined: 10 Feb 09 Posts: 52 Credit: 16,397,122 RAC: 121	Message 71056 - Posted: 6 Aug 2021, 12:59:11 UTC I think the important part is the science. The new app will help researchers? It's welcome! ID: 71056 · Rating: 0 · rate: / Reply Quote

Joseph Donahue Send message Joined: 7 Dec 16 Posts: 2 Credit: 2,216,975 RAC: 0	Message 71060 - Posted: 11 Aug 2021, 17:09:39 UTC - in response to Message 70988. Is there a difference in Watts/result? ID: 71060 · Rating: 0 · rate: / Reply Quote

Magiceye04 Send message Joined: 14 Feb 10 Posts: 17 Credit: 112,743,478 RAC: 17,553	Message 71120 - Posted: 18 Sep 2021, 13:44:22 UTC - in response to Message 71001. I would be very happy to see longer running work units on GPUs. Especially on high performance AMD cards (with a lot of double precision performance) i have to run several WUs in parallel, which causes driver issues. I got some Fixed by AMD, but not all. If i could run a N-Body WU on my GPU and it takes several hours and loads the GPU well, it would be great. The comparison specifically preformed was between a i9-10900k and RTX 3070, using all available compute cores for both. When is is the comparison, then AMD cards with a lot of double precision performance should do well, as well as Nvidia cards. A 3070 has roughly 0.3 TFlop double precision performance. A 10900K should turn out the same, since memory bandwith is limited on the CPU. I would expect a R9 280X to be 3 times as fast as a 3070 then. I totally agree. Even with 4 WUs in parallel my VII finishes the WUs in less than 1 minute. Why are the WUs so short? In my opinion a new GPU app makes only sense if the work is done more efficient then on the CPU. My VII and the 3950X are set both to ~90W and in this config I would expect the VII to be at minimum 2-3x faster then the 32 CPU threads. Otherwise the next jump from AMD to 32 CPU cores for the consumer platform would make this advantage for the GPU obsolete. Threadripper already have more CPU cores, but also much more power consumption. ID: 71120 · Rating: 0 · rate: / Reply Quote

Maliar Ivan Send message Joined: 15 Apr 21 Posts: 1 Credit: 448,285 RAC: 160	Message 71123 - Posted: 20 Sep 2021, 8:55:02 UTC Approximately how many calculations are left? ID: 71123 · Rating: 0 · rate: / Reply Quote

jackielan2000 Send message Joined: 6 Nov 19 Posts: 8 Credit: 141,468 RAC: 0	Message 71262 - Posted: 23 Oct 2021, 0:05:08 UTC It would be appreciated if you can make a GPU app for Android devices. Currently there's no such app existed for any other project. However, the population of Android devices is the largest in the world, much larger than that of PC, Mac or Linux copmuters. I have 7 Android devices, 4 mobile phones and 3 TV boxes. 5 of them are running WCG's OpenPandemics-Covid19 and 1 is running Universe@Home. I think it is time for MilkyWay@Home to embrace the power of Android. Well, if you cannot make a GPU app at the moment, how about make one for CPU on Android? ID: 71262 · Rating: 0 · rate: / Reply Quote

WMD Send message Joined: 15 Jun 13 Posts: 15 Credit: 2,070,897,222 RAC: 0	Message 73144 - Posted: 28 Apr 2022, 1:24:02 UTC Bump! Any progress on this? ID: 73144 · Rating: 0 · rate: / Reply Quote

Septimus Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,905,857 RAC: 0	Message 73150 - Posted: 28 Apr 2022, 9:45:16 UTC - in response to Message 73144. Last modified: 28 Apr 2022, 9:52:17 UTC Any chance of using Intel GPU on both applications, clearly they are not as fast as some but there are a lot of them around. It certainly made a difference on Einstein and Collatz. On Collatz especially reducing the time dramatically. ID: 73150 · Rating: 0 · rate: / Reply Quote

poppinfresh99 Send message Joined: 28 Feb 22 Posts: 16 Credit: 2,441,742 RAC: 0	Message 74952 - Posted: 29 Jan 2023, 0:09:48 UTC My understanding is that n-body uses the Barnes Hut algorithm. I came across the following Barnes-Hut GPU paper, where GPU is MUCH faster than CPU... https://iss.oden.utexas.edu/Publications/Papers/burtscher11.pdf N-body probably also models gas flow (such as a dark matter distributions), which could be why your GPU code is so much slower? Regardless, the paper might be useful, and I'd be curious why your n-body GPU code is not seeing a large speedup. ID: 74952 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 739 Credit: 568,553,314 RAC: 66,798	Message 74955 - Posted: 29 Jan 2023, 2:47:52 UTC - in response to Message 74952. I think you are confused. There is no N-body gpu app so no N-body gpu code. The N-body app is a multi-threaded cpu application only. ID: 74955 · Rating: 0 · rate: / Reply Quote