Welcome to MilkyWay@home

New Poll Regarding GPU Application of N-Body

Message boards : News : New Poll Regarding GPU Application of N-Body
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Sebastian*

Send message
Joined: 8 Apr 09
Posts: 70
Credit: 11,027,167,827
RAC: 0
Message 71014 - Posted: 23 Jul 2021, 17:19:14 UTC - in response to Message 71005.  

To clarify, running the GPU was testing doing the same task as the CPU. AMD GPUs from a purely teraflop performance standpoint should preform much faster than the RTX 3070 when simply looking at spec sheet as AMD invests more into this feature, however it remains to be seen weather this advantage will be realized in practice. The main idea here is to recognize that if the GPU and CPU are doing the same amount of work and an average computer has both a GPU and CPU; it is likely to double the amount of computation preformed overall on part of the network. I would like to point out that, although this slightly effects CPU performance, it is a minimal cost as the CPU section of the GPU code is designed to be lightweight and serve only for control purposes; this fact was realized when testing running both at the same time. If anyone wants to get an idea of performance for their specific system, the Github repo does have a working version of the GPU code although it does not support as many features like the LMC at the moment.


Thank you for that info.
About the 10900K, did it run with the intel suggested PL1 and PL2 tau time limits, after it will stay at 125W, or was it allowed to boost all day at the max sustainable frequency?
If it would stay at 125W we could judge the comparison more accurate, even with performance per Watt. But if it was allowed to boost all day then it would be less efficient then a GPU.

i am not sure how fare you are willing to optimize the app (CPU and GPU) but if you do, you can get way more results at the same time compared to before optimisations. What compiler do you use by the way? And are all the flags set to support the Ryzen CPUs to it's full potential?
ID: 71014 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sebastian*

Send message
Joined: 8 Apr 09
Posts: 70
Credit: 11,027,167,827
RAC: 0
Message 71015 - Posted: 23 Jul 2021, 17:22:47 UTC - in response to Message 71008.  

There is an "anomaly" in the latest version of BOINC client's use of tasks that use the GPU option. It has something to do with the Intel GPU. It can hang a task indefinitely. Trod carefully before checking this out. As Redbeard the pirate might say if he were still alive, "Matey, ye be warned!"


I have had that "hangin indefinitely" on AMD GPUs as well. It seems to depend on the driver. On my R9 390X it was fixed in the 19.7.2 driver i think. Not sure if it is still working. On the Fire Pro W9100, with professional drivers, it still hangs. And on the Radeon VII the last working driver is the professional 20q4 driver. The desktop drivers always cause hanging.
That is why i would like to see longer running tasks like the nbody ones on the GPU. A single WU at a time runs fine tho.
ID: 71015 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toby Broom

Send message
Joined: 13 Jun 09
Posts: 24
Credit: 137,536,729
RAC: 0
Message 71016 - Posted: 23 Jul 2021, 17:36:13 UTC - in response to Message 71010.  

If I have a high FP64 GPU then I assume it would be similar to the current WU, in terms of speed up?
ID: 71016 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
WMD

Send message
Joined: 15 Jun 13
Posts: 15
Credit: 2,069,756,183
RAC: 49,499
Message 71018 - Posted: 23 Jul 2021, 18:35:45 UTC

I've been waiting for this for years! I only run MilkyWay on GPU, as the CPU is used for other things - generally, projects that don't have a GPU version. And I do have a Titan V, so, I imagine the performance will be quite good. :) The performance differential between CPU and GPU won't matter so much to me, though, since GPU is all I do here. Would just be nice to be able to contribute to both applications!

As an aside, if the workunits do take longer, it'll at least be able to help with the never-fixed problem this project has with sending workunits - namely, the one where you get 300 tasks, it runs through them all without receiving any more, and then it sits for almost 15 minutes with nothing to do (since the server has a bad setting somewhere related to intervals between sending units, and the BOINC client backs off for ~10 minutes upon hitting up against that). Was tolerable with my old video card, but when your Titan V can do 300 tasks in about 35 minutes, it becomes kind of a pain. I've set up some backup projects with a work share of 0 to keep the system busy during that time, but... when you spend the money on an FP64 card, you want to do FP64 work! :)
ID: 71018 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sebastian*

Send message
Joined: 8 Apr 09
Posts: 70
Credit: 11,027,167,827
RAC: 0
Message 71019 - Posted: 23 Jul 2021, 18:48:04 UTC - in response to Message 71016.  

If I have a high FP64 GPU then I assume it would be similar to the current WU, in terms of speed up?


Not really. Nbody is pretty complex to calculate. I don't know how exactly it works tho. But if there are a lot of parts which have to be finished, before you can continue you calculations, then you have to wait. CPUs can handle forks and dependencies well, GPUs not so much. You can stall most of the GPU to wait for one important calculation to finish.

Video onecoding is such a thing, that is why you have special hardware for video encoding on GPUs and some intel CPUs. Even CPUs are faster compared to video encoding, when you don't use the special hardware inside the GPU for it.
ID: 71019 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toby Broom

Send message
Joined: 13 Jun 09
Posts: 24
Credit: 137,536,729
RAC: 0
Message 71023 - Posted: 24 Jul 2021, 10:39:42 UTC - in response to Message 71019.  
Last modified: 24 Jul 2021, 10:39:54 UTC

Thanks for the comment.

If the split of n body FP64 calculations is 1/8 and a consumer GPU is 1/64 and a high FP64 GPU is 1/2 then you would expect some speed up.

Anyway I would vote for whatever the project team thinks gives best return on investment.
ID: 71023 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
robertmiles

Send message
Joined: 30 Sep 09
Posts: 211
Credit: 36,977,315
RAC: 0
Message 71035 - Posted: 28 Jul 2021, 19:29:00 UTC

Nvidia provides a CUDA sample that looks closely related to the N-body problem.

On my computer, it is in directory:

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.4\5_Simulations

Its name is nbody.

Have you checked if it contains anything useful for the N-body problem?

I've seen no sign that programs partly in CUDA and partly in OpenCL are supported.

I've found a program named swan that is supposed to translate CUDA into OpenCL. However, it hasn't been updated in over 10 years, so I haven't put much effort into checking further.

dnzykbrzxk-1

I'm trying to devote my GPU to BOINC projects related to medical research. However, few enough such tasks are available that I'm also getting the shorter GPU tasks from other BOINC projects to avoid changes in the speed of my computer fans at night.

Folding@Home doesn't use BOINC, or it would be a suitable choice.

I don't consider whether you offer GPU N-body tasks to be important for me. However, I'll consider running them if I like their running times.

Is the GPU N-body program in the sane computer language as the CPU version? If so, you may be able to simply copy the code for the unimplemented features from the CPU version to the GPU version.
ID: 71035 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 9 Jul 17
Posts: 100
Credit: 16,967,906
RAC: 0
Message 71036 - Posted: 28 Jul 2021, 20:19:46 UTC - in response to Message 71035.  

Nvidia provides a CUDA sample that looks closely related to the N-body problem.

On my computer, it is in directory:

C:\ProgramData\NVIDIA Corporation\CUDA Samples\v11.4\5_Simulations

Its name is nbody.

That must be in the toolkit. They have some info on it:
https://docs.nvidia.com/cuda/cuda-samples/index.html#cuda-n-body-simulation

If this could make it go faster on CUDA, I am all for it. There are plenty of Nvidia cards around.
They could use the AMD cards for the separation work units.
ID: 71036 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dunx

Send message
Joined: 13 Feb 11
Posts: 31
Credit: 1,403,524,537
RAC: 0
Message 71037 - Posted: 29 Jul 2021, 9:16:41 UTC

Please run this code on a Radeon VII....

But mine sits idle waiting for WU's anyway, so why bother ?

Seriously, if it will keep my PC running 100% of the time then YES do it !

D.C.
ID: 71037 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,938,432
RAC: 22,812
Message 71039 - Posted: 29 Jul 2021, 10:22:11 UTC - in response to Message 71037.  
Last modified: 29 Jul 2021, 10:24:17 UTC

Please run this code on a Radeon VII....

But mine sits idle waiting for WU's anyway, so why bother ?

Seriously, if it will keep my PC running 100% of the time then YES do it !

D.C.


Have you tried using the alternate version of Boinc posted here:
https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4532#69225

Another option is joining the Team GPU Users Group as they also have a 'fix' that seems to work, but I don't know the criteria for getting access to the 'fix'. I know that Keith Myers can tell you some of that though https://milkyway.cs.rpi.edu/milkyway/view_profile.php?userid=147145

The new Admin here at MilkyWay has said if he could find the timer setting he would fix it but the former Admin who made the setting the way it is now didn't write things down.
ID: 71039 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Wee Todd Didd

Send message
Joined: 28 Jun 15
Posts: 3
Credit: 11,475,553
RAC: 0
Message 71044 - Posted: 2 Aug 2021, 20:52:08 UTC - in response to Message 70990.  

Wondering this as well. Could not find any recent N-Body in my results to look at to see if there was a verification of results. Current GPU's uses 2x the power of a CPU, So a GPU version with verification, would have to be 4x as fast at minimum to break even power consumption wise.

quote]When you say it's not much faster than a CPU, do you mean single-core CPU vs single GPU? Or do you mean an 8 or 12 thread CPU vs a 512 CUDA Core GPU? Because N-Body can scale itself to multiple threads, and I image the CPU also uses all CUDA cores on a GPU....[/quote]
ID: 71044 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 10 Feb 09
Posts: 52
Credit: 16,286,597
RAC: 0
Message 71056 - Posted: 6 Aug 2021, 12:59:11 UTC

I think the important part is the science.
The new app will help researchers? It's welcome!
ID: 71056 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Joseph Donahue

Send message
Joined: 7 Dec 16
Posts: 2
Credit: 2,216,975
RAC: 0
Message 71060 - Posted: 11 Aug 2021, 17:09:39 UTC - in response to Message 70988.  

Is there a difference in Watts/result?
ID: 71060 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Magiceye04

Send message
Joined: 14 Feb 10
Posts: 6
Credit: 109,713,505
RAC: 452
Message 71120 - Posted: 18 Sep 2021, 13:44:22 UTC - in response to Message 71001.  

I would be very happy to see longer running work units on GPUs. Especially on high performance AMD cards (with a lot of double precision performance) i have to run several WUs in parallel, which causes driver issues.
I got some Fixed by AMD, but not all.
If i could run a N-Body WU on my GPU and it takes several hours and loads the GPU well, it would be great.

The comparison specifically preformed was between a i9-10900k and RTX 3070, using all available compute cores for both.


When is is the comparison, then AMD cards with a lot of double precision performance should do well, as well as Nvidia cards. A 3070 has roughly 0.3 TFlop double precision performance. A 10900K should turn out the same, since memory bandwith is limited on the CPU.
I would expect a R9 280X to be 3 times as fast as a 3070 then.

I totally agree.
Even with 4 WUs in parallel my VII finishes the WUs in less than 1 minute.
Why are the WUs so short?

In my opinion a new GPU app makes only sense if the work is done more efficient then on the CPU.
My VII and the 3950X are set both to ~90W and in this config I would expect the VII to be at minimum 2-3x faster then the 32 CPU threads. Otherwise the next jump from AMD to 32 CPU cores for the consumer platform would make this advantage for the GPU obsolete. Threadripper already have more CPU cores, but also much more power consumption.
ID: 71120 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Maliar Ivan

Send message
Joined: 15 Apr 21
Posts: 1
Credit: 50,001
RAC: 0
Message 71123 - Posted: 20 Sep 2021, 8:55:02 UTC

Approximately how many calculations are left?
ID: 71123 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jackielan2000

Send message
Joined: 6 Nov 19
Posts: 8
Credit: 140,349
RAC: 0
Message 71262 - Posted: 23 Oct 2021, 0:05:08 UTC

It would be appreciated if you can make a GPU app for Android devices. Currently there's no such app existed for any other project. However, the population of Android devices is the largest in the world, much larger than that of PC, Mac or Linux copmuters. I have 7 Android devices, 4 mobile phones and 3 TV boxes. 5 of them are running WCG's OpenPandemics-Covid19 and 1 is running Universe@Home. I think it is time for MilkyWay@Home to embrace the power of Android.

Well, if you cannot make a GPU app at the moment, how about make one for CPU on Android?
ID: 71262 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
WMD

Send message
Joined: 15 Jun 13
Posts: 15
Credit: 2,069,756,183
RAC: 49,499
Message 73144 - Posted: 28 Apr 2022, 1:24:02 UTC

Bump!

Any progress on this?
ID: 73144 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Septimus

Send message
Joined: 8 Nov 11
Posts: 205
Credit: 2,882,763
RAC: 293
Message 73150 - Posted: 28 Apr 2022, 9:45:16 UTC - in response to Message 73144.  
Last modified: 28 Apr 2022, 9:52:17 UTC

Any chance of using Intel GPU on both applications, clearly they are not as fast as some but there are a lot of them around. It certainly made a difference on Einstein and Collatz. On Collatz especially reducing the time dramatically.
ID: 73150 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
poppinfresh99

Send message
Joined: 28 Feb 22
Posts: 16
Credit: 2,400,538
RAC: 0
Message 74952 - Posted: 29 Jan 2023, 0:09:48 UTC

My understanding is that n-body uses the Barnes Hut algorithm. I came across the following Barnes-Hut GPU paper, where GPU is MUCH faster than CPU...
https://iss.oden.utexas.edu/Publications/Papers/burtscher11.pdf

N-body probably also models gas flow (such as a dark matter distributions), which could be why your GPU code is so much slower? Regardless, the paper might be useful, and I'd be curious why your n-body GPU code is not seeing a large speedup.
ID: 74952 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 539,987,910
RAC: 87,082
Message 74955 - Posted: 29 Jan 2023, 2:47:52 UTC - in response to Message 74952.  

I think you are confused. There is no N-body gpu app so no N-body gpu code.

The N-body app is a multi-threaded cpu application only.
ID: 74955 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : News : New Poll Regarding GPU Application of N-Body

©2024 Astroinformatics Group