Welcome to MilkyWay@home

New Poll Regarding GPU Application of N-Body

Message boards : News : New Poll Regarding GPU Application of N-Body
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Eric Mendelsohn
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 21 Aug 18
Posts: 59
Credit: 5,350,675
RAC: 0
Message 70988 - Posted: 21 Jul 2021, 17:46:03 UTC

Hey everyone,

We are currently looking at making a GPU version of N-Body. This code has been under development for quite some time, and the base code is finally working, though we would still need to implement some other features to run it alongside the CPU version. However, due to the complexity of our code and our need for double precision, the GPU version has a similar runtime to that of the CPU version, though there may be some speed-up on professional grade GPU cards. For reference, the GPU version of the Separation code is roughly 50-60 times faster than its CPU counterpart depending on the machine. Keeping that in mind, do you guys still want a GPU version of N-Body? I have put up a basic straw poll on https://www.strawpoll.me/45510486. If you wish to elaborate on your choice, please feel free to comment below.

Thank you all for your input, time, and consideration,

-Eric
ID: 70988 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 29 Sep 10
Posts: 54
Credit: 1,342,886
RAC: 0
Message 70990 - Posted: 21 Jul 2021, 21:06:11 UTC - in response to Message 70988.  
Last modified: 21 Jul 2021, 21:06:37 UTC

When you say it's not much faster than a CPU, do you mean single-core CPU vs single GPU? Or do you mean an 8 or 12 thread CPU vs a 512 CUDA Core GPU? Because N-Body can scale itself to multiple threads, and I image the CPU also uses all CUDA cores on a GPU....
ID: 70990 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 12 Jun 10
Posts: 57
Credit: 6,163,587
RAC: 156
Message 70991 - Posted: 21 Jul 2021, 22:28:02 UTC

Eric if you are wanting to increase the speed of N – body project, you believe you have the server resources available and you are able to increase the speed. I say go for it. Have voted accordingly
ID: 70991 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FurryGuy

Send message
Joined: 1 Aug 11
Posts: 10
Credit: 51,374,490
RAC: 0
Message 70992 - Posted: 21 Jul 2021, 23:08:07 UTC

Not just yes for a GPU app for N-Body, but HELL YES!.

Faster, slower, same time, is no big deal for me.
ID: 70992 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
dylansheils0241

Send message
Joined: 10 Jan 21
Posts: 4
Credit: 56
RAC: 0
Message 70994 - Posted: 22 Jul 2021, 3:41:32 UTC - in response to Message 70990.  

The comparison specifically preformed was between a i9-10900k and RTX 3070, using all available compute cores for both.
ID: 70994 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bozz4science

Send message
Joined: 3 May 20
Posts: 1
Credit: 2,681,424
RAC: 0
Message 70995 - Posted: 22 Jul 2021, 8:41:43 UTC

If you already have the code (almost) ready to be deployed, why not just go for it? I'd definitely like to see for myself how a 10/12/14-threaded CPU WU compares to a mid- to low-end NVIDIA GPU. Especially curious about util and power consumption. After running a while, everyone can decide for themselves what app to run. Anyway, this will likely boost overall productivity!

Thanks for your effort!
ID: 70995 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zioriga

Send message
Joined: 30 Aug 07
Posts: 17
Credit: 65,973,233
RAC: 0
Message 70996 - Posted: 22 Jul 2021, 9:07:36 UTC

And do you mean NVidia or ATI GPUs ??

Usually ATI cards are faster compared to NVidia when double precision is required
ID: 70996 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
dylansheils0241

Send message
Joined: 10 Jan 21
Posts: 4
Credit: 56
RAC: 0
Message 70997 - Posted: 22 Jul 2021, 11:55:10 UTC - in response to Message 70996.  

The comparison was between a RTX 3090 and i9-10900K, using all compute cores for each platform. You are, additionally, correct AMD typically invests more in FP64 performance for consumer cards than Nvidia; I, myself, am quite curious to see how AMD cards preform.
ID: 70997 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ironslug

Send message
Joined: 4 Nov 10
Posts: 1
Credit: 563,717,626
RAC: 0
Message 70999 - Posted: 22 Jul 2021, 14:36:59 UTC - in response to Message 70988.  
Last modified: 22 Jul 2021, 15:05:21 UTC

My current AMD GPUs complete a Separation 1.46 WU in about 2.5 minutes. My 16-Core AMD CPU takes approximately 57 minutes to complete a single WU. If you are suggesting the adoption of a GPU implementation that will keep my GPU tied up for nearly an hour to only complete only one WU I would consider it a considerable waste of power and efficiency. Please correct me if I'm missing something here, but I'm failing to see advantage in throughput or efficiency. Both CPU and GPU tasks are similar computational size (approximately 42.6 GFLOPs) and my GPUs are each powering through 24-plus WUs and hour. I run several projects and at the moment MilkyWay@home is primarily a GPU-focused task for me precisely because of the efficiency of the GPU work.

I suppose that I will have to wait until you actually release your GPU product to test it live, but if I lose the efficiency I'm presently estimating, I will likely have to push my GPUs into other distributed-science projects.
ID: 70999 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Le_Pommier] Jerome_C2005

Send message
Joined: 1 Apr 08
Posts: 30
Credit: 84,549,863
RAC: 0
Message 71000 - Posted: 22 Jul 2021, 15:02:26 UTC

I'm afraid the poll is a bit light : as said above we need more details to be able to compare,

- would 1 GPU task do the same *amount of work* than 1 CPU thread task ? (since you already say the process time will be equivalent)
- would 1 GPU task also need a % of a CPU thread ? (as it is often the case with GPU tasks, especially with OpenCL, I can only run AMD OpenCL tasks on my iMac) and then do you know what % ? and again, also to do the same amount of of work than the regular 1 thread CPU task ? (if yes it would then be counter productive and not equivalent !)
ID: 71000 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sebastian*

Send message
Joined: 8 Apr 09
Posts: 70
Credit: 11,027,167,827
RAC: 0
Message 71001 - Posted: 22 Jul 2021, 15:14:52 UTC
Last modified: 22 Jul 2021, 15:15:23 UTC

I would be very happy to see longer running work units on GPUs. Especially on high performance AMD cards (with a lot of double precision performance) i have to run several WUs in parallel, which causes driver issues.
I got some Fixed by AMD, but not all.
If i could run a N-Body WU on my GPU and it takes several hours and loads the GPU well, it would be great.

The comparison specifically preformed was between a i9-10900k and RTX 3070, using all available compute cores for both.


When is is the comparison, then AMD cards with a lot of double precision performance should do well, as well as Nvidia cards. A 3070 has roughly 0.3 TFlop double precision performance. A 10900K should turn out the same, since memory bandwith is limited on the CPU.
I would expect a R9 280X to be 3 times as fast as a 3070 then.
ID: 71001 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pututu

Send message
Joined: 24 Aug 17
Posts: 8
Credit: 223,957,930
RAC: 0
Message 71002 - Posted: 22 Jul 2021, 15:55:09 UTC

Voted.

Some of us are credit-whores, so if you set the credit accordingly, I'm sure these crunchers will gladly participate and help out with their gpus ;).

Don't recommend to use BOINC CreditScrew, I mean CreditNew system.
ID: 71002 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile adrianxw
Avatar

Send message
Joined: 25 May 14
Posts: 31
Credit: 56,750,059
RAC: 0
Message 71003 - Posted: 22 Jul 2021, 16:18:52 UTC

If the GPU version is doing the same work as the CPU version in roughly the same time, then, no, don't distribute it. GPU versions have a habit of also needing some CPU, so the net result is some CPU plus a whole GPU for the same effect as a CPU. Waste of resources.
ID: 71003 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sebastian*

Send message
Joined: 8 Apr 09
Posts: 70
Credit: 11,027,167,827
RAC: 0
Message 71004 - Posted: 22 Jul 2021, 16:54:51 UTC - in response to Message 71003.  
Last modified: 22 Jul 2021, 16:55:20 UTC

If the GPU version is doing the same work as the CPU version in roughly the same time, then, no, don't distribute it. GPU versions have a habit of also needing some CPU, so the net result is some CPU plus a whole GPU for the same effect as a CPU. Waste of resources.

That is the reason why we asked for the comparison. What CPU was used and what GPU. Especially the consumer Nvidia cards don't have a lot of double precision performance. But even tho they have a lot more memory bandwith, and will be more efficient that way.
A 10900k is also not a very common CPU. People will likely have worse CPUs but with GPUs which have even more double precision performance then the 3070. So it makes sense to release a GPU app :)
ID: 71004 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
dylansheils0241

Send message
Joined: 10 Jan 21
Posts: 4
Credit: 56
RAC: 0
Message 71005 - Posted: 22 Jul 2021, 16:55:11 UTC

To clarify, running the GPU was testing doing the same task as the CPU. AMD GPUs from a purely teraflop performance standpoint should preform much faster than the RTX 3070 when simply looking at spec sheet as AMD invests more into this feature, however it remains to be seen weather this advantage will be realized in practice. The main idea here is to recognize that if the GPU and CPU are doing the same amount of work and an average computer has both a GPU and CPU; it is likely to double the amount of computation preformed overall on part of the network. I would like to point out that, although this slightly effects CPU performance, it is a minimal cost as the CPU section of the GPU code is designed to be lightweight and serve only for control purposes; this fact was realized when testing running both at the same time. If anyone wants to get an idea of performance for their specific system, the Github repo does have a working version of the GPU code although it does not support as many features like the LMC at the moment.
ID: 71005 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kk4jo

Send message
Joined: 17 Jul 21
Posts: 1
Credit: 418,570
RAC: 0
Message 71006 - Posted: 22 Jul 2021, 17:21:08 UTC - in response to Message 70988.  

Is this GPU code targeted at a specific GPU or is generic where it would run on my m1 chipped apple? That would be great!

Thanks,
Kerry
ID: 71006 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
dylansheils0241

Send message
Joined: 10 Jan 21
Posts: 4
Credit: 56
RAC: 0
Message 71007 - Posted: 22 Jul 2021, 18:51:11 UTC - in response to Message 71006.  

At the moment, the OpenCL code is targeted to AMD and Nvidia cards only.
ID: 71007 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Wisesooth

Send message
Joined: 2 Oct 14
Posts: 43
Credit: 54,799,597
RAC: 1,574
Message 71008 - Posted: 23 Jul 2021, 2:20:45 UTC - in response to Message 70988.  

There is an "anomaly" in the latest version of BOINC client's use of tasks that use the GPU option. It has something to do with the Intel GPU. It can hang a task indefinitely. Trod carefully before checking this out. As Redbeard the pirate might say if he were still alive, "Matey, ye be warned!"
ID: 71008 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
DaiKiwi

Send message
Joined: 2 Apr 10
Posts: 5
Credit: 161,937,717
RAC: 19,742
Message 71010 - Posted: 23 Jul 2021, 8:10:13 UTC

Well, it'd be a 'maybe' for me. AMD's older Cypress/Cayman/Tahiti/Hawaii cards have 2x-4x the raw FP64 performance of that 3070. Perhaps a test with a 'Tahiti' generation card of the same workunit would give us a bit more of an idea?
ID: 71010 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Astro 1940

Send message
Joined: 29 Aug 12
Posts: 2
Credit: 204,504,206
RAC: 0
Message 71013 - Posted: 23 Jul 2021, 16:32:46 UTC - in response to Message 70988.  

When addressing the Apple M1 GPU?
ID: 71013 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : News : New Poll Regarding GPU Application of N-Body

©2024 Astroinformatics Group