Message boards :
News :
New Poll Regarding GPU Application of N-Body
Message board moderation
Author | Message |
---|---|
Send message Joined: 21 Aug 18 Posts: 59 Credit: 5,350,675 RAC: 0 |
Hey everyone, We are currently looking at making a GPU version of N-Body. This code has been under development for quite some time, and the base code is finally working, though we would still need to implement some other features to run it alongside the CPU version. However, due to the complexity of our code and our need for double precision, the GPU version has a similar runtime to that of the CPU version, though there may be some speed-up on professional grade GPU cards. For reference, the GPU version of the Separation code is roughly 50-60 times faster than its CPU counterpart depending on the machine. Keeping that in mind, do you guys still want a GPU version of N-Body? I have put up a basic straw poll on https://www.strawpoll.me/45510486. If you wish to elaborate on your choice, please feel free to comment below. Thank you all for your input, time, and consideration, -Eric |
Send message Joined: 29 Sep 10 Posts: 54 Credit: 1,386,559 RAC: 1 |
When you say it's not much faster than a CPU, do you mean single-core CPU vs single GPU? Or do you mean an 8 or 12 thread CPU vs a 512 CUDA Core GPU? Because N-Body can scale itself to multiple threads, and I image the CPU also uses all CUDA cores on a GPU.... |
Send message Joined: 12 Jun 10 Posts: 57 Credit: 6,163,587 RAC: 0 |
Eric if you are wanting to increase the speed of N – body project, you believe you have the server resources available and you are able to increase the speed. I say go for it. Have voted accordingly |
Send message Joined: 1 Aug 11 Posts: 10 Credit: 51,374,490 RAC: 0 |
Not just yes for a GPU app for N-Body, but HELL YES!. Faster, slower, same time, is no big deal for me. |
Send message Joined: 10 Jan 21 Posts: 4 Credit: 56 RAC: 0 |
The comparison specifically preformed was between a i9-10900k and RTX 3070, using all available compute cores for both. |
Send message Joined: 3 May 20 Posts: 1 Credit: 2,681,424 RAC: 0 |
If you already have the code (almost) ready to be deployed, why not just go for it? I'd definitely like to see for myself how a 10/12/14-threaded CPU WU compares to a mid- to low-end NVIDIA GPU. Especially curious about util and power consumption. After running a while, everyone can decide for themselves what app to run. Anyway, this will likely boost overall productivity! Thanks for your effort! |
Send message Joined: 30 Aug 07 Posts: 17 Credit: 66,031,055 RAC: 0 |
And do you mean NVidia or ATI GPUs ?? Usually ATI cards are faster compared to NVidia when double precision is required |
Send message Joined: 10 Jan 21 Posts: 4 Credit: 56 RAC: 0 |
The comparison was between a RTX 3090 and i9-10900K, using all compute cores for each platform. You are, additionally, correct AMD typically invests more in FP64 performance for consumer cards than Nvidia; I, myself, am quite curious to see how AMD cards preform. |
Send message Joined: 4 Nov 10 Posts: 1 Credit: 563,717,626 RAC: 0 |
My current AMD GPUs complete a Separation 1.46 WU in about 2.5 minutes. My 16-Core AMD CPU takes approximately 57 minutes to complete a single WU. If you are suggesting the adoption of a GPU implementation that will keep my GPU tied up for nearly an hour to only complete only one WU I would consider it a considerable waste of power and efficiency. Please correct me if I'm missing something here, but I'm failing to see advantage in throughput or efficiency. Both CPU and GPU tasks are similar computational size (approximately 42.6 GFLOPs) and my GPUs are each powering through 24-plus WUs and hour. I run several projects and at the moment MilkyWay@home is primarily a GPU-focused task for me precisely because of the efficiency of the GPU work. I suppose that I will have to wait until you actually release your GPU product to test it live, but if I lose the efficiency I'm presently estimating, I will likely have to push my GPUs into other distributed-science projects. |
Send message Joined: 1 Apr 08 Posts: 30 Credit: 84,632,635 RAC: 349 |
I'm afraid the poll is a bit light : as said above we need more details to be able to compare, - would 1 GPU task do the same *amount of work* than 1 CPU thread task ? (since you already say the process time will be equivalent) - would 1 GPU task also need a % of a CPU thread ? (as it is often the case with GPU tasks, especially with OpenCL, I can only run AMD OpenCL tasks on my iMac) and then do you know what % ? and again, also to do the same amount of of work than the regular 1 thread CPU task ? (if yes it would then be counter productive and not equivalent !) |
Send message Joined: 8 Apr 09 Posts: 70 Credit: 11,027,167,827 RAC: 0 |
I would be very happy to see longer running work units on GPUs. Especially on high performance AMD cards (with a lot of double precision performance) i have to run several WUs in parallel, which causes driver issues. I got some Fixed by AMD, but not all. If i could run a N-Body WU on my GPU and it takes several hours and loads the GPU well, it would be great. The comparison specifically preformed was between a i9-10900k and RTX 3070, using all available compute cores for both. When is is the comparison, then AMD cards with a lot of double precision performance should do well, as well as Nvidia cards. A 3070 has roughly 0.3 TFlop double precision performance. A 10900K should turn out the same, since memory bandwith is limited on the CPU. I would expect a R9 280X to be 3 times as fast as a 3070 then. |
Send message Joined: 24 Aug 17 Posts: 8 Credit: 224,675,577 RAC: 1 |
Voted. Some of us are credit-whores, so if you set the credit accordingly, I'm sure these crunchers will gladly participate and help out with their gpus ;). Don't recommend to use BOINC CreditScrew, I mean CreditNew system. |
Send message Joined: 25 May 14 Posts: 31 Credit: 56,750,059 RAC: 0 |
If the GPU version is doing the same work as the CPU version in roughly the same time, then, no, don't distribute it. GPU versions have a habit of also needing some CPU, so the net result is some CPU plus a whole GPU for the same effect as a CPU. Waste of resources. |
Send message Joined: 8 Apr 09 Posts: 70 Credit: 11,027,167,827 RAC: 0 |
If the GPU version is doing the same work as the CPU version in roughly the same time, then, no, don't distribute it. GPU versions have a habit of also needing some CPU, so the net result is some CPU plus a whole GPU for the same effect as a CPU. Waste of resources. That is the reason why we asked for the comparison. What CPU was used and what GPU. Especially the consumer Nvidia cards don't have a lot of double precision performance. But even tho they have a lot more memory bandwith, and will be more efficient that way. A 10900k is also not a very common CPU. People will likely have worse CPUs but with GPUs which have even more double precision performance then the 3070. So it makes sense to release a GPU app :) |
Send message Joined: 10 Jan 21 Posts: 4 Credit: 56 RAC: 0 |
To clarify, running the GPU was testing doing the same task as the CPU. AMD GPUs from a purely teraflop performance standpoint should preform much faster than the RTX 3070 when simply looking at spec sheet as AMD invests more into this feature, however it remains to be seen weather this advantage will be realized in practice. The main idea here is to recognize that if the GPU and CPU are doing the same amount of work and an average computer has both a GPU and CPU; it is likely to double the amount of computation preformed overall on part of the network. I would like to point out that, although this slightly effects CPU performance, it is a minimal cost as the CPU section of the GPU code is designed to be lightweight and serve only for control purposes; this fact was realized when testing running both at the same time. If anyone wants to get an idea of performance for their specific system, the Github repo does have a working version of the GPU code although it does not support as many features like the LMC at the moment. |
Send message Joined: 17 Jul 21 Posts: 1 Credit: 418,570 RAC: 0 |
Is this GPU code targeted at a specific GPU or is generic where it would run on my m1 chipped apple? That would be great! Thanks, Kerry |
Send message Joined: 10 Jan 21 Posts: 4 Credit: 56 RAC: 0 |
At the moment, the OpenCL code is targeted to AMD and Nvidia cards only. |
Send message Joined: 2 Oct 14 Posts: 43 Credit: 55,029,894 RAC: 869 |
There is an "anomaly" in the latest version of BOINC client's use of tasks that use the GPU option. It has something to do with the Intel GPU. It can hang a task indefinitely. Trod carefully before checking this out. As Redbeard the pirate might say if he were still alive, "Matey, ye be warned!" |
Send message Joined: 2 Apr 10 Posts: 5 Credit: 164,112,090 RAC: 3,273 |
Well, it'd be a 'maybe' for me. AMD's older Cypress/Cayman/Tahiti/Hawaii cards have 2x-4x the raw FP64 performance of that 3070. Perhaps a test with a 'Tahiti' generation card of the same workunit would give us a bit more of an idea? |
Send message Joined: 29 Aug 12 Posts: 2 Credit: 204,504,206 RAC: 0 |
When addressing the Apple M1 GPU? |
©2024 Astroinformatics Group