Welcome to MilkyWay@home

New Poll Regarding GPU Application of N-Body

Message boards : News : New Poll Regarding GPU Application of N-Body
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 578
Credit: 18,845,239
RAC: 856
Message 74958 - Posted: 29 Jan 2023, 14:17:12 UTC - in response to Message 74955.  
Last modified: 29 Jan 2023, 14:19:56 UTC

I think you are confused. There is no N-body gpu app so no N-body gpu code.

I'm afraid you are confused this time, there is an n-Body GPU application, but it was never released as it needs same time on the GPU as the CPU application on CPU while separation is 50-60 times faster on GPU than on CPU. See the first message of this thread. Releasing this application as is would slow down the overall throughput of the project.
ID: 74958 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 12 Jun 10
Posts: 57
Credit: 6,163,587
RAC: 156
Message 74961 - Posted: 29 Jan 2023, 20:23:28 UTC - in response to Message 74958.  

Releasing this application as is would slow down the overall throughput of the project.

Exactly the reason why N body application does not have a GPU version
ID: 74961 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,947,628
RAC: 22,118
Message 74962 - Posted: 30 Jan 2023, 1:00:51 UTC - in response to Message 74961.  

Releasing this application as is would slow down the overall throughput of the project.


Exactly the reason why N body application does not have a GPU version


On top of not making a bit of sense to even have it around, what's the point if it's not as efficient as the current cpu app, they already have a gpu app here so don't NEED to release a slow inefficient one.
ID: 74962 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 578
Credit: 18,845,239
RAC: 856
Message 74971 - Posted: 31 Jan 2023, 10:59:36 UTC - in response to Message 74962.  

On top of not making a bit of sense to even have it around, what's the point if it's not as efficient as the current cpu app, they already have a gpu app here so don't NEED to release a slow inefficient one.

Of course to speed up n-Body processing they could offer the possibility to not "Run CPU versions of applications for which GPU versions are available" and credits similar to Separation would surely help too, right now when crunching n-Body one might get the impression, it isn't very valuable type of work considering the "pay" we get for it. ;-)
ID: 74971 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,947,628
RAC: 22,118
Message 74973 - Posted: 31 Jan 2023, 12:35:25 UTC - in response to Message 74971.  

On top of not making a bit of sense to even have it around, what's the point if it's not as efficient as the current cpu app, they already have a gpu app here so don't NEED to release a slow inefficient one.


Of course to speed up n-Body processing they could offer the possibility to not "Run CPU versions of applications for which GPU versions are available" and credits similar to Separation would surely help too, right now when crunching n-Body one might get the impression, it isn't very valuable type of work considering the "pay" we get for it. ;-)


I agree the credits aren't up to par and several Admins have said they will look at it but they always come back 'we're happy where they are' which isn't helpful at all for those doing the actual crunching.
ID: 74973 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
reiner

Send message
Joined: 25 May 23
Posts: 13
Credit: 58,073
RAC: 0
Message 75687 - Posted: 18 Jun 2023, 21:09:57 UTC

I´d imagine that many current GPU users have got a lot of FP64 capable GPUs for MW@H, since the seperation tasks needed FP64.
If I understand correctly, the n-body calculations also need FP64.
Leaving out these GPUs would leave a lot of compute-power unused.

Regarding effificency - I am new to BOINC/MQ@H, but in the time I contributed so far, I compared my older GCN1 Cards with older Threadripper CPUs for the seperation tasks... And the GPU was A LOT more efficient - even an older R9 280x with 1:4 Ratio was able to be around 30 times faster than my Threaripper while only consuming a fration of the watts when tuned down a little. Efficiency of GPUs goes up quite a bit if the power target is lowered a little... With around 40watts I was able to complete on task in around one minute on the GPU...

So here is a thought from someone not being a programmer:

How about converting the n_body to OPENCL? This way, CPUs and all GPUs could contribute - having one universal codebase, being easy to maintain. (Don´t know if this is applicable to the mw@h variant of nbody, but I have seen a lot of OPENCL Benchmarks using Nbody on both GPU and CPU, so it seems possible ...)

This also would make the use of internal GPUs of CPUs usable - some Iris XE Intel iGPUs actually do pack a lot of punch in FP64 with very high efficiency (around 20 watts for 600 TLOPS in FP64), and even some UHD Variants are very efficient. While in raw numbers inferiiour to some dedicated GPUs, per Watt, they can easily hold up with 1080s, smaller RTX, GCN 5, etc........ Imagine every now unused small Notebook contributing as much FP64 power as midrange Gaming PCs (per Watt...)

I know that CUDA often is favoured because it often is easier to work with and sometimes offers better speeds - but as far as I understand it, OPENCL can also be very fast and powerfull, it only needs more fiddling when coding..

Going CUDA would leave out all AMD / Intel /mobile chips and CPUs as well. With OPENCL, even the FP64 pwowerfull Oldies of GCN 1 and 2 could still be used. Or the Intel XEON PHIs could contribute (they actually are not so bad for FP64...).

I just recently did see some fluid dynamics simulations from this guy, he also provides an OPEN CL wrapper, making some stuff in OPEN CL a lot easier - its a long shot and I have no idea if this makes sense (not being a programmer), but maybe he has some ideas on how to accelerate the NBODY Implementation of MW@H in OPENCL ?

https://github.com/ProjectPhysX/FluidX3D
ID: 75687 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 16
Credit: 173,221
RAC: 0
Message 75688 - Posted: 18 Jun 2023, 21:49:55 UTC - in response to Message 75687.  

They did build an OpenCL app for Nvidia and AMD. However, as you can read in the OP, it wasn't faster than the CPU app so it did not make sense to deploy it.

They are open to it but only if it's significantly faster than CPUs. From this message:


For anyone who wants to play around with the GPU code, the current GPU code for Nbody can be found at the sheilsGPU branch (https://github.com/Milkyway-at-home/milkywayathome_client/tree/sheilsGPU). It is out of date compared to the CPU version of Nbody though, so you will need to compare against master to see what changes have been made since then.

If you are able to produce a meaningful speedup for widely-used GPU architecture (and share the source code), we would be happy to consider implementing any upgrades you come up with! It is hard to make speedups for generic GPU architecture (which is what we would prefer to do as a BOINC project), but if most people are using a specific type of GPU that can get 50-100x speed up compared to the CPU code, we would definitely be interested.
ID: 75688 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
reiner

Send message
Joined: 25 May 23
Posts: 13
Credit: 58,073
RAC: 0
Message 75689 - Posted: 18 Jun 2023, 22:00:43 UTC

I have no clue whether I am on the wrong path here, but I just remembered that compubench does nbody as part of the benchmarks in opencl or cuda - and in opencl, CPUs and GPUs are tested... Looking at the table of results, even older or smaller GPUs outperform CPUs in this task.. So for me it seems that doing nBODY on GPUs is worth it..

Or am I missing somehting here?

https://compubench.com/result.jsp?benchmark=compu20d&test=725&text-filter=&order=median&ff-desktop=true&os-Windows_cl=true&os-Windows_cu=true&pu-dGPU=true&pu-iGPU=true&pu-ACC=true&arch-x86=true&base=device
ID: 75689 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 16
Credit: 173,221
RAC: 0
Message 75690 - Posted: 18 Jun 2023, 22:50:45 UTC - in response to Message 75689.  

Read the message I typed before yours.
ID: 75690 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
reiner

Send message
Joined: 25 May 23
Posts: 13
Credit: 58,073
RAC: 0
Message 75692 - Posted: 19 Jun 2023, 8:56:36 UTC - in response to Message 75690.  

Read the message I typed before yours.

I did :)
ID: 75692 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 16
Credit: 173,221
RAC: 0
Message 75701 - Posted: 19 Jun 2023, 12:19:11 UTC - in response to Message 75692.  

We can only hope someone will be kind enough to improve the OpenCL app into something much, much faster than it is right now.
ID: 75701 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alanb1951

Send message
Joined: 16 Mar 10
Posts: 208
Credit: 105,450,396
RAC: 36,426
Message 75728 - Posted: 19 Jun 2023, 23:51:02 UTC - in response to Message 75701.  

We can only hope someone will be kind enough to improve the OpenCL app into something much, much faster than it is right now.
And someone would probably have to commit to amending the OpenCL code and/or the OpenCL-related code in the CPU-based part of the application whenever there was a relevant change to the science code in the "CPU-only" application code :-)

I get the impression that this application isn't one of those simple ones where the GPU-based calculations are more or less unchangeable and can be completely controlled via parameter settings. If all it is doing is (for instance) FFTs or simple optimization of a matrix, the only issue would be "Can it be made efficient enough to make it worth doing?" However, if it would either entail lots of shuffling data around on the GPU or frequent movement of data to and from the GPU between GPU-worthy sections of computation that might be a completely different matter!

And, of course, if part of making it efficient enough entails "messing" with support libraries or adding hacks to facilitate using the GPU for one task whilst another one is doing CPU-intensive stuff, that would have to be done very carefully, especially if end users might be using their GPUs for more than one BOINC application -- I recall an issue over at WCG a while back which was to do with something in that area :-(

Without an expert eye on the code, we can't know what performance issues (global memory usage, bandwidth between motherboard and GPU, et cetera) there might be. Whilst I share the hope that it might be possible to do something GPU-wise, I would be unsurprised if an [unbiased] expert outsider decides it isn't worth it...

Cheers - Al.
ID: 75728 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
.clair.

Send message
Joined: 3 Mar 13
Posts: 84
Credit: 779,527,512
RAC: 26,552
Message 75734 - Posted: 20 Jun 2023, 0:59:57 UTC

One thing I would like to know is , does the N-body mt cpu app use SSE or AVX optimization , if it is of any use for what they do , I have not seen any mention of it .
it shure does speed up cpu crunching at other projects that developed app code for it .
ID: 75734 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 12 Jun 10
Posts: 57
Credit: 6,163,587
RAC: 156
Message 75740 - Posted: 20 Jun 2023, 3:31:59 UTC - in response to Message 75701.  

We can only hope someone will be kind enough to improve the OpenCL app into something much, much faster than it is right now.

If somebody is able to provide me with the code and tell me where to put it I would be more than happy to do so :-)
ID: 75740 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 540,044,586
RAC: 86,680
Message 75741 - Posted: 20 Jun 2023, 6:33:21 UTC - in response to Message 75740.  

Tom, provided the source code location for you already.

https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=5007&postid=75603

Have at it.
ID: 75741 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
reiner

Send message
Joined: 25 May 23
Posts: 13
Credit: 58,073
RAC: 0
Message 75742 - Posted: 20 Jun 2023, 9:32:44 UTC - in response to Message 75728.  

We can only hope someone will be kind enough to improve the OpenCL app into something much, much faster than it is right now.
And someone would probably have to commit to amending the OpenCL code and/or the OpenCL-related code in the CPU-based part of the application whenever there was a relevant change to the science code in the "CPU-only" application code :-)



I have no knowledge how difficult it would be to port CPU to GPU - my idea was to stay in the OPENCL Realm - a CPU can perfectly run OPENCL stuff, if OPENCL Drivers are installed... But I admit that its not as straight forward as using a GPU - since every GPU driver already has OPENCL bundled - its a no brainer... With CPU, one would have to seperately install OPENCL for CPU.. Not sure if a portable OPENCL/CPU driver could be an option...

Regarding SSE/AVX etc.... I have some workloads, that really benefit from AVX512 in one of my CPUs - but I am no expert whether nBody would benefit from it. When looking at the code from my amateur mind, I see SSE2 as the bare minimum..... this issue probably also is a strategic decision: The lower the minimum requirements are set, the more people can contribute - the more modern CPU / GPU Features are supported, the better mote efficient modern gear becomes.. I have seen this on inferences running image scalers - once the tensor cores are suppoted, even number crunching beasts like Radeon VII Pro stand no chance.. (But FP64 in Tensor cores is not available in consumer cards, so this example probably is not a valid one for this specific n-body case....).
ID: 75742 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
reiner

Send message
Joined: 25 May 23
Posts: 13
Credit: 58,073
RAC: 0
Message 75743 - Posted: 20 Jun 2023, 9:39:01 UTC - in response to Message 75740.  

We can only hope someone will be kind enough to improve the OpenCL app into something much, much faster than it is right now.

If somebody is able to provide me with the code and tell me where to put it I would be more than happy to do so :-)

Great ! Thanx! If something emerges that can simply be run on windows; I´d be happy to toss it onto some CPUs/GPUs for comparisons.
ID: 75743 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3

Message boards : News : New Poll Regarding GPU Application of N-Body

©2024 Astroinformatics Group