Welcome to MilkyWay@home

Posts by reiner

1) Message boards : News : New Poll Regarding GPU Application of N-Body (Message 75743)
Posted 20 Jun 2023 by reiner
Post:
We can only hope someone will be kind enough to improve the OpenCL app into something much, much faster than it is right now.

If somebody is able to provide me with the code and tell me where to put it I would be more than happy to do so :-)

Great ! Thanx! If something emerges that can simply be run on windows; I´d be happy to toss it onto some CPUs/GPUs for comparisons.
2) Message boards : News : New Poll Regarding GPU Application of N-Body (Message 75742)
Posted 20 Jun 2023 by reiner
Post:
We can only hope someone will be kind enough to improve the OpenCL app into something much, much faster than it is right now.
And someone would probably have to commit to amending the OpenCL code and/or the OpenCL-related code in the CPU-based part of the application whenever there was a relevant change to the science code in the "CPU-only" application code :-)



I have no knowledge how difficult it would be to port CPU to GPU - my idea was to stay in the OPENCL Realm - a CPU can perfectly run OPENCL stuff, if OPENCL Drivers are installed... But I admit that its not as straight forward as using a GPU - since every GPU driver already has OPENCL bundled - its a no brainer... With CPU, one would have to seperately install OPENCL for CPU.. Not sure if a portable OPENCL/CPU driver could be an option...

Regarding SSE/AVX etc.... I have some workloads, that really benefit from AVX512 in one of my CPUs - but I am no expert whether nBody would benefit from it. When looking at the code from my amateur mind, I see SSE2 as the bare minimum..... this issue probably also is a strategic decision: The lower the minimum requirements are set, the more people can contribute - the more modern CPU / GPU Features are supported, the better mote efficient modern gear becomes.. I have seen this on inferences running image scalers - once the tensor cores are suppoted, even number crunching beasts like Radeon VII Pro stand no chance.. (But FP64 in Tensor cores is not available in consumer cards, so this example probably is not a valid one for this specific n-body case....).
3) Message boards : News : Separation Project Coming To An End (Message 75711)
Posted 19 Jun 2023 by reiner
Post:
But can you run tensorflow on amd gpu?

With an Nvidia GPU, machine learning stuff still is A LOT easier and tensor core support is still much better than AMD or Intel matrix vector counterparts...

I am not saying this as a Nvidia fanboy, but as someone who thinks this has to change, the dominance of Nvidia is not healthy in the long run. But I know enough people who do programming that simply have the attitude: "well, with Nvidia, it just works, we can get stuff done"....

Having said that, I personally am always looking out for ways to make use of AMD or Intel cards: My Intel ARC happily is doing Stable diffusion, My GCN AMD Cards crunch inferences in DirectML/ONNX, Fractal Movies with OPENCL Acceleration run on everything I have....

So there are ways to use AMD for machine leraning, inferences, etc nowadays - and its getting better and more and more is possible...

But Nvidia still is king in this. Sadly... Especially if you are more of an enduser than a programmer - if you simply want to "play around" or "download ready compiled stuff and simply use it"...
4) Message boards : News : New Poll Regarding GPU Application of N-Body (Message 75692)
Posted 19 Jun 2023 by reiner
Post:
Read the message I typed before yours.

I did :)
5) Message boards : News : New Poll Regarding GPU Application of N-Body (Message 75689)
Posted 18 Jun 2023 by reiner
Post:
I have no clue whether I am on the wrong path here, but I just remembered that compubench does nbody as part of the benchmarks in opencl or cuda - and in opencl, CPUs and GPUs are tested... Looking at the table of results, even older or smaller GPUs outperform CPUs in this task.. So for me it seems that doing nBODY on GPUs is worth it..

Or am I missing somehting here?

https://compubench.com/result.jsp?benchmark=compu20d&test=725&text-filter=&order=median&ff-desktop=true&os-Windows_cl=true&os-Windows_cu=true&pu-dGPU=true&pu-iGPU=true&pu-ACC=true&arch-x86=true&base=device
6) Message boards : News : New Poll Regarding GPU Application of N-Body (Message 75687)
Posted 18 Jun 2023 by reiner
Post:
I´d imagine that many current GPU users have got a lot of FP64 capable GPUs for MW@H, since the seperation tasks needed FP64.
If I understand correctly, the n-body calculations also need FP64.
Leaving out these GPUs would leave a lot of compute-power unused.

Regarding effificency - I am new to BOINC/MQ@H, but in the time I contributed so far, I compared my older GCN1 Cards with older Threadripper CPUs for the seperation tasks... And the GPU was A LOT more efficient - even an older R9 280x with 1:4 Ratio was able to be around 30 times faster than my Threaripper while only consuming a fration of the watts when tuned down a little. Efficiency of GPUs goes up quite a bit if the power target is lowered a little... With around 40watts I was able to complete on task in around one minute on the GPU...

So here is a thought from someone not being a programmer:

How about converting the n_body to OPENCL? This way, CPUs and all GPUs could contribute - having one universal codebase, being easy to maintain. (Don´t know if this is applicable to the mw@h variant of nbody, but I have seen a lot of OPENCL Benchmarks using Nbody on both GPU and CPU, so it seems possible ...)

This also would make the use of internal GPUs of CPUs usable - some Iris XE Intel iGPUs actually do pack a lot of punch in FP64 with very high efficiency (around 20 watts for 600 TLOPS in FP64), and even some UHD Variants are very efficient. While in raw numbers inferiiour to some dedicated GPUs, per Watt, they can easily hold up with 1080s, smaller RTX, GCN 5, etc........ Imagine every now unused small Notebook contributing as much FP64 power as midrange Gaming PCs (per Watt...)

I know that CUDA often is favoured because it often is easier to work with and sometimes offers better speeds - but as far as I understand it, OPENCL can also be very fast and powerfull, it only needs more fiddling when coding..

Going CUDA would leave out all AMD / Intel /mobile chips and CPUs as well. With OPENCL, even the FP64 pwowerfull Oldies of GCN 1 and 2 could still be used. Or the Intel XEON PHIs could contribute (they actually are not so bad for FP64...).

I just recently did see some fluid dynamics simulations from this guy, he also provides an OPEN CL wrapper, making some stuff in OPEN CL a lot easier - its a long shot and I have no idea if this makes sense (not being a programmer), but maybe he has some ideas on how to accelerate the NBODY Implementation of MW@H in OPENCL ?

https://github.com/ProjectPhysX/FluidX3D
7) Message boards : News : Separation Project Coming To An End (Message 75601)
Posted 16 Jun 2023 by reiner
Post:


It contains around 150 entrys at the moment and I am also including different power states (some older GCN Cards can be set to 50% power target, only zipping 50 watts instead of almost 200 while only loosing 40% compute power...). Stuff like DX Feature Level will also be included, Vulkan versions, etc... Its in the making and will be more complete over time...
In the past, I have found myself manualy browsing techpowerup etc.. to look stuff up like this - so I finally decided to put everything I ever needed int a big spreadsheet :)

The work is not lost, I use it for other stuff, too.
8) Message boards : News : Separation Project Coming To An End (Message 75600)
Posted 16 Jun 2023 by reiner
Post:
This page might have saved you alot of spreadsheet time https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#Compute_capability_table
scroll down to see the FP64/FP32 ratio's

Thanx, I now the chart as and have taken it as a starting point as well as other sources (techpowerup, Reddit, etc..) to make my table. I added stuff like efficiency, numer of GPus on the card, prices, etc... This way, I can quickly decide how much power I can get per dollar or watt out of a certain card for a specific task in different applications. The table includes stuff like CUDA Version, Compute Shader type, etc, which come in handy for inference tasks, general computation, etc...
9) Message boards : News : Separation Project Coming To An End (Message 75575)
Posted 16 Jun 2023 by reiner
Post:
asteroids makes use of FP64. but it's not 100% dependent like MW Separation is. so you get *some* benefit with good FP64 performance, but it can be overshadowed with even stronger FP32 numbers and raw clock speed, like the 40-series cards. plus their GPU CUDA app isn't very well optimized, and while it's faster (per task) on GPU you get better overall production per watt on CPU because the CPU can run so many in parallel.

a titan V is a decent performer on Asteroids, but older cards with only OK FP64 and outdated FP32 (like kepler cards) don't perform very well.

old AMD cards wont work on asteroids since they only have CUDA apps.


Thanx for this helpful answer.

It´s too bad - I just spend the last couple days compiling a spreadsheet with FP64 numbers, including efficiency and FP64 per watt, looked for older GCN2 cards with 2:1 FP32(FP64 ratio, Kepler cards, etc and was actually ready to spend some money on these oldies to put them to good use (I like the thought of giving this itherwise more or less uselesse gear some new purpose) - sustainability and keeping them from being dumped in landfill or recycling..

I guess I saved some money now :)
10) Message boards : Number crunching : Top GPU models (Message 75574)
Posted 16 Jun 2023 by reiner
Post:
yes, for others this is an option - my intent was more like: "give old gear a new purpose" - sustainability thinking..
Of course, new GPUs still can share the FP32 compute power.
11) Message boards : News : Separation Project Coming To An End (Message 75558)
Posted 15 Jun 2023 by reiner
Post:
Nice to see that all the donated computation time did help to come to an end of the research project....

Like others here, I was just about to get a lot of FP64 capable GPUs (I like the thought of putting these otherwise uselss GPUS to a meaningfull purpose)..

Is there really no other GPU FP64 project out there?
12) Message boards : Number crunching : Top GPU models (Message 75557)
Posted 15 Jun 2023 by reiner
Post:
Thanx for mentioning the shutdown of the seperation tasks.....

I was just about to get more FP64 capable GPUs - seems I don´t have to any more :)

I have been reading the whole announcement thread, trying to find out if there are any other projects that would benefit from FP64 GPUs, but I can´t get a clear picture - seems FP64 is obsolete nowadays?

I loved the idea of putting old gear to good use and it was somewhat "sexy" to have a otherwise pretty useless old GPU contribute to something meaningfull, because it is good in FP64. For FP32/16 tasks, the situation is simpler, but way less fun: Simply get the biggest modern GPU...
13) Message boards : Number crunching : Top GPU models (Message 75484)
Posted 13 Jun 2023 by reiner
Post:
noob question: according to this list, why, for example, AMD Radeon Pro 580X (1.0) is faster than a AMD Radeon VII (0.506)? According to techpowerup, Pro 580X has 5.5 TFLOPS in FP32 and 345.6 GFLOPS in FP64 (https://www.techpowerup.com/gpu-specs/radeon-pro-580x.c3398), whereas Radeon VII has 13.44 TFLOPS in FP32 and 3.360 TFLOPS in FP64 (https://www.techpowerup.com/gpu-specs/radeon-vii.c3358). Am I missing something here?

I am puzzled, too. I made myself a list of GPUs and theyr FP64 and FP32 computer power. Looking at the list, I find a lot of "weak FP64" cards at the top.

For example:

1. Place: Radeon PRO 580X has 5,5 TFLOPS FP32 and 345 TFLOPS FP64 .
15. Place: Radeon VII (13.4 TFLOPS FP32 / 3.3 TFLOPS FP64)

It almost seems that the list is summing up the total amount of GPUs of a specific type - if a lot more people use GCN4 Cards than Radeon VII, the total compute power in the grid can be higher.

If the list is meant to represent "fastest cards for MW@H, I am missing something obvious here..




©2024 Astroinformatics Group