Welcome to MilkyWay@home

Posts by ProDigit

1) Message boards : Number crunching : Ampere (Message 70293)
Posted 30 Dec 2020 by ProDigit
Here we can see a crunching NVIDIA GeForce RTX 3080:

Which is pretty pathetic actually assuming 1X tasks per card.
My old GTX 1080Ti does Separation in ~90 seconds or less.
The 1/64 DP FP pipeline on the Ampere cards hurt MW processing even more than the Turing cards.

One thing you'd have to keep into consideration,
The 3080 and 3090 have significantly more shaders, so the boost frequency is lower.
If any RTX is utilized less than 50%, the shader frequency further lowers to ~1350Mhz.
If that's the case, you can expect a 75% increase in performance, just by letting the shaders run at their rated frequency.
More, if your GPU is a third party GPU, as the factory Nvidia GPUs are great for small cases (since they essentially pushing 50-66% of the heat outside of the case, but are running less efficient (hotter) than third party GPUs with triple fan heat sinks.

Add to that 2, 3, or 4 WUs per GPU (which most likely will cause the GPU to run at max frequency), and I think you could potentially see a 200-300% improvement (unless for some reason, the GPU still utilizes less than 150W).
2) Message boards : Number crunching : GPU upgrade time (Message 70292)
Posted 30 Dec 2020 by ProDigit
What about Gromacs?

Or is it a single multipurpose shader, that's counted as 2, and can do INT calculations, as long as FLOAT isn't being utilized (or reverse)?
3) Message boards : Number crunching : GPU upgrade time (Message 70283)
Posted 28 Dec 2020 by ProDigit
Please provide solid proof that ANY of our apps can actually use the second FMA/AVX/FP pipeline in the RTX 3000 series cards.

I don't believe that to be the case UNTIL new apps are written to use the secondary path at the same time as the original RTX 2000 path.

Where did you read they have 2 separate pipelines?
Even if, doesn't CUDA redirect traffic to where there are resources available?

But I agree, until the first person runs the GPUs, we won't know for sure.
At the moment, all we know is that in theory the RTX 3000 have the capacity of being twice as fast as a same GPU model of the 2000 series.
We possibly will not know until February or so, when prices hopefully will drop (and RTX 4000 GPUs will be announced), and more people start having them (and more bugs can be fixed/ironed out).
I think most of the processing of the data will be handled by OpenCL, the Nvidia drivers, underlying platforms, and/or CUDA. It shouldn't require much difference in coding.
4) Message boards : Number crunching : GPU upgrade time (Message 70276)
Posted 26 Dec 2020 by ProDigit
Since RTX 3000 GPUs have double the DPP processing shaders, than RTX 2000 series GPUs,
I would suspect that the RTX 3060 Ti, 3070 and 3080 would best run 3 WUs.
Possibly the 3080 might do 4 WUs.
The 3090 would run best at 4WUs (possibly 5 WUs, as the 2080Ti with 50% less DPP does run 3 WUs faster than 2WUs).

When you compare AMD GPUs to Nvidia GPUs, do consider that 2060 to 2080 Super GPUs, run between 100W and 150W doing Milkyway (or Einstein).
They don't run the full 200 Watts they're rated for. Some 2060s only hit 50-80W avg (when capped to 129W; when I cap them to their factory rated 175W, they'll run <100W avg, with peaks up to 120W).
5) Message boards : Number crunching : Run Multiple WU's on Your GPU (Message 70275)
Posted 26 Dec 2020 by ProDigit
Perhaps related,
For Nvidia RTX 2000 series GPUs, anything above 2 is useless.
The RTX 2080 Ti does 3 WUs at the same time as it does 2WUs, a tiny bit faster, but it'll also consume about 10W more power ( 5%) for finishing 2x3 WUs in less than 5% difference in time of 3x2WUs.
AMD GPUs are a bit better for Einstein and Milkyway, because they have more DPP.

I think because Milkyway is using DPP, more than what the GPUs actually have to offer, and thus is bottlenecking the GPU by a lot! (80-140W usage out of 150-195W limits).
If that's the case, you could set GPU to 0.33 and 0.25, on RTX3080 and 3090.

My main issue is, that I want to use only 1 Milkyway on my GPU, and combine it with another project that doesn't use a lot of 64bit (DPP) commands.
I have 2 GPUs in my system, and want to run 1 Milkyway WU per GPU.
With my current setup, 2 MW WUs are running on one GPU, and none on the second.

6) Message boards : Number crunching : GPU upgrade time (Message 70233)
Posted 9 Dec 2020 by ProDigit
I would go with Nvidia, simply for 2 reasons.
1- Nvidia GPUs perform better on OpenCL computations than AMD GPUs; while AMD GPUs are more gaming oriented.
2- Nvidia GPUs can be throttled down by a large margin on power consumption. Meaning the 300W GPUs could be ran at 200-225W with about 98% of stock performance.
3- The RTX3000 GPUs have a much higher DPP processing than the RTX2000 GPUs. Occasionally it was mentioned that Vega 64 would outperform an RTX2080Ti in some DPP heavy workloads.
Not going to be an issue anymore.
4- Nvidia has Cuda support now, which when supported is going to be guaranteed speed increase. FAH supports it, and has about a ~20% speed boost 'for free', by just enabling it.

If your aim is to crunch data occasionally (like 2 days in the week or less), AMD could be the better buy, because they're cheaper.
If you are crunching 24/7, months on end, Nvidia will be cheaper, as their running cost is lower.
7) Message boards : Number crunching : Increase performance (watts)? (Message 70232)
Posted 9 Dec 2020 by ProDigit
Instead of looking at power consumption, what does card utilization show for 2X or 3X? That would be the limiting factor. Also are you overclocking the card's P2 power state under compute load to get back to what the P0 power state for card would be if not penalized by the drivers?

keith@Serenity:~$ nvidia-smi
Mon May 11 18:17:24 2020       
| NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2     |
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  GeForce RTX 2080    On   | 00000000:08:00.0 Off |                  N/A |
|100%   55C    P2   188W / 225W |    315MiB /  7982MiB |     90%      Default |
|   1  GeForce RTX 2080    On   | 00000000:0A:00.0  On |                  N/A |
|100%   41C    P2   158W / 225W |   1107MiB /  7979MiB |     97%      Default |
|   2  GeForce RTX 2080    On   | 00000000:0B:00.0 Off |                  N/A |
|100%   38C    P2    98W / 225W |    446MiB /  7982MiB |    100%      Default |
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|    0      1502      G   /usr/lib/xorg/Xorg                             6MiB |
|    0      9408      C   ./keepP2                                     111MiB |
|    0     20858      C   acemd3                                       185MiB |
|    1      1502      G   /usr/lib/xorg/Xorg                           120MiB |
|    1      2229      G   /usr/bin/gnome-shell                         102MiB |
|    1      9409      C   ./keepP2                                     111MiB |
|    1     23651      C   ...86_64-pc-linux-gnu__FGRPopenclTV-nvidia   769MiB |
|    2      1502      G   /usr/lib/xorg/Xorg                             6MiB |
|    2      9410      C   ./keepP2                                     111MiB |
|    2     23709      C   ..._x86_64-pc-linux-gnu__opencl_nvidia_101   157MiB |
|    2     23764      C   ..._x86_64-pc-linux-gnu__opencl_nvidia_101   157MiB |

The gpu#2 is running 2X MW separation tasks. 100% utilization at 98W.

No, the GPU utilization is at 92-100%. The wattage clearly shows it's only using a fraction of what it needs to do.
Is milkyway using a lot of DPP calculations (64bit)? This might be a bottleneck on RTX2000 series GPUs.
8) Message boards : Number crunching : Increase performance (watts)? (Message 69804)
Posted 11 May 2020 by ProDigit
I've noticed that running 1 WU on an RTX 2060 or above, the WU runs my GPU at ~80W.
When I run 2 WUs, they run a little higher, and running 3 WUs, My GPU runs at around 105W.
Running more than 3 WUs doesn't speed up crunching and slows down desktop usability drastically.
Same results with 2070/2080/2080Ti.

I was wondering if there was a way to make better use of my GPUs, seeing that even with 3WUs, they're only using 105W out of the available 150/225W.
Is there a hard cap on power consumption built into these WUs?
Why isn't simply doubling WUs, doubling the power output, like is with other projects?
I mean, 80W for 1 WU, should return 160W for 2, and 240W for 3 WUs per GPU.
But it doesn't.

Would be great to get some help of knowledgeable minds here!
9) Message boards : Number crunching : Milkyway settings for RTX 2080 Ti (Message 69767)
Posted 5 May 2020 by ProDigit
app_config.xml settings for multiple RTX 2080 Tis, running a Core i processor 9th gen 3,9-4Ghz in Linux:
Setting will make image look sluggish, reducing GPU usage to 0.5 will make desktop experience more fluid.

©2023 Astroinformatics Group