Welcome to MilkyWay@home

GPU upgrade time


Advanced search

Message boards : Number crunching : GPU upgrade time
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
ProDigit

Send message
Joined: 13 Nov 19
Posts: 9
Credit: 32,117,570
RAC: 0
30 million credit badge3 year member badge
Message 70276 - Posted: 26 Dec 2020, 0:38:53 UTC
Last modified: 26 Dec 2020, 0:40:26 UTC

Since RTX 3000 GPUs have double the DPP processing shaders, than RTX 2000 series GPUs,
I would suspect that the RTX 3060 Ti, 3070 and 3080 would best run 3 WUs.
Possibly the 3080 might do 4 WUs.
The 3090 would run best at 4WUs (possibly 5 WUs, as the 2080Ti with 50% less DPP does run 3 WUs faster than 2WUs).

When you compare AMD GPUs to Nvidia GPUs, do consider that 2060 to 2080 Super GPUs, run between 100W and 150W doing Milkyway (or Einstein).
They don't run the full 200 Watts they're rated for. Some 2060s only hit 50-80W avg (when capped to 129W; when I cap them to their factory rated 175W, they'll run <100W avg, with peaks up to 120W).
ID: 70276 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKeith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 670
Credit: 516,193,690
RAC: 174,028
500 million credit badge12 year member badgeextraordinary contributions badge
Message 70277 - Posted: 26 Dec 2020, 8:21:34 UTC - in response to Message 70276.  

Please provide solid proof that ANY of our apps can actually use the second FMA/AVX/FP pipeline in the RTX 3000 series cards.

I don't believe that to be the case UNTIL new apps are written to use the secondary path at the same time as the original RTX 2000 path.
ID: 70277 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SBretz

Send message
Joined: 3 May 11
Posts: 18
Credit: 133,104,941
RAC: 0
100 million credit badge11 year member badge
Message 70282 - Posted: 27 Dec 2020, 20:12:13 UTC

Update. The power supply that came as part of my newegg bundle arrived. I got that installed in my wife's PC along with one of my evga 970's. She has a dual monitor setup and didn't want to loose it so I had to leave the older radeon 6670 in there as the 970 had two DVI ports and two hdmi's. One of the monitors (the junky one) uses the older VGA with not other options for connection.

I do not see her PC running any GPU tasks. Is this a coincidence that there are no GPU tasks or does BOINC not like having two different brand GPUs?

Next, the home theater PC will be getting my old i7-6700k and the other evga 970. I am super stoked to get rid of the old phenom 2 555 (two core, not hyper treading). The new CPU and ddr4 should really make that PC responsive.
ID: 70282 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProDigit

Send message
Joined: 13 Nov 19
Posts: 9
Credit: 32,117,570
RAC: 0
30 million credit badge3 year member badge
Message 70283 - Posted: 28 Dec 2020, 2:10:39 UTC - in response to Message 70277.  
Last modified: 28 Dec 2020, 2:17:23 UTC

Please provide solid proof that ANY of our apps can actually use the second FMA/AVX/FP pipeline in the RTX 3000 series cards.

I don't believe that to be the case UNTIL new apps are written to use the secondary path at the same time as the original RTX 2000 path.

Where did you read they have 2 separate pipelines?
Even if, doesn't CUDA redirect traffic to where there are resources available?

But I agree, until the first person runs the GPUs, we won't know for sure.
At the moment, all we know is that in theory the RTX 3000 have the capacity of being twice as fast as a same GPU model of the 2000 series.
We possibly will not know until February or so, when prices hopefully will drop (and RTX 4000 GPUs will be announced), and more people start having them (and more bugs can be fixed/ironed out).
I think most of the processing of the data will be handled by OpenCL, the Nvidia drivers, underlying platforms, and/or CUDA. It shouldn't require much difference in coding.
ID: 70283 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 3075
Credit: 517,393,658
RAC: 34,467
500 million credit badge13 year member badgeextraordinary contributions badge
Message 70284 - Posted: 28 Dec 2020, 10:57:24 UTC - in response to Message 70282.  

Update. The power supply that came as part of my newegg bundle arrived. I got that installed in my wife's PC along with one of my evga 970's. She has a dual monitor setup and didn't want to loose it so I had to leave the older radeon 6670 in there as the 970 had two DVI ports and two hdmi's. One of the monitors (the junky one) uses the older VGA with not other options for connection.

I do not see her PC running any GPU tasks. Is this a coincidence that there are no GPU tasks or does BOINC not like having two different brand GPUs?


You need an cc_config.xml file that tell Boinc to use all the gpu's it finds in the machine before it will use both gpu's all the time.

This one will work great for that:

<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
<report_results_immediately>1</report_results_immediately>
</options>
</cc_config>

Copy that using Notepad in Windows to the Boinc folder located at c:\program data\boinc and save it there. Then in the Boinc Manager go to Options, read config files and Boinc should start using both gpu's right away assuming MilkyWay still supports both of your gpu's.
ID: 70284 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKeith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 670
Credit: 516,193,690
RAC: 174,028
500 million credit badge12 year member badgeextraordinary contributions badge
Message 70287 - Posted: 28 Dec 2020, 18:22:11 UTC - in response to Message 70283.  

Please provide solid proof that ANY of our apps can actually use the second FMA/AVX/FP pipeline in the RTX 3000 series cards.

I don't believe that to be the case UNTIL new apps are written to use the secondary path at the same time as the original RTX 2000 path.

Where did you read they have 2 separate pipelines?
Even if, doesn't CUDA redirect traffic to where there are resources available?

But I agree, until the first person runs the GPUs, we won't know for sure.
At the moment, all we know is that in theory the RTX 3000 have the capacity of being twice as fast as a same GPU model of the 2000 series.
We possibly will not know until February or so, when prices hopefully will drop (and RTX 4000 GPUs will be announced), and more people start having them (and more bugs can be fixed/ironed out).
I think most of the processing of the data will be handled by OpenCL, the Nvidia drivers, underlying platforms, and/or CUDA. It shouldn't require much difference in coding.

Until someone shows me otherwise, all our current apps don't have a way of splitting the data into the second dual purpose FP/INT pipeline and all goes through the normal FP all the time pipeline.
Here is a RTX 3080 doing separation in ~101-115 seconds. My GTX 1080Ti does Separation in under 90 seconds.
https://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=856552
So the Ampere cards are slower than Pascal. Blame the 1:64 FP ratio.
ID: 70287 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKeith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 670
Credit: 516,193,690
RAC: 174,028
500 million credit badge12 year member badgeextraordinary contributions badge
Message 70288 - Posted: 28 Dec 2020, 22:46:05 UTC - in response to Message 70287.  

Where did you read they have 2 separate pipelines?
Even if, doesn't CUDA redirect traffic to where there are resources available?

Maybe give a read and look at the floating point diagrams for Turing and Ampere on this website.
https://www.hardwaretimes.com/nvidia-rtx-3080-3090-ampere-architectural-deep-dive-2x-fp32-2nd-gen-rt-cores-3rd-gen-tensor-cores-and-rtx-io/

Of note, this is part of the article.
Each of the four partitions in an SM has two datapaths or pipelines; One with a cluster of 32 CUDA cores purely dedicated to FP32 operations while another that can do both, FP32 or INT32. This means that the 2x FP32 or 128 FMA per clock of performance that NVIDIA is touting will only be true when the workloads are purely composed of FP32 instructions which is rarely the case. This is why we don’t see an increase of 2x in performance despite the fact that the core count increases by the same figure.

The only way to get the second INT/FP pipeline in use is to keep any INT operation out of it. This would have to be implemented in the app coding for the warp scheduler. I don't think ANY of our current apps have made that adjustment yet. The architecture is just too new.

The only project app that MIGHT be coded properly yet I think is the Primegrid CUDA application which is showing the possible theoretical improvement in FP performance that using both FP pipelines at the same time should enable.
ID: 70288 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SBretz

Send message
Joined: 3 May 11
Posts: 18
Credit: 133,104,941
RAC: 0
100 million credit badge11 year member badge
Message 70289 - Posted: 29 Dec 2020, 0:40:38 UTC - in response to Message 70284.  

Update. The power supply that came as part of my newegg bundle arrived. I got that installed in my wife's PC along with one of my evga 970's. She has a dual monitor setup and didn't want to loose it so I had to leave the older radeon 6670 in there as the 970 had two DVI ports and two hdmi's. One of the monitors (the junky one) uses the older VGA with not other options for connection.

I do not see her PC running any GPU tasks. Is this a coincidence that there are no GPU tasks or does BOINC not like having two different brand GPUs?


You need an cc_config.xml file that tell Boinc to use all the gpu's it finds in the machine before it will use both gpu's all the time.

This one will work great for that:

<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
<report_results_immediately>1</report_results_immediately>
</options>
</cc_config>

Copy that using Notepad in Windows to the Boinc folder located at c:\program data\boinc and save it there. Then in the Boinc Manager go to Options, read config files and Boinc should start using both gpu's right away assuming MilkyWay still supports both of your gpu's.


Thanks for the reply Mikey. I did poke around the config thinking it was set to zero. I even changed the "no_gpu" setting to see if that made a change. It ended up removing the GPU options in the computer preferences. I put that back to the way is was.
It might just be coincidence. IDK if the old GPU was even capable of running GPU task, so it may have been only workloads left in the que from running CPU only. I will keep an eye on it. and see if any gpu workloads come up.
ID: 70289 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProDigit

Send message
Joined: 13 Nov 19
Posts: 9
Credit: 32,117,570
RAC: 0
30 million credit badge3 year member badge
Message 70292 - Posted: 30 Dec 2020, 1:00:27 UTC

What about Gromacs?

Or is it a single multipurpose shader, that's counted as 2, and can do INT calculations, as long as FLOAT isn't being utilized (or reverse)?
ID: 70292 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKeith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 670
Credit: 516,193,690
RAC: 174,028
500 million credit badge12 year member badgeextraordinary contributions badge
Message 70297 - Posted: 30 Dec 2020, 19:41:43 UTC - in response to Message 70292.  

Go back and read the article I linked. Depends on whether your code sets up the warp scheduler to process two FP32 operations at the same time. As soon as an INT operation flows through, there is only a single FP32 pipeline processing the workload. So entirely depends on your applications coding. If you don't specifically set up the warp scheduler to process two FP32 operations at the same time and prevent a INT operation from jumping in, you are crunching on a single FP32 pipeline, just as you did with Turing.

Ampere goes a step back and uses a common pipeline or CUDA core cluster for both integer and floating-point workloads.


Every FP32 CUDA core in an SM is an SIMD16 unit that takes two cycles to resolve a warp or a thread-group just like Turing. Unlike Turing, one set of cores is specifically for FP32 workloads while the other can do either a warp of INT or FP threads per two cycles.

ID: 70297 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Oct 16
Posts: 162
Credit: 1,004,076,615
RAC: 909,973
1 billion credit badge6 year member badge
Message 70374 - Posted: 16 Jan 2021, 4:00:47 UTC

MW scales right along with FP64 performance. Not shaders. This is an old list but a good reference for most of the top FP64 cards since double precision keeps being cut.
https://www.geeks3d.com/20140305/amd-radeon-and-nvidia-geforce-fp32-fp64-gflops-table-computing/

A 3080 has 465.1 FP64 GFLOPs performance which is in between a Radeon HD 5850 and Radeon HD 5870. So yeah a NV 30xx cards will suck here just like their desktop parts always have
ID: 70374 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SBretz

Send message
Joined: 3 May 11
Posts: 18
Credit: 133,104,941
RAC: 0
100 million credit badge11 year member badge
Message 70380 - Posted: 17 Jan 2021, 2:30:45 UTC

Update: The wife's computer is running GPU tasks on the 970 I installed. The older AMD GPU does not run tasks. It probably isn't compatible for GPU workloads. IDK the model number but I remember it was in the 6000 range from about 8-10 yrs ago. It was the highest end model that AMD recommended to run with their first gen of APU's. The only reason it is in the computer is to run the second monitor. As I may have mentioned, the 970 doesn't have the VGA adaptor for her second panel. I think this will eventual give me a reason to upgrade my monitor from a 1080 to a 1440. Then I could use the DVI and usb outputs on the 970 for her dual monitor instead of two GPU's.

As far as GPU computing not running on it earlier, I just had to wait for it to run out of some of the CPU workloads and let it download GPU work units.
ID: 70380 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 3075
Credit: 517,393,658
RAC: 34,467
500 million credit badge13 year member badgeextraordinary contributions badge
Message 70382 - Posted: 17 Jan 2021, 11:45:45 UTC - in response to Message 70380.  

Update: The older AMD GPU does not run tasks. It probably isn't compatible for GPU workloads. IDK the model number but I remember it was in the 6000 range from about 8-10 yrs ago. It was the highest end model that AMD recommended to run with their first gen of APU's. The only reason it is in the computer is to run the second monitor.


Do an exclude for Milkyway in your cc_config file for the AMD gpu, or make the settings to not get AMD units, and try putting it on SRBase doing the TF tasks. Remember to do the opposite so the Nvidia gpu doesn't get tasks from SRBase. I had a gpu that would not load the OpenCL stuff and it still crunched TF tasks just fine. I've since gotten a driver to work that loads the OpenCL stuff so it now works on every Project.
ID: 70382 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : Number crunching : GPU upgrade time

©2023 Astroinformatics Group