GPU upgrade time

Author	Message
ProDigit Send message Joined: 13 Nov 19 Posts: 9 Credit: 32,117,570 RAC: 0	Message 70233 - Posted: 9 Dec 2020, 21:12:08 UTC I would go with Nvidia, simply for 2 reasons. 1- Nvidia GPUs perform better on OpenCL computations than AMD GPUs; while AMD GPUs are more gaming oriented. 2- Nvidia GPUs can be throttled down by a large margin on power consumption. Meaning the 300W GPUs could be ran at 200-225W with about 98% of stock performance. 3- The RTX3000 GPUs have a much higher DPP processing than the RTX2000 GPUs. Occasionally it was mentioned that Vega 64 would outperform an RTX2080Ti in some DPP heavy workloads. Not going to be an issue anymore. 4- Nvidia has Cuda support now, which when supported is going to be guaranteed speed increase. FAH supports it, and has about a ~20% speed boost 'for free', by just enabling it. If your aim is to crunch data occasionally (like 2 days in the week or less), AMD could be the better buy, because they're cheaper. If you are crunching 24/7, months on end, Nvidia will be cheaper, as their running cost is lower. ID: 70233 · Rating: 0 · rate: / Reply Quote

Chooka Send message Joined: 13 Dec 12 Posts: 101 Credit: 1,782,758,310 RAC: 0	Message 70234 - Posted: 10 Dec 2020, 7:22:24 UTC - in response to Message 70233. Um... if we are talking about crunching projects like Milkyway or Einstein, AMD cards crush NVIDIA for those work units. You can also undervolt the cards if power is a concern. My Radeon VII's undervolted drop the wattage from well over 200W down to about 180W with barely any performance drop. If you are crunching projects like Primegrid, its NVIDIA all the way. @mikey - Interesting regarding your older cards. That's why I'm not an early adopter of new cards. I'll let someone else stump up the cash first :) ID: 70234 · Rating: 0 · rate: / Reply Quote

SBretz Send message Joined: 3 May 11 Posts: 18 Credit: 133,104,941 RAC: 0	Message 70246 - Posted: 18 Dec 2020, 2:42:58 UTC Update: Yesterday I was able to secure a new CPU. I was able to land a 5900x. Tonight I was trying to get a GPU. I missed out on the 6800xt but was able to land a 6900xt. Yeah, it is over priced for its performance, but it should be good for any gaming for good bit of time plus it should be pretty descent at crunching numbers. I am not sure how you guys "dyno" these cards for these work loads. I will look into it when I get the new system up and running. I may need your help getting numbers if I am too "stoopid" to figure it out myself. ID: 70246 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 70247 - Posted: 18 Dec 2020, 4:00:58 UTC - in response to Message 70246. Update: Yesterday I was able to secure a new CPU. I was able to land a 5900x. Tonight I was trying to get a GPU. I missed out on the 6800xt but was able to land a 6900xt. Yeah, it is over priced for its performance, but it should be good for any gaming for good bit of time plus it should be pretty descent at crunching numbers. I am not sure how you guys "dyno" these cards for these work loads. I will look into it when I get the new system up and running. I may need your help getting numbers if I am too "stoopid" to figure it out myself. Most people run Prime 95 for the cpu and while people used to use Furmark it has fried some gpu's so alot of people have moved to Unigine Heaven. All of the programs are free to run and will give good numbers but do have some things disabled that require paying for which most people never do. Sure businesses will pay but homeowners just don't need to test things that often and the free version works great for us. ID: 70247 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 70248 - Posted: 18 Dec 2020, 4:04:50 UTC - in response to Message 70234. @mikey - Interesting regarding your older cards. That's why I'm not an early adopter of new cards. I'll let someone else stump up the cash first :) Me either, in fact my overall best gpu's are 1080 Ti's. I move up thru sheer volume of results not the speed of each individual result, ie I have 3 1080 Ti's mixed into my total of 17 pc's running with over 250 cpu cores and a gaming capable gpu in all of them. ID: 70248 · Rating: 0 · rate: / Reply Quote

SBretz Send message Joined: 3 May 11 Posts: 18 Credit: 133,104,941 RAC: 0	Message 70251 - Posted: 18 Dec 2020, 23:35:32 UTC - in response to Message 70248. 17 pcs! That sounds like you are overseeing a computer lab. ID: 70251 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 70252 - Posted: 18 Dec 2020, 23:44:05 UTC - in response to Message 70251. 17 pcs! That sounds like you are overseeing a computer lab. No just a bunch of pc's I've kept running over the years that are in a room in the garage with their own a/c system. Some I picked up on trash day on the side of the road, others I bought parts for and have kept running that way and others I bought on Ebay and then enhanced to get the most out of them. ID: 70252 · Rating: 0 · rate: / Reply Quote

koschi Send message Joined: 10 Aug 08 Posts: 5 Credit: 20,664,027 RAC: 41,726	Message 70257 - Posted: 21 Dec 2020, 22:59:32 UTC For Linux users that want to crunch Milkyway "Big Navi" is problematic. The 20.45 driver, which is the first to officially support Radeon 6800/6900(XT,) is no longer using the PAL OpenCL implementation for Vega and newer cards. 20.45 switched to ROCm as OpenCL backend, for Vega 10/20, Navi and Big Navi. My Vega 56 crunched Milkyway on 20.40 just fine, but on 20.45 fails with OpenCL errors: Error creating command queue (-6): CL_OUT_OF_HOST_MEMORY Error getting device and context (-6): CL_OUT_OF_HOST_MEMORY I tried Milkyway on a 6900XT, same issue. When I tested the free Mesa/Clover and also ROCm OpenCL implementations last year, the same error occurred on this project. So with Big Navi becoming more wide spread over coming years and people updating GPU drivers on their Vegas and Radeon VIIs, more people bump will into this issue. ID: 70257 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 70258 - Posted: 22 Dec 2020, 0:13:47 UTC - in response to Message 70257. For Linux users that want to crunch Milkyway "Big Navi" is problematic. The 20.45 driver, which is the first to officially support Radeon 6800/6900(XT,) is no longer using the PAL OpenCL implementation for Vega and newer cards. 20.45 switched to ROCm as OpenCL backend, for Vega 10/20, Navi and Big Navi. My Vega 56 crunched Milkyway on 20.40 just fine, but on 20.45 fails with OpenCL errors: Error creating command queue (-6): CL_OUT_OF_HOST_MEMORY Error getting device and context (-6): CL_OUT_OF_HOST_MEMORY I tried Milkyway on a 6900XT, same issue. When I tested the free Mesa/Clover and also ROCm OpenCL implementations last year, the same error occurred on this project. So with Big Navi becoming more wide spread over coming years and people updating GPU drivers on their Vegas and Radeon VIIs, more people bump will into this issue. Try this thread on Einstein, the latest posts talk about ROCm and making it work. https://einsteinathome.org/content/all-things-navi-10 ID: 70258 · Rating: 0 · rate: / Reply Quote

koschi Send message Joined: 10 Aug 08 Posts: 5 Credit: 20,664,027 RAC: 41,726	Message 70259 - Posted: 22 Dec 2020, 7:35:56 UTC - in response to Message 70258. Well, driver 20.45 computes Einstein FGRP1G WUs within the usual run times, but fails entirely at Milkyway. I have no problem with dkms or leftovers of eg. 20.40. ID: 70259 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 70260 - Posted: 22 Dec 2020, 11:27:00 UTC - in response to Message 70259. Well, driver 20.45 computes Einstein FGRP1G WUs within the usual run times, but fails entirely at Milkyway. I have no problem with dkms or leftovers of eg. 20.40. I guess MilkyWay is slow on the uptick then in making things work for the new cards, that's sad as it means people may not come back. ID: 70260 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 715 Credit: 556,843,930 RAC: 44,705	Message 70261 - Posted: 22 Dec 2020, 18:31:00 UTC - in response to Message 70260. MilkyWay isn't the only project slow to support new hardware. GPUGrid can't handle the latest Nvidia cards either. ID: 70261 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 70262 - Posted: 22 Dec 2020, 23:54:21 UTC - in response to Message 70261. MilkyWay isn't the only project slow to support new hardware. GPUGrid can't handle the latest Nvidia cards either. Now that surprises me ID: 70262 · Rating: 0 · rate: / Reply Quote

SBretz Send message Joined: 3 May 11 Posts: 18 Credit: 133,104,941 RAC: 0	Message 70263 - Posted: 23 Dec 2020, 2:17:46 UTC The 6900xt came in today. I currently have it set up with 4 instances running with .25cpu cores per card. It looks like it is running through the 1.46 (opencl_ati_101) packets at around 2:10 to 2:30. This is the first time I have run mulitple instances on one gpu... do I have too much CPU allocated to them? The base set up was 0.988 cpus/1gpu. But I see the modified text in the config file has 0.05cpu/0.5gpus. Should I be running the cpu one factor of 10 lower than I have now? PS- This is awesome seeing this new rig chew through so much in so little time. ID: 70263 · Rating: 0 · rate: / Reply Quote

SBretz Send message Joined: 3 May 11 Posts: 18 Credit: 133,104,941 RAC: 0	Message 70264 - Posted: 23 Dec 2020, 2:18:42 UTC Last modified: 23 Dec 2020, 2:30:52 UTC Oh, not computational errors. I am running the latest update of windows 10 pro for now. I will be downgrading to the home version when my nvme shows up. PS: sorry, I didn't see the "Edit" button until after I made the second post. ID: 70264 · Rating: 0 · rate: / Reply Quote

SBretz Send message Joined: 3 May 11 Posts: 18 Credit: 133,104,941 RAC: 0	Message 70265 - Posted: 23 Dec 2020, 2:56:24 UTC Last modified: 23 Dec 2020, 3:05:19 UTC Interesting... I noticed when running one instance that there was down time on the GPU so I set the ratio to 4cpu's/1gpu and it stayed up around 99% with some drop offs. Running 12cpus had not difference so I dropped to 2 cpus. At 4cpus it knocks out a data packet in about 36sec. At 2cpus it drops to 48secs. With one instance running there are still occasional down time dips in the GPU usage graph. I will play with this some more. I think 2 instances will keep the GPU usage up near 99% all the time, I will just need to dial in the amount of CPU cores I need to feed the GPU consistantly so that it doesn't drop off. This "should" keep the gpu pegged. If not I will go back up to 4 gpu instances and turn down the CPU ratio until I see a drop in GPU usage and then go to the last known 99% usage setting. With one core/1cpu. The GPU usage will spike up to around 45-60% and the fall off to 0-6%. I suspect the GPU is waiting for the CPU to feed in for data. ID: 70265 · Rating: 0 · rate: / Reply Quote

SBretz Send message Joined: 3 May 11 Posts: 18 Credit: 133,104,941 RAC: 0	Message 70266 - Posted: 23 Dec 2020, 3:47:41 UTC It appears that 4 instances is the sweet spot. I am running only 0.05cpus per wu and the gpu is staying near 100%. It occasionly is dipping to 60ish% and then right back up to 99% usage. The times per wu is coming in from as low as 2:08 to 2:36 is the lowest and highest I have seen so far. I had one computational error, but it was on a CPU only task. ID: 70266 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 70267 - Posted: 23 Dec 2020, 11:13:48 UTC - in response to Message 70266. It appears that 4 instances is the sweet spot. I am running only 0.05cpus per wu and the gpu is staying near 100%. It occasionly is dipping to 60ish% and then right back up to 99% usage. The times per wu is coming in from as low as 2:08 to 2:36 is the lowest and highest I have seen so far. I had one computational error, but it was on a CPU only task. That was some VERY good testing to find that, most people just set a number and then expect the app to use what they set. The dip could just be a couple of units finishing at the same time so the gpu is slightly underused as it switches to the next task. The only gpu I have that's even really close to your numbers is my old 7970 and it's running then one at a time in just under 60 seconds each. Due to it's age I'm not willing to push it beyond one unit at a time though. I have one gpu taking over 2800 seconds per task but at least it's crunching and I did not expect that when I put it in the machine! ID: 70267 · Rating: 0 · rate: / Reply Quote

SBretz Send message Joined: 3 May 11 Posts: 18 Credit: 133,104,941 RAC: 0	Message 70268 - Posted: 23 Dec 2020, 14:27:13 UTC - in response to Message 70267. The dip could just be a couple of units finishing at the same time so the gpu is slightly underused as it switches to the next task. I thought that at first too. I was checking back to the boinc manager when I would see the dip expecting to see a new work unit starting but they would be in the middle of the of a run. IDK what is causing it. More CPUs/GPU helped a little bit but the largest impact to keep the GPU up on runtime was to throw more instances at it. I was suprised that I could use such low CPU/GPU. With 4 instances running it needs only 0.2 CPU's to divide amount all four work loads. The pretty much leaves another CPU thread open. ID: 70268 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 70269 - Posted: 24 Dec 2020, 11:05:24 UTC - in response to Message 70268. The dip could just be a couple of units finishing at the same time so the gpu is slightly underused as it switches to the next task. I thought that at first too. I was checking back to the boinc manager when I would see the dip expecting to see a new work unit starting but they would be in the middle of the of a run. IDK what is causing it. More CPUs/GPU helped a little bit but the largest impact to keep the GPU up on runtime was to throw more instances at it. I was suprised that I could use such low CPU/GPU. With 4 instances running it needs only 0.2 CPU's to divide amount all four work loads. The pretty much leaves another CPU thread open. You do know that setting is not a thing the app respects...right? The app is designed to use X amount of cpu and it will use that no matter what settings we try to make, BUT what it does do is like you said instead of just reserving a whole cpu core for each gpu like most people do you have found that you can now run some NCI projects with no problem at all. NCI projects are 'non computational intense' projects and they run in the background and Ithena, wuprop and the now retired project GoofyxGrid are examples. I forget what Goofy was tracking but Ithena tracks the internet speeds around the World while wuprop tracks what projects and apps we are crunching, what hardware we are using and how much time it takes to run the different apps. It also tracks a BUNCH of other metrics as well like ram needed for each app etc etc. ID: 70269 · Rating: 0 · rate: / Reply Quote