Welcome to MilkyWay@home

GPU upgrade time

Message boards : Number crunching : GPU upgrade time
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
ProDigit

Send message
Joined: 13 Nov 19
Posts: 9
Credit: 32,117,570
RAC: 0
Message 70233 - Posted: 9 Dec 2020, 21:12:08 UTC

I would go with Nvidia, simply for 2 reasons.
1- Nvidia GPUs perform better on OpenCL computations than AMD GPUs; while AMD GPUs are more gaming oriented.
2- Nvidia GPUs can be throttled down by a large margin on power consumption. Meaning the 300W GPUs could be ran at 200-225W with about 98% of stock performance.
3- The RTX3000 GPUs have a much higher DPP processing than the RTX2000 GPUs. Occasionally it was mentioned that Vega 64 would outperform an RTX2080Ti in some DPP heavy workloads.
Not going to be an issue anymore.
4- Nvidia has Cuda support now, which when supported is going to be guaranteed speed increase. FAH supports it, and has about a ~20% speed boost 'for free', by just enabling it.

If your aim is to crunch data occasionally (like 2 days in the week or less), AMD could be the better buy, because they're cheaper.
If you are crunching 24/7, months on end, Nvidia will be cheaper, as their running cost is lower.
ID: 70233 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Chooka
Avatar

Send message
Joined: 13 Dec 12
Posts: 101
Credit: 1,782,758,310
RAC: 0
Message 70234 - Posted: 10 Dec 2020, 7:22:24 UTC - in response to Message 70233.  

Um... if we are talking about crunching projects like Milkyway or Einstein, AMD cards crush NVIDIA for those work units.
You can also undervolt the cards if power is a concern. My Radeon VII's undervolted drop the wattage from well over 200W down to about 180W with barely any performance drop.

If you are crunching projects like Primegrid, its NVIDIA all the way.

@mikey - Interesting regarding your older cards. That's why I'm not an early adopter of new cards. I'll let someone else stump up the cash first :)

ID: 70234 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SBretz

Send message
Joined: 3 May 11
Posts: 18
Credit: 133,104,941
RAC: 0
Message 70246 - Posted: 18 Dec 2020, 2:42:58 UTC

Update: Yesterday I was able to secure a new CPU. I was able to land a 5900x. Tonight I was trying to get a GPU. I missed out on the 6800xt but was able to land a 6900xt. Yeah, it is over priced for its performance, but it should be good for any gaming for good bit of time plus it should be pretty descent at crunching numbers.

I am not sure how you guys "dyno" these cards for these work loads. I will look into it when I get the new system up and running. I may need your help getting numbers if I am too "stoopid" to figure it out myself.
ID: 70246 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 70247 - Posted: 18 Dec 2020, 4:00:58 UTC - in response to Message 70246.  

Update: Yesterday I was able to secure a new CPU. I was able to land a 5900x. Tonight I was trying to get a GPU. I missed out on the 6800xt but was able to land a 6900xt. Yeah, it is over priced for its performance, but it should be good for any gaming for good bit of time plus it should be pretty descent at crunching numbers.

I am not sure how you guys "dyno" these cards for these work loads. I will look into it when I get the new system up and running. I may need your help getting numbers if I am too "stoopid" to figure it out myself.


Most people run Prime 95 for the cpu and while people used to use Furmark it has fried some gpu's so alot of people have moved to Unigine Heaven. All of the programs are free to run and will give good numbers but do have some things disabled that require paying for which most people never do. Sure businesses will pay but homeowners just don't need to test things that often and the free version works great for us.
ID: 70247 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 70248 - Posted: 18 Dec 2020, 4:04:50 UTC - in response to Message 70234.  


@mikey - Interesting regarding your older cards. That's why I'm not an early adopter of new cards. I'll let someone else stump up the cash first :)


Me either, in fact my overall best gpu's are 1080 Ti's. I move up thru sheer volume of results not the speed of each individual result, ie I have 3 1080 Ti's mixed into my total of 17 pc's running with over 250 cpu cores and a gaming capable gpu in all of them.
ID: 70248 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SBretz

Send message
Joined: 3 May 11
Posts: 18
Credit: 133,104,941
RAC: 0
Message 70251 - Posted: 18 Dec 2020, 23:35:32 UTC - in response to Message 70248.  

17 pcs! That sounds like you are overseeing a computer lab.
ID: 70251 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 70252 - Posted: 18 Dec 2020, 23:44:05 UTC - in response to Message 70251.  

17 pcs! That sounds like you are overseeing a computer lab.


No just a bunch of pc's I've kept running over the years that are in a room in the garage with their own a/c system. Some I picked up on trash day on the side of the road, others I bought parts for and have kept running that way and others I bought on Ebay and then enhanced to get the most out of them.
ID: 70252 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
koschi

Send message
Joined: 10 Aug 08
Posts: 5
Credit: 20,660,583
RAC: 41,478
Message 70257 - Posted: 21 Dec 2020, 22:59:32 UTC

For Linux users that want to crunch Milkyway "Big Navi" is problematic.
The 20.45 driver, which is the first to officially support Radeon 6800/6900(XT,) is no longer using the PAL OpenCL implementation for Vega and newer cards.
20.45 switched to ROCm as OpenCL backend, for Vega 10/20, Navi and Big Navi.

My Vega 56 crunched Milkyway on 20.40 just fine, but on 20.45 fails with OpenCL errors:
Error creating command queue (-6): CL_OUT_OF_HOST_MEMORY
Error getting device and context (-6): CL_OUT_OF_HOST_MEMORY


I tried Milkyway on a 6900XT, same issue. When I tested the free Mesa/Clover and also ROCm OpenCL implementations last year, the same error occurred on this project.

So with Big Navi becoming more wide spread over coming years and people updating GPU drivers on their Vegas and Radeon VIIs, more people bump will into this issue.
ID: 70257 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 70258 - Posted: 22 Dec 2020, 0:13:47 UTC - in response to Message 70257.  

For Linux users that want to crunch Milkyway "Big Navi" is problematic.
The 20.45 driver, which is the first to officially support Radeon 6800/6900(XT,) is no longer using the PAL OpenCL implementation for Vega and newer cards.
20.45 switched to ROCm as OpenCL backend, for Vega 10/20, Navi and Big Navi.

My Vega 56 crunched Milkyway on 20.40 just fine, but on 20.45 fails with OpenCL errors:
Error creating command queue (-6): CL_OUT_OF_HOST_MEMORY
Error getting device and context (-6): CL_OUT_OF_HOST_MEMORY


I tried Milkyway on a 6900XT, same issue. When I tested the free Mesa/Clover and also ROCm OpenCL implementations last year, the same error occurred on this project.

So with Big Navi becoming more wide spread over coming years and people updating GPU drivers on their Vegas and Radeon VIIs, more people bump will into this issue.


Try this thread on Einstein, the latest posts talk about ROCm and making it work.
https://einsteinathome.org/content/all-things-navi-10
ID: 70258 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
koschi

Send message
Joined: 10 Aug 08
Posts: 5
Credit: 20,660,583
RAC: 41,478
Message 70259 - Posted: 22 Dec 2020, 7:35:56 UTC - in response to Message 70258.  

Well, driver 20.45 computes Einstein FGRP1G WUs within the usual run times, but fails entirely at Milkyway.
I have no problem with dkms or leftovers of eg. 20.40.
ID: 70259 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 70260 - Posted: 22 Dec 2020, 11:27:00 UTC - in response to Message 70259.  

Well, driver 20.45 computes Einstein FGRP1G WUs within the usual run times, but fails entirely at Milkyway.
I have no problem with dkms or leftovers of eg. 20.40.


I guess MilkyWay is slow on the uptick then in making things work for the new cards, that's sad as it means people may not come back.
ID: 70260 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 556,843,899
RAC: 44,759
Message 70261 - Posted: 22 Dec 2020, 18:31:00 UTC - in response to Message 70260.  

MilkyWay isn't the only project slow to support new hardware.

GPUGrid can't handle the latest Nvidia cards either.
ID: 70261 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 70262 - Posted: 22 Dec 2020, 23:54:21 UTC - in response to Message 70261.  

MilkyWay isn't the only project slow to support new hardware.

GPUGrid can't handle the latest Nvidia cards either.


Now that surprises me
ID: 70262 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SBretz

Send message
Joined: 3 May 11
Posts: 18
Credit: 133,104,941
RAC: 0
Message 70263 - Posted: 23 Dec 2020, 2:17:46 UTC

The 6900xt came in today.
I currently have it set up with 4 instances running with .25cpu cores per card. It looks like it is running through the 1.46 (opencl_ati_101) packets at around 2:10 to 2:30.

This is the first time I have run mulitple instances on one gpu... do I have too much CPU allocated to them? The base set up was 0.988 cpus/1gpu. But I see the modified text in the config file has 0.05cpu/0.5gpus.
Should I be running the cpu one factor of 10 lower than I have now?

PS- This is awesome seeing this new rig chew through so much in so little time.
ID: 70263 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SBretz

Send message
Joined: 3 May 11
Posts: 18
Credit: 133,104,941
RAC: 0
Message 70264 - Posted: 23 Dec 2020, 2:18:42 UTC
Last modified: 23 Dec 2020, 2:30:52 UTC

Oh, not computational errors. I am running the latest update of windows 10 pro for now.
I will be downgrading to the home version when my nvme shows up.


PS: sorry, I didn't see the "Edit" button until after I made the second post.
ID: 70264 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SBretz

Send message
Joined: 3 May 11
Posts: 18
Credit: 133,104,941
RAC: 0
Message 70265 - Posted: 23 Dec 2020, 2:56:24 UTC
Last modified: 23 Dec 2020, 3:05:19 UTC

Interesting...
I noticed when running one instance that there was down time on the GPU so I set the ratio to 4cpu's/1gpu and it stayed up around 99% with some drop offs.
Running 12cpus had not difference so I dropped to 2 cpus. At 4cpus it knocks out a data packet in about 36sec. At 2cpus it drops to 48secs. With one instance running there are still occasional down time dips in the GPU usage graph.
I will play with this some more. I think 2 instances will keep the GPU usage up near 99% all the time, I will just need to dial in the amount of CPU cores I need to feed the GPU consistantly so that it doesn't drop off. This "should" keep the gpu pegged. If not I will go back up to 4 gpu instances and turn down the CPU ratio until I see a drop in GPU usage and then go to the last known 99% usage setting.

With one core/1cpu. The GPU usage will spike up to around 45-60% and the fall off to 0-6%. I suspect the GPU is waiting for the CPU to feed in for data.
ID: 70265 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SBretz

Send message
Joined: 3 May 11
Posts: 18
Credit: 133,104,941
RAC: 0
Message 70266 - Posted: 23 Dec 2020, 3:47:41 UTC

It appears that 4 instances is the sweet spot. I am running only 0.05cpus per wu and the gpu is staying near 100%. It occasionly is dipping to 60ish% and then right back up to 99% usage. The times per wu is coming in from as low as 2:08 to 2:36 is the lowest and highest I have seen so far.

I had one computational error, but it was on a CPU only task.
ID: 70266 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 70267 - Posted: 23 Dec 2020, 11:13:48 UTC - in response to Message 70266.  

It appears that 4 instances is the sweet spot. I am running only 0.05cpus per wu and the gpu is staying near 100%. It occasionly is dipping to 60ish% and then right back up to 99% usage. The times per wu is coming in from as low as 2:08 to 2:36 is the lowest and highest I have seen so far.

I had one computational error, but it was on a CPU only task.


That was some VERY good testing to find that, most people just set a number and then expect the app to use what they set. The dip could just be a couple of units finishing at the same time so the gpu is slightly underused as it switches to the next task.

The only gpu I have that's even really close to your numbers is my old 7970 and it's running then one at a time in just under 60 seconds each. Due to it's age I'm not willing to push it beyond one unit at a time though. I have one gpu taking over 2800 seconds per task but at least it's crunching and I did not expect that when I put it in the machine!
ID: 70267 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SBretz

Send message
Joined: 3 May 11
Posts: 18
Credit: 133,104,941
RAC: 0
Message 70268 - Posted: 23 Dec 2020, 14:27:13 UTC - in response to Message 70267.  


The dip could just be a couple of units finishing at the same time so the gpu is slightly underused as it switches to the next task.


I thought that at first too. I was checking back to the boinc manager when I would see the dip expecting to see a new work unit starting but they would be in the middle of the of a run.

IDK what is causing it. More CPUs/GPU helped a little bit but the largest impact to keep the GPU up on runtime was to throw more instances at it.
I was suprised that I could use such low CPU/GPU.

With 4 instances running it needs only 0.2 CPU's to divide amount all four work loads. The pretty much leaves another CPU thread open.
ID: 70268 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 70269 - Posted: 24 Dec 2020, 11:05:24 UTC - in response to Message 70268.  


The dip could just be a couple of units finishing at the same time so the gpu is slightly underused as it switches to the next task.


I thought that at first too. I was checking back to the boinc manager when I would see the dip expecting to see a new work unit starting but they would be in the middle of the of a run.

IDK what is causing it. More CPUs/GPU helped a little bit but the largest impact to keep the GPU up on runtime was to throw more instances at it.
I was suprised that I could use such low CPU/GPU.

With 4 instances running it needs only 0.2 CPU's to divide amount all four work loads. The pretty much leaves another CPU thread open.


You do know that setting is not a thing the app respects...right? The app is designed to use X amount of cpu and it will use that no matter what settings we try to make, BUT what it does do is like you said instead of just reserving a whole cpu core for each gpu like most people do you have found that you can now run some NCI projects with no problem at all. NCI projects are 'non computational intense' projects and they run in the background and Ithena, wuprop and the now retired project GoofyxGrid are examples. I forget what Goofy was tracking but Ithena tracks the internet speeds around the World while wuprop tracks what projects and apps we are crunching, what hardware we are using and how much time it takes to run the different apps. It also tracks a BUNCH of other metrics as well like ram needed for each app etc etc.
ID: 70269 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : GPU upgrade time

©2024 Astroinformatics Group