Welcome to MilkyWay@home

Help gpu mystery

Message boards : Number crunching : Help gpu mystery
Message board moderation

To post messages, you must log in.

AuthorMessage
zoom314
Avatar

Send message
Joined: 29 Jul 08
Posts: 267
Credit: 188,848,188
RAC: 0
Message 69665 - Posted: 8 Apr 2020, 1:51:02 UTC

Ok I have 2 gpu cards, a pair of Nvidia 970 cards, before 12 noon I was doing 2 at a time or 1 per gpu, yet now only gpu 0 is doing any work and gpu 1 is doing nothing, the cpu is in high priority mode of course after the internet was offline for a few hours, it's a 1650 v2 cpu. The temp of gpu 1 at idle is 32C, gpu 0 is 57C while running.

I crunched numbers for Seti@Home for years and didn't have this problem. The cc config is set to use all gpus since one gpu is slower than the other.
Both gpus are seen by Boinc 7.14.2.

ID: 69665 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 709
Credit: 549,584,151
RAC: 56,221
Message 69666 - Posted: 8 Apr 2020, 2:39:14 UTC - in response to Message 69665.  

The use all gpus should be ignored and unnecessary since the cards are the same type. I ran 3 1070Ti all the time at Seti with no use all gpus in the cc_config just fine.
It sounds like the high priority mode is what is getting in the way. Once that happens BOINC will preempt everything else to clear the debt to that task or set of tasks.
The other possibility is there is not enough cpu resource to support the task on the other gpu since they are being preempted by the high priority debt.
Once that clears, I suspect everything will return to normal.
ID: 69666 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zoom314
Avatar

Send message
Joined: 29 Jul 08
Posts: 267
Credit: 188,848,188
RAC: 0
Message 69667 - Posted: 8 Apr 2020, 14:44:38 UTC - in response to Message 69666.  

So the 300MHz core clock and the 405MHz memory clock on the lower 2 cards and those two cards are sharing a pcie power cable, probably doesn't mean much, beyond raising My anxiety?

Ok.

On the debt I set tasks to No New Work until the cpu debt is resolved. The gpu being as fast as it is @ 1995- 2010MHz only for the top card so far, the gpu(s) does not have any debt, yet, though there are 798 wu's waiting for the gpus and 100 for Hex core cpu. I think I might set Boinc to use more cpus, rather than the 33% Boinc is set for now, like maybe 45-50% to try and reduce the cpu debt quicker.

One thing I wish I could do was to set the cpu at no new work and let the gpu do what it wants, but I don't think Boinc is going to allow that. At least not yet.

I do know the "new" top card seems to be doing the numbers at about the same rate as the one it replaced and for less heat, the new card is cooler and in more than one way.

I do notice in gpu-z v2.30.0 that the bottom card says 1500MHz for a couple seconds, then drops to 300MHz like the middle card is running at, I could power the PC down later when there is more light and plug two more pcie cables in, the 4th cable would be just in case I want to try and tackle a 4th card again.

I mean 42-44C vs 54-60C, It's no contest. 12nm gpus, My cpu is running at 41-44C w/a 200MHz overclock, I can imagine 7nm ought to be really cool.

I'm curious would 4 of these cards in the same case hurt anything thermally?

With 4 cards that's about 1/8 inch of separation between the cards then, instead of a slots worth of space with 3 cards like seen in the pic below.

This case bows out in the back, getting the case and the brackets to line up for a screw is hard.

Here's the 3 cards in the case, click on the pic for a larger 1920x1440 sized pic.
That's a Seasonic 1050w SnowSilent Platinum, a good psu, not a lick of problems before this debt showed it's head, which I suspect was caused by the internet outage Spectrum had on Tuesday.


ID: 69667 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 709
Credit: 549,584,151
RAC: 56,221
Message 69668 - Posted: 8 Apr 2020, 15:48:57 UTC

Well what you posted and what your picture shows is very different. You stated you had a pair of 970's. Then said one was slower than the other. That made no sense because both are 970's.

But your picture shows 3 cards of indeterminate type. If the third card is different, slower or faster than the 970 pair, you will have to use the use_all_gpus parameter in cc_config.

You should never share PCIe cables between cards. A card needs either 1 or 2 cables directly from the power supply to satisfy the power requirements of the card.

You say one card is not showing any load probably is because it is power limiting itself because of insufficient input PCIe power.

First thing you should always check is "do all cards get detected by BOINC at startup in the Log"? If not, investigate why. Drivers not loaded usually is the issue.
ID: 69668 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zoom314
Avatar

Send message
Joined: 29 Jul 08
Posts: 267
Credit: 188,848,188
RAC: 0
Message 69669 - Posted: 8 Apr 2020, 16:04:08 UTC - in response to Message 69668.  

The 970's since removed were an Asus Turbo and a Palit Blower card, the Turbo is somewhat faster.

ID: 69669 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zoom314
Avatar

Send message
Joined: 29 Jul 08
Posts: 267
Credit: 188,848,188
RAC: 0
Message 69670 - Posted: 8 Apr 2020, 16:09:16 UTC - in response to Message 69668.  
Last modified: 8 Apr 2020, 16:12:40 UTC

Well what you posted and what your picture shows is very different. You stated you had a pair of 970's. Then said one was slower than the other. That made no sense because both are 970's.

But your picture shows 3 cards of indeterminate type. If the third card is different, slower or faster than the 970 pair, you will have to use the use_all_gpus parameter in cc_config.

You should never share PCIe cables between cards. A card needs either 1 or 2 cables directly from the power supply to satisfy the power requirements of the card.

You say one card is not showing any load probably is because it is power limiting itself because of insufficient input PCIe power.

First thing you should always check is "do all cards get detected by BOINC at startup in the Log"? If not, investigate why. Drivers not loaded usually is the issue.


I didn't have the energy to take a pic of the two 970's please forgive.

The 3 pictured are what is in the case now.

Actually the 970's had the lower or 2nd card not showing any load after High P. showed up.

The two lower 1660TI's are not doing anything either.

I'll correct the cable situation asap.

All 3 cards are detected by Boinc and by Windows.

ID: 69670 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 709
Credit: 549,584,151
RAC: 56,221
Message 69672 - Posted: 9 Apr 2020, 1:07:56 UTC

So what do you actually have installed? Is it two 1660Ti's and a 970? BOINC should detect and report the 1660 TI's as GPU0 and GPU1 and the 970 as GPU2 in the order of capabilities.
Again, you can't really say if anything is not working correctly if you have a cpu High Priority event occurring.

This is what is needed to run the Milkyway Separation gpu tasks as the tasks are sent by the project. You can see each task needs almost an entire cpu thread to support the task.

Resources 0.976 CPUs + 1 NVIDIA GPU

I don't know what other projects you are running on your host, but the cpu resource needed for gpu tasks is very similar for most projects on Nvidia cards. Somewhere just under a full cpu thread.

So with your cpu cores occupied by the High Priority Event, there would not have been sufficient cpu resources to run any gpu tasks.
ID: 69672 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zoom314
Avatar

Send message
Joined: 29 Jul 08
Posts: 267
Credit: 188,848,188
RAC: 0
Message 69673 - Posted: 9 Apr 2020, 1:50:44 UTC - in response to Message 69672.  
Last modified: 9 Apr 2020, 1:56:07 UTC

It's just the 3 Gigabyte GTX 1660Ti cards and each card now has its own pcie power cable.

And I changed the cc config "use all gpus" setting back to stock.

The two 970 cards are off on a cart and not installed in any PC at present.

I'm going to pull the 3rd 1660Ti card on Thursday and insert a 1660TI card that needs minor testing to make sure Windows and Boinc sees the card.

Just changing the cables was all I could do today.

For now just Milkyway is being run on this host.

I do have a Hex core Intel Xeon 1650 v2 @ 3.7GHz cpu and I'm down to 74 cpu wu's with Milkyway which is set to no new tasks to get rid of the debt. I never had debt problems during My 17 years @ Seti.
I'm at 50% of cpus, I could push the Xeon to 7 or 8 cpus(58% or 66%) if needed instead of just 6.

The deadlines are 4-18-2020, 4-19-2020, and 4-20-2020.

Boinc here uses 0.966 on the 1660Ti cards, My cable company had an outage when I had the Asus 970 Turbo and the Palit 970 Blower installed and after the outage this debt happened, the two 970 cards were doing 0.953.

The Asus 970 Turbo is a White and Red card.
The Palit 970 Blower is an all Black card.

ID: 69673 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zoom314
Avatar

Send message
Joined: 29 Jul 08
Posts: 267
Credit: 188,848,188
RAC: 0
Message 69674 - Posted: 9 Apr 2020, 2:09:38 UTC

2 of My 3 Gigabyte 1660Ti cards are now crunching at the same time and both are doing 0.966.

ID: 69674 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
zoom314
Avatar

Send message
Joined: 29 Jul 08
Posts: 267
Credit: 188,848,188
RAC: 0
Message 69675 - Posted: 9 Apr 2020, 2:20:50 UTC

Now all 3 Gigabyte GTX 1660Ti Gaming OC video cards are crunching Milkyway wu's and each is doing 0.966.

I could install a 4th card, but the spacing between cards would be about 1/8 of an inch which is pretty narrow.

ID: 69675 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Help gpu mystery

©2024 Astroinformatics Group