Message boards :
Number crunching :
Help gpu mystery
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Jul 08 Posts: 267 Credit: 188,848,188 RAC: 0 |
Ok I have 2 gpu cards, a pair of Nvidia 970 cards, before 12 noon I was doing 2 at a time or 1 per gpu, yet now only gpu 0 is doing any work and gpu 1 is doing nothing, the cpu is in high priority mode of course after the internet was offline for a few hours, it's a 1650 v2 cpu. The temp of gpu 1 at idle is 32C, gpu 0 is 57C while running. I crunched numbers for Seti@Home for years and didn't have this problem. The cc config is set to use all gpus since one gpu is slower than the other. Both gpus are seen by Boinc 7.14.2. |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 555,880,145 RAC: 42,608 |
The use all gpus should be ignored and unnecessary since the cards are the same type. I ran 3 1070Ti all the time at Seti with no use all gpus in the cc_config just fine. It sounds like the high priority mode is what is getting in the way. Once that happens BOINC will preempt everything else to clear the debt to that task or set of tasks. The other possibility is there is not enough cpu resource to support the task on the other gpu since they are being preempted by the high priority debt. Once that clears, I suspect everything will return to normal. |
Send message Joined: 29 Jul 08 Posts: 267 Credit: 188,848,188 RAC: 0 |
So the 300MHz core clock and the 405MHz memory clock on the lower 2 cards and those two cards are sharing a pcie power cable, probably doesn't mean much, beyond raising My anxiety? Ok. On the debt I set tasks to No New Work until the cpu debt is resolved. The gpu being as fast as it is @ 1995- 2010MHz only for the top card so far, the gpu(s) does not have any debt, yet, though there are 798 wu's waiting for the gpus and 100 for Hex core cpu. I think I might set Boinc to use more cpus, rather than the 33% Boinc is set for now, like maybe 45-50% to try and reduce the cpu debt quicker. One thing I wish I could do was to set the cpu at no new work and let the gpu do what it wants, but I don't think Boinc is going to allow that. At least not yet. I do know the "new" top card seems to be doing the numbers at about the same rate as the one it replaced and for less heat, the new card is cooler and in more than one way. I do notice in gpu-z v2.30.0 that the bottom card says 1500MHz for a couple seconds, then drops to 300MHz like the middle card is running at, I could power the PC down later when there is more light and plug two more pcie cables in, the 4th cable would be just in case I want to try and tackle a 4th card again. I mean 42-44C vs 54-60C, It's no contest. 12nm gpus, My cpu is running at 41-44C w/a 200MHz overclock, I can imagine 7nm ought to be really cool. I'm curious would 4 of these cards in the same case hurt anything thermally? With 4 cards that's about 1/8 inch of separation between the cards then, instead of a slots worth of space with 3 cards like seen in the pic below. This case bows out in the back, getting the case and the brackets to line up for a screw is hard. Here's the 3 cards in the case, click on the pic for a larger 1920x1440 sized pic. That's a Seasonic 1050w SnowSilent Platinum, a good psu, not a lick of problems before this debt showed it's head, which I suspect was caused by the internet outage Spectrum had on Tuesday. |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 555,880,145 RAC: 42,608 |
Well what you posted and what your picture shows is very different. You stated you had a pair of 970's. Then said one was slower than the other. That made no sense because both are 970's. But your picture shows 3 cards of indeterminate type. If the third card is different, slower or faster than the 970 pair, you will have to use the use_all_gpus parameter in cc_config. You should never share PCIe cables between cards. A card needs either 1 or 2 cables directly from the power supply to satisfy the power requirements of the card. You say one card is not showing any load probably is because it is power limiting itself because of insufficient input PCIe power. First thing you should always check is "do all cards get detected by BOINC at startup in the Log"? If not, investigate why. Drivers not loaded usually is the issue. |
Send message Joined: 29 Jul 08 Posts: 267 Credit: 188,848,188 RAC: 0 |
The 970's since removed were an Asus Turbo and a Palit Blower card, the Turbo is somewhat faster. |
Send message Joined: 29 Jul 08 Posts: 267 Credit: 188,848,188 RAC: 0 |
Well what you posted and what your picture shows is very different. You stated you had a pair of 970's. Then said one was slower than the other. That made no sense because both are 970's. I didn't have the energy to take a pic of the two 970's please forgive. The 3 pictured are what is in the case now. Actually the 970's had the lower or 2nd card not showing any load after High P. showed up. The two lower 1660TI's are not doing anything either. I'll correct the cable situation asap. All 3 cards are detected by Boinc and by Windows. |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 555,880,145 RAC: 42,608 |
So what do you actually have installed? Is it two 1660Ti's and a 970? BOINC should detect and report the 1660 TI's as GPU0 and GPU1 and the 970 as GPU2 in the order of capabilities. Again, you can't really say if anything is not working correctly if you have a cpu High Priority event occurring. This is what is needed to run the Milkyway Separation gpu tasks as the tasks are sent by the project. You can see each task needs almost an entire cpu thread to support the task. Resources 0.976 CPUs + 1 NVIDIA GPU I don't know what other projects you are running on your host, but the cpu resource needed for gpu tasks is very similar for most projects on Nvidia cards. Somewhere just under a full cpu thread. So with your cpu cores occupied by the High Priority Event, there would not have been sufficient cpu resources to run any gpu tasks. |
Send message Joined: 29 Jul 08 Posts: 267 Credit: 188,848,188 RAC: 0 |
It's just the 3 Gigabyte GTX 1660Ti cards and each card now has its own pcie power cable. And I changed the cc config "use all gpus" setting back to stock. The two 970 cards are off on a cart and not installed in any PC at present. I'm going to pull the 3rd 1660Ti card on Thursday and insert a 1660TI card that needs minor testing to make sure Windows and Boinc sees the card. Just changing the cables was all I could do today. For now just Milkyway is being run on this host. I do have a Hex core Intel Xeon 1650 v2 @ 3.7GHz cpu and I'm down to 74 cpu wu's with Milkyway which is set to no new tasks to get rid of the debt. I never had debt problems during My 17 years @ Seti. I'm at 50% of cpus, I could push the Xeon to 7 or 8 cpus(58% or 66%) if needed instead of just 6. The deadlines are 4-18-2020, 4-19-2020, and 4-20-2020. Boinc here uses 0.966 on the 1660Ti cards, My cable company had an outage when I had the Asus 970 Turbo and the Palit 970 Blower installed and after the outage this debt happened, the two 970 cards were doing 0.953. The Asus 970 Turbo is a White and Red card. The Palit 970 Blower is an all Black card. |
Send message Joined: 29 Jul 08 Posts: 267 Credit: 188,848,188 RAC: 0 |
2 of My 3 Gigabyte 1660Ti cards are now crunching at the same time and both are doing 0.966. |
Send message Joined: 29 Jul 08 Posts: 267 Credit: 188,848,188 RAC: 0 |
Now all 3 Gigabyte GTX 1660Ti Gaming OC video cards are crunching Milkyway wu's and each is doing 0.966. I could install a 4th card, but the spacing between cards would be about 1/8 of an inch which is pretty narrow. |
©2024 Astroinformatics Group