Welcome to MilkyWay@home

GPU not at 100% when all CPU cores crunch

Message boards : Number crunching : GPU not at 100% when all CPU cores crunch
Message board moderation

To post messages, you must log in.

AuthorMessage
Pavel Hanak

Send message
Joined: 30 Apr 10
Posts: 4
Credit: 38,050,684
RAC: 0
Message 59053 - Posted: 23 Jun 2013, 12:55:49 UTC
Last modified: 23 Jun 2013, 13:17:03 UTC

Hi all, I know the thread title might sound familiar, but I think I encountered a different problem/bug than discussed previously here, so please bear with me.

You see, I have this water-cooled gaming rig with 6-core/12-thread CPU (Intel i7-970) and Radeon HD7970 GPU. And of course, I use it to crunch for several BOINC projects, but MilkyWay@home is (currently) the only project I run on GPU. Now normally, the HD7970 can burn through one MilkyWay@home workunit in about 50 seconds and GPU is utilized at 100% the entire time. But that happens only if at least one CPU core/thread is idle. If all CPU cores/threads are crunching, the GPU slows down considerably - most of the time, the GPU utilization jiggles between 40 and 80%. The MilkyWay@home workunits take about twice as long to crunch, too. When I disable CPU tasks, the GPU utilization almost immediately jumps back to 100%. When I enable them, the GPU falls back into that 40 to 80% range, so there is definitely a pattern there.

I updated to the latest BOINC Manager and graphics drivers, but nothing helped. Is this some known problem?

Oh, I almost forgot, the PC runs on W7 64-bit Professional and has 12 GB RAM.
ID: 59053 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 59054 - Posted: 23 Jun 2013, 13:12:40 UTC
Last modified: 23 Jun 2013, 13:14:36 UTC

It's not a problem per se, it's just the nature of the beast.

There's a lot of memory IO involved with running GPU apps, so if you have all the cores busy doing other tasks (including using it yourself to do your work) something has to give. Therefore the graphics card sometimes has to wait around for it's turn to get to main memory, as your from the hip experiment demonstrated. ;-)
ID: 59054 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3319
Credit: 520,257,388
RAC: 20,689
Message 59069 - Posted: 24 Jun 2013, 11:17:44 UTC - in response to Message 59054.  

It's not a problem per se, it's just the nature of the beast.

There's a lot of memory IO involved with running GPU apps, so if you have all the cores busy doing other tasks (including using it yourself to do your work) something has to give. Therefore the graphics card sometimes has to wait around for it's turn to get to main memory, as your from the hip experiment demonstrated. ;-)


AND it is especially noticeable the better/faster the gpu is. The better/faster the gpu is the more it's need for stuff to do, all that stuff comes from the cpu, so if you don't leave a cpu core free the gpu will bog down ALOT! Some projects are able to fit the whole workunit into gpu memory making this a non problems, this project can't do that as the workunit is just too big. DistRTgen can do that, as can Collatz and Moo, but at most projects the workunit is just too big to fit all of it into the gpu memory and still have enough left over to crunch with.
ID: 59069 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pavel Hanak

Send message
Joined: 30 Apr 10
Posts: 4
Credit: 38,050,684
RAC: 0
Message 59077 - Posted: 24 Jun 2013, 18:34:07 UTC - in response to Message 59069.  
Last modified: 24 Jun 2013, 19:07:40 UTC

To say the truth, I found your "high I/O and memory access" explanation a bit fishy. It simply would be a very unlikely coincidence that 11 CPU cores crunching were fine, but 12 cores suddenly created such a bottleneck that it would slow down the GPU to half. So I poked around a bit more and I think I found the true source of the problem. I think the MW@H app jumps between CPU threads wildly, which (among other undesirable things) causes those GPU slowdowns. When I use Windows Task Manager to force "milkyway_separation__modified_fit_1.22_windows_x86_64__opencl_amd_ati" process to use only one thread (it is called "process affinity" in Windows), the GPU jumps to 100% even if all CPU cores are crunching at full blast. Forcing the MW@H app to use just one CPU thread of course eliminates that wild jumping. The bad thing is, the affinity setting lasts only one workunit, so unless it is fixed in the app itself, this solution is useless.

I found no bug-report thread here, so if some moderator sees this, please forward this information to MW@H programmers.

BTW, this is not the first time I encountered a problem like this, though AFAIR it would be the first for BOINC apps. I still vividly remember how many programs crashed or run extremely slow on (then bleeding edge desktop CPU) Athlon X2. Even the WXP themselves needed a special patch to run properly. It is rather rare to happen with modern programs though. The last time I needed to mess with process affinity like this was when I had random crashes in Fallout 3. It ran fine on my previous 4-thread machine, but its programmers obviously never expected that 12-thread machines would come so soon...

...
...
...

Hmm, is it possible somehow to force process affinity in BOINC when it starts the apps, by any chance?
ID: 59077 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 59079 - Posted: 24 Jun 2013, 18:47:30 UTC

Hey there Pavel.

We are looking into this. There is a chance it might be something weird with the Boinc scheduler, but we won't know until we get a chance to look into it some more. Sorry if it takes a little while to fix we have a couple other bug fixes to release this week and then we will focus on this.

Thanks for the report,

Jake W.
ID: 59079 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pavel Hanak

Send message
Joined: 30 Apr 10
Posts: 4
Credit: 38,050,684
RAC: 0
Message 59080 - Posted: 24 Jun 2013, 19:20:42 UTC - in response to Message 59079.  
Last modified: 24 Jun 2013, 19:22:58 UTC

Oh wow, I didn't expect somebody from MW@H team would notice so soon. That CPU thread affinity problem is no big deal, I can easily leave 1 core idle via "local computing preferences" in BOINC Manager. My current water-cooling solution a bit struggles with the almost 400-watt heat load from GPU and CPU anyway. I primarily designed it to be as quiet as possible, not to dissipate that much heat 24/7. So until I solve that, I can't safely run the MW@H GPU app for longer than a few hours a day anyway.
ID: 59080 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JLConawayII

Send message
Joined: 27 Apr 10
Posts: 35
Credit: 90,828,595
RAC: 0
Message 59092 - Posted: 25 Jun 2013, 12:00:14 UTC

I had problems with this a few years ago on folding@home. The system SHOULD give the GPU the cycles it needs in order to run properly, but sometimes it doesn't if you have the CPU running at 100%. The simplest fix is to just leave one core free (this is normally recommended anyway). You aren't going to lose much work from that one core and it will be an overall net gain in production because your GPU won't be starving to death.
ID: 59092 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3319
Credit: 520,257,388
RAC: 20,689
Message 59094 - Posted: 25 Jun 2013, 12:02:14 UTC - in response to Message 59077.  

Hmm, is it possible somehow to force process affinity in BOINC when it starts the apps, by any chance?


Yes but every new unit goes back to the default settings. So YES you can set a currently running unit to a certain cpu core, but as soon as that unit finishes the next unit will revert back to the defaults. You can do this by going into the task manager, and then right clicking on the task and clicking affinity. I am NOT a Linux guy so you will have to figure out the Linux equivalents if you use Linux. Supposedly there are some interesting tweaks coming in some future Boinc versions, but I don't know if that is one of them or not. I am not a member of the Boinc Mailing List or a programmer, so don't know those kinds of things.
ID: 59094 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Pavel Hanak

Send message
Joined: 30 Apr 10
Posts: 4
Credit: 38,050,684
RAC: 0
Message 59140 - Posted: 27 Jun 2013, 19:13:38 UTC

As a temporary fix, I found a little program that can set process affinity automatically:

http://bitsum.com/processlasso/

In case anyone else wants to try it, you will find "Configure default CPU affinities" in its Options menu. Just write *milkyway* (including the asterisks) in the "name match" field, select only one CPU on the right and press "Add to list". Works like charm, my GPU now crunches at 100% even when all CPU cores are busy.
ID: 59140 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : GPU not at 100% when all CPU cores crunch

©2024 Astroinformatics Group