Welcome to MilkyWay@home

How to reduce VRAM usage for NVIDIA GPU tasks?


Advanced search

Message boards : Number crunching : How to reduce VRAM usage for NVIDIA GPU tasks?
Message board moderation

To post messages, you must log in.

AuthorMessage
Cautilus

Send message
Joined: 29 Jul 14
Posts: 9
Credit: 973,012,010
RAC: 1,678,362
500 million credit badge4 year member badge
Message 67084 - Posted: 15 Feb 2018, 7:20:56 UTC

So I have a TITAN V that I want to use on this project while it's not doing anything important. The problem is I can't max out its usage by running more WUs simultaneously because I max out the VRAM on the TITAN and all of the work units end in 'computation error'. I can run about 8 or so WUs simultaneously if I micromanage them so they don't hit 12GB VRAM usage, but surely there's a way to set the WUs to use less VRAM somehow right?
ID: 67084 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Oct 16
Posts: 133
Credit: 130,397,138
RAC: 139
100 million credit badge2 year member badge
Message 67085 - Posted: 15 Feb 2018, 11:32:47 UTC
Last modified: 15 Feb 2018, 11:35:46 UTC

Don't run 8x...

I am running 4x with 330mb of memory usage. Is it so much higher on NV cards?

Is that really 70-90 seconds at 8x?
ID: 67085 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cautilus

Send message
Joined: 29 Jul 14
Posts: 9
Credit: 973,012,010
RAC: 1,678,362
500 million credit badge4 year member badge
Message 67086 - Posted: 15 Feb 2018, 15:24:46 UTC

Yeah look trust me, I'd need probably 16 WUs simultaneously to saturate the TITAN V's FP64. I know on 280X's the VRAM usage is significantly lower, for some reason on the TITAN, each WU uses about 1.5GB of VRAM. I'm not sure if this is because of the new architecture or if it's just an NVIDIA thing. The WUs process in about 55 - 65 seconds even with 10 WUs running simultaneously, and that's still peaking at only 70 - 75% usage, indicating there's still headroom left.
ID: 67086 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Oct 16
Posts: 133
Credit: 130,397,138
RAC: 139
100 million credit badge2 year member badge
Message 67089 - Posted: 15 Feb 2018, 22:54:25 UTC

That's ridiculous. They probably complete so fast to keep it busy. A longer task, the individual piece of the bundle, would probably suit that very well. They it could probably stay busy with fewer tasks.

I don't run MW on any of my Pascal/Maxwell cards cause... well they suck at FP64 ha. I' guess it would be NV's implementation of OpenCL code vs one NV card over another. A guess.
ID: 67089 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKeith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 219
Credit: 108,346,956
RAC: 16,290
100 million credit badge8 year member badgeextraordinary contributions badge
Message 67094 - Posted: 16 Feb 2018, 20:16:43 UTC - in response to Message 67084.  
Last modified: 16 Feb 2018, 20:18:35 UTC

Generally most current OpenCL applications are limited to 25% of VRAM on graphics cards. So you only have approximately 3GB of the 12GB of VRAM accessible on your Titan V for MW tasks to use.

If and when applications start using the OpenCL 2.0 specification that opens up global_work_size memory space, then you would be able to fully access the 12GB.

From the Nvidia driver release notes.


Experimental OpenCL 2.0 Features
Select features in OpenCL 2.0 are available in the driver for evaluation purposes only. The
following are the features as well as a description of known issues with these features in
the driver:
 Device side enqueue
•The current implementation is limited to 64-bit platforms only.
•OpenCL 2.0 allows kernels to be enqueued with global_work_size larger than the
compute capability of the NVIDIA GPU. The current implementation supports only
combinations of global_work_size and local_work_size that are within the compute
capability of the NVIDIA GPU.

The maximum supported CUDA grid and block size of NVIDIA GPUs is available
at http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#computecapabilities.
For a given grid dimension, the global_work_size can be determined by
CUDA grid size x CUDA block size.
•For executing kernels (whether from the host or the device), OpenCL 2.0 supports
non-uniform ND-ranges where global_work_size does not need to be divisible by
the local_work_size. This capability is not yet supported in the NVIDIA driver, and
therefore not supported for device side kernel enqueues.

ID: 67094 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cautilus

Send message
Joined: 29 Jul 14
Posts: 9
Credit: 973,012,010
RAC: 1,678,362
500 million credit badge4 year member badge
Message 67101 - Posted: 18 Feb 2018, 3:43:06 UTC
Last modified: 18 Feb 2018, 3:44:00 UTC

Well maybe that's how it's set out in NVIDIA's guidelines, but Milkyway still allocates up to 12GB of VRAM. Here's a graph from HWiNFO64 that shows my VRAM usage with 8 WUs running simultaneously, clearly showing it's above the 3GB threshold (Y-axis is from 0 to 12500MB of VRAM allocation).

https://i.imgur.com/665amcH.png
ID: 67101 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Oct 16
Posts: 133
Credit: 130,397,138
RAC: 139
100 million credit badge2 year member badge
Message 67102 - Posted: 19 Feb 2018, 0:22:14 UTC

Maybe 3gb per task then. Otherwise why put 4+ GB on any AMD card.
ID: 67102 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2236
Credit: 259,748,137
RAC: 43,216
200 million credit badge10 year member badgeextraordinary contributions badge
Message 67107 - Posted: 19 Feb 2018, 16:36:43 UTC - in response to Message 67102.  

Maybe 3gb per task then. Otherwise why put 4+ GB on any AMD card.


Because crunching is NOT their primary market, it's gaming and they can access it all. Building super computers is a big market share too and they can access it all too.
ID: 67107 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileChooka
Avatar

Send message
Joined: 13 Dec 12
Posts: 52
Credit: 146,521,953
RAC: 5
100 million credit badge6 year member badgeextraordinary contributions badge
Message 67132 - Posted: 23 Feb 2018, 20:18:06 UTC

A Titan V as in the $3000 Volta card Cautilus???
Jeezus. I thought I had the BOINC bug bad...LOL.

ID: 67132 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Oct 16
Posts: 133
Credit: 130,397,138
RAC: 139
100 million credit badge2 year member badge
Message 67133 - Posted: 23 Feb 2018, 22:12:22 UTC - in response to Message 67132.  

A Titan V as in the $3000 Volta card Cautilus???
Jeezus. I thought I had the BOINC bug bad...LOL.


Yes, x2 cards. NV put a hefty price tag on their top compute card.
ID: 67133 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileChooka
Avatar

Send message
Joined: 13 Dec 12
Posts: 52
Credit: 146,521,953
RAC: 5
100 million credit badge6 year member badgeextraordinary contributions badge
Message 67134 - Posted: 24 Feb 2018, 8:55:14 UTC - in response to Message 67133.  

WOW...that has a staggering DP score. 6144.0 (7449.6)
Crazy! (Guess it comes with a crazy price tag too)

Cautilus couldn't do any better to try and catch Gary Roberts ;)

ID: 67134 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : How to reduce VRAM usage for NVIDIA GPU tasks?

©2019 Astroinformatics Group