How to reduce VRAM usage for NVIDIA GPU tasks?

Author	Message
Cautilus Send message Joined: 29 Jul 14 Posts: 19 Credit: 3,451,802,406 RAC: 0	Message 67084 - Posted: 15 Feb 2018, 7:20:56 UTC So I have a TITAN V that I want to use on this project while it's not doing anything important. The problem is I can't max out its usage by running more WUs simultaneously because I max out the VRAM on the TITAN and all of the work units end in 'computation error'. I can run about 8 or so WUs simultaneously if I micromanage them so they don't hit 12GB VRAM usage, but surely there's a way to set the WUs to use less VRAM somehow right? ID: 67084 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 2 Oct 16 Posts: 167 Credit: 1,006,769,984 RAC: 11,900	Message 67085 - Posted: 15 Feb 2018, 11:32:47 UTC Last modified: 15 Feb 2018, 11:35:46 UTC Don't run 8x... I am running 4x with 330mb of memory usage. Is it so much higher on NV cards? Is that really 70-90 seconds at 8x? ID: 67085 · Rating: 0 · rate: / Reply Quote

Cautilus Send message Joined: 29 Jul 14 Posts: 19 Credit: 3,451,802,406 RAC: 0	Message 67086 - Posted: 15 Feb 2018, 15:24:46 UTC Yeah look trust me, I'd need probably 16 WUs simultaneously to saturate the TITAN V's FP64. I know on 280X's the VRAM usage is significantly lower, for some reason on the TITAN, each WU uses about 1.5GB of VRAM. I'm not sure if this is because of the new architecture or if it's just an NVIDIA thing. The WUs process in about 55 - 65 seconds even with 10 WUs running simultaneously, and that's still peaking at only 70 - 75% usage, indicating there's still headroom left. ID: 67086 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 2 Oct 16 Posts: 167 Credit: 1,006,769,984 RAC: 11,900	Message 67089 - Posted: 15 Feb 2018, 22:54:25 UTC That's ridiculous. They probably complete so fast to keep it busy. A longer task, the individual piece of the bundle, would probably suit that very well. They it could probably stay busy with fewer tasks. I don't run MW on any of my Pascal/Maxwell cards cause... well they suck at FP64 ha. I' guess it would be NV's implementation of OpenCL code vs one NV card over another. A guess. ID: 67089 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 709 Credit: 549,584,151 RAC: 56,221	Message 67094 - Posted: 16 Feb 2018, 20:16:43 UTC - in response to Message 67084. Last modified: 16 Feb 2018, 20:18:35 UTC Generally most current OpenCL applications are limited to 25% of VRAM on graphics cards. So you only have approximately 3GB of the 12GB of VRAM accessible on your Titan V for MW tasks to use. If and when applications start using the OpenCL 2.0 specification that opens up global_work_size memory space, then you would be able to fully access the 12GB. From the Nvidia driver release notes. Experimental OpenCL 2.0 Features Select features in OpenCL 2.0 are available in the driver for evaluation purposes only. The following are the features as well as a description of known issues with these features in the driver: ïµ Device side enqueue â€¢The current implementation is limited to 64-bit platforms only. â€¢OpenCL 2.0 allows kernels to be enqueued with global_work_size larger than the compute capability of the NVIDIA GPU. The current implementation supports only combinations of global_work_size and local_work_size that are within the compute capability of the NVIDIA GPU. The maximum supported CUDA grid and block size of NVIDIA GPUs is available at http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#computecapabilities. For a given grid dimension, the global_work_size can be determined by CUDA grid size x CUDA block size. â€¢For executing kernels (whether from the host or the device), OpenCL 2.0 supports non-uniform ND-ranges where global_work_size does not need to be divisible by the local_work_size. This capability is not yet supported in the NVIDIA driver, and therefore not supported for device side kernel enqueues. ID: 67094 · Rating: 0 · rate: / Reply Quote

Cautilus Send message Joined: 29 Jul 14 Posts: 19 Credit: 3,451,802,406 RAC: 0	Message 67101 - Posted: 18 Feb 2018, 3:43:06 UTC Last modified: 18 Feb 2018, 3:44:00 UTC Well maybe that's how it's set out in NVIDIA's guidelines, but Milkyway still allocates up to 12GB of VRAM. Here's a graph from HWiNFO64 that shows my VRAM usage with 8 WUs running simultaneously, clearly showing it's above the 3GB threshold (Y-axis is from 0 to 12500MB of VRAM allocation). https://i.imgur.com/665amcH.png ID: 67101 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 2 Oct 16 Posts: 167 Credit: 1,006,769,984 RAC: 11,900	Message 67102 - Posted: 19 Feb 2018, 0:22:14 UTC Maybe 3gb per task then. Otherwise why put 4+ GB on any AMD card. ID: 67102 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3334 Credit: 524,010,781 RAC: 962	Message 67107 - Posted: 19 Feb 2018, 16:36:43 UTC - in response to Message 67102. Maybe 3gb per task then. Otherwise why put 4+ GB on any AMD card. Because crunching is NOT their primary market, it's gaming and they can access it all. Building super computers is a big market share too and they can access it all too. ID: 67107 · Rating: 0 · rate: / Reply Quote

Chooka Send message Joined: 13 Dec 12 Posts: 101 Credit: 1,782,758,310 RAC: 526	Message 67132 - Posted: 23 Feb 2018, 20:18:06 UTC A Titan V as in the $3000 Volta card Cautilus??? Jeezus. I thought I had the BOINC bug bad...LOL. ID: 67132 · Rating: 0 · rate: / Reply Quote

mmonnin Send message Joined: 2 Oct 16 Posts: 167 Credit: 1,006,769,984 RAC: 11,900	Message 67133 - Posted: 23 Feb 2018, 22:12:22 UTC - in response to Message 67132. A Titan V as in the $3000 Volta card Cautilus??? Jeezus. I thought I had the BOINC bug bad...LOL. Yes, x2 cards. NV put a hefty price tag on their top compute card. ID: 67133 · Rating: 0 · rate: / Reply Quote

Chooka Send message Joined: 13 Dec 12 Posts: 101 Credit: 1,782,758,310 RAC: 526	Message 67134 - Posted: 24 Feb 2018, 8:55:14 UTC - in response to Message 67133. WOW...that has a staggering DP score. 6144.0 (7449.6) Crazy! (Guess it comes with a crazy price tag too) Cautilus couldn't do any better to try and catch Gary Roberts ;) ID: 67134 · Rating: 0 · rate: / Reply Quote