Welcome to MilkyWay@home

Posts by kevinjos

1) Message boards : Application Code Discussion : GPU RAM Requirements (Message 67327)
Posted 13 Apr 2018 by kevinjos
Post:
Yeah it seems like the work units use up to 1.5GB of VRAM on NVIDIA cards for whatever reason. On AMD cards they only use like 100MB per work unit, I'm not sure why NVIDIA work units use so much more VRAM. But I would also like to know if there's a way to reduce the VRAM usage for NVIDIA cards.


Thanks for sharing! This provides a good lead into where the issue my be. The GPUs I am using are from Nvidia. Also the 1.5GB VRAM observation is in line with 8WUs/GPU * 1.5GB/WU = 12GBs pushing the limit of a 12GB card.
2) Message boards : Application Code Discussion : GPU RAM Requirements (Message 67313)
Posted 7 Apr 2018 by kevinjos
Post:
I am seeing errors such as the following when trying to run 8 WUs per GPU:

Error creating context (-6): CL_OUT_OF_HOST_MEMORY

https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=2304133664

How much GPU RAM do WUs require on average? Is it unreasonable to expect 8 WUs to run on a GPU with only 12GBs RAM? I ask because when I only run 4 WUs per GPU the GPU utilization is not near 100%, thus why I would like to run a greater number at once. Could someone point to me the relevant lines in the code? I'd be happy to take a look to better understand the GPU RAM allocations.
3) Message boards : Number crunching : Run Multiple WU's on Your GPU (Message 67312)
Posted 6 Apr 2018 by kevinjos
Post:
I think this is a probable lead https://github.com/BOINC/boinc/issues/1773
4) Message boards : Number crunching : Run Multiple WU's on Your GPU (Message 67307)
Posted 4 Apr 2018 by kevinjos
Post:
No I don't have it in front of me but the Boinc Server side software isn't capable of seeing all the memory the newer gpu's have. It's not a "bug in the source code" either, it's older programming code that hasn't caught up yet. Boinc is written by a bunch of volunteers right now and even though they are very dedicated they all have "real jobs" too and they are mostly just fixing bugs and things in the Boinc software, both the Client and Server side. The money ran out a while ago and things aren't being done as quickly as they used to be.


Very interesting. I understand the challenge of maintaining an opensource project without the support of a full-time staff. I will take a look at the source and see if I can identify where the issue may be. If you have any recommendations on where to being, that would be much appreciated! :)
5) Message boards : Number crunching : Run Multiple WU's on Your GPU (Message 67298)
Posted 3 Apr 2018 by kevinjos
Post:
You have run into a Boinc software limitation, not a gpu limitation, Boinc itself can't see 12gb of ram on the gpu, it will in time but not now, so running that many workunits that each take that much memory will be a problem.


How could this be a BOINC limitation? Do you have a citation on this? Or a link to the bug in the source code? It seems to me that if I ask BOINC to schedule 8 tasks per GPU, that BOINC will do that without trying to determine if the GPU has enough RAM. Additionally, the errors I am seeing are coming from the Milkyway WUs. The errors are intermittent too. The computer can successfully handle 8 WUs per GPU most of the time, but even a 5% error rate is too high.
6) Message boards : Number crunching : Run Multiple WU's on Your GPU (Message 67293)
Posted 3 Apr 2018 by kevinjos
Post:
I have observed that running too many WUs per GPU can cause the WUs to error out with OOMs. Is there any guidance on how much is too much? For instance, I have a GPU with 12GB RAM, what's the maximum number of WUs I can put on this GPU? In practice, 8 leads to errors and 4 does not. That said, the GPU is not crunching at 100% utilization with only 4 WUs scheduled at a time.




©2020 Astroinformatics Group