Run Multiple WU's on Your GPU

Author	Message
bluestang Send message Joined: 13 Oct 16 Posts: 112 Credit: 1,174,293,644 RAC: 0	Message 66023 - Posted: 16 Dec 2016, 3:14:36 UTC - in response to Message 66022. Not that I know of, I always have a crap load as well and so do most people I think. Nothing to worry about me thinks ;) ID: 66023 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 715 Credit: 557,043,112 RAC: 42,579	Message 66025 - Posted: 17 Dec 2016, 9:09:12 UTC - in response to Message 66022. Quick question..... ATM I have about 186 WU's which are validation inconclusive. Could that be a result of running multiple WU's? Thx Chooka, your app_info.xml looks fine for elements and syntax. Your inconclusives are less than 10% of valid tasks and matches my inconclusive percentage. I see nothing to worry about. I didn't see any glaring completely wrong numbers in several of your inconclusive and valid task stderr.txt outputs. Just keep motorin' along. ;^} ID: 66025 · Rating: 0 · rate: / Reply Quote

Chooka Send message Joined: 13 Dec 12 Posts: 101 Credit: 1,782,758,310 RAC: 0	Message 66614 - Posted: 16 Sep 2017, 4:19:27 UTC Last modified: 16 Sep 2017, 4:52:47 UTC Hi guys, I'm back again. Bought a new video card and am getting mutiple computaional errors. Does this file still look correct? I seriously can't get my head around this stuff every time it comes up. Quite depressing really :( <app_info> <app> <name>milkyway</name> </app> <file_info> <name>milkyway_1.43_windows_x86_64__opencl_ati_101.exe</name> <executable/> </file_info> <app_version> <app_name>milkyway</app_name> <version_num>143</version_num> <platform>windows_x86_64</platform> <avg_ncpus>0.5</avg_ncpus> <max_ncpus>0.567833</max_ncpus> <plan_class>opencl_ati_101</plan_class> <cmdline></cmdline> <coproc> <type>ATI</type> <count>0.5</count> </coproc> <file_ref> <file_name>milkyway_1.43_windows_x86_64__opencl_ati_101.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>milkyway_separation__modified_fit</name> <user_friendly_name>Milkyway Sep. (Mod. Fit)</user_friendly_name> </app> <file_info> <name>milkyway_separation__modified_fit_1.43_windows_x86_64.exe</name> <executable/> </file_info> <app_version> <app_name>milkyway_separation__modified_fit</app_name> <version_num>143</version_num> <platform>windows_x86_64</platform> <file_ref> <file_name>milkyway_separation__modified_fit_1.43_windows_x86_64.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>milkyway_separation__modified_fit</name> </app> <file_info> <name>milkyway_separation__modified_fit_1.43_windows_x86_64__opencl_ati_101.exe</name> <executable/> </file_info> <app_version> <app_name>milkyway_separation__modified_fit</app_name> <version_num>143</version_num> <platform>windows_x86_64</platform> <avg_ncpus>0.05000</avg_ncpus> <max_ncpus>0.0567833</max_ncpus> <plan_class>opencl_ati_101</plan_class> <cmdline></cmdline> <coproc> <type>ATI</type> <count>0.5</count> </coproc> <file_ref> <file_name>milkyway_separation__modified_fit_1.43_windows_x86_64__opencl_ati_101.exe</file_name> <main_program/> </file_ref> </app_version> </app_info> Is this an old...um...file name? 1.43? If so, where do I download a newer version? Or would I just change the 1.43 to 1.46? Thank you once again. ID: 66614 · Rating: 0 · rate: / Reply Quote

Chooka Send message Joined: 13 Dec 12 Posts: 101 Credit: 1,782,758,310 RAC: 0	Message 66618 - Posted: 16 Sep 2017, 10:24:22 UTC Sorry all, I got it sorted. The errors are being reported by others users too so it's not my card. ID: 66618 · Rating: 0 · rate: / Reply Quote

DJBPace07 Send message Joined: 21 Mar 15 Posts: 3 Credit: 47,175,569 RAC: 0	Message 66625 - Posted: 16 Sep 2017, 18:14:29 UTC - in response to Message 65387. I'm currently using an AMD Radeon RX Vega 64. I'm using the same settings from message 65387 for the AMD Radeon R9 Fury X but instead of 3 WUs per GPU, i'm using 4 WUs per GPU. A single WU takes about 90 seconds before I applied the change. Post-change, four WUs on the GPU take 3 minutes. All run simultaneously and finish at the same time of 3 minutes. I may be able to go higher, but the time may increase a lot more. ID: 66625 · Rating: 0 · rate: / Reply Quote

zlodeck Send message Joined: 24 Oct 11 Posts: 1 Credit: 100,090,867 RAC: 0	Message 66626 - Posted: 16 Sep 2017, 21:06:11 UTC I've monitored the task flow on my client (ATI HD5870) within last two days. Based on the current statistics there are three kind of MW tasks now: -- de_modfit_fast_XX_... -- reports an error; -- de_modfit_XX_... -- OK; -- de_modfit_fast_SimXX_... -- OK. Erroneous tasks are completed with an error on entire quorum, no difference of video card type. Moreover, CPU running tasks are reported an error too. ID: 66626 · Rating: 0 · rate: / Reply Quote

DJBPace07 Send message Joined: 21 Mar 15 Posts: 3 Credit: 47,175,569 RAC: 0	Message 66627 - Posted: 16 Sep 2017, 21:34:52 UTC - in response to Message 66626. Are you referring to http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4172? If so, there are two WU's that are causing all sorts of problems, they are: De_modfit_fast_18_3s_146_bundle5_ModfitConstraintsWithDisk_Bouncy and de_modfit_fast_20_3s_146_bundle5_ModfitConstraintsWithDisk_Bouncy. ID: 66627 · Rating: 0 · rate: / Reply Quote

kevinjos Send message Joined: 13 Mar 18 Posts: 9 Credit: 66,232,294 RAC: 0	Message 67293 - Posted: 3 Apr 2018, 6:01:30 UTC I have observed that running too many WUs per GPU can cause the WUs to error out with OOMs. Is there any guidance on how much is too much? For instance, I have a GPU with 12GB RAM, what's the maximum number of WUs I can put on this GPU? In practice, 8 leads to errors and 4 does not. That said, the GPU is not crunching at 100% utilization with only 4 WUs scheduled at a time. ID: 67293 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 67297 - Posted: 3 Apr 2018, 12:47:34 UTC - in response to Message 67293. I have observed that running too many WUs per GPU can cause the WUs to error out with OOMs. Is there any guidance on how much is too much? For instance, I have a GPU with 12GB RAM, what's the maximum number of WUs I can put on this GPU? In practice, 8 leads to errors and 4 does not. That said, the GPU is not crunching at 100% utilization with only 4 WUs scheduled at a time. You have run into a Boinc software limitation, not a gpu limitation, Boinc itself can't see 12gb of ram on the gpu, it will in time but not now, so running that many workunits that each take that much memory will be a problem. ID: 67297 · Rating: 0 · rate: / Reply Quote

kevinjos Send message Joined: 13 Mar 18 Posts: 9 Credit: 66,232,294 RAC: 0	Message 67298 - Posted: 3 Apr 2018, 14:41:40 UTC - in response to Message 67297. You have run into a Boinc software limitation, not a gpu limitation, Boinc itself can't see 12gb of ram on the gpu, it will in time but not now, so running that many workunits that each take that much memory will be a problem. How could this be a BOINC limitation? Do you have a citation on this? Or a link to the bug in the source code? It seems to me that if I ask BOINC to schedule 8 tasks per GPU, that BOINC will do that without trying to determine if the GPU has enough RAM. Additionally, the errors I am seeing are coming from the Milkyway WUs. The errors are intermittent too. The computer can successfully handle 8 WUs per GPU most of the time, but even a 5% error rate is too high. ID: 67298 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 67304 - Posted: 4 Apr 2018, 9:53:35 UTC - in response to Message 67298. You have run into a Boinc software limitation, not a gpu limitation, Boinc itself can't see 12gb of ram on the gpu, it will in time but not now, so running that many workunits that each take that much memory will be a problem. How could this be a BOINC limitation? Do you have a citation on this? Or a link to the bug in the source code? It seems to me that if I ask BOINC to schedule 8 tasks per GPU, that BOINC will do that without trying to determine if the GPU has enough RAM. Additionally, the errors I am seeing are coming from the Milkyway WUs. The errors are intermittent too. The computer can successfully handle 8 WUs per GPU most of the time, but even a 5% error rate is too high. No I don't have it in front of me but the Boinc Server side software isn't capable of seeing all the memory the newer gpu's have. It's not a "bug in the source code" either, it's older programming code that hasn't caught up yet. Boinc is written by a bunch of volunteers right now and even though they are very dedicated they all have "real jobs" too and they are mostly just fixing bugs and things in the Boinc software, both the Client and Server side. The money ran out a while ago and things aren't being done as quickly as they used to be. ID: 67304 · Rating: 0 · rate: / Reply Quote

kevinjos Send message Joined: 13 Mar 18 Posts: 9 Credit: 66,232,294 RAC: 0	Message 67307 - Posted: 4 Apr 2018, 16:15:54 UTC - in response to Message 67304. No I don't have it in front of me but the Boinc Server side software isn't capable of seeing all the memory the newer gpu's have. It's not a "bug in the source code" either, it's older programming code that hasn't caught up yet. Boinc is written by a bunch of volunteers right now and even though they are very dedicated they all have "real jobs" too and they are mostly just fixing bugs and things in the Boinc software, both the Client and Server side. The money ran out a while ago and things aren't being done as quickly as they used to be. Very interesting. I understand the challenge of maintaining an opensource project without the support of a full-time staff. I will take a look at the source and see if I can identify where the issue may be. If you have any recommendations on where to being, that would be much appreciated! :) ID: 67307 · Rating: 0 · rate: / Reply Quote

bluestang Send message Joined: 13 Oct 16 Posts: 112 Credit: 1,174,293,644 RAC: 0	Message 67308 - Posted: 4 Apr 2018, 20:21:06 UTC - in response to Message 58387. Nice! I would do 3 WUs per GPU then. Try and stagger them. ID: 67308 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 67309 - Posted: 5 Apr 2018, 2:31:39 UTC - in response to Message 67307. No I don't have it in front of me but the Boinc Server side software isn't capable of seeing all the memory the newer gpu's have. It's not a "bug in the source code" either, it's older programming code that hasn't caught up yet. Boinc is written by a bunch of volunteers right now and even though they are very dedicated they all have "real jobs" too and they are mostly just fixing bugs and things in the Boinc software, both the Client and Server side. The money ran out a while ago and things aren't being done as quickly as they used to be. Very interesting. I understand the challenge of maintaining an opensource project without the support of a full-time staff. I will take a look at the source and see if I can identify where the issue may be. If you have any recommendations on where to being, that would be much appreciated! :) You can try here http://boinc.berkeley.edu/dl/ but I'm not sure that's what you want. Like you I'm just a user, unlike you I have no clue what to even look for. I do know there is an email list about the software but I don't have the link to it right now, it's mostly about the Alpha and Beta versions of the client software though. ID: 67309 · Rating: 0 · rate: / Reply Quote

kevinjos Send message Joined: 13 Mar 18 Posts: 9 Credit: 66,232,294 RAC: 0	Message 67312 - Posted: 6 Apr 2018, 5:42:36 UTC Last modified: 6 Apr 2018, 5:43:04 UTC I think this is a probable lead https://github.com/BOINC/boinc/issues/1773 ID: 67312 · Rating: 0 · rate: / Reply Quote

vseven Send message Joined: 26 Mar 18 Posts: 24 Credit: 102,912,937 RAC: 0	Message 67378 - Posted: 20 Apr 2018, 18:04:50 UTC - in response to Message 67298. Last modified: 20 Apr 2018, 18:33:47 UTC You have run into a Boinc software limitation, not a gpu limitation, Boinc itself can't see 12gb of ram on the gpu, it will in time but not now, so running that many workunits that each take that much memory will be a problem. How could this be a BOINC limitation? Do you have a citation on this? Or a link to the bug in the source code? It seems to me that if I ask BOINC to schedule 8 tasks per GPU, that BOINC will do that without trying to determine if the GPU has enough RAM. Additionally, the errors I am seeing are coming from the Milkyway WUs. The errors are intermittent too. The computer can successfully handle 8 WUs per GPU most of the time, but even a 5% error rate is too high. So this info is incorrect. There is no issue with BOINC and 12Gb of ram on a graphics card. The issue is the application running the WU doesn't know to throttle back if it runs out of memory. So with 12Gb of GPU RAM and 8 WU going you can go past the 12Gb of available RAM and it will error out and I think kill all running WUs (or at least the one that ran out of memory). This is not a BOINC limitation but a limitation with the application crunching the WU. I recently tested out a Tesla v100 with 16Gb of GPU RAM. I ran 10 WU at a time and I would peak at 14.5Gb of RAM used. It didn't error out...worked fine. This was running Boinc 7.6.31 on Ubuntu 16.04. If I pushed 12 WU, depending on how they ran (RAM usage ramps up as WU processes) they would error out because I ran out of GPU RAM. In general a Milkyway WU will peak at the end around 1800 Mb. Doing the math: 6 WU @ 1800 = 10.8Gb 8 WU @ 1800 = 14.4Gb That's why you are erroring out at 8 WU....you are randomly running out of GPU RAM. I say randomly because your WU are all starting and ending at random times and its rare for all of them to finish at once (and hit peak memory usage). You could probably get away with 7 but some will still fail randomly. Here is a v100 running 7 WU. Notice the GPU memory usage: +-----------------------------------------------------------------------------+ \| NVIDIA-SMI 390.30 Driver Version: 390.30 \| \|-------------------------------+----------------------+----------------------+ \| GPU Name Persistence-M\| Bus-Id Disp.A \| Volatile Uncorr. ECC \| \| Fan Temp Perf Pwr:Usage/Cap\| Memory-Usage \| GPU-Util Compute M. \| \|===============================+======================+======================\| \| 0 Tesla V100-PCIE... Off \| 000094A8:00:00.0 Off \| 0 \| \| N/A 60C P0 199W / 250W \| 8915MiB / 16160MiB \| 100% Default \| +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ \| Processes: GPU Memory \| \| GPU PID Type Process name Usage \| \|=============================================================================\| \| 0 92457 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 1838MiB \| \| 0 92476 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 1480MiB \| \| 0 92484 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 1838MiB \| \| 0 92500 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 1444MiB \| \| 0 92523 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 1480MiB \| \| 0 92685 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 406MiB \| \| 0 92693 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 358MiB \| +-----------------------------------------------------------------------------+ The ones at the top of the list have been running and are about to finish up. The ones at the bottom (higher PID) have just started. There is a command line version of BOINC. If I were you I'd openup a DOS prompt and go to c:\Program Files\BOINC or wherever you have it installed. Run " boinccmd --get_project_status". Record the current time and the number of WU's you have and the elapsed time. Thats your baseline. Let it run for a couple hours. Get the stats again. Calculate the difference (new time - old time and new total - old total). Do the math to find out how long you were taking per WU. Now divide that by 6. There is your approximate average per WU when running 6 at a time. Change it to 5 WU, do it again. Change it to 4, do it again. Change it to 7, do it again (and watch for errors). You now have a number lower then the others. Stick with that many WUs. Also in case anyone is wondering after lots of playing a Tesla v100 seems optimal at 7 WU using 0.142 for the GPU setting and I used 0.5 for the CPU. It seemed to give the best average WU time with quantity taken into effect...37s per WU with 7 at a time or a average of one WU per 5.3 seconds. I also tested a p100 which despite its price tag being 75% of a v100 its almost half the speed. The best I could get out of it was 54.9s per WU with 6 at a time or a average of one WU per 9.14 seconds. 4 or 5 WU were just about the same (9.32), below or above were slower on average. ID: 67378 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 67379 - Posted: 21 Apr 2018, 11:08:36 UTC - in response to Message 67378. You have run into a Boinc software limitation, not a gpu limitation, Boinc itself can't see 12gb of ram on the gpu, it will in time but not now, so running that many workunits that each take that much memory will be a problem. How could this be a BOINC limitation? Do you have a citation on this? Or a link to the bug in the source code? It seems to me that if I ask BOINC to schedule 8 tasks per GPU, that BOINC will do that without trying to determine if the GPU has enough RAM. Additionally, the errors I am seeing are coming from the Milkyway WUs. The errors are intermittent too. The computer can successfully handle 8 WUs per GPU most of the time, but even a 5% error rate is too high. So this info is incorrect. There is no issue with BOINC and 12Gb of ram on a graphics card. The issue is the application running the WU doesn't know to throttle back if it runs out of memory. So with 12Gb of GPU RAM and 8 WU going you can go past the 12Gb of available RAM and it will error out and I think kill all running WUs (or at least the one that ran out of memory). This is not a BOINC limitation but a limitation with the application crunching the WU. I recently tested out a Tesla v100 with 16Gb of GPU RAM. I ran 10 WU at a time and I would peak at 14.5Gb of RAM used. It didn't error out...worked fine. This was running Boinc 7.6.31 on Ubuntu 16.04. If I pushed 12 WU, depending on how they ran (RAM usage ramps up as WU processes) they would error out because I ran out of GPU RAM. In general a Milkyway WU will peak at the end around 1800 Mb. Doing the math: 6 WU @ 1800 = 10.8Gb 8 WU @ 1800 = 14.4Gb That's why you are erroring out at 8 WU....you are randomly running out of GPU RAM. I say randomly because your WU are all starting and ending at random times and its rare for all of them to finish at once (and hit peak memory usage). You could probably get away with 7 but some will still fail randomly. Here is a v100 running 7 WU. Notice the GPU memory usage: +-----------------------------------------------------------------------------+ \| NVIDIA-SMI 390.30 Driver Version: 390.30 \| \|-------------------------------+----------------------+----------------------+ \| GPU Name Persistence-M\| Bus-Id Disp.A \| Volatile Uncorr. ECC \| \| Fan Temp Perf Pwr:Usage/Cap\| Memory-Usage \| GPU-Util Compute M. \| \|===============================+======================+======================\| \| 0 Tesla V100-PCIE... Off \| 000094A8:00:00.0 Off \| 0 \| \| N/A 60C P0 199W / 250W \| 8915MiB / 16160MiB \| 100% Default \| +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ \| Processes: GPU Memory \| \| GPU PID Type Process name Usage \| \|=============================================================================\| \| 0 92457 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 1838MiB \| \| 0 92476 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 1480MiB \| \| 0 92484 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 1838MiB \| \| 0 92500 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 1444MiB \| \| 0 92523 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 1480MiB \| \| 0 92685 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 406MiB \| \| 0 92693 C ..._x86_64-pc-linux-gnu__opencl_nvidia_101 358MiB \| +-----------------------------------------------------------------------------+ The ones at the top of the list have been running and are about to finish up. The ones at the bottom (higher PID) have just started. There is a command line version of BOINC. If I were you I'd openup a DOS prompt and go to c:\Program Files\BOINC or wherever you have it installed. Run " boinccmd --get_project_status". Record the current time and the number of WU's you have and the elapsed time. Thats your baseline. Let it run for a couple hours. Get the stats again. Calculate the difference (new time - old time and new total - old total). Do the math to find out how long you were taking per WU. Now divide that by 6. There is your approximate average per WU when running 6 at a time. Change it to 5 WU, do it again. Change it to 4, do it again. Change it to 7, do it again (and watch for errors). You now have a number lower then the others. Stick with that many WUs. Also in case anyone is wondering after lots of playing a Tesla v100 seems optimal at 7 WU using 0.142 for the GPU setting and I used 0.5 for the CPU. It seemed to give the best average WU time with quantity taken into effect...37s per WU with 7 at a time or a average of one WU per 5.3 seconds. I also tested a p100 which despite its price tag being 75% of a v100 its almost half the speed. The best I could get out of it was 54.9s per WU with 6 at a time or a average of one WU per 9.14 seconds. 4 or 5 WU were just about the same (9.32), below or above were slower on average. Thank you very much, I learned something new today!! ID: 67379 · Rating: 0 · rate: / Reply Quote

Hurr1cane78 Send message Joined: 7 May 14 Posts: 57 Credit: 206,540,646 RAC: 0	Message 68840 - Posted: 6 Jun 2019, 6:44:29 UTC update Radeon VII _ 3 instances , I have uploaded to youtube including config for 3three instances https://youtu.be/4xKy9wGKmz4 ID: 68840 · Rating: 0 · rate: / Reply Quote

bluestang Send message Joined: 13 Oct 16 Posts: 112 Credit: 1,174,293,644 RAC: 0	Message 68842 - Posted: 6 Jun 2019, 17:40:15 UTC - in response to Message 68840. update Radeon VII _ 3 instances , I have uploaded to youtube including config for 3three instances https://youtu.be/4xKy9wGKmz4 Nice! We could use someone like you on our Team :) ID: 68842 · Rating: 0 · rate: / Reply Quote

Arnulf Send message Joined: 16 Nov 09 Posts: 1 Credit: 99,314,406 RAC: 0	Message 68865 - Posted: 26 Jun 2019, 18:18:10 UTC Thanks for the optimizing tips for the Radeon VII. Now I have another problem ..... is it possibe to get more than 300 workunits at a time? :P Every time I have crunched 300 WU's and asks for more it takes more than 5 minutes to get another batch of 300. ID: 68865 · Rating: 0 · rate: / Reply Quote