Welcome to MilkyWay@home

Posts by vseven

21) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 67433)
Posted 8 May 2018 by vseven
Post:
For Nvidia the original Titan, Titan Black and dual-gpu Titan Z all offer strong FP64 at a favorable GFLOPS per Watt (6.0, 6.6, 7.2)


The Wikipedia on these is pretty interesting also:

https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units
22) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 67432)
Posted 8 May 2018 by vseven
Post:
nVidia Tesla v100 16Gb SXM2 interface, Ubuntu 16.04, Cuda 9.1:

GPU - CPU - Credit
16.31 - 13.95 - 227.63
17.30 - 15.26 - 229.05
16.21 - 13.95 - 227.63
16.23 - 11.73 - 228.52
15.20 - 13.08 - 228.13

With the above said it can run 6 at a time averaging about 31s each so a bit above 5s per WU.


Its beautiful: https://imgur.com/Gsb3NiR
23) Message boards : Number crunching : Huge number of 'Validation inconclusive' WUs (Message 67431)
Posted 8 May 2018 by vseven
Post:
I'm confused. I mean they will eventually be validated so what does it matter, correct?
24) Message boards : Number crunching : Run Multiple WU's on Your GPU (Message 67378)
Posted 20 Apr 2018 by vseven
Post:
You have run into a Boinc software limitation, not a gpu limitation, Boinc itself can't see 12gb of ram on the gpu, it will in time but not now, so running that many workunits that each take that much memory will be a problem.


How could this be a BOINC limitation? Do you have a citation on this? Or a link to the bug in the source code? It seems to me that if I ask BOINC to schedule 8 tasks per GPU, that BOINC will do that without trying to determine if the GPU has enough RAM. Additionally, the errors I am seeing are coming from the Milkyway WUs. The errors are intermittent too. The computer can successfully handle 8 WUs per GPU most of the time, but even a 5% error rate is too high.



So this info is incorrect. There is no issue with BOINC and 12Gb of ram on a graphics card. The issue is the application running the WU doesn't know to throttle back if it runs out of memory. So with 12Gb of GPU RAM and 8 WU going you can go past the 12Gb of available RAM and it will error out and I think kill all running WUs (or at least the one that ran out of memory). This is not a BOINC limitation but a limitation with the application crunching the WU. I recently tested out a Tesla v100 with 16Gb of GPU RAM. I ran 10 WU at a time and I would peak at 14.5Gb of RAM used. It didn't error out...worked fine. This was running Boinc 7.6.31 on Ubuntu 16.04. If I pushed 12 WU, depending on how they ran (RAM usage ramps up as WU processes) they would error out because I ran out of GPU RAM. In general a Milkyway WU will peak at the end around 1800 Mb. Doing the math:


6 WU @ 1800 = 10.8Gb
8 WU @ 1800 = 14.4Gb


That's why you are erroring out at 8 WU....you are randomly running out of GPU RAM. I say randomly because your WU are all starting and ending at random times and its rare for all of them to finish at once (and hit peak memory usage). You could probably get away with 7 but some will still fail randomly. Here is a v100 running 7 WU. Notice the GPU memory usage:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30                 Driver Version: 390.30                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-PCIE...  Off  | 000094A8:00:00.0 Off |                    0 |
| N/A   60C    P0   199W / 250W |   8915MiB / 16160MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     92457      C   ..._x86_64-pc-linux-gnu__opencl_nvidia_101  1838MiB |
|    0     92476      C   ..._x86_64-pc-linux-gnu__opencl_nvidia_101  1480MiB |
|    0     92484      C   ..._x86_64-pc-linux-gnu__opencl_nvidia_101  1838MiB |
|    0     92500      C   ..._x86_64-pc-linux-gnu__opencl_nvidia_101  1444MiB |
|    0     92523      C   ..._x86_64-pc-linux-gnu__opencl_nvidia_101  1480MiB |
|    0     92685      C   ..._x86_64-pc-linux-gnu__opencl_nvidia_101   406MiB |
|    0     92693      C   ..._x86_64-pc-linux-gnu__opencl_nvidia_101   358MiB |
+-----------------------------------------------------------------------------+


The ones at the top of the list have been running and are about to finish up. The ones at the bottom (higher PID) have just started.

There is a command line version of BOINC. If I were you I'd openup a DOS prompt and go to c:\Program Files\BOINC or wherever you have it installed. Run " boinccmd --get_project_status". Record the current time and the number of WU's you have and the elapsed time. Thats your baseline. Let it run for a couple hours. Get the stats again. Calculate the difference (new time - old time and new total - old total). Do the math to find out how long you were taking per WU. Now divide that by 6. There is your approximate average per WU when running 6 at a time. Change it to 5 WU, do it again. Change it to 4, do it again. Change it to 7, do it again (and watch for errors).


You now have a number lower then the others. Stick with that many WUs.


Also in case anyone is wondering after lots of playing a Tesla v100 seems optimal at 7 WU using 0.142 for the GPU setting and I used 0.5 for the CPU. It seemed to give the best average WU time with quantity taken into effect...37s per WU with 7 at a time or a average of one WU per 5.3 seconds. I also tested a p100 which despite its price tag being 75% of a v100 its almost half the speed. The best I could get out of it was 54.9s per WU with 6 at a time or a average of one WU per 9.14 seconds. 4 or 5 WU were just about the same (9.32), below or above were slower on average.


Previous 20

©2024 Astroinformatics Group