Message boards :
Number crunching :
opencl_nvidia_101 on RTX 3060 Ti
Message board moderation
Author | Message |
---|---|
Send message Joined: 11 Mar 22 Posts: 42 Credit: 21,902,543 RAC: 0 |
I changed the grafix card on my rig from an old GTX 660 Ti to a RTX 3060 Ti. From the old set-up, I still have many WUs indicated with "opencl_nvidia_101" and to me it seems that the new card isn't much faster than the old one. Are there other WU types around that make use of the advanced features of a modern grafix card? My rig: AMD Ryzen 9 5950x RTX 3060 Ti Asus Prime B550M-WiFi 32 GB RAM Controlled by an app_confix.xml, I have 4 GPU tasks running with 1 CPU each and another 8 CPU only tasks. According to HWiNFO64, this keeps the average CPU (Tdie) just around 90° C, which still is within the temp limits of the CPU. It seems there is no difference in running GPU tasks with 0.25 GPU and 1 CPU or just with 0.5 CPU. Running MW@home on all 16 cores results in temps up to 94° C, but still no thermal throttling (HTC) kicks in. However, I don't like to have the Ryzen load at max all the time. |
Send message Joined: 13 Dec 17 Posts: 46 Credit: 2,421,362,376 RAC: 0 |
Nominally, the RTX 3060Ti should be 2.5 times faster than GTX 660Ti (that is in terms of FP64 FLOPs - 253 GFLOPs vs 110 GFLOPs). This should be a noticeable difference, something like 8 simultaneous WUs vs 4 simultaneous WUs completed for a similar time. |
Send message Joined: 11 Mar 22 Posts: 42 Credit: 21,902,543 RAC: 0 |
Well, the RTX 3060 Ti completes a WU in about 9:55 minutes. This is almost the same as with the GTX 660 Ti, only that I now can run 4 GPU tasks in parallel, with the GTX 660 Ti it were only 2 in parallel. The 660 Ti uses OpenCL 3.0 and CUDA 3.0 while the 3060 Ti runs still OpenCL 3.0 but CUDA 8.6. MW@home runs Seperation 1.46 WUs since 3-4 years now? And they are still using the old OpenCL 3.0 standard? |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 555,629,227 RAC: 42,188 |
Well, MW used to use the OpenCL 1.2 standard. And OpenCL 3.0 is in fact the latest standard and only recently supported by the card vendors drivers. https://www.khronos.org/registry/OpenCL/specs/3.0-unified/pdf/OpenCL_API.pdf |
Send message Joined: 11 Mar 22 Posts: 42 Credit: 21,902,543 RAC: 0 |
My bad. Depending the sources I have found, the 660 Ti uses opencl 1.1 or 1.2, not 3.0. |
Send message Joined: 1 Feb 09 Posts: 4 Credit: 101,666,161 RAC: 0 |
I have a 3900x/3060Ti doing Seperation (GPU) tasks under linux using nvidia 510 drivers currently does 1 work unit in about 2:40. Is your 9:55 work unit time running 4 at once? I had a lot of invalids to start out but I updated my drivers from 470 to 510, disabled milkyway CPU work, backed off on my CPU cores loaded with other BOINC projects and one of these seems to have greatly lowered the invalid rate. I am pretty hesitant to push multiple tasks, what do you all think for my card would it be worth it? |
Send message Joined: 13 Oct 21 Posts: 44 Credit: 226,974,200 RAC: 3,714 |
I have an HP Omen (running Windows10 and use WSL2 for Linux tasks) that came with a 3060Ti. I upgraded the CPU to 5900X (12C/24T) and RAM to 64GB. MW Separation tasks take ~ 2:45 per task running one a time. I tested running multiple at a time a while back and found that time per task slowed down too much. You get more done per unit of time by running one task at a time. I also run Einstein GPU tasks and found that FGRP1G tasks are best run one at a time but O3AS ones - 3 at a time. I very rarely get errors or invalids and run my PC full load (CPU & GPU) 24/7. Even with demanding loads such as 24 Rosettas or 24 LHC ATLAS tasks at a time which can almost use all of 64GB of RAM. ace_quaker, besides drivers, check if your firmware/VBIOS is up to date. Maybe also motherboard drivers/firmware. I wouldn't think that the load you place on your CPU should affect the error rate of GPU tasks. From my experience, MW Separation on 3060Ti is best run one task at a time. GolfSierra, I undervolted my CPU using RyzenMaster and GPU using MSI Afterburner. This did wonders for cooling and noise with no noticeable performance hit. CPU temperatures rarely go over 70 (usually 60s) and GPU is usually in the 50s even under heavy loads mentioned above. Since I run my PC full load 24/7, I eventually optimized some more and tuned down the CPU to 3.7 GHz as it significantly decreased power usage and I didn't notice any performance hit compared to having it run at 4+ GHz. You may consider doing something similar so you can use all or at least more of the 32 threads available on the 5950X and with lower temperatures. |
Send message Joined: 13 Dec 12 Posts: 101 Credit: 1,782,758,310 RAC: 0 |
I found the following with my 3070Ti - 1 x wu = 148 sec 2 x wu = 120 sec 3 x wu = 114 sec 4 x wu = 112.5sec Any which way you cut it, these cards SUCK at Milkyway. (I bought this card for Primegrid where they do very well) For comparison, my 7990 = roughly 46sec running 3 wu's. Radeon VII is about 45sec running 4 wu's. |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 555,629,227 RAC: 42,188 |
I find the following with my GTX 1080 Ti's - 1X = 101 seconds 2X = 80 seconds My 3080's do no better 2X=75-82 seconds This Separation app is not very well optimized. When I was injecting an extra OpenCL optimized library into the OS environment for the benefit of Einstein Gamma-Ray tasks, I discovered an unintentional beneficial side-effect on Milkyway Separation tasks that knocked 30 seconds off the runtimes compared to not injecting the optimized library. When we switched to a newer optimized GR application that got rid of the injected library to bring the optimization code inside the newer application I lost the side-effect benefit on my MW tasks and they reverted to the original runtimes of the stock application. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
I find the following with my GTX 1080 Ti's - As I said somewhere else they need Petri here too if you guys can spare him!! :-) |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 555,629,227 RAC: 42,188 |
I agree. I hope I can turn his attention to the Separation app as soon as he is finished with a final revision of the GR app. |
Send message Joined: 16 Mar 10 Posts: 213 Credit: 108,372,900 RAC: 3,950 |
I have the feeling that improving the efficiency of the existing program without also finding a way of putting a lot more sub-tasks within each work unit aimed at GPUs will probably lead to more server overloading because of increased database activity :-( There are several issues regarding packing more into an individual work unit. The biggest issue is likely to be that the current parameter mechanism uses the command line option of the work unit, so it may be constrained by that (modern BOINC clients on Linux can handle command lines of more than 1024 bytes; is that also true for Windows???) So it might be necessary to re-write the wrapper to get the parameters from a file instead, and that would entail re-working the work-unit generator as well. Also, packing more into a work-unit will mean large increases in run time for users of the CPU version, or the need for separate versions for CPU and GPU, with separate validators. Of course, some of the people with powerful GPUs will say something along the lines of "But you don't need CPU tasks for this project!" -- witness some of the posts about Open Pandemics at WCG! That may or may not be true, but it is unfair to folks without a GPU :-) Yes, making the application more efficient will have benefits, especially for the users with [lots of] powerful GPUs. However, the possible consequences should also be considered -- need I remind the ex-SETI@Home folks of the issues there as more and more folks ran the "Special Sauce" application and the servers often got very bogged down... Cheers - Al. |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 555,629,227 RAC: 42,188 |
You make a very valid point Al. We can improve the application speed to the point the projects servers can't keep up or cope. In the end, IF we get Petri to improve the speed of the application and submit it to the project managers, it is still up them to decide whether to release it the general public. With the very low MW participation of GPUUG team members, I doubt the couple of us could impact the servers too much. The improvement in the Einstein GR app that Petri developed was incorporated into the Einstein stock application. Again that was a decision by the project managers that their servers could handle it. But now would not be the time to incorporate an improved application here because things certainly have not returned to normal after the disk and database rebuild. |
Send message Joined: 11 Mar 22 Posts: 42 Credit: 21,902,543 RAC: 0 |
I have a 3900x/3060Ti doing Seperation (GPU) tasks under linux using nvidia 510 drivers currently does 1 work unit in about 2:40. Is your 9:55 work unit time running 4 at once? Yes, I'm controlling MW@home through an app_config.xml. 4 GPU tasks in parallel, the setting ist 0.5 CPU and 0.25 GPU per task. Had no invalid results so far. You should give it a try. Beside the GPU tasks, my computer does 8 CPU tasks and 1 CPU task (16 CPUs) in parallel. I didn't enable all CPUs for number crunching to keep the CPU average temps lower. This is my app_config.xml <app_config> <app> <name>milkyway</name> <max_concurrent>12</max_concurrent> <gpu_versions> <gpu_usage>0.25</gpu_usage> <cpu_usage>0.5</cpu_usage> </gpu_versions> </app> <app> <name>milkyway_nbody</name> <max_concurrent>12</max_concurrent> <gpu_versions> <gpu_usage>0.25</gpu_usage> <cpu_usage>0.5</cpu_usage> </gpu_versions> </app> </app_config> |
Send message Joined: 11 Mar 22 Posts: 42 Credit: 21,902,543 RAC: 0 |
I find the following with my GTX 1080 Ti's - I fully agree. Video cards have mage huge progress in capabilities, that's why GPUs are used for mining and not CPUs. However, distributed computing does not yet make use of this potential. |
©2024 Astroinformatics Group