Questions and Answers :
Unix/Linux :
MW stopped using my nvidia GPU
Message board moderation
Author | Message |
---|---|
Send message Joined: 26 May 20 Posts: 23 Credit: 673,618,527 RAC: 34,135 |
openSuse Linux (Tumbleweed) x86_64, nvidia driver 440.59 This was going along fine but at least a day ago all the nvidia tasks went to "waiting to run" status and have not run since. the activitiy menu options are all set to "Always" and nothing is suspended, there is no appinfo.xml no app_confing.xml cc_config.xml etc - just raw MW and BOINC. computing preferences allow for 1 cpu per GPU (I have 2 nividia cards) so that should be ok. The nvidia driver hasnt changed. I just double checked I am still part of the video group so thats not it. How can i determine what is preventing the GPU from running? Also, as an aside, how does one view the specific computer on the website? I cant seem to sort by computer id so searching for a particular computer in my set of computers is like searching for a needle in a haystack Example: Application Milkyway@home Separation 1.46 (opencl_nvidia_101) Name de_modfit_86_bundle4_4s_south4s_bgset_2_1588605902_20177642 State Waiting to run Received Sun 31 May 2020 09:17:52 PM PDT Report deadline Fri 12 Jun 2020 09:17:51 PM PDT Resources 0.965 CPUs + 1 NVIDIA GPU Estimated computation size 42,135 GFLOPs CPU time --- CPU time since checkpoint --- Elapsed time --- Estimated time remaining 00:03:05 Fraction done 0.000% Virtual memory size 9.97 GB Working set size 499.16 MB Directory slots/2 Executable milkyway_1.46_x86_64-pc-linux-gnu__opencl_nvidia_101 Here is a boinc re-start i just did to see if that would help (it didnt): [---] Starting BOINC client version 7.17.0 for x86_64-pc-linux-gnu [---] This a development version of BOINC and may not function properly [---] log flags: file_xfer, sched_ops, task [---] Libraries: libcurl/7.70.0 OpenSSL/1.1.1g-fips zlib/1.2.11 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.3.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0 [---] Data directory: /home/erbenton/boinc [---] CUDA: NVIDIA GPU 0: GeForce RTX 2060 (driver version 440.59, CUDA version 10.2, compute capability 7.5, 4096MB, 3970MB available, 6739 GFLOPS peak) [---] CUDA: NVIDIA GPU 1: GeForce GTX 1660 Ti (driver version 440.59, CUDA version 10.2, compute capability 7.5, 4096MB, 3972MB available, 5668 GFLOPS peak) [---] OpenCL: NVIDIA GPU 0: GeForce RTX 2060 (driver version 440.59, device version OpenCL 1.2 CUDA, 5932MB, 3970MB available, 6739 GFLOPS peak) [---] OpenCL: NVIDIA GPU 1: GeForce GTX 1660 Ti (driver version 440.59, device version OpenCL 1.2 CUDA, 5945MB, 3972MB available, 5668 GFLOPS peak) [---] OpenCL CPU: pthread-Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz (OpenCL driver vendor: The pocl project, driver version 1.4, device version OpenCL 1.2 pocl HSTR: pthread-x86_64-unknown-linux-gnu-sandybridge) [SETI@home] Found app_info.xml; using anonymous platform [---] libc: GNU libc version 2.31 [---] Host name: erb1 [---] Processor: 12 GenuineIntel Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz [Family 6 Model 45 Stepping 7] [---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d [---] OS: Linux openSUSE: openSUSE Tumbleweed [5.5.4-cstm|libc 2.31 (GNU libc)] [---] Memory: 31.29 GB physical, 2.00 GB virtual [---] Disk: 80.10 GB total, 6.77 GB free [---] Local time is UTC -7 hours [---] VirtualBox version: 6.1.2r135662 [---] Config: use all coprocessors [Milkyway@Home] General prefs: from Milkyway@Home (last modified 02-Jun-2020 19:14:24) [Milkyway@Home] Computer location: home [Milkyway@Home] General prefs: no separate prefs for home; using your defaults [---] Reading preferences override file [---] Preferences: [---] max memory usage when active: 32041.71 MB [---] max memory usage when idle: 32041.71 MB [---] max disk usage: 4.40 GB [---] max CPUs used: 10 [---] (to change preferences, visit a project web site or select Preferences in the Manager) [---] Setting up project and slot directories [---] Checking active tasks [Milkyway@Home] URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 852607; resource share 100 [SETI@home] URL http://setiathome.berkeley.edu/; Computer ID 8730567; resource share 100 [---] Setting up GUI RPC socket [---] Checking presence of 31 project files Initialization completed [SETI@home] Sending scheduler request: To fetch work. [SETI@home] Requesting new tasks for NVIDIA GPU [SETI@home] Scheduler request completed: got 0 new tasks [SETI@home] Project has no tasks available [SETI@home] Project requested delay of 87264 seconds [Milkyway@Home] Sending scheduler request: To fetch work. [Milkyway@Home] Requesting new tasks for NVIDIA GPU [Milkyway@Home] Scheduler request completed: got 0 new tasks [Milkyway@Home] Not sending work - last request too recent: 77 sec [Milkyway@Home] Project requested delay of 91 seconds |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
openSuse Linux (Tumbleweed) x86_64, nvidia driver 440.59 The last part is the key "[Milkyway@Home] Not sending work - last request too recent: 77 sec" MilkyWay REQUIRES 10 minutes of not asking for new work before they will send you more gpu work, setup a zero resource share project and run a couple of their workunits until MilkyWay refills the cache. Cpu workunits do not have this problem, just gpu workunits. |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 556,864,830 RAC: 43,467 |
Also, as an aside, how does one view the specific computer on the website? I cant seem to sort by computer id so searching for a particular computer in my set of computers is like searching for a needle in a haystack Don't understand this at all. Login to MW, go to your account main page, click the computers link on the page. https://milkyway.cs.rpi.edu/milkyway/hosts_user.php Voila! All your computers are listed, even with their assigned network names. Easy to figure out which computer is which. If you are constantly running out of work and the 10 minute backoff bugs you too much, you can always run JStateson's modified client which removes that aggravation. |
Send message Joined: 26 May 20 Posts: 23 Credit: 673,618,527 RAC: 34,135 |
Hi, thanks for the info. I checked and the last nvidia task was sent in on June 1, all the others ware waiting. I have plenty of tasks, some nvidia some cpu. But why is boinc ignoring the nvidia tasks? All my nvidia tasks are in state "waiting to run" I have 1 nbody simulation 1.76 task running (12 cpu's) and thats it, in fact as i write this it finished and started another similar nbody task. |
Send message Joined: 26 May 20 Posts: 23 Credit: 673,618,527 RAC: 34,135 |
Well, I gave up and did a 'project reset' and lo and behold the nvidia apps are running now :-) yaaaa |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 556,864,830 RAC: 43,467 |
You starved the gpus from running by taking away all the cpu support by running the nbody tasks without any limit. A gpu task needs at least some part of a cpu to feed it data. If all your cpu threads were busy with nbody, then the gpu tasks will be forced into waiting to run. No mystery here, BOINC did exactly what it was supposed to. If you want to run both types of work you need to limit the nbody tasks from taking all the cpu threads. Read the documentation pertaining to nbody mt configuration. https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration |
Send message Joined: 12 Nov 20 Posts: 4 Credit: 11,812,952 RAC: 1 |
Hi everyone. I set up an i7-3770k on Linux Ubuntu 20.04, 64 bits equipped with a GTX 1050, driver 450 and 16 GB Ram. I don't receive any Milkyway WU's for the GPU although all settings should be correct even after waiting 10 minutes as suggested with no work pending for the GPU. It gets WU's from Primegrid, SRBase and Moo! Wrapper but none from Milkyway, Collatz, WCG or Einstein. That's why I selected the older 450 driver instead of the 460 but no help. Does anyone have an idea what the problem may be? The 1050 is included in the list of supported GPUs under Linux for MW. |
Send message Joined: 12 Nov 20 Posts: 4 Credit: 11,812,952 RAC: 1 |
Also I have a problem now on my Ryzen 9 3900X, 16GB, GTX 1660, Win10, 64 bits. The Milkyway WU's for GPU all error out after 2 seconds. MW and MLC are the only projects who seem to be affected by this. A while ago everything worked fine. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Hi everyone. I set up an i7-3770k on Linux Ubuntu 20.04, 64 bits equipped with a GTX 1050, driver 450 and 16 GB Ram. I don't receive any Milkyway WU's for the GPU although all settings should be correct even after waiting 10 minutes as suggested with no work pending for the GPU. It gets WU's from Primegrid, SRBase and Moo! Wrapper but none from Milkyway, Collatz, WCG or Einstein. That's why I selected the older 450 driver instead of the 460 but no help. Does anyone have an idea what the problem may be? The 1050 is included in the list of supported GPUs under Linux for MW. Nvidia's 450 version was problematic I think, try upgrading a little bit. Another thought is to not get, or suspend tasks, from any other gpu project while trying to get MIlkyWay tasks |
Send message Joined: 12 Nov 20 Posts: 4 Credit: 11,812,952 RAC: 1 |
I don't have any suspended tasks but nevertheless my Linux PC doesn't receive any GPU unit from MW. Before I reverted to 450 driver version I had the 460 installed and the system ran without problems so I think it is unlikely that the problem is driver related. But to make sure I will update the driver tonight once the work cache is empty. My Windows PC has the latetst drivers but nevertheless the MW GPU WU's fail instantly. Was maybe something done to the GPU WU's recently? |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 556,864,830 RAC: 43,467 |
With Windows drivers, always reinstall the drivers directly downloaded from Nvidia. Always check that you have the OpenCL component of the drivers installed with clinfo. Available for both Linux and Windows. |
Send message Joined: 12 Nov 20 Posts: 4 Credit: 11,812,952 RAC: 1 |
On the Linux system I installed the newest driver, still no change. According to CL-info OpenCL 1.2 is installed. Platform Name NVIDIA CUDA Number of devices 1 Device Name GeForce GTX 1050 Device Vendor NVIDIA Corporation Device Vendor ID 0x10de Device Version OpenCL 1.2 CUDA Driver Version 460.73.01 Device OpenCL C Version OpenCL C 1.2 Device Type GPU Device Topology (NV) PCI-E, 01:00.0 Device Profile FULL_PROFILE Device Available Yes Compiler Available Yes Linker Available Yes Max compute units 5 Max clock frequency 1468MHz Compute Capability (NV) 6.1 Device Partition (core) Max number of sub-devices 1 Supported partition types None Supported affinity domains (n/a) Max work item dimensions 3 Max work item sizes 1024x1024x64 Max work group size 1024 Preferred work group size multiple 32 Warp size (NV) 32 |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
On the Linux system I installed the newest driver, still no change. According to CL-info OpenCL 1.2 is installed. Move up or down a couple from ver 460.??? and see if it helps |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 556,864,830 RAC: 43,467 |
Post the output from the Event Log when requesting work with the sched_ops_debug flag set. How many seconds of gpu work are you requesting? Also post the startup of the Event Log after starting BOINC to be sure the gpu is detected. When those are posted and nothing is self-evident, go back and set work_fetch_debug to show why BOINC is not requesting MW work. |
©2024 Astroinformatics Group