Welcome to MilkyWay@home

MW stopped using my nvidia GPU


Advanced search

Questions and Answers : Unix/Linux : MW stopped using my nvidia GPU
Message board moderation

To post messages, you must log in.

AuthorMessage
Cat22
Avatar

Send message
Joined: 26 May 20
Posts: 12
Credit: 74,139,392
RAC: 542,129
50 million credit badge
Message 69884 - Posted: 3 Jun 2020, 2:34:19 UTC

openSuse Linux (Tumbleweed) x86_64, nvidia driver 440.59

This was going along fine but at least a day ago all the nvidia tasks went to "waiting to run" status and have not run since. the activitiy menu options are all set to "Always" and nothing is suspended, there is no appinfo.xml no app_confing.xml cc_config.xml etc - just raw MW and BOINC. computing preferences allow for 1 cpu per GPU (I have 2 nividia cards) so that should be ok. The nvidia driver hasnt changed. I just double checked I am still part of the video group so thats not it.
How can i determine what is preventing the GPU from running?
Also, as an aside, how does one view the specific computer on the website? I cant seem to sort by computer id so searching for a particular computer in my set of computers is like searching for a needle in a haystack

Example:
Application Milkyway@home Separation 1.46 (opencl_nvidia_101)
Name de_modfit_86_bundle4_4s_south4s_bgset_2_1588605902_20177642
State Waiting to run
Received Sun 31 May 2020 09:17:52 PM PDT
Report deadline Fri 12 Jun 2020 09:17:51 PM PDT
Resources 0.965 CPUs + 1 NVIDIA GPU
Estimated computation size 42,135 GFLOPs
CPU time ---
CPU time since checkpoint ---
Elapsed time ---
Estimated time remaining 00:03:05
Fraction done 0.000%
Virtual memory size 9.97 GB
Working set size 499.16 MB
Directory slots/2
Executable milkyway_1.46_x86_64-pc-linux-gnu__opencl_nvidia_101

Here is a boinc re-start i just did to see if that would help (it didnt):
[---] Starting BOINC client version 7.17.0 for x86_64-pc-linux-gnu
[---] This a development version of BOINC and may not function properly
[---] log flags: file_xfer, sched_ops, task
[---] Libraries: libcurl/7.70.0 OpenSSL/1.1.1g-fips zlib/1.2.11 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.3.0) libssh/0.9.3/openssl/zlib nghttp2/1.40.0
[---] Data directory: /home/erbenton/boinc
[---] CUDA: NVIDIA GPU 0: GeForce RTX 2060 (driver version 440.59, CUDA version 10.2, compute capability 7.5, 4096MB, 3970MB available, 6739 GFLOPS peak)
[---] CUDA: NVIDIA GPU 1: GeForce GTX 1660 Ti (driver version 440.59, CUDA version 10.2, compute capability 7.5, 4096MB, 3972MB available, 5668 GFLOPS peak)
[---] OpenCL: NVIDIA GPU 0: GeForce RTX 2060 (driver version 440.59, device version OpenCL 1.2 CUDA, 5932MB, 3970MB available, 6739 GFLOPS peak)
[---] OpenCL: NVIDIA GPU 1: GeForce GTX 1660 Ti (driver version 440.59, device version OpenCL 1.2 CUDA, 5945MB, 3972MB available, 5668 GFLOPS peak)
[---] OpenCL CPU: pthread-Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz (OpenCL driver vendor: The pocl project, driver version 1.4, device version OpenCL 1.2 pocl HSTR: pthread-x86_64-unknown-linux-gnu-sandybridge)
[SETI@home] Found app_info.xml; using anonymous platform
[---] libc: GNU libc version 2.31
[---] Host name: erb1
[---] Processor: 12 GenuineIntel Intel(R) Core(TM) i7-3960X CPU @ 3.30GHz [Family 6 Model 45 Stepping 7]
[---] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm epb ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm ida arat pln pts md_clear flush_l1d
[---] OS: Linux openSUSE: openSUSE Tumbleweed [5.5.4-cstm|libc 2.31 (GNU libc)]
[---] Memory: 31.29 GB physical, 2.00 GB virtual
[---] Disk: 80.10 GB total, 6.77 GB free
[---] Local time is UTC -7 hours
[---] VirtualBox version: 6.1.2r135662
[---] Config: use all coprocessors
[Milkyway@Home] General prefs: from Milkyway@Home (last modified 02-Jun-2020 19:14:24)
[Milkyway@Home] Computer location: home
[Milkyway@Home] General prefs: no separate prefs for home; using your defaults
[---] Reading preferences override file
[---] Preferences:
[---]    max memory usage when active: 32041.71 MB
[---]    max memory usage when idle: 32041.71 MB
[---]    max disk usage: 4.40 GB
[---]    max CPUs used: 10
[---]    (to change preferences, visit a project web site or select Preferences in the Manager)
[---] Setting up project and slot directories
[---] Checking active tasks
[Milkyway@Home] URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 852607; resource share 100
[SETI@home] URL http://setiathome.berkeley.edu/; Computer ID 8730567; resource share 100
[---] Setting up GUI RPC socket
[---] Checking presence of 31 project files
Initialization completed
[SETI@home] Sending scheduler request: To fetch work.
[SETI@home] Requesting new tasks for NVIDIA GPU
[SETI@home] Scheduler request completed: got 0 new tasks
[SETI@home] Project has no tasks available
[SETI@home] Project requested delay of 87264 seconds
[Milkyway@Home] Sending scheduler request: To fetch work.
[Milkyway@Home] Requesting new tasks for NVIDIA GPU
[Milkyway@Home] Scheduler request completed: got 0 new tasks
[Milkyway@Home] Not sending work - last request too recent: 77 sec
[Milkyway@Home] Project requested delay of 91 seconds
ID: 69884 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2408
Credit: 450,647,155
RAC: 27,373
300 million credit badge10 year member badgeextraordinary contributions badge
Message 69886 - Posted: 3 Jun 2020, 10:25:59 UTC - in response to Message 69884.  

openSuse Linux (Tumbleweed) x86_64, nvidia driver 440.59

This was going along fine but at least a day ago all the nvidia tasks went to "waiting to run" status and have not run since. the activitiy menu options are all set to "Always" and nothing is suspended, there is no appinfo.xml no app_confing.xml cc_config.xml etc - just raw MW and BOINC. computing preferences allow for 1 cpu per GPU (I have 2 nividia cards) so that should be ok. The nvidia driver hasnt changed. I just double checked I am still part of the video group so thats not it.
How can i determine what is preventing the GPU from running?
[i]Also, as an aside, how does one view the specific computer on the website? I cant seem to sort by computer id so searching for a particular computer in my set of computers is like searching for a needle in a haystack

[Milkyway@Home] Requesting new tasks for NVIDIA GPU
[Milkyway@Home] Scheduler request completed: got 0 new tasks
[Milkyway@Home] Not sending work - last request too recent: 77 sec
[Milkyway@Home] Project requested delay of 91 seconds
[/code]


The last part is the key "[Milkyway@Home] Not sending work - last request too recent: 77 sec" MilkyWay REQUIRES 10 minutes of not asking for new work before they will send you more gpu work, setup a zero resource share project and run a couple of their workunits until MilkyWay refills the cache. Cpu workunits do not have this problem, just gpu workunits.
ID: 69886 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKeith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 335
Credit: 215,189,368
RAC: 320,541
200 million credit badge9 year member badgeextraordinary contributions badge
Message 69887 - Posted: 3 Jun 2020, 18:54:22 UTC - in response to Message 69884.  

Also, as an aside, how does one view the specific computer on the website? I cant seem to sort by computer id so searching for a particular computer in my set of computers is like searching for a needle in a haystack

Don't understand this at all. Login to MW, go to your account main page, click the computers link on the page. https://milkyway.cs.rpi.edu/milkyway/hosts_user.php
Voila! All your computers are listed, even with their assigned network names. Easy to figure out which computer is which.

If you are constantly running out of work and the 10 minute backoff bugs you too much, you can always run JStateson's modified client which removes that aggravation.
ID: 69887 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cat22
Avatar

Send message
Joined: 26 May 20
Posts: 12
Credit: 74,139,392
RAC: 542,129
50 million credit badge
Message 69888 - Posted: 4 Jun 2020, 6:03:43 UTC - in response to Message 69887.  

Hi,
thanks for the info. I checked and the last nvidia task was sent in on June 1, all the others ware waiting.
I have plenty of tasks, some nvidia some cpu. But why is boinc ignoring the nvidia tasks?
All my nvidia tasks are in state "waiting to run" I have 1 nbody simulation 1.76 task running (12 cpu's)
and thats it, in fact as i write this it finished and started another similar nbody task.
ID: 69888 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cat22
Avatar

Send message
Joined: 26 May 20
Posts: 12
Credit: 74,139,392
RAC: 542,129
50 million credit badge
Message 69889 - Posted: 4 Jun 2020, 6:20:43 UTC - in response to Message 69888.  

Well, I gave up and did a 'project reset' and lo and behold the nvidia apps are running now :-) yaaaa
ID: 69889 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKeith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 335
Credit: 215,189,368
RAC: 320,541
200 million credit badge9 year member badgeextraordinary contributions badge
Message 69890 - Posted: 4 Jun 2020, 7:07:34 UTC

You starved the gpus from running by taking away all the cpu support by running the nbody tasks without any limit. A gpu task needs at least some part of a cpu to feed it data. If all your cpu threads were busy with nbody, then the gpu tasks will be forced into waiting to run. No mystery here, BOINC did exactly what it was supposed to. If you want to run both types of work you need to limit the nbody tasks from taking all the cpu threads. Read the documentation pertaining to nbody mt configuration.
https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration
ID: 69890 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : MW stopped using my nvidia GPU

©2020 Astroinformatics Group