Message boards :
News :
another scheduler update
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Updated the scheduler yet again. I just want to double check, which of the following are people having (since the update): 1. non-GPU hosts are getting GPU workunits. 2. hosts with ATI GPUs that don't have the compute capability are getting GPU workunits. 3. hosts with ATI GPUs aren't getting workunits. Is anyone having problems with NVIDIA GPUs? Or is this just an ATI thing? --Travis |
Send message Joined: 11 Oct 09 Posts: 19 Credit: 202,475,569 RAC: 0 |
#3. But it just started again. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
4.) Winbox CPU host not getting anything. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
#3. But it just started again. Made an update, let me know if this let you get some ATI GPU workunits. |
Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0 |
3. hosts with ATI GPUs aren't getting workunits... I'm in that boat. My 5870 host got it's last WU at about 1950 UTC. |
Send message Joined: 14 Jul 12 Posts: 2 Credit: 393,940 RAC: 0 |
I fall under this catagory. 2. hosts with ATI GPUs that don't have the compute capability are getting GPU workunits. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
4.) Winbox CPU host not getting anything. Looks like you just got some? From the scheduler (I XXXed out your IP and host id just in case you're hiding those): 2013-06-11 19:32:15.5706 [PID=22629] Request: [USER#5696] [HOST#XXXXX] [IP XXXXXXXX] client 6.12.34 2013-06-11 19:32:15.5955 [PID=22629] [send] [HOST#XXXX] app version 321 is reliable 2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: random choice for cons valid 1165: yes 2013-06-11 19:32:15.5955 [PID=22629] [send] [AV#385] not reliable; cons valid 0 < 10 2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: cons valid 0 < 10, don't use single replication 2013-06-11 19:32:15.5955 [PID=22629] [send] [HOST#XX] app version 398 is reliable 2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: random choice for cons valid 76: yes 2013-06-11 19:32:15.5955 [PID=22629] [send] [HOST#XX] app version 418 is reliable 2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: random choice for cons valid 17442: yes 2013-06-11 19:32:15.5955 [PID=22629] [send] [AV#430] not reliable; cons valid 0 < 10 2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: cons valid 0 < 10, don't use single replication 2013-06-11 19:32:15.5955 [PID=22629] [send] [AV#436] not reliable; cons valid 0 < 10 2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: cons valid 0 < 10, don't use single replication 2013-06-11 19:32:15.5955 [PID=22629] [send] [AV#438] not reliable; cons valid 1 < 10 2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: cons valid 1 < 10, don't use single replication 2013-06-11 19:32:15.5955 [PID=22629] [send] [HOST#XX] app version 445 is reliable 2013-06-11 19:32:15.5956 [PID=22629] [send] set_trust: random choice for cons valid 23: yes 2013-06-11 19:32:15.5956 [PID=22629] [send] [HOST#XX] app version 451 is reliable 2013-06-11 19:32:15.5956 [PID=22629] [send] set_trust: random choice for cons valid 148: yes 2013-06-11 19:32:15.5956 [PID=22629] [send] [HOST#XX] app version 485 is reliable 2013-06-11 19:32:15.5956 [PID=22629] [send] set_trust: random choice for cons valid 510: yes 2013-06-11 19:32:15.5956 [PID=22629] [send] [AV#3000002] not reliable; cons valid 0 < 10 2013-06-11 19:32:15.5956 [PID=22629] [send] set_trust: cons valid 0 < 10, don't use single replication 2013-06-11 19:32:15.5956 [PID=22629] [quota] effective ncpus 4 ngpus 1 2013-06-11 19:32:15.5956 [PID=22629] [quota] max jobs per RPC: 400 2013-06-11 19:32:15.5956 [PID=22629] [quota] Overall limits on jobs in progress: 2013-06-11 19:32:15.5956 [PID=22629] [quota] CPU: base 3 scaled 12 njobs 0 2013-06-11 19:32:15.5956 [PID=22629] [quota] GPU: base 40 scaled 40 njobs 38 2013-06-11 19:32:15.5956 [PID=22629] [send] Not using matchmaker scheduling; Not using EDF sim 2013-06-11 19:32:15.5956 [PID=22629] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00 2013-06-11 19:32:15.5956 [PID=22629] [send] AMD/ATI GPU: req 81174.22 sec, 0.00 instances; est delay 0.00 2013-06-11 19:32:15.5956 [PID=22629] [send] work_req_seconds: 0.00 secs 2013-06-11 19:32:15.5956 [PID=22629] [send] available disk 2.82 GB, work_buf_min 86400 2013-06-11 19:32:15.5957 [PID=22629] [send] active_frac 0.945916 on_frac 0.996949 2013-06-11 19:32:15.5957 [PID=22629] [send] CPU features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow 2013-06-11 19:32:15.5984 [PID=22629] [version] looking for version of milkyway 2013-06-11 19:32:15.5984 [PID=22629] [version] Checking plan class 'ati14' 2013-06-11 19:32:15.5984 [PID=22629] [version] Couldn't open plan class spec file '../plan_class_spec.xml' 2013-06-11 19:32:15.5984 [PID=22629] [version] ati14 ATI app projected 51.07G peak 5775.35G 0.963 CPUs 2013-06-11 19:32:15.5984 [PID=22629] [quota] [AV#485] scaled max jobs per day: 10510 2013-06-11 19:32:15.5984 [PID=22629] [version] [AV#485] (ati14) setting projected flops based on host elapsed time avg: 437.54G 2013-06-11 19:32:15.5984 [PID=22629] [version] [AV#485] (ati14) comparison pfc: 437.57G et: 437.54G 2013-06-11 19:32:15.5985 [PID=22629] [version] Best app version is now AV485 (437.61 GFLOP) 2013-06-11 19:32:15.5985 [PID=22629] [version] Checking plan class 'opencl_amd_ati' 2013-06-11 19:32:15.5985 [PID=22629] [version] plan_class opencl_amd_ati uses OpenCl version 0 2013-06-11 19:32:15.5985 [PID=22629] [version] [opencl] GPU/Driver/BOINC revision doesn not support OpenCL 2013-06-11 19:32:15.5985 [PID=22629] [quota] [AV#418] scaled max jobs per day: 27442 2013-06-11 19:32:15.5985 [PID=22629] [version] [AV#418] (opencl_amd_ati) setting projected flops based on host elapsed time avg: 363.93G 2013-06-11 19:32:15.5985 [PID=22629] [version] [AV#418] (opencl_amd_ati) comparison pfc: 364.08G et: 363.93G 2013-06-11 19:32:15.5985 [PID=22629] [version] Comparing AV#418 (363.92 GFLOP) against AV#485 (437.61 GFLOP) 2013-06-11 19:32:15.5986 [PID=22629] [version] Checking plan class 'opencl_nvidia' 2013-06-11 19:32:15.5986 [PID=22629] [version] plan_class opencl_nvidia uses OpenCl version 0 2013-06-11 19:32:15.5986 [PID=22629] [version] [AV#416] app_plan() returned false 2013-06-11 19:32:15.5986 [PID=22629] [version] [AV#485] (ati14) setting projected flops based on host elapsed time avg: 437.54G 2013-06-11 19:32:15.5986 [PID=22629] [version] [AV#485] (ati14) comparison pfc: 437.57G et: 437.54G 2013-06-11 19:32:15.5986 [PID=22629] [version] Best version of app milkyway is [AV#485] (437.54 GFLOPS) 2013-06-11 19:32:15.5986 [PID=22629] [send] est delay 0, skipping deadline check 2013-06-11 19:32:15.5987 [PID=22629] [version] returning cached version: [AV#485] 2013-06-11 19:32:15.5987 [PID=22629] [send] est delay 0, skipping deadline check 2013-06-11 19:32:15.6013 [PID=22629] [send] Sending app_version milkyway 2 102 ati14; projected 437.54 GFLOPS 2013-06-11 19:32:15.6014 [PID=22629] [send] est. duration for WU 380375116: unscaled 45.24 scaled 47.97 2013-06-11 19:32:15.6014 [PID=22629] [send] [HOST#XX] sending [RESULT#498050348 de_separation_79_DR8_rev_2_1370993394_149_0] (est. dur. 47.97 seconds) 2013-06-11 19:32:15.6017 [PID=22629] [version] looking for version of milkyway_nbody 2013-06-11 19:32:15.6017 [PID=22629] [version] [AV#475] Skipping CPU version - user prefs say no CPU 2013-06-11 19:32:15.6017 [PID=22629] [version] Checking plan class 'mt' 2013-06-11 19:32:15.6017 [PID=22629] [version] Multi-thread app projected 10.50GS 2013-06-11 19:32:15.6017 [PID=22629] [version] [AV#481] Skipping CPU version - user prefs say no CPU 2013-06-11 19:32:15.6017 [PID=22629] [version] returning NULL; platforms: 2013-06-11 19:32:15.6017 [PID=22629] [version] windows_x86_64 2013-06-11 19:32:15.6017 [PID=22629] [version] windows_intelx86 2013-06-11 19:32:15.6017 [PID=22629] [version] returning cached version: [AV#485] 2013-06-11 19:32:15.6017 [PID=22629] [send] est. duration for WU 380375117: unscaled 33.83 scaled 35.88 2013-06-11 19:32:15.6017 [PID=22629] [send] [WU#380375117] meets deadline: 47.97 + 35.88 < 1036800 2013-06-11 19:32:15.6017 [PID=22629] [version] returning cached version: [AV#485] 2013-06-11 19:32:15.6017 [PID=22629] [send] est. duration for WU 380375117: unscaled 33.83 scaled 35.88 2013-06-11 19:32:15.6017 [PID=22629] [send] [WU#380375117] meets deadline: 47.97 + 35.88 < 1036800 2013-06-11 19:32:15.6034 [PID=22629] [send] Sending app_version milkyway 2 102 ati14; projected 437.54 GFLOPS 2013-06-11 19:32:15.6036 [PID=22629] [send] est. duration for WU 380375117: unscaled 33.83 scaled 35.88 2013-06-11 19:32:15.6036 [PID=22629] [send] [HOST#XX] sending [RESULT#498050349 de_separation_20_2s_sscon_1_1370993394_150_0] (est. dur. 35.88 seconds) 2013-06-11 19:32:15.6039 [PID=22629] [quota] reached limit on GPU jobs in progress 2013-06-11 19:32:15.6039 [PID=22629] [quota] Overall limits on jobs in progress: 2013-06-11 19:32:15.6039 [PID=22629] [quota] CPU: base 3 scaled 12 njobs 0 2013-06-11 19:32:15.6039 [PID=22629] [quota] GPU: base 40 scaled 40 njobs 40 2013-06-11 19:32:15.6039 [PID=22629] [send] don't need more work 2013-06-11 19:32:15.6048 [PID=22629] Sending reply to [HOST#XX]: 2 results, delay req 61.00 |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I fall under this catagory. As of now? Also, what's the error message (if it's printing out one)? |
Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0 |
Made an update, let me know if this let you get some ATI GPU workunits. D'oh! Okay, I'm back in business...for now! ;-) |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
3. hosts with ATI GPUs aren't getting workunits... Have you tried to grab work recently? I just made a couple more updates. If it's not, what's the error message, if any? |
Send message Joined: 11 Oct 09 Posts: 19 Credit: 202,475,569 RAC: 0 |
#3 is working fine for me. Separation Runs 79_DR8_rev_3 run almost 40% longer than separation_21_2s_sscon_1 for a return 5% lower over the same period. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
4.) Winbox CPU host not getting anything. Nope, that was this host, which had been suffering from case 2 and is now drawing nBodies, no GPU work, but no CPU MilkyWays since I set him up to test 1.18 The ones I was talking about are the single core Intel and AMD's without any GPU capability which had been running regular MW CPU tasks, but have completed and reported all but one and haven't gotten any new work since early on the 10th. This one is currently hungry for CPU work but isn't getting any. I assume he should not be getting nBody but should still be getting regular MW CPU work. |
Send message Joined: 3 Dec 12 Posts: 6 Credit: 6,755,635 RAC: 0 |
I'm having the same issue as Alinator. No CPU tasks. I run other projects on my GPUs. It has worked fine in the past. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
4.) Winbox CPU host not getting anything. I don't have any scheduler requests from a host with that #. Could you do a manual update? |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I'm having the same issue as Alinator. No CPU tasks. I run other projects on my GPUs. It has worked fine in the past. Could you do a manual update? I don't have any entries in the scheduler's log for your userid, so I can try and debug the issue. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
Done |
Send message Joined: 19 Mar 13 Posts: 3 Credit: 736,353 RAC: 0 |
3. hosts with ATI GPUs aren't getting workunits. |
Send message Joined: 3 Dec 12 Posts: 6 Credit: 6,755,635 RAC: 0 |
I just ran two manual updates. Neither returned anything. Hopefully you can spot something in the logs. :) |
Send message Joined: 19 Mar 13 Posts: 3 Credit: 736,353 RAC: 0 |
GPU lacks the necessary double precision extension |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I just ran two manual updates. Neither returned anything. Hopefully you can spot something in the logs. :) Strange, it's saying it returned 2 results to you... |
©2024 Astroinformatics Group