another scheduler update
log in

Advanced search

Message boards : News : another scheduler update

1 · 2 · Next
Author Message
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0

Message 58735 - Posted: 11 Jun 2013, 22:45:59 UTC

Updated the scheduler yet again.

I just want to double check, which of the following are people having (since the update):

1. non-GPU hosts are getting GPU workunits.
2. hosts with ATI GPUs that don't have the compute capability are getting GPU workunits.
3. hosts with ATI GPUs aren't getting workunits.

Is anyone having problems with NVIDIA GPUs? Or is this just an ATI thing?

--Travis
____________

Herge
Send message
Joined: 11 Oct 09
Posts: 19
Credit: 186,871,163
RAC: 0

Message 58737 - Posted: 11 Jun 2013, 22:56:10 UTC

#3. But it just started again.

Alinator
Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0

Message 58739 - Posted: 11 Jun 2013, 23:16:29 UTC

4.) Winbox CPU host not getting anything.

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0

Message 58740 - Posted: 11 Jun 2013, 23:30:17 UTC - in response to Message 58737.

#3. But it just started again.


Made an update, let me know if this let you get some ATI GPU workunits.
____________

Profile ritterm
Avatar
Send message
Joined: 16 Jun 08
Posts: 73
Credit: 362,759,005
RAC: 0

Message 58741 - Posted: 11 Jun 2013, 23:34:03 UTC - in response to Message 58735.

3. hosts with ATI GPUs aren't getting workunits...

I'm in that boat. My 5870 host got it's last WU at about 1950 UTC.
____________

bones cruncher
Send message
Joined: 14 Jul 12
Posts: 2
Credit: 393,940
RAC: 0

Message 58742 - Posted: 11 Jun 2013, 23:35:52 UTC

I fall under this catagory.
2. hosts with ATI GPUs that don't have the compute capability are getting GPU workunits.

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0

Message 58743 - Posted: 11 Jun 2013, 23:36:04 UTC - in response to Message 58739.
Last modified: 11 Jun 2013, 23:37:48 UTC

4.) Winbox CPU host not getting anything.


Looks like you just got some? From the scheduler (I XXXed out your IP and host id just in case you're hiding those):

2013-06-11 19:32:15.5706 [PID=22629] Request: [USER#5696] [HOST#XXXXX] [IP XXXXXXXX] client 6.12.34
2013-06-11 19:32:15.5955 [PID=22629] [send] [HOST#XXXX] app version 321 is reliable
2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: random choice for cons valid 1165: yes
2013-06-11 19:32:15.5955 [PID=22629] [send] [AV#385] not reliable; cons valid 0 < 10
2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: cons valid 0 < 10, don't use single replication
2013-06-11 19:32:15.5955 [PID=22629] [send] [HOST#XX] app version 398 is reliable
2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: random choice for cons valid 76: yes
2013-06-11 19:32:15.5955 [PID=22629] [send] [HOST#XX] app version 418 is reliable
2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: random choice for cons valid 17442: yes
2013-06-11 19:32:15.5955 [PID=22629] [send] [AV#430] not reliable; cons valid 0 < 10
2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: cons valid 0 < 10, don't use single replication
2013-06-11 19:32:15.5955 [PID=22629] [send] [AV#436] not reliable; cons valid 0 < 10
2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: cons valid 0 < 10, don't use single replication
2013-06-11 19:32:15.5955 [PID=22629] [send] [AV#438] not reliable; cons valid 1 < 10
2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: cons valid 1 < 10, don't use single replication
2013-06-11 19:32:15.5955 [PID=22629] [send] [HOST#XX] app version 445 is reliable
2013-06-11 19:32:15.5956 [PID=22629] [send] set_trust: random choice for cons valid 23: yes
2013-06-11 19:32:15.5956 [PID=22629] [send] [HOST#XX] app version 451 is reliable
2013-06-11 19:32:15.5956 [PID=22629] [send] set_trust: random choice for cons valid 148: yes
2013-06-11 19:32:15.5956 [PID=22629] [send] [HOST#XX] app version 485 is reliable
2013-06-11 19:32:15.5956 [PID=22629] [send] set_trust: random choice for cons valid 510: yes
2013-06-11 19:32:15.5956 [PID=22629] [send] [AV#3000002] not reliable; cons valid 0 < 10
2013-06-11 19:32:15.5956 [PID=22629] [send] set_trust: cons valid 0 < 10, don't use single replication
2013-06-11 19:32:15.5956 [PID=22629] [quota] effective ncpus 4 ngpus 1
2013-06-11 19:32:15.5956 [PID=22629] [quota] max jobs per RPC: 400
2013-06-11 19:32:15.5956 [PID=22629] [quota] Overall limits on jobs in progress:
2013-06-11 19:32:15.5956 [PID=22629] [quota] CPU: base 3 scaled 12 njobs 0
2013-06-11 19:32:15.5956 [PID=22629] [quota] GPU: base 40 scaled 40 njobs 38
2013-06-11 19:32:15.5956 [PID=22629] [send] Not using matchmaker scheduling; Not using EDF sim
2013-06-11 19:32:15.5956 [PID=22629] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00
2013-06-11 19:32:15.5956 [PID=22629] [send] AMD/ATI GPU: req 81174.22 sec, 0.00 instances; est delay 0.00
2013-06-11 19:32:15.5956 [PID=22629] [send] work_req_seconds: 0.00 secs
2013-06-11 19:32:15.5956 [PID=22629] [send] available disk 2.82 GB, work_buf_min 86400
2013-06-11 19:32:15.5957 [PID=22629] [send] active_frac 0.945916 on_frac 0.996949
2013-06-11 19:32:15.5957 [PID=22629] [send] CPU features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow
2013-06-11 19:32:15.5984 [PID=22629] [version] looking for version of milkyway
2013-06-11 19:32:15.5984 [PID=22629] [version] Checking plan class 'ati14'
2013-06-11 19:32:15.5984 [PID=22629] [version] Couldn't open plan class spec file '../plan_class_spec.xml'
2013-06-11 19:32:15.5984 [PID=22629] [version] ati14 ATI app projected 51.07G peak 5775.35G 0.963 CPUs
2013-06-11 19:32:15.5984 [PID=22629] [quota] [AV#485] scaled max jobs per day: 10510
2013-06-11 19:32:15.5984 [PID=22629] [version] [AV#485] (ati14) setting projected flops based on host elapsed time avg: 437.54G
2013-06-11 19:32:15.5984 [PID=22629] [version] [AV#485] (ati14) comparison pfc: 437.57G et: 437.54G
2013-06-11 19:32:15.5985 [PID=22629] [version] Best app version is now AV485 (437.61 GFLOP)
2013-06-11 19:32:15.5985 [PID=22629] [version] Checking plan class 'opencl_amd_ati'
2013-06-11 19:32:15.5985 [PID=22629] [version] plan_class opencl_amd_ati uses OpenCl version 0
2013-06-11 19:32:15.5985 [PID=22629] [version] [opencl] GPU/Driver/BOINC revision doesn not support OpenCL
2013-06-11 19:32:15.5985 [PID=22629] [quota] [AV#418] scaled max jobs per day: 27442
2013-06-11 19:32:15.5985 [PID=22629] [version] [AV#418] (opencl_amd_ati) setting projected flops based on host elapsed time avg: 363.93G
2013-06-11 19:32:15.5985 [PID=22629] [version] [AV#418] (opencl_amd_ati) comparison pfc: 364.08G et: 363.93G
2013-06-11 19:32:15.5985 [PID=22629] [version] Comparing AV#418 (363.92 GFLOP) against AV#485 (437.61 GFLOP)
2013-06-11 19:32:15.5986 [PID=22629] [version] Checking plan class 'opencl_nvidia'
2013-06-11 19:32:15.5986 [PID=22629] [version] plan_class opencl_nvidia uses OpenCl version 0
2013-06-11 19:32:15.5986 [PID=22629] [version] [AV#416] app_plan() returned false
2013-06-11 19:32:15.5986 [PID=22629] [version] [AV#485] (ati14) setting projected flops based on host elapsed time avg: 437.54G
2013-06-11 19:32:15.5986 [PID=22629] [version] [AV#485] (ati14) comparison pfc: 437.57G et: 437.54G
2013-06-11 19:32:15.5986 [PID=22629] [version] Best version of app milkyway is [AV#485] (437.54 GFLOPS)
2013-06-11 19:32:15.5986 [PID=22629] [send] est delay 0, skipping deadline check
2013-06-11 19:32:15.5987 [PID=22629] [version] returning cached version: [AV#485]
2013-06-11 19:32:15.5987 [PID=22629] [send] est delay 0, skipping deadline check
2013-06-11 19:32:15.6013 [PID=22629] [send] Sending app_version milkyway 2 102 ati14; projected 437.54 GFLOPS
2013-06-11 19:32:15.6014 [PID=22629] [send] est. duration for WU 380375116: unscaled 45.24 scaled 47.97
2013-06-11 19:32:15.6014 [PID=22629] [send] [HOST#XX] sending [RESULT#498050348 de_separation_79_DR8_rev_2_1370993394_149_0] (est. dur. 47.97 seconds)
2013-06-11 19:32:15.6017 [PID=22629] [version] looking for version of milkyway_nbody
2013-06-11 19:32:15.6017 [PID=22629] [version] [AV#475] Skipping CPU version - user prefs say no CPU
2013-06-11 19:32:15.6017 [PID=22629] [version] Checking plan class 'mt'
2013-06-11 19:32:15.6017 [PID=22629] [version] Multi-thread app projected 10.50GS
2013-06-11 19:32:15.6017 [PID=22629] [version] [AV#481] Skipping CPU version - user prefs say no CPU
2013-06-11 19:32:15.6017 [PID=22629] [version] returning NULL; platforms:
2013-06-11 19:32:15.6017 [PID=22629] [version] windows_x86_64
2013-06-11 19:32:15.6017 [PID=22629] [version] windows_intelx86
2013-06-11 19:32:15.6017 [PID=22629] [version] returning cached version: [AV#485]
2013-06-11 19:32:15.6017 [PID=22629] [send] est. duration for WU 380375117: unscaled 33.83 scaled 35.88
2013-06-11 19:32:15.6017 [PID=22629] [send] [WU#380375117] meets deadline: 47.97 + 35.88 < 1036800
2013-06-11 19:32:15.6017 [PID=22629] [version] returning cached version: [AV#485]
2013-06-11 19:32:15.6017 [PID=22629] [send] est. duration for WU 380375117: unscaled 33.83 scaled 35.88
2013-06-11 19:32:15.6017 [PID=22629] [send] [WU#380375117] meets deadline: 47.97 + 35.88 < 1036800
2013-06-11 19:32:15.6034 [PID=22629] [send] Sending app_version milkyway 2 102 ati14; projected 437.54 GFLOPS
2013-06-11 19:32:15.6036 [PID=22629] [send] est. duration for WU 380375117: unscaled 33.83 scaled 35.88
2013-06-11 19:32:15.6036 [PID=22629] [send] [HOST#XX] sending [RESULT#498050349 de_separation_20_2s_sscon_1_1370993394_150_0] (est. dur. 35.88 seconds)
2013-06-11 19:32:15.6039 [PID=22629] [quota] reached limit on GPU jobs in progress
2013-06-11 19:32:15.6039 [PID=22629] [quota] Overall limits on jobs in progress:
2013-06-11 19:32:15.6039 [PID=22629] [quota] CPU: base 3 scaled 12 njobs 0
2013-06-11 19:32:15.6039 [PID=22629] [quota] GPU: base 40 scaled 40 njobs 40
2013-06-11 19:32:15.6039 [PID=22629] [send] don't need more work
2013-06-11 19:32:15.6048 [PID=22629] Sending reply to [HOST#XX]: 2 results, delay req 61.00
____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0

Message 58744 - Posted: 11 Jun 2013, 23:38:19 UTC - in response to Message 58742.
Last modified: 11 Jun 2013, 23:39:06 UTC

I fall under this catagory.
2. hosts with ATI GPUs that don't have the compute capability are getting GPU workunits.


As of now?

Also, what's the error message (if it's printing out one)?
____________

Profile ritterm
Avatar
Send message
Joined: 16 Jun 08
Posts: 73
Credit: 362,759,005
RAC: 0

Message 58745 - Posted: 11 Jun 2013, 23:38:28 UTC - in response to Message 58740.

Made an update, let me know if this let you get some ATI GPU workunits.

D'oh! Okay, I'm back in business...for now! ;-)

____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0

Message 58746 - Posted: 11 Jun 2013, 23:38:41 UTC - in response to Message 58741.
Last modified: 11 Jun 2013, 23:39:16 UTC

3. hosts with ATI GPUs aren't getting workunits...

I'm in that boat. My 5870 host got it's last WU at about 1950 UTC.


Have you tried to grab work recently? I just made a couple more updates.

If it's not, what's the error message, if any?
____________

Herge
Send message
Joined: 11 Oct 09
Posts: 19
Credit: 186,871,163
RAC: 0

Message 58748 - Posted: 11 Jun 2013, 23:52:34 UTC - in response to Message 58740.
Last modified: 12 Jun 2013, 0:01:23 UTC

#3 is working fine for me. Separation Runs 79_DR8_rev_3 run almost 40% longer than separation_21_2s_sscon_1 for a return 5% lower over the same period.

Alinator
Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0

Message 58752 - Posted: 12 Jun 2013, 0:42:25 UTC - in response to Message 58743.
Last modified: 12 Jun 2013, 0:48:20 UTC

4.) Winbox CPU host not getting anything.


Looks like you just got some? From the scheduler (I XXXed out your IP and host id just in case you're hiding those):

2013-06-11 19:32:15.5706 [PID=22629] Request: [USER#5696] [HOST#XXXXX] [IP XXXXXXXX] client 6.12.34

<snip scheduler log>


Nope, that was this host, which had been suffering from case 2 and is now drawing nBodies, no GPU work, but no CPU MilkyWays since I set him up to test 1.18

The ones I was talking about are the single core Intel and AMD's without any GPU capability which had been running regular MW CPU tasks, but have completed and reported all but one and haven't gotten any new work since early on the 10th.

This one is currently hungry for CPU work but isn't getting any. I assume he should not be getting nBody but should still be getting regular MW CPU work.

Gert
Send message
Joined: 3 Dec 12
Posts: 6
Credit: 6,755,635
RAC: 0

Message 58753 - Posted: 12 Jun 2013, 0:51:22 UTC - in response to Message 58739.
Last modified: 12 Jun 2013, 0:52:07 UTC

I'm having the same issue as Alinator. No CPU tasks. I run other projects on my GPUs. It has worked fine in the past.

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0

Message 58754 - Posted: 12 Jun 2013, 0:56:22 UTC - in response to Message 58752.

4.) Winbox CPU host not getting anything.


Looks like you just got some? From the scheduler (I XXXed out your IP and host id just in case you're hiding those):

2013-06-11 19:32:15.5706 [PID=22629] Request: [USER#5696] [HOST#XXXXX] [IP XXXXXXXX] client 6.12.34



Nope, that was this host, which had been suffering from case 2 and is now drawing nBodies, no GPU work, but no CPU MilkyWays since I set him up to test 1.18

The ones I was talking about are the single core Intel and AMD's without any GPU capability which had been running regular MW CPU tasks, but have completed and reported all but one and haven't gotten any new work since early on the 10th.

This one is currently hungry for CPU work but isn't getting any. I assume he should not be getting nBody but should still be getting regular MW CPU work.


I don't have any scheduler requests from a host with that #. Could you do a manual update?
____________

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0

Message 58755 - Posted: 12 Jun 2013, 0:57:44 UTC - in response to Message 58753.

I'm having the same issue as Alinator. No CPU tasks. I run other projects on my GPUs. It has worked fine in the past.


Could you do a manual update? I don't have any entries in the scheduler's log for your userid, so I can try and debug the issue.
____________

Alinator
Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0

Message 58756 - Posted: 12 Jun 2013, 1:05:28 UTC - in response to Message 58754.


I don't have any scheduler requests from a host with that #. Could you do a manual update?


Done

Dale Turner
Send message
Joined: 19 Mar 13
Posts: 3
Credit: 608,644
RAC: 0

Message 58757 - Posted: 12 Jun 2013, 1:05:50 UTC

3. hosts with ATI GPUs aren't getting workunits.

Gert
Send message
Joined: 3 Dec 12
Posts: 6
Credit: 6,755,635
RAC: 0

Message 58758 - Posted: 12 Jun 2013, 1:07:50 UTC - in response to Message 58755.

I just ran two manual updates. Neither returned anything. Hopefully you can spot something in the logs. :)

Dale Turner
Send message
Joined: 19 Mar 13
Posts: 3
Credit: 608,644
RAC: 0

Message 58759 - Posted: 12 Jun 2013, 1:10:32 UTC

GPU lacks the necessary double precision extension

Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0

Message 58760 - Posted: 12 Jun 2013, 1:21:45 UTC - in response to Message 58758.

I just ran two manual updates. Neither returned anything. Hopefully you can spot something in the logs. :)


Strange, it's saying it returned 2 results to you...
____________

1 · 2 · Next
Post to thread

Message boards : News : another scheduler update


Main page · Your account · Message boards


Copyright © 2017 AstroInformatics Group