41)
Message boards :
News :
another scheduler update
(Message 58746)
Posted 11 Jun 2013 by Travis Post: 3. hosts with ATI GPUs aren't getting workunits... Have you tried to grab work recently? I just made a couple more updates. If it's not, what's the error message, if any? |
42)
Message boards :
News :
another scheduler update
(Message 58744)
Posted 11 Jun 2013 by Travis Post: I fall under this catagory. As of now? Also, what's the error message (if it's printing out one)? |
43)
Message boards :
News :
another scheduler update
(Message 58743)
Posted 11 Jun 2013 by Travis Post: 4.) Winbox CPU host not getting anything. Looks like you just got some? From the scheduler (I XXXed out your IP and host id just in case you're hiding those): 2013-06-11 19:32:15.5706 [PID=22629] Request: [USER#5696] [HOST#XXXXX] [IP XXXXXXXX] client 6.12.34 2013-06-11 19:32:15.5955 [PID=22629] [send] [HOST#XXXX] app version 321 is reliable 2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: random choice for cons valid 1165: yes 2013-06-11 19:32:15.5955 [PID=22629] [send] [AV#385] not reliable; cons valid 0 < 10 2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: cons valid 0 < 10, don't use single replication 2013-06-11 19:32:15.5955 [PID=22629] [send] [HOST#XX] app version 398 is reliable 2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: random choice for cons valid 76: yes 2013-06-11 19:32:15.5955 [PID=22629] [send] [HOST#XX] app version 418 is reliable 2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: random choice for cons valid 17442: yes 2013-06-11 19:32:15.5955 [PID=22629] [send] [AV#430] not reliable; cons valid 0 < 10 2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: cons valid 0 < 10, don't use single replication 2013-06-11 19:32:15.5955 [PID=22629] [send] [AV#436] not reliable; cons valid 0 < 10 2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: cons valid 0 < 10, don't use single replication 2013-06-11 19:32:15.5955 [PID=22629] [send] [AV#438] not reliable; cons valid 1 < 10 2013-06-11 19:32:15.5955 [PID=22629] [send] set_trust: cons valid 1 < 10, don't use single replication 2013-06-11 19:32:15.5955 [PID=22629] [send] [HOST#XX] app version 445 is reliable 2013-06-11 19:32:15.5956 [PID=22629] [send] set_trust: random choice for cons valid 23: yes 2013-06-11 19:32:15.5956 [PID=22629] [send] [HOST#XX] app version 451 is reliable 2013-06-11 19:32:15.5956 [PID=22629] [send] set_trust: random choice for cons valid 148: yes 2013-06-11 19:32:15.5956 [PID=22629] [send] [HOST#XX] app version 485 is reliable 2013-06-11 19:32:15.5956 [PID=22629] [send] set_trust: random choice for cons valid 510: yes 2013-06-11 19:32:15.5956 [PID=22629] [send] [AV#3000002] not reliable; cons valid 0 < 10 2013-06-11 19:32:15.5956 [PID=22629] [send] set_trust: cons valid 0 < 10, don't use single replication 2013-06-11 19:32:15.5956 [PID=22629] [quota] effective ncpus 4 ngpus 1 2013-06-11 19:32:15.5956 [PID=22629] [quota] max jobs per RPC: 400 2013-06-11 19:32:15.5956 [PID=22629] [quota] Overall limits on jobs in progress: 2013-06-11 19:32:15.5956 [PID=22629] [quota] CPU: base 3 scaled 12 njobs 0 2013-06-11 19:32:15.5956 [PID=22629] [quota] GPU: base 40 scaled 40 njobs 38 2013-06-11 19:32:15.5956 [PID=22629] [send] Not using matchmaker scheduling; Not using EDF sim 2013-06-11 19:32:15.5956 [PID=22629] [send] CPU: req 0.00 sec, 0.00 instances; est delay 0.00 2013-06-11 19:32:15.5956 [PID=22629] [send] AMD/ATI GPU: req 81174.22 sec, 0.00 instances; est delay 0.00 2013-06-11 19:32:15.5956 [PID=22629] [send] work_req_seconds: 0.00 secs 2013-06-11 19:32:15.5956 [PID=22629] [send] available disk 2.82 GB, work_buf_min 86400 2013-06-11 19:32:15.5957 [PID=22629] [send] active_frac 0.945916 on_frac 0.996949 2013-06-11 19:32:15.5957 [PID=22629] [send] CPU features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm sse4a osvw ibs skinit wdt page1gb rdtscp 3dnowext 3dnow 2013-06-11 19:32:15.5984 [PID=22629] [version] looking for version of milkyway 2013-06-11 19:32:15.5984 [PID=22629] [version] Checking plan class 'ati14' 2013-06-11 19:32:15.5984 [PID=22629] [version] Couldn't open plan class spec file '../plan_class_spec.xml' 2013-06-11 19:32:15.5984 [PID=22629] [version] ati14 ATI app projected 51.07G peak 5775.35G 0.963 CPUs 2013-06-11 19:32:15.5984 [PID=22629] [quota] [AV#485] scaled max jobs per day: 10510 2013-06-11 19:32:15.5984 [PID=22629] [version] [AV#485] (ati14) setting projected flops based on host elapsed time avg: 437.54G 2013-06-11 19:32:15.5984 [PID=22629] [version] [AV#485] (ati14) comparison pfc: 437.57G et: 437.54G 2013-06-11 19:32:15.5985 [PID=22629] [version] Best app version is now AV485 (437.61 GFLOP) 2013-06-11 19:32:15.5985 [PID=22629] [version] Checking plan class 'opencl_amd_ati' 2013-06-11 19:32:15.5985 [PID=22629] [version] plan_class opencl_amd_ati uses OpenCl version 0 2013-06-11 19:32:15.5985 [PID=22629] [version] [opencl] GPU/Driver/BOINC revision doesn not support OpenCL 2013-06-11 19:32:15.5985 [PID=22629] [quota] [AV#418] scaled max jobs per day: 27442 2013-06-11 19:32:15.5985 [PID=22629] [version] [AV#418] (opencl_amd_ati) setting projected flops based on host elapsed time avg: 363.93G 2013-06-11 19:32:15.5985 [PID=22629] [version] [AV#418] (opencl_amd_ati) comparison pfc: 364.08G et: 363.93G 2013-06-11 19:32:15.5985 [PID=22629] [version] Comparing AV#418 (363.92 GFLOP) against AV#485 (437.61 GFLOP) 2013-06-11 19:32:15.5986 [PID=22629] [version] Checking plan class 'opencl_nvidia' 2013-06-11 19:32:15.5986 [PID=22629] [version] plan_class opencl_nvidia uses OpenCl version 0 2013-06-11 19:32:15.5986 [PID=22629] [version] [AV#416] app_plan() returned false 2013-06-11 19:32:15.5986 [PID=22629] [version] [AV#485] (ati14) setting projected flops based on host elapsed time avg: 437.54G 2013-06-11 19:32:15.5986 [PID=22629] [version] [AV#485] (ati14) comparison pfc: 437.57G et: 437.54G 2013-06-11 19:32:15.5986 [PID=22629] [version] Best version of app milkyway is [AV#485] (437.54 GFLOPS) 2013-06-11 19:32:15.5986 [PID=22629] [send] est delay 0, skipping deadline check 2013-06-11 19:32:15.5987 [PID=22629] [version] returning cached version: [AV#485] 2013-06-11 19:32:15.5987 [PID=22629] [send] est delay 0, skipping deadline check 2013-06-11 19:32:15.6013 [PID=22629] [send] Sending app_version milkyway 2 102 ati14; projected 437.54 GFLOPS 2013-06-11 19:32:15.6014 [PID=22629] [send] est. duration for WU 380375116: unscaled 45.24 scaled 47.97 2013-06-11 19:32:15.6014 [PID=22629] [send] [HOST#XX] sending [RESULT#498050348 de_separation_79_DR8_rev_2_1370993394_149_0] (est. dur. 47.97 seconds) 2013-06-11 19:32:15.6017 [PID=22629] [version] looking for version of milkyway_nbody 2013-06-11 19:32:15.6017 [PID=22629] [version] [AV#475] Skipping CPU version - user prefs say no CPU 2013-06-11 19:32:15.6017 [PID=22629] [version] Checking plan class 'mt' 2013-06-11 19:32:15.6017 [PID=22629] [version] Multi-thread app projected 10.50GS 2013-06-11 19:32:15.6017 [PID=22629] [version] [AV#481] Skipping CPU version - user prefs say no CPU 2013-06-11 19:32:15.6017 [PID=22629] [version] returning NULL; platforms: 2013-06-11 19:32:15.6017 [PID=22629] [version] windows_x86_64 2013-06-11 19:32:15.6017 [PID=22629] [version] windows_intelx86 2013-06-11 19:32:15.6017 [PID=22629] [version] returning cached version: [AV#485] 2013-06-11 19:32:15.6017 [PID=22629] [send] est. duration for WU 380375117: unscaled 33.83 scaled 35.88 2013-06-11 19:32:15.6017 [PID=22629] [send] [WU#380375117] meets deadline: 47.97 + 35.88 < 1036800 2013-06-11 19:32:15.6017 [PID=22629] [version] returning cached version: [AV#485] 2013-06-11 19:32:15.6017 [PID=22629] [send] est. duration for WU 380375117: unscaled 33.83 scaled 35.88 2013-06-11 19:32:15.6017 [PID=22629] [send] [WU#380375117] meets deadline: 47.97 + 35.88 < 1036800 2013-06-11 19:32:15.6034 [PID=22629] [send] Sending app_version milkyway 2 102 ati14; projected 437.54 GFLOPS 2013-06-11 19:32:15.6036 [PID=22629] [send] est. duration for WU 380375117: unscaled 33.83 scaled 35.88 2013-06-11 19:32:15.6036 [PID=22629] [send] [HOST#XX] sending [RESULT#498050349 de_separation_20_2s_sscon_1_1370993394_150_0] (est. dur. 35.88 seconds) 2013-06-11 19:32:15.6039 [PID=22629] [quota] reached limit on GPU jobs in progress 2013-06-11 19:32:15.6039 [PID=22629] [quota] Overall limits on jobs in progress: 2013-06-11 19:32:15.6039 [PID=22629] [quota] CPU: base 3 scaled 12 njobs 0 2013-06-11 19:32:15.6039 [PID=22629] [quota] GPU: base 40 scaled 40 njobs 40 2013-06-11 19:32:15.6039 [PID=22629] [send] don't need more work 2013-06-11 19:32:15.6048 [PID=22629] Sending reply to [HOST#XX]: 2 results, delay req 61.00 |
44)
Message boards :
News :
another scheduler update
(Message 58740)
Posted 11 Jun 2013 by Travis Post: #3. But it just started again. Made an update, let me know if this let you get some ATI GPU workunits. |
45)
Message boards :
News :
another scheduler update
(Message 58735)
Posted 11 Jun 2013 by Travis Post: Updated the scheduler yet again. I just want to double check, which of the following are people having (since the update): 1. non-GPU hosts are getting GPU workunits. 2. hosts with ATI GPUs that don't have the compute capability are getting GPU workunits. 3. hosts with ATI GPUs aren't getting workunits. Is anyone having problems with NVIDIA GPUs? Or is this just an ATI thing? --Travis |
46)
Message boards :
News :
yet another scheduler update
(Message 58715)
Posted 11 Jun 2013 by Travis Post: Made a few more tweaks, how's this going? Let me know if anything isn't working that was working before we did the server code upgrade. |
47)
Message boards :
Number crunching :
MilkyWay not up loading new workunits
(Message 58706)
Posted 11 Jun 2013 by Travis Post: Please note, while we are now receiving AMD GPU work units, my machines, all day, have ZERO CPU work units. All 88 of my CPU cores are doing absolutely jack-shit all. The student who had done most of our server side modifications has basically been MIA, so in doing the server update I've had to figure out everything he did, and move it all over into the new main BOINC code. Now that BOINC is using git as it's version control software, keeping things up to date will be much easier, as we can have our own local repository and pull the main BOINC changes into it, and updating things as needed. Before when BOINC used SVN this was much more difficult. So once we get everything working again, I think things should be pretty good here on out, and it should be much easier to keep the software up to date, especially as I'll have figured out all the changes that were made here. |
48)
Message boards :
Number crunching :
Bunch of new computational errors
(Message 58705)
Posted 11 Jun 2013 by Travis Post: Now I have the same problem with WU 1.02. I have noticed that the warning that you don't have double precision processor is not being logged any more? Could this be the culprit? I think I just got this fixed. |
49)
Message boards :
Number crunching :
GPU apps delivered to single precision GPU
(Message 58704)
Posted 11 Jun 2013 by Travis Post: Appears to be fixed. I'm getting the message that my card isn't good enough and I'm not getting GPU WU's. Awesome! Hopefully this last fix did the trick. |
50)
Message boards :
News :
scheduler update
(Message 58702)
Posted 11 Jun 2013 by Travis Post: I've made some updates to the scheduler which I think should fix the problem with people getting GPU workunits that shouldn't be. Let me know if this change fixed things. --Travis |
51)
Message boards :
News :
added applications for ati only GPUs
(Message 58663)
Posted 10 Jun 2013 by Travis Post: I'd still like to know why on a computer that is CPU ONLY I'm getting ATI GPU units that can't run on this computer. I never had a problem only getting CPU WU's until whatever it was you all did last week. I'm getting the ATI units and they are refusing to start now. No surprise since this gfx card has never been able to run WU's. And its a HD 6670 but reads as a HD5700.. I've even reset the project to no avail. We're looking into it, hopefully have it figured out soon. Thanks for the patience with all this. |
52)
Message boards :
News :
What users aren't getting GPU workunits?
(Message 58662)
Posted 10 Jun 2013 by Travis Post: I'm getting ATI WU's on a computer that can't run them and has been set for CPU ONLY for months... They won't even start and so I'm aborting them. I've just sent out an email to the boinc projects mailing lists, so we're looking into it and hopefully can have a fix for you soon. |
53)
Message boards :
Number crunching :
MilkyWay not up loading new workunits
(Message 58661)
Posted 10 Jun 2013 by Travis Post: This weekend I downloaded the new BOINC Manager... I think I just put a fix in this, I'm assuming you're running the GPU workunits? |
54)
Message boards :
Number crunching :
No more GPU work units?
(Message 58655)
Posted 10 Jun 2013 by Travis Post: Any idea what caused this catastrophic malfunction? What steps are you guys taking to prevent these issues from arising in the future? We needed to update the BOINC server code due to a security issue. The version of the BOINC server code we had been using was a few months out of day, so in making the change a bunch of things broke, as the BOINC server code is being constantly updated. To make matters worse our old version of the BOINC server code had a few hacks in it to get things to work, so we had to port those over, or swap out our hacks for stuff in the main BOINC codebase now. Was a lot messier than expected, but now that we're back up to date with the server code things should be good for awhile. You need to recognize that both MilkyWay@Home and extent BOINC are actively developed research projects, and most of the work is done by myself (and I am not even at RPI anymore) and students at RPI. We don't have full time IT people getting paid to support these projects. I work on milkyway@home mostly in my free time, which is extremely limited now that I'm an assistant professor at university of north dakota and have many other research projects to work on and classes to teach. I'd like to say things like this won't happen in the future, but I'd be lying. Given the nature of MilkyWay@Home and BOINC, problems are going to happen as we have new students learn things and code gets updated. |
55)
Message boards :
News :
What users aren't getting GPU workunits?
(Message 58646)
Posted 10 Jun 2013 by Travis Post: Win7x64, BOINC 7.0.64, CPU i7-920, 6GB RAM w/ 1 each 7770, 7850, 7950 Getting any work after the recent update? |
56)
Message boards :
News :
potential fix for bad applications being sent out
(Message 58641)
Posted 10 Jun 2013 by Travis Post: Like arkayn said ... Just tried to put in a fix for this, let me know if it worked. --Travis |
57)
Message boards :
News :
added applications for ati only GPUs
(Message 58639)
Posted 10 Jun 2013 by Travis Post: I added the 0.82 applications as 1.02 with the ati14 plan class, so hopefully those who were not getting GPU workunits because they had ATI GPUs which didn't support opencl should be getting them now. Let me know if this is working. --Travis |
58)
Message boards :
News :
What users aren't getting GPU workunits?
(Message 58636)
Posted 10 Jun 2013 by Travis Post: What applications do you guys have selected in your project preferences? I'm wondering if maybe you turned off the opencl ATI version? --Travis |
59)
Message boards :
Number crunching :
No more GPU work units?
(Message 58629)
Posted 10 Jun 2013 by Travis Post: 6/10/2013 12:14:12 PM | Milkyway@Home | update requested by user What version BOINC client are you using? What applications do you have selected for the project to send? |
60)
Message boards :
News :
What users aren't getting GPU workunits?
(Message 58628)
Posted 10 Jun 2013 by Travis Post: I'm trying to figure out what users aren't getting GPU workunits. Is it everyone? Or just some people with certain types of ATI cards? As far as I can tell, there should be both opencl AMD/ATI and opencl NVIDIA applications available. --Travis |
©2024 Astroinformatics Group