Welcome to MilkyWay@home

Posts by jpmboy

1) Message boards : Number crunching : Server fails to download GPU tasks so long as there are tasks "ready to report" (Message 69397)
Posted 23 Dec 2019 by jpmboy
Post:
I don't run cpu tasks at Projects where my gpu's can crunch so that's never been a problem for me, I have the mw back-off stuff in my files in the right places and STILL get the 'not sending tasks' BS so run Einstein as a zero resource share Project until MilkyWay refills my cache.

just to followup. I also have the mod running on this rig. A single Radeon VII. No GPU idle time. Sorry for the large image. I did scale it to 80% original.
2) Message boards : Number crunching : Server fails to download GPU tasks so long as there are tasks "ready to report" (Message 69395)
Posted 22 Dec 2019 by jpmboy
Post:
it's been running as intended for a day now. Zero GPU idle time on a rig with 3 Titan Vs. I had initially not disabled CPU tasks as described in the other thread. Works great! Give Jstateson's solution a try.
3) Message boards : Number crunching : Server fails to download GPU tasks so long as there are tasks "ready to report" (Message 69393)
Posted 21 Dec 2019 by jpmboy
Post:
Issue is fixed. This thread should be closed.
4) Message boards : Number crunching : Server fails to download GPU tasks so long as there are tasks "ready to report" (Message 69392)
Posted 21 Dec 2019 by jpmboy
Post:
No worries, no cpu tasks are running or being fetched. Preferences set to not allow them on rigs called "home." I have separate preferences for other machines which do run cpu tasks.
Still holding "proper" gpu tasks on the 2 rigs running the mod. :thumb
5) Message boards : Number crunching : Finally getting new tasks only seconds after running out. May not be worth the hassle. (Message 69391)
Posted 21 Dec 2019 by jpmboy
Post:
I'm amazed that you can keep the S9100s from catching fire! Must have a some serious air flow on them Wow!
Yeah, the Koolance software works with their internal fan/pump controller (it's discontinued unfortunately - it really work better than the new new bay-mount controller. I have a couple of Aquaeros here, and even one of Aquacomputer's 720XTs and one of their GiGant 1680 rad "things". The Aquaeros are the best controller available, expensive though. If U used the stock air coolers on the gear here, my wife would not tolerate this vice, tho I would be able to hear here complain about it. :)
SIV64's author really did a deep dive on the SIO and other signal busses... to the point where the software scares off a lot of potential users. That said, once you learn it, I have found nothing comparable (AID64, HWi, etc are all good, just not as good as SIV64 IMO).
Here's a link to some rig pics on G-drive just for grins: https://drive.google.com/open?id=19nHZg1I-PAoCmL56VgnEFLsxYKlszldO

Thanks again for sorting this GPU idle time thing out Outstanding work!
6) Message boards : Number crunching : Finally getting new tasks only seconds after running out. May not be worth the hassle. (Message 69389)
Posted 21 Dec 2019 by jpmboy
Post:
3 Titan Vs running in P2 state at 1466MHz 0.75V. 7 tasks per GPU.
I did not disable the P2 undervolt when cuda is detected in the stack, which would allow for P1 cuda calculations. Each card would then pull ~ 250W.

7) Message boards : Number crunching : Finally getting new tasks only seconds after running out. May not be worth the hassle. (Message 69388)
Posted 21 Dec 2019 by jpmboy
Post:
mikey, If you can deal with not running any CPU tasks, JStateson's fix to the delayed GPU task-fetching issue does work! I have implemented it on two machines now and both are fetching GPU tasks while completed tasks are "Ready to report". The GPUs have not sat idle at all since,
It's pretty easy, just requires edits to the cc file, a simple regedit and replacing boinc.exe with his complied version (v7.15), renamed boinc.exe, boincmgr and boinctray exe's to " .origexe" and did not have to delete the reg strings for tray and mgr, only so I could switch back to boinc v7.14 easily if needed[/quote]
8) Message boards : Number crunching : Finally getting new tasks only seconds after running out. May not be worth the hassle. (Message 69381)
Posted 20 Dec 2019 by jpmboy
Post:
Yeah, it sure is a bug in the server side software... maybe we can find a way to patch the issue from the client side. JStaeson has been at this pretty creatively. Maybe his home-grown fix can work on this rig also.
Honestly, it's not an issue on my R290 or R295x2 rig. Only on the Titan V and Radeon VII rigs. If we can "fix" this Titan V rig (3 gpus) it shold also work on the VII rig. Well, I hope anyway. :)
9) Message boards : Number crunching : Finally getting new tasks only seconds after running out. May not be worth the hassle. (Message 69380)
Posted 20 Dec 2019 by jpmboy
Post:
I did disable the N-body sims and aborted any tasks I had "Ready to run. Same issue. Running separation and gpu tasks the thread count never exceeds 32. There was a few instances where two 16C N-body tasks would start up if there were no GPU tasks running (which can be for 10-20min before new tasks DL after the last is uploaded).
Check you PM. I'll link some more info trying the boinctasks (no mgr regedit thing). :)
10) Message boards : Number crunching : Server fails to download GPU tasks so long as there are tasks "ready to report" (Message 69377)
Posted 20 Dec 2019 by jpmboy
Post:
Ugh! The fetch for GPU tasks returned back to the bugged behavior of not sending any new GPU tasks when there are any GPU tasks "ready to report". While, CPU tasks will continue to accumulate.
This is definitely a BUG with Milkyway GPU tasks.
11) Message boards : Number crunching : Finally getting new tasks only seconds after running out. May not be worth the hassle. (Message 69376)
Posted 20 Dec 2019 by jpmboy
Post:
Spoke too soon. After running overnight, the behavior returned back to No new tasks when there are completed task "ready to report". Worked for a couple of hours it seems. UGH!
12) Message boards : Number crunching : Server fails to download GPU tasks so long as there are tasks "ready to report" (Message 69375)
Posted 20 Dec 2019 by jpmboy
Post:
Fixed.
13) Message boards : Number crunching : Finally getting new tasks only seconds after running out. May not be worth the hassle. (Message 69374)
Posted 20 Dec 2019 by jpmboy
Post:
Okay... so what I decided to do was to clean out the nvidia driver base and install the latest Studio driver, Using the cc file shown below and v7.15 it now seems to be fetching 20-30 tasks while at the same time sending ~180 completed tasks back to the MW server! Viola. I'll let it run overnight and see it it holds up. It may (somehow) have been related to the 441.22 vs 441.66 drivers (both are the non-DCH Studio driver). 441.66 clean install may have been the fix. Crazy!
<cc_config>
<log_flags>
<task>1</task>
<file_xfer>1</file_xfer>
<mw_debug>1</mw_debug>
</log_flags>
<options>
<use_all_gpus>1</use_all_gpus>
<allow_remote_gui_rpc>1</allow_remote_gui_rpc>
<mw_low_water_pct>1</mw_low_water_pct>
<mw_high_water_pct>16</mw_high_water_pct>
<mw_wait_interval>512</mw_wait_interval>
</options>
</cc_config>
THe R6Ax299 rig was the "offending" party. :)
14) Message boards : Number crunching : Finally getting new tasks only seconds after running out. May not be worth the hassle. (Message 69373)
Posted 19 Dec 2019 by jpmboy
Post:
BTW - is there any "formal" bug report system for Boinc or MW thru Boinc?
I'm not throwing compute power at this for 'coin, just for the community. My wife is an RPI grad and I gave many an organic chemistry seminar/lecture there decades ago.
15) Message boards : Number crunching : Finally getting new tasks only seconds after running out. May not be worth the hassle. (Message 69372)
Posted 19 Dec 2019 by jpmboy
Post:
In the one right below? Yeah - in the initial one where I was just popping flags in at the request of... probably too much advice from several users. my fault.
16) Message boards : Number crunching : Finally getting new tasks only seconds after running out. May not be worth the hassle. (Message 69370)
Posted 19 Dec 2019 by jpmboy
Post:
C'mon man. This is the cc_ file currently in use (only recent change is 512 to 256 to see if it does anything). Boincmgr reading it occurs once (unless you manually request additional reads). And of course it had many flags toggled "1" while trying to ID the actual root cause of the problem.

Guys - nothing is slowing down the actual processing of tasks... except that MW will not provide GPU tasks so long as any are READY TO REPORT. I hope that is clear now. *although it is a know bug*
This has nothing to do with the length of a cc_config file.

<cc_config>
<log_flags>
<task>1</task>
<file_xfer>1</file_xfer>
<mw_debug>1</mw_debug>
</log_flags>
<options>
<use_all_gpus>1</use_all_gpus>
<allow_remote_gui_rpc>1</allow_remote_gui_rpc>
<mw_low_water_pct>1</mw_low_water_pct>
<mw_high_water_pct>16</mw_high_water_pct>
<mw_wait_interval>256</mw_wait_interval>
</options>
</cc_config>
17) Message boards : Number crunching : Finally getting new tasks only seconds after running out. May not be worth the hassle. (Message 69368)
Posted 19 Dec 2019 by jpmboy
Post:
Yeah, I'm not having any problem with 7.15, I hope that's not a misunderstanding. 7.15 works fine, but for this issue, it seems to work the same as 7.14 on my install. I'll let cpu tasks run out (disable both n-body and separation in preferences) and see if that solves the issue. I did not suspend N-body. I set preferences to exclude this task type in web preferences.
I have experienced the same "no new tasks while any are returned" issue before with cpu tasks disabled, but using 7.14 at the time - hence the reason for trying 7.15. And this same behavior occurs on another rig here running Radeon VII, which completes tasks in under 1min... again, so no 91 sec period has zero completed tasks "Waiting to Report" :)
The core count might be the cause, but I'll test that (again) andI never see more than 80% CPU package load (ever) - the MT tasks actually lower CPU usage when compared to Separation on the same number of threads. I hope this is the issue and fixes this "no GPU tasks when completed GPU tasks are queued for uploading. That would be a viola!
Correction to last post/PM - MT 16C tasks complete in as little as 13min and as long as 80min irrespective of whether any GPU tasks are running.

Max_TASK_LIMIT would seem to affect the number a client can hold, but not trhe number it could receive until hitting that number. I'm less concernd whether I can load up to 800 or 737... but only with, as you point out, the known problem with ZERO GPU tasks being added to reach that, or any limit while one or more completed GPU tasks is "Ready to report".
18) Message boards : Number crunching : Server fails to download GPU tasks so long as there are tasks "ready to report" (Message 69366)
Posted 19 Dec 2019 by jpmboy
Post:
Old problem... "No New Tasks" returned whenever any tasks are "Ready to report"
It seems that this is something that could be resolved rather easily by setting as flag allowing tasks to be "Fetched" until the MAX_TASK_LIMIT is reached??
Running 3 Titan Vs on a single rig, single client, and I would not know how to or look to spoof the gpu count. Each Titan V runs 6 tasks each with 0.5 CPU per completing tasks in 52-55sec, The problem is that the MW server will not send new GPU tasks to my client so long as any completed tasks are ready to report. Once GPU tasks download and start, there is never a 91sec period where there are no completed tasks. This is not an issue for the CPU tasks since it will not accumulate completed tasks every 91 sec.
19) Message boards : Number crunching : Finally getting new tasks only seconds after running out. May not be worth the hassle. (Message 69365)
Posted 19 Dec 2019 by jpmboy
Post:
Yeah, I can account for the MAX_TASK_LIMIT. (Again, it can only reach that LIMIT if I manually snooze the GPUs BEFORE any completed tasks are ready to report). But the issue is not whether a 36 thread CPU can handle the load, It does. Each Titan V runs 6 tasks each with 0.5 CPU per completing tasks in 52-55sec, N-body sims on 16C each complete in ~ 60-90min with the total CPU load under 70% average). The problem is that the MW server will not send new GPU tasks to my client so long as any completed tasks are ready to report. Once GPU tasks download and start, there is never a 91sec period where there are no completed tasks. This is not an issue for the CPU tasks since it will not accumulate completed tasks every 91 sec.


20) Message boards : Number crunching : Finally getting new tasks only seconds after running out. May not be worth the hassle. (Message 69359)
Posted 18 Dec 2019 by jpmboy
Post:
for some reason, MW server assigns a GPU count of "0" to a 3 GPU rig whenever there are Tasks "ready to Start" in the list. (and still only sends ~100 tasks per gpu).


Next 20

©2024 Astroinformatics Group