Welcome to MilkyWay@home

Posts by mmonnin

1) Message boards : Number crunching : Delay in getting new work units untill all work units have cleared (Message 69143)
Posted 30 Sep 2019 by mmonnin
Post:
We have been monitoring the situation, and it seems like the community has found fixes to some of the problems you are experiencing.

Jake said that the problem appeared to be some obscure BOINC setting somewhere, and had asked BOINC forums about it. It looks like this issue disappears in the new beta of a BOINC client, so they must have patched whatever was causing problems. When that is released, hopefully the problem will be resolved.

- Tom


What fixes are those? MW work runs out, waits a couple of minutes then the server finally gives us more work. The server should be give us more work the entire time, not wait until our MW queues are empty to provide more work.
2) Message boards : Number crunching : How many CPUs? (Message 69142)
Posted 30 Sep 2019 by mmonnin
Post:
The BOINC Client manages this, not a project. mt apps will sometimes use 1 thread during the initial setup phase prior to the science app starting to do work then BOINC to stop other tasks once the science app starts to use more CPU. The exception I have seen are when there are some tasks close to deadline and in high priority mode.

The client needs a full CPU thread to reserve a thread so 0.0497 of a CPU does not count towards your 6 available threads.
3) Message boards : Number crunching : Rx570 vs. gtx 1080, 1080ti, 2080 (Message 69116)
Posted 24 Sep 2019 by mmonnin
Post:
PrimeGrid GFN are FP64 as well. The last NV consumer card with high FP64 was the Titan Black I believe. Now NV leaves that to the Pro cards like Tesla.
4) Message boards : Number crunching : Delay in getting new work units untill all work units have cleared (Message 69109)
Posted 23 Sep 2019 by mmonnin
Post:
Setup a 2nd gpu project with a zero resource share so when MW runs out it will get tasks and your gpu won't be idle while waiting for MW to refill the cache. PrimeGrid and Collatz are two projects that almost always have tasks, at PrimeGrid you can pick wu's that run very quickly like the MW wu's do and at Collatz if you do use the optimization codes the wu's will run much faster as well.


This is just a workaround, not a solution.. developers must fix it

This is for sure a server-side misconfiguration, in 15years of boinc it never happened with any other project


I do this for every client and its always a good idea no matter your main project.
5) Message boards : Number crunching : WUs not downloaded in time - rig is idling - doing no work ... (Message 69098)
Posted 21 Sep 2019 by mmonnin
Post:
when you ron out of WU the first deferred communication is always more or less 1.40minutes.. in this first stage it uploads the latest results (and don't downloads anything)

then begin the second deferred communication of more or less 12minutes (can't remember).. the gpu is idling and then since no results to report, the downloads of 300wu begin

this loop is always the same on all hosts.. next time i'll run out of WU i will post the exact times..


There is nothing to actually upload. Set the client to no networking and the tasks go straight to Waiting to Report and not Uploading. There are no data files to download or upload for this project per task.
6) Message boards : Number crunching : Rx570 vs. gtx 1080, 1080ti, 2080 (Message 69066)
Posted 19 Sep 2019 by mmonnin
Post:
AMD RX cards perform MUCH better at E@H than MW@H. That is where I ran my RX580.

This project favors cards with high FP64 compute power. So AMD 78xx/R9, NV Titan Black, AMD Radeon VII, NV Titan V. Most of the top PCs run one of those 4 generations of GPUs.
https://milkyway.cs.rpi.edu/milkyway/top_hosts.php

I have most of your cards, the RX, 1080 and Ti but will never run them at MW@H as they have low FP64 compute power. But if you want, run as many simultaneous tasks to keep the utilization pegged. The CPU port of <gpu_versions> just dedicates that much CPU to NOT run CPU tasks. It in no way affects the actual CPU usage of the GPU app. If you put <cpu_usage>4</cpu_usage> BOINC will not run 4 CPU tasks and leave those 4 CPUs for the GPU even if the GPU task uses 0.1 CPU threads in Task Mgr.
7) Message boards : Number crunching : de_modfit_80_bundle4_4s_south4s - error messages (Message 69065)
Posted 19 Sep 2019 by mmonnin
Post:
0130 Thursday (Central EU time) the reinstallation of studio driver manually seems to have fixed this issue.
task is running ok.


The very 1st suggestion was the fix...
8) Message boards : Number crunching : Delay in getting new work units untill all work units have cleared (Message 69064)
Posted 19 Sep 2019 by mmonnin
Post:
https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4424

Already discussed.
9) Message boards : Number crunching : WUs not downloaded in time - rig is idling - doing no work ... (Message 69050)
Posted 17 Sep 2019 by mmonnin
Post:
Yes, this was reported in May. From the last result there needs to be a 10min period of no requests until the clients can get more work.
https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4424&postid=68441#68441

I'm about to setup a script to turn off networking for like 11min, resume and do a project update, allow for 30min or so then repeat.
10) Message boards : Number crunching : de_modfit_80_bundle4_4s_south4s - error messages (Message 69037)
Posted 15 Sep 2019 by mmonnin
Post:
Post results from the command 'clinfo' in CMD prompt. You may have installed NV drivers but Win10 probably overwrote it.
11) Message boards : Number crunching : MilkyWay takes a backseat to Einstein ??? (Message 68992)
Posted 28 Aug 2019 by mmonnin
Post:
I always run 1 client for CPUs and a separate client for GPU work on a single PC. That'll fix caching issues between CPU and GPU tasks.
12) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68483)
Posted 6 Apr 2019 by mmonnin
Post:
200 computed tasks in less 10 minutes? It is not possible even for fastest machines. Very best computers with few modern powerful GPUs working in parallel dedicated to the single project of MW can do 200 tasks "only" in ~20-40 min.


Beg to differ. If I have a host with 8 RTX 2080 TI cards or similar, I can easily crunch through 200 tasks in ten minutes. There are many hosts with mining rig pedigrees that have multiple gpus. I have a minimum of 3 cards in every host.


Then your task count is higher with more cards. 200 is the limit for 1 GPU and the statement was in regards to 1 single GPU. Only a TV or 7 is crunching in that time with 200 tasks per GPU.

It seems like its hard enough to get the admins to realize the issue wasn't how many task can be downloaded at once but the timeout issue completely preventing tasks from downloading at all. Please stay on topic instead of e-peening about omg my gpus can do it in 10minutes.
13) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68455)
Posted 29 Mar 2019 by mmonnin
Post:
Well for my 280x it takes over 2 hours but there are quite a few requests within 10min. I wish for 30min. :) I'd guess a PC would need to run 1 task for over 10min to not run into the issue.

These 4 lines come every 2-4 task completions . Some complete, none are downloaded. Queue runs dry. Moo/Collatz take over 10min for a task and a new set of MW tasks arrive.

362062 Milkyway@Home 3/28/2019 11:01:04 PM Sending scheduler request: To fetch work.
362063 Milkyway@Home 3/28/2019 11:01:04 PM Reporting 2 completed tasks
362064 Milkyway@Home 3/28/2019 11:01:04 PM Requesting new tasks for AMD/ATI GPU
362065 Milkyway@Home 3/28/2019 11:01:06 PM Scheduler request completed: got 0 new tasks

Can the server distinguish between auto updates like those above and user updates so that the former could have a lower limit than possible user spam? The log file mentions a user update but does the server know?
14) Message boards : Number crunching : Download Stalled? (Message 68452)
Posted 28 Mar 2019 by mmonnin
Post:
Recently been having trouble getting new tasks, because some N-Body won't download - and I haven't opted for N-Body tasks for a couple of years? Is there a simple fix for this?

3/28/2019 3:03:11 PM | Milkyway@Home | Reporting 4 completed tasks
3/28/2019 3:03:11 PM | Milkyway@Home | Not requesting tasks: some download is stalled
3/28/2019 3:03:13 PM | Milkyway@Home | Scheduler request completed
3/28/2019 3:13:16 PM | Milkyway@Home | Sending scheduler request: Requested by project.
3/28/2019 3:13:16 PM | Milkyway@Home | Not requesting tasks: some download is stalled
3/28/2019 3:13:17 PM | Milkyway@Home | Scheduler request completed
3/28/2019 3:23:18 PM | Milkyway@Home | Sending scheduler request: Requested by project.
3/28/2019 3:23:18 PM | Milkyway@Home | Not requesting tasks: some download is stalled
3/28/2019 3:23:20 PM | Milkyway@Home | Scheduler request completed
3/28/2019 3:33:23 PM | Milkyway@Home | Sending scheduler request: Requested by project.
3/28/2019 3:33:23 PM | Milkyway@Home | Not requesting tasks: some download is stalled
3/28/2019 3:33:25 PM | Milkyway@Home | Scheduler request completed


Have you tried resetting the project?
15) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68450)
Posted 28 Mar 2019 by mmonnin
Post:
Hey guys,

So the current set up allows for users to have up to 200 workunits per GPU on their computer and another 40 workunits per CPU with a maximum of 600 possible workunits.

On the server, we try to store a cache of 10,000 workunits. Sometimes when a lot of people request work all at the same time, this cache will run low.

So all of the numbers I have listed are tunable. What would you guys recommend for changes to these numbers?

Jake


It's not any of these settings. When the server allows work, we get work. But there is a timeout to prevent users from spamming projects with frequent requests. These tasks are so quick tasks are constantly uploading. About ever 30-35 seconds for me. So we are constantly requesting too frequently until all tasks are done, the delay passes and then we can get more work.

I still have a PC that has not contacted the server after the upgrade. The sched_reply_milkyway.cs.rpi.edu_milkyway.xml file does not have this line at all in the old version.

<next_rpc_delay>600.000000</next_rpc_delay>

https://boinc.berkeley.edu/trac/wiki/ProjectOptions#client-control

For reference, the entire old version of the file minus some user info.

<scheduler_reply>
<scheduler_version>707</scheduler_version>
<dont_use_dcf/>
<master_url>http://milkyway.cs.rpi.edu/milkyway/</master_url>
<request_delay>91.000000</request_delay>
<project_name>Milkyway@Home</project_name>
<project_preferences>
<resource_share>10</resource_share>
<no_cpu>1</no_cpu>
<no_ati>0</no_ati>
<no_cuda>0</no_cuda>
<project_specific>
<max_gfx_cpu_pct>20</max_gfx_cpu_pct>
<gpu_target_frequency>60</gpu_target_frequency>
<nbody_graphics_poll_period>30</nbody_graphics_poll_period>
<nbody_graphics_float_speed>5</nbody_graphics_float_speed>
<nbody_graphics_textured_point_size>250</nbody_graphics_textured_point_size>
<nbody_graphics_point_point_size>40</nbody_graphics_point_point_size>
</project_specific>
<venue name="home">
<resource_share>50</resource_share>
<no_cpu>0</no_cpu>
<no_ati>1</no_ati>
<no_cuda>1</no_cuda>
<project_specific>
<max_gfx_cpu_pct>20</max_gfx_cpu_pct>
<gpu_target_frequency>60</gpu_target_frequency>
<nbody_graphics_poll_period>30</nbody_graphics_poll_period>
<nbody_graphics_float_speed>5</nbody_graphics_float_speed>
<nbody_graphics_textured_point_size>250</nbody_graphics_textured_point_size>
<nbody_graphics_point_point_size>40</nbody_graphics_point_point_size>
</project_specific>
</venue>
</project_preferences>

<result_ack>
    <name>de_modfit_sim19fixed_bundle4_4s_NoContraintsWithDisk260_3_1533467104_9447502_1</name>
</result_ack>
<result_ack>
    <name>de_modfit_sim19fixed_bundle4_4s_NoContraintsWithDisk260_1_1533467104_9241919_1</name>
</result_ack>
</scheduler_reply>
16) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68438)
Posted 28 Mar 2019 by mmonnin
Post:
Jake -

I just did a "user request for tasks" and received 51 new ones.

As I have written before, every 4 minutes or so I complete a task, report it, request new tasks, and get nothing. I have about 200 tasks now from my user requests so I will have to wait until tomorrow and see if my stock pile bleeds down.


I've ran out of tasks several times overnight. BOINCTasks history shows a lot of MW tasks, 1 task from Moo or Collatz (my 0% resource share backup projects) then a lot more MW tasks. There were several cycles of this.
17) Message boards : Number crunching : Collected list of badges available? (Message 68437)
Posted 28 Mar 2019 by mmonnin
Post:
Looked through the forum, and couldn't find one. Does one exist?


This site has all badges from all projects:
https://signature.statseb.fr/index.py?badge=24
18) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68434)
Posted 27 Mar 2019 by mmonnin
Post:
Jake -

Just tried a "user update" and got another 30 tasks.

And still, every time I complete a task and try to "replace it", no tasks are down loaded.


This is the issue.
19) Message boards : News : New Server Update (Message 68433)
Posted 27 Mar 2019 by mmonnin
Post:
Vortac and wb8ili,

Is this the case for every request? What is the most workunits you have received from a request? Our current configuration settings allow for 600 download per request so I'm trying to pinpoint where this error is occurring.

Best,
Jake


I am seeing the same as wb8ili and I described it above:

It seems like I can't get any work until my queue runs completely dry and then I'll download 200 more tasks. Those 200 will complete and the queue will continue to drop. Tasks are being reported immediately but no tasks are downloaded to keep the queue topped off. If I try to manually update I'm just told the last request was too recent. That continues until everything is gone, I run a couple of tasks from a backup project and then I can update to get 200 more tasks. The older server would keep tasks at 80 pretty much at all times without user intervention.
20) Questions and Answers : Web site : Website bugs since server transition (Message 68406)
Posted 26 Mar 2019 by mmonnin
Post:
Bill, which team was it? It seems fixed now or a stats refresh moved things around.


Next 20

©2019 Astroinformatics Group