 
    
            Message boards : 
            News : 
        30 Workunit Limit Per Request - Fix Implemented
Message board moderation
    
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
| Author | Message | 
|---|---|
| Send message Joined: 2 Aug 11 Posts: 13 Credit: 44,453,057 RAC: 0     | 
 Yes, such workaround (not a fix really) work fine.  I used slightly different cmd file after i have identified and described source of the problem there https://boinc.berkeley.edu/forum_thread.php?id=12918&postid=91355#91355 :start timeout 120 boinccmd.exe --project milkyway.cs.rpi.edu/milkyway update goto start For not very fast machines it eliminates idle time with no task completely. For very fast machines it reduces idle time to ~2 minutes maximum and about ~1 min on average. (if there are no other delays from other reasons like internet connection or server outages of course) P.S. If project staff will ever want to fix this problem for good there is a hint where to start: bug occurs only on combined request (reporting completed tasks + requesting new), while pure work requests work fine. That is why it mostly affects fast computer (they almost always have some finished task in queue) while rarely seen on slow. And why forced manual updates usually works OK while automatic requests fails: If BOINC client has nothing to report yet - server will give it a new work. If client reports something completed - the server will not give it new work. | 
|  Joseph Stateson  Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,463,985,753 RAC: 25     | 
 sorry, got posted twice, please delete | 
|  Joseph Stateson  Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,463,985,753 RAC: 25     | 
 some thoughts on fixing the problem with fast GPUs running out of data. I just switched to the new "seti special app" that uses CUDA90 and am getting elapsed times comparable to milkyway times. Slightly over 1 minute for GTX1070 (Linux only) This is comparable to my S9000 boards running milkyway. Currently I show almost 600 work units in the queue where normally 100 is the max for non cuda90. Looking at the event messages I see the following Seti completes a task and that task is added to the queue of tasks "ready to report". That queue grows while the queue of tasks "ready to start" deceases. When this process occurs the message "upload" is displayed in the manager (boinctasks for me) and the event message dialog box shows "Finished upload of …." At no time are tasks as for and no tasks are downloaded. ie: there is no message about getting "0" tasks like what shows up constantly on milkyway. Periodically I see the following: 905 SETI@home 7/15/2019 10:58:30 AM Reporting 64 completed tasks 906 SETI@home 7/15/2019 10:58:30 AM Requesting new tasks for NVIDIA GPU 907 SETI@home 7/15/2019 10:58:34 AM Scheduler request completed: got 77 new tasks Seti is not asking for new data after every "upload" and it seems an "upload" is not the same are reporting. In any event, seti asks for data much less frequently than milkyway. I am guessing that milkyway, on fast gpus, is asking for more data BEFORE the timeout "you asked for data too soon" and that is why no data is ever sent till 10 or more minutes after the very last reported tasks. Milkyway needs to STOP asking for data after each upload or better, get help from the SETI folks on how they implemented their buffering. | 
| Send message Joined: 26 Mar 18 Posts: 24 Credit: 102,912,937 RAC: 0     | 
 Thanks for the examples.  I'll implement the script to rerun the update command and see how it goes. | 
| Send message Joined: 2 Aug 11 Posts: 13 Credit: 44,453,057 RAC: 0     | 
 some thoughts on fixing the problem with fast GPUs running out of data. No, it not the case - I and other user already cheeked it long ago. Server side timeout on MW servers is only 1.5 min (91 sec) and BOINC client always wait at least this time before next request. You can see error about "Not sending work - last request too recent" only after forced updates (manual or via cmd/script). About 10 min - this is another internal CLIENT (not server) side timeout - if BOINC client gets error while requesting work from server - it will wait additional ~10 min before next request even if server did not ask for any delays/timeouts. But this is not the cause of the error - this is its consequence. It make problem a little bit worse (by increasing idle time) but has nothing to with the season/source of error itself. And about SETI - that you described it is a standard BOINC client behavior on any project then it has enough work in queue. It will contact server for reporting completed tasks and getting more work only about 1 time per hour or even less frequently. Only reason why at MW client sent requests very often - because it can not get enough work in queue due to server error on all combined requests. After some failed request queue begin running dry and client begins ask for work more often trying ti fill it up. But again - it not the cause/source of error. It is a consequence too. 1 - normal work 2 - errors while getting new work due to some errors in processing combined request (reporting completed WUs + requesting WUs in the same request to server) 3 - low local work cache on client because client successfully reports completed work but can not get any new 4 - client begin sending request to server often due to low work cache, but they all still fail 5 - work cache completely empty, all completed WUs already reported to server 6 - work request finally succeed (as work request without reporting completed work just fine) and brings a lot of new WUs 7 - go to 1 | 
|  Send message Joined: 5 Jul 11 Posts: 993 Credit: 377,122,459 RAC: 1,002     | 
 Hey guys, I'd like the limit to be only related to the number of GPUs. Eg. if I had (and I will shortly) a host with 20 GPUs, I could get 20 x 300 units, not just 600 (which would be only 30 per GPU). | 
|  Send message Joined: 5 Jul 11 Posts: 993 Credit: 377,122,459 RAC: 1,002     | 
 Several ways to fix this. All rely on issuing an update after waiting the about 2.5 minutes minimum. The delay is to allow the timeout of the "your request is too soon". I use Boinctasks but have never used its rules. Can you tell me how to set it up to do the above please? | 
|  Keith Myers  Send message Joined: 24 Jan 11 Posts: 733 Credit: 564,752,650 RAC: 11,890         | 
 I use Boinctasks but have never used its rules. Can you tell me how to set it up to do the above please? It's under the Extra Menu >> BoincTasks Settings >> Rules tab. Use the snippet of code posted as a guideline to build the rule with the constructor dialog and save the rule.   | 
|  Send message Joined: 5 Jul 11 Posts: 993 Credit: 377,122,459 RAC: 1,002     | 
 I use Boinctasks but have never used its rules. Can you tell me how to set it up to do the above please? Ok, I tried that, but the rule doesn't fire. The batch file works ok if I run it manually though. Although every time I test it, boinc seems to be automatically requesting new work 1:31 after it ran out anyway (the server backoff time after it reported the last task). Don't tell me somebody fixed something? It used to wait 8 minutes. Here's the rule I made, which doesn't launch the batch file: https://www.dropbox.com/s/dqhjgykvq2sg2ey/MW.jpg?dl=0 Update - it just launched it at the wrong time. AFTER Boinc had already downloaded another 600 tasks, about 5 minutes later the rule was triggered, despite the rule is supposed to be checking for less than 1 second left. It's now 6 hours left! | 
|  Joseph Stateson  Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,463,985,753 RAC: 25     | 
 I use Boinctasks but have never used its rules. Can you tell me how to set it up to do the above please? Just saw this. The image you posted shows one problem. "00d,00:02:30" is 150 seconds It is not obvious how the rules work and I had to do a lot of testing. Unaccountably, GoDaddy has messed with certificates on "my site" and I cannot easily put an image up. edit AppData\Roaming\eFMer\BoincTasks\rules.xml and add the following after changing the obvious <rule>
    <active>no</active>
    <name>MWempty1</name>
    <computer>s9x00</computer>
    <project>Milkyway@Home</project>
    <application>1.46%20Milkyway@home%20Separation%20(opencl_ati_101)</application>
    <type0>12</type0>
    <type1>0</type1>
    <type2>0</type2>
    <operator0>3</operator0>
    <operator1>0</operator1>
    <operator2>0</operator2>
    <ivalue0>1</ivalue0>
    <ivalue1>-1</ivalue1>
    <ivalue2>-1</ivalue2>
    <isnooze>0</isnooze>
    <dvalue0>-1.000000</dvalue0>
    <dvalue1>-1.000000</dvalue1>
    <dvalue2>-1.000000</dvalue2>
    <itime>150</itime>
    <color>6569215</color>
    <event_show>2</event_show>
    <event_internal>0</event_internal>
    <event_external>3</event_external>
    <event_parameters>d:\up-s9000.bat</event_parameters>
</rule>There is something else you can do if you can compile the boinc client. in the program "fetch_work.cpp" at function void RSC_PROJECT_WORK_FETCH::resource_backoff(PROJECT* p, const char* name) {
change 
    double x = (.5 + drand())*backoff_interval;
    backoff_time = gstate.now + x;
by replacing x with 160 or maybe 180. I had to experiment and I added a command line variable to the client to make it easy. This only work for milkyway. I did not bother to see if there are any other projects being served so it would be a bad idea to use 160 seconds on other projects. 
    double x = (.5 + drand())*backoff_interval;
    if(gstate.mwbuffer != -1)
    {
       x = gstate.mwbuffer; // jys
    }
    backoff_time = gstate.now + x;
 | 
|  Send message Joined: 5 Jul 11 Posts: 993 Credit: 377,122,459 RAC: 1,002     | 
 You have 0 under "Time", it needs to be 160 seconds at least else you get the "last request too soon" I've changed it to 1 minute 40 seconds, as the MW backoff appears to be 1 minute 31 seconds judging by "Project requested delay of 91 seconds " in the messages. At "value" you need to have a positive number as BT has to see the project is empty for at least one second. I don't understand, I already have a positive value don't I? Of 1 second. As in "if less than 1 second of work left, then run the program". | 
|  Joseph Stateson  Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,463,985,753 RAC: 25     | 
 You have 0 under "Time", it needs to be 160 seconds at least else you get the "last request too soon" I posted too soon and was unable to correct before you replied. My fingers get ahead of my thoughts. I have not figured out which key does it, but it is possible to post without clicking the OK button with the mouse. Should work with "00d,00:02:30" for the time field and value of 1 is ok. Make sure you run the "Check" button at least once and then make sure it is active. | 
|  Send message Joined: 5 Jul 11 Posts: 993 Credit: 377,122,459 RAC: 1,002     | 
 I posted too soon and was unable to correct before you replied. My fingers get ahead of my thoughts. I have not figured out which key does it, but it is possible to post without clicking the OK button with the mouse. I've never managed to post with a key on this forum, but I've had a similar problem with emails in Opera. By default it has HUGE numbers of keystrokes assigned to tasks. It's very badly designed, as single keypresses do stuff. I would prefer always something like CTRL-F to make something happen. With single keypresses, they can occur when I think I'm typing an email, and because the wrong thing was selected, I've now performed 10 unknown tasks instead of typing a sentence. Why 2:30? AFAIK the timer on MW is 1:31. The active tick annoyed me. So many things in Boinc and Boinctasks you can set something up then you have to activate it too. Nothing is sensible anymore. I remember when you set things up in a dialog box then pressed ok or cancel. Nowadays ok is assumed in windows, for example there's no ok or cancel in control panel anymore, you have to hope it saved it. And if you changed your mind, tough! Anyway, nothing you've suggested I think will cause it to run the program. I wasn't getting a "too soon" complaint from MW, it just wasn't starting the batch file for some reason. But it did start it a lot later after some minutes, not sure why. Something is causing it not to immediately detect a lack of WUs in MW. | 
|  Joseph Stateson  Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,463,985,753 RAC: 25     | 
 I posted too soon and was unable to correct before you replied. My fingers get ahead of my thoughts. I have not figured out which key does it, but it is possible to post without clicking the OK button with the mouse. I am not privy to inner workings of BT but I believe it works on a transition from having tasks to not having so it wont bother calling your batch file if it never saw any work units in the first place. As a consequence it does not sent out subsequent commands to run the batch file if nothing shows up again (project went off line). | 
|  Send message Joined: 5 Jul 11 Posts: 993 Credit: 377,122,459 RAC: 1,002     | 
 I am not privy to inner workings of BT but I believe it works on a transition from having tasks to not having so it wont bother calling your batch file if it never saw any work units in the first place. As a consequence it does not sent out subsequent commands to run the batch file if nothing shows up again (project went off line). I tried to test it by aborting all tasks in progress, maybe I confused it, I'll just have to wait until it runs out in the normal way. This only happens every 6 hours, and not if I play a game which pauses MW and makes it download more anyway, so it may be a while until I catch it running out. I'll certainly notice the DOS box appearing if I'm sat in front of the machine. | 
|  Joseph Stateson  Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,463,985,753 RAC: 25     | 
 Hey guys, Not sure where those number came from but I have never seen more than 900 work units on my 6 GPU system but get 600 on my two GPU system. I tried to figure this out as follows: GPU Cores Thread Number acquired How calculated --- ------- --------- --------------- ------------------ 6 4 8 900 have no idea 2 12 24 600 ditto However, all the effort put into this project and others (especially SETI) to prevent overloading the server by throttling users is for naught when users can "spoof" the number of GPUs. One can even get around the maximum number of task per device by having multiple clients on the same system. | 
|  Send message Joined: 5 Jul 11 Posts: 993 Credit: 377,122,459 RAC: 1,002     | 
 Not sure where those number came from but I have never seen more than 900 work units on my 6 GPU system but get 600 on my two GPU system. The limits are 300 per GPU, and 900 per host. The per GPU ignores what projects you're doing, eg. I have 2 GPUs, one only runs Einstein and one only MW, but MW thinks there are 2, so I get 600. I may have to use multiple clients when I get my 20 GPU rig running. 900 WUs for 20 GPUs would run out very quickly, although that would be fine if they fixed the problem of not getting new work until you've stopped reporting the completed ones, otherwise I'd have them sat idle quite often. | 
|  Joseph Stateson  Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,463,985,753 RAC: 25     | 
 I tried to test it by aborting all tasks in progress, maybe I confused it, That should have run your batch file. You might be able to debug the problem by picking another project and setting resources to 0 so only one task at a time. Wont have to wait several hours unless you pick gpugrid. However, it wont work if the project sends a second one before the first one finishes. Plants-vs-zombies can run with milkyway. no problem. It can be made interesting if you try to pass all the rooftop level with only 1 sun. If Interested I have a list of challenges for PVZ you can try. I cannot play any FPS due to motion sickness so I stick to PVZ and the win7 spider solitaire. | 
|  Send message Joined: 5 Jul 11 Posts: 993 Credit: 377,122,459 RAC: 1,002     | 
 I play Sims 3 and Civilisation 6 rising tide.  Sims 3 needs the GPUs off or it gets jerky.  Civilisation needs the GPUs and CPU off or it gets jerky.  But I'll shortly be moving the powerful card onto another machine in my garage on the other end of a 10Gbit link, just leaving the rubbish one on here which is fine for games, just no good at double precision Boinc.  That way I don't hear the fans from it, and also it can run 24/7 even when I play games. | 
|  Joseph Stateson  Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,463,985,753 RAC: 25     | 
 I may have to use multiple clients when I get my 20 GPU rig running. 900 WUs for 20 GPUs would run out very quickly, although that would be fine if they fixed the problem of not getting new work until you've stopped reporting the completed ones, otherwise I'd have them sat idle quite often. I agree, it would be best if the project would upload a few new tasks each time results get downloaded. From my own experience building the client under both win & Linux, I know it is difficult to figure out what is going on plus the latest usable windows compiler is VS-2013. It is really difficult to maintain source code for all the different systems that boinc can run on and I didn't appreciate the effort until I tried making a few changes. I bought up some feature I thought might be useful here but the head-shed didn't seem to like it. https://github.com/BOINC/boinc/issues/3337 I was able to add the following to the boinc client 
   --force_hostname <name>        use this as hostname
    --set_password <password>      rpc gui password
    --mwBackoff N                  seconds to force project backoff
    --spoof_gpus N                 fake number of gpus
However, my "spoof_gpus" is NOT the same as the ones the SETI GPU users group have in their secret boinc app. It just allowed a single client to claim ownership for all the actual GPUs instead of just the one it is assigned in cc_config after I excluded the others. I am not sure if I need this but I noticed it got me a lot more than I expected for a new client. It will not get more than 900 from MW in any event. I have put together a script that will create any number of clients on a single system. It is designed for SETI only and I plan to use it on the next SETI CRUNCH EVENT as I found that some users were archiving and processing work units months before the event started. I don't plan on using my script here, only next year at that SETI crunch-a-thron. | 
 
        
        ©2025 Astroinformatics Group