Message boards :
News :
Server Downtime March 28, 2022 (12 hours starting 00:00 UTC)
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 15 · Next
Author | Message |
---|---|
Send message Joined: 31 Mar 12 Posts: 96 Credit: 152,502,177 RAC: 0 |
Probably a while more, I am getting tasks generated on the 16th of March :DDoesn't that mean the server has caught up, but we haven't? Server has caught up with work that was sent back. But tasks generated on the 16th of march haven't been sent out yet. Server has done the heavy lifting of validating work that we've completed, now we can compute knowing stuff is running normally |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Server has caught up with work that was sent back. But tasks generated on the 16th of march haven't been sent out yet.Not sure that means anything. Just that work was generated a while ago. The telescope data it's from is probably 6 months ago. |
Send message Joined: 18 Feb 10 Posts: 57 Credit: 222,497,049 RAC: 3,985 |
project: http://milkyway.cs.rpi.edu/milkyway/I can't find "report_delay" in the Boinc configuration files. Where do I put it? This is why I currently let it get the maximum 300 per GPU because there will be a 10 minute gap at the end. That option is only for those that uses a custom made BOINC, also it's only for Linux. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Can I have a big description of this? I'm going to ask in Github for them to put it into normal Boinc, as it would be very useful.project: http://milkyway.cs.rpi.edu/milkyway/I can't find "report_delay" in the Boinc configuration files. Where do I put it? This is why I currently let it get the maximum 300 per GPU because there will be a 10 minute gap at the end. |
Send message Joined: 18 Feb 10 Posts: 57 Credit: 222,497,049 RAC: 3,985 |
That option is only for those that uses a custom made BOINC, also it's only for Linux. Can I have a big description of this? I'm going to ask in Github for them to put it into normal Boinc, as it would be very useful. Well I don't know that much, but maybe Keith Myers can explain things, try send him a PM. |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 555,418,879 RAC: 38,206 |
project: http://milkyway.cs.rpi.edu/milkyway/I can't find "report_delay" in the Boinc configuration files. Where do I put it? This is why I currently let it get the maximum 300 per GPU because there will be a 10 minute gap at the end. It's not in the standard BOINC client. I use our optimized GPUUG team client which has the setting especially for Milkyway. I asked our dev to put it in just for me since I am the only team member that does MW. [Edit] Or initially was. A few newer team members also do MW now. Our client does a lot more to overcome the failures of the standard BOINC client. Setting specific task count sizes is the one everyone uses on all projects. The next most used is the request_min_cooldown setting to choose our own scheduler interval connection intervals. It's pretty ridiculous to ping a cpu only project server every 11 seconds for a scheduler connection as the project default when the tasks take 1-12 hours to complete. With most of my projects I add 2-3 minutes to the stock project interval. And back in the old Seti project days when it was difficult to get enough work to keep our GPUUG team "special sauce" application busy we can also spoof however many cpus or gpus we wanted to build up large cache sizes. There is another MW optimized client by a different developer over at Github that achieves the same thing. https://github.com/JStateson/MilkywayNewWork |
Send message Joined: 21 Feb 22 Posts: 66 Credit: 817,008 RAC: 0 |
I get a date for the WU and another date for the task. I should have been more specific. I'm looking at task date. For Example... I get this info when looking at the workunit : created 15 Mar 2022, 13:16:34 UTC but when I look at the specific task that I have I get : Created 24 Mar 2022, 20:57:13 UTC I'm guessing that a workunit is created, but isn't used to generate a task for the queue until the queue size falls below some threshold . So I'm looking at the task creation (date,time) as I think it is more interesting to see where in the queue we are. A couple of days ago we were as much as 10+ days behind in the queue, and now we are 7-8 days behind. I'm not sure what is "normal" , but it gives me another data point to watch as the system heals. |
Send message Joined: 31 Mar 12 Posts: 96 Credit: 152,502,177 RAC: 0 |
I get a date for the WU and another date for the task. I should have been more specific. I'm looking at task date. I got this task about... 4 hours ago https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=174582283 Created 16 Mar 2022, 3:10:33 UTC And the associated workunit created 15 Mar 2022, 3:29:37 UTC The difference between workunit creation and task creation is due to the transitioner being backed up. Workunits are generated by the work generator typically and is independent of queue size. Tasks are generated by the transitioner and is independent of queue size, it'll keep generating as long as there are workunits without tasks associated. You can see a flaw... |
Send message Joined: 21 Feb 22 Posts: 66 Credit: 817,008 RAC: 0 |
Thanks Kiska for the info. I'm not doing any nbody tasks so it looks like the nbody tasks have a short window between generation and being sent out even though the queue is huge (17 million +). I just got some new separation tasks and it looks like I'm working on tasks created 24 Mar 2022, 21:04:17 UTC . I'm betting there are a ton of resends for March 30, 31 later in the queue as the system finally worked through the backlog of validations. Is there really no limit on queue size? I guess not if the nbody queue was over 18 million. |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
Another day without credits…… |
Send message Joined: 10 Jun 09 Posts: 6 Credit: 10,333,666 RAC: 0 |
Over 17 million WUs "Ready to Send" yet I still can't get any new tasks. What gives ? 02/04/2022 11:17:54 | Milkyway@Home | work fetch resumed by user 02/04/2022 11:17:57 | Milkyway@Home | Sending scheduler request: To fetch work. 02/04/2022 11:17:57 | Milkyway@Home | Requesting new tasks for NVIDIA GPU 02/04/2022 11:18:00 | Milkyway@Home | Scheduler request completed: got 0 new tasks 02/04/2022 11:18:00 | Milkyway@Home | Project requested delay of 91 seconds |
Send message Joined: 31 Mar 12 Posts: 96 Credit: 152,502,177 RAC: 0 |
Over 17 million WUs "Ready to Send" yet I still can't get any new tasks. Let Tom have the weekend off to not focus on the project. Its running fairly well at this time and I just got 100 tasks for my CPU so it may be the case that the buffer hasn't filled completely before everyone has drained it |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
I have had a machine ready for an hour , even tried manually, nothing. Doing other work now. |
Send message Joined: 17 Jun 21 Posts: 5 Credit: 10,492,392 RAC: 14 |
Exactly. Thank you Tom for all your hard work in making the situation stable, have a great weekend and get some rest :) |
Send message Joined: 21 Feb 22 Posts: 66 Credit: 817,008 RAC: 0 |
Been 2 to 3 hours since I've gotten a WU, just to confirm what others are seeing. I've got the machine doing other projects, so no stress, hopefully the server will start giving WUs again on its own at some point. I was working on some resends from March 26 (separation queue) last I had some WUs to work on. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Another day without credits……But think of the big christmas present you'll soon get! |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
Another day without credits……But think of the big christmas present you'll soon get! SOON? I'd hate to have to wait till December .... |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Does it really matter?Another day without credits……But think of the big christmas present you'll soon get! |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
Does it really matter?Another day without credits……But think of the big christmas present you'll soon get! +1 |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
I get the feeling that the server thinks there are 17M jobs ready to send out, so it doesn't make more jobs. However, I cancelled all of those jobs in order to try to clear the validation backlog. I'm not sure where the jobs are stuck, but I will turn things off and reset their transition times, and see if that clears them. The jobs should get removed from the DB once they are cancelled, but they just might not have been transitioned yet because they haven't gone out to volunteers (because they were cancelled) |
©2024 Astroinformatics Group