Message boards :
Number crunching :
new workunit queue size (6)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Send message Joined: 22 Nov 07 Posts: 285 Credit: 1,076,786,368 RAC: 0 |
Travis, As long as you think your server can handle the requests.. This is just 1 machine with a GPU - you will notice it requesting every few seconds, and getting 1 - it is going to destroy your bandwidth - and may cause some issues on the RPC calls on your side.. I am not sure about this, but something to consider. 2/27/2009 19:59:30|Milkyway@home|Sending scheduler request: Requested by user. Requesting 2488366 seconds of work, reporting 1 completed tasks 2/27/2009 19:59:35|Milkyway@home|Scheduler request completed: got 1 new tasks 2/27/2009 19:59:46|Milkyway@home|Sending scheduler request: To report completed tasks. Requesting 2488191 seconds of work, reporting 1 completed tasks 2/27/2009 19:59:51|Milkyway@home|Scheduler request completed: got 1 new tasks 2/27/2009 20:00:02|Milkyway@home|Sending scheduler request: To report completed tasks. Requesting 2488079 seconds of work, reporting 1 completed tasks 2/27/2009 20:00:07|Milkyway@home|Scheduler request completed: got 1 new tasks 2/27/2009 20:00:17|Milkyway@home|Sending scheduler request: To report completed tasks. Requesting 2487999 seconds of work, reporting 1 completed tasks 2/27/2009 20:00:22|Milkyway@home|Scheduler request completed: got 1 new tasks Guess it is time to open up another project to run as well. . |
Send message Joined: 26 Sep 08 Posts: 12 Credit: 1,228,382 RAC: 0 |
Do you have the option to limit DL's to perhaps 10 at once but keep the total limit at 20 per core? That should ease the hits on the scheduler yet leave us with a little better comfort level if we do manage get a full quota, and let us get a partial load to hold us over in the mean time. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I don't know how many GPUs are already in use on this project, but with more likely to be used soon I would think it a good idea to increase the size of each wu, so there will not need to be so many requests. I'd rather have a real fix for the problem, because if we do this the same problem will just show up again when we get more users and starting seeing the same amount of workunit requests... |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Do you have the option to limit DL's to perhaps 10 at once but keep the total limit at 20 per core? That should ease the hits on the scheduler yet leave us with a little better comfort level if we do manage get a full quota, and let us get a partial load to hold us over in the mean time. I was thinking about doing something like this. Going to let the 6 WU queue go for awhile and see if it helps anything first. |
Send message Joined: 19 Jul 08 Posts: 5 Credit: 2,547,855 RAC: 0 |
Up until yesterday my quad had very rarely run out of work. Now, in the last 24 hours, it has run completely dry probably 5 times or more and all I get is the same as others: "Scheduler request completed: got 0 new tasks" |
Send message Joined: 10 Aug 08 Posts: 218 Credit: 41,846,854 RAC: 0 |
I haven't been able to keep my caches full (set for .1 days) for a while now. Cutting back to 6 per core is just going to make it beg even more. Seems to me this would cause more problems for the server instead of it making fewer requests at a longer interval. so I'll help out. Setting all systems to pull from einstein when the server here won't honor requests for work. Not a complaint as I know you got problems, but would rather have something to do than the computers sitting at idle for long stretches. Maybe this will take some of the load off. Probably wouldn't hurt if others did the same thing over the weekend until you got this worked out. (but then again if more people were doing that you would probably think 6 is enough) <smile> |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Up until yesterday my quad had very rarely run out of work. Now, in the last 24 hours, it has run completely dry probably 5 times or more and all I get is the same as others: I've been debugging the new code for the assimilator/validator which lets us do some validation of workunits to keep our searches from getting screwed up by invalid results. This has caused the server to crash quite a few times this evening, so that might be causing a lot of the lack of work. The assimilator/validator doesn't seem to be crashing anymore *fingers crossed* so work availability should be better from here on out. |
Send message Joined: 8 Nov 08 Posts: 178 Credit: 6,140,854 RAC: 0 |
The assimilator/validator doesn't seem to be crashing anymore *fingers crossed* so work availability should be better from here on out. My quad seems to be keeping filled so far. It's running the GPU app, 0.19. |
Send message Joined: 4 Jul 08 Posts: 165 Credit: 364,966 RAC: 0 |
Gday all.. Just saw this result in my task list wondering why it is so??? Task ID 12690510 Name ps_s82_10_394_1235776518_0 Workunit 12383039 Created 27 Feb 2009 23:15:21 UTC Sent 27 Feb 2009 23:15:54 UTC Received 28 Feb 2009 4:51:43 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 21898 Report deadline 2 Mar 2009 23:15:54 UTC CPU time 1027.891 stderr out <core_client_version>6.4.6</core_client_version> <![CDATA[ <stderr_txt> Running Milkyway@home version 0.19 by Gipsel CPU: Genuine Intel(R) CPU T2300 @ 1.66GHz (2 cores/threads) 1.66677 GHz (917ms) WU completed. It took 1027.89 seconds CPU time and 1038.24 seconds wall clock time @ 1.66678 GHz. </stderr_txt> ]]> Validate state Invalid Claimed credit 2.83954728750274 Granted credit 0 application version 0.19 Every other task has been validated correctly dont know why this is the odd one out....... |
Send message Joined: 1 Sep 08 Posts: 520 Credit: 302,524,931 RAC: 0 |
What lowering to 6 does is temporarily (say for maybe a couple of hours) reduce the false (success / 0 new work) messages by replacing them with 'met your CPU limit of 6'. Then, when the completed work drops the queue back from 20 to 6, the same problem pops up again -- but *more* frequently. Now one needs to hit the server for more work almost continuously since it may take about 15 minutes of server pounding for more work to get 45 minutes of work by which time two more work units have completed. I'm still trying to figure out how one can script a 'pound the server continuously script' <rueful smile>.
|
Send message Joined: 1 Sep 08 Posts: 520 Credit: 302,524,931 RAC: 0 |
Yup -- I think you are spot on here.
|
Send message Joined: 4 Dec 07 Posts: 45 Credit: 1,257,904 RAC: 0 |
Feed the beast. Limiting on the workunit queue size to 6 will not stop the beast from starvation. Feed the beast. Holy Mackerel! Call headquarters. Get the lieutenant. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
Gday all.. Just saw this result in my task list wondering why it is so??? Hmmm... Hard to say, but assuming it wasn't just a coincidence the most likely reason is that the backend 'lost' the output file (due to the troubleshooting at MW) for some reason. If there aren't any further repeats of it, then I wouldn't worry about it. Alinator |
Send message Joined: 16 Jan 08 Posts: 18 Credit: 4,111,257 RAC: 0 |
Sorry, but reducing the wu limit to 6 has only made things worse. m4rtyn ******************************* ******************************* |
Send message Joined: 10 Feb 09 Posts: 13 Credit: 1,704,492 RAC: 0 |
2009-02-28 09:22:33|Milkyway@home|Message from server: No work sent 2009-02-28 09:22:33|Milkyway@home|Message from server: (reached per-CPU limit of 6 tasks) I see this info more often than this about getin' 0 new tasks. So it's gettin' better now imo. Anyway i see that more WUs is with granted credit 0. As far as i can see now 0 credit is granted in about 10 % of my new WUs. |
Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0 |
i see that more WUs is with granted credit 0. I'm seeing WUs claiming zero credit and being awarded the credit they jolly well deserve. (OK, it's some GPU which show zero crunching time but I've timed some of them on a stopwatch and they do actually take a few seconds. I mean, I wouldn't even see them if it was zero seconds, would I) |
Send message Joined: 15 May 08 Posts: 7 Credit: 126,077,128 RAC: 0 |
At this moment I see also workunits that are getting q credits. I did not see this before this night. Now it looks like to happen with about 10% of the wu. |
Send message Joined: 15 May 08 Posts: 7 Credit: 126,077,128 RAC: 0 |
I forgot. For me it seems to work more smoothfull. I get less wu but the system is running most of time now. The message reaches cpu limit is ocuring regurly, but this is the setting at this moment. This means that the work available for my serer is at that moment the maximum that it is allowed, so I think this has made the situation more stable. |
Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0 |
I forgot. For me it seems to work more smoothfull. I get less wu but the system is running most of time now. The message reaches cpu limit is ocuring regurly, but this is the setting at this moment. This means that the work available for my serer is at that moment the maximum that it is allowed, so I think this has made the situation more stable. Hallelujah, someone's happy ;) |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 |
Compared to yesterday my rigs (CPU only) seem to be running well, and fed. Yesterday, with the cache at 20 per CPU, I was in the same position as everyone else - zilch. |
©2024 Astroinformatics Group