Welcome to MilkyWay@home

new workunit queue size (6)

Message boards : Number crunching : new workunit queue size (6)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Profile Kevint
Avatar

Send message
Joined: 22 Nov 07
Posts: 285
Credit: 1,076,786,368
RAC: 0
Message 13211 - Posted: 28 Feb 2009, 3:09:08 UTC
Last modified: 28 Feb 2009, 3:13:57 UTC

Travis,

As long as you think your server can handle the requests..

This is just 1 machine with a GPU - you will notice it requesting every few seconds, and getting 1 - it is going to destroy your bandwidth - and may cause some issues on the RPC calls on your side.. I am not sure about this, but something to consider.


2/27/2009 19:59:30|Milkyway@home|Sending scheduler request: Requested by user. Requesting 2488366 seconds of work, reporting 1 completed tasks
2/27/2009 19:59:35|Milkyway@home|Scheduler request completed: got 1 new tasks
2/27/2009 19:59:46|Milkyway@home|Sending scheduler request: To report completed tasks. Requesting 2488191 seconds of work, reporting 1 completed tasks
2/27/2009 19:59:51|Milkyway@home|Scheduler request completed: got 1 new tasks
2/27/2009 20:00:02|Milkyway@home|Sending scheduler request: To report completed tasks. Requesting 2488079 seconds of work, reporting 1 completed tasks
2/27/2009 20:00:07|Milkyway@home|Scheduler request completed: got 1 new tasks
2/27/2009 20:00:17|Milkyway@home|Sending scheduler request: To report completed tasks. Requesting 2487999 seconds of work, reporting 1 completed tasks
2/27/2009 20:00:22|Milkyway@home|Scheduler request completed: got 1 new tasks




Guess it is time to open up another project to run as well.
.
ID: 13211 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gomeyer
Avatar

Send message
Joined: 26 Sep 08
Posts: 12
Credit: 1,228,382
RAC: 0
Message 13212 - Posted: 28 Feb 2009, 3:10:53 UTC

Do you have the option to limit DL's to perhaps 10 at once but keep the total limit at 20 per core? That should ease the hits on the scheduler yet leave us with a little better comfort level if we do manage get a full quota, and let us get a partial load to hold us over in the mean time.
ID: 13212 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 13215 - Posted: 28 Feb 2009, 3:13:41 UTC - in response to Message 13210.  

I don't know how many GPUs are already in use on this project, but with more likely to be used soon I would think it a good idea to increase the size of each wu, so there will not need to be so many requests.


I'd rather have a real fix for the problem, because if we do this the same problem will just show up again when we get more users and starting seeing the same amount of workunit requests...
ID: 13215 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 13216 - Posted: 28 Feb 2009, 3:14:34 UTC - in response to Message 13212.  

Do you have the option to limit DL's to perhaps 10 at once but keep the total limit at 20 per core? That should ease the hits on the scheduler yet leave us with a little better comfort level if we do manage get a full quota, and let us get a partial load to hold us over in the mean time.


I was thinking about doing something like this. Going to let the 6 WU queue go for awhile and see if it helps anything first.
ID: 13216 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bob in FL

Send message
Joined: 19 Jul 08
Posts: 5
Credit: 2,547,855
RAC: 0
Message 13218 - Posted: 28 Feb 2009, 3:35:50 UTC

Up until yesterday my quad had very rarely run out of work. Now, in the last 24 hours, it has run completely dry probably 5 times or more and all I get is the same as others:

"Scheduler request completed: got 0 new tasks"
ID: 13218 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Arion
Avatar

Send message
Joined: 10 Aug 08
Posts: 218
Credit: 41,846,854
RAC: 0
Message 13219 - Posted: 28 Feb 2009, 3:41:17 UTC - in response to Message 13216.  


I was thinking about doing something like this. Going to let the 6 WU queue go for awhile and see if it helps anything first.


I haven't been able to keep my caches full (set for .1 days) for a while now. Cutting back to 6 per core is just going to make it beg even more. Seems to me this would cause more problems for the server instead of it making fewer requests at a longer interval. so I'll help out. Setting all systems to pull from einstein when the server here won't honor requests for work.

Not a complaint as I know you got problems, but would rather have something to do than the computers sitting at idle for long stretches. Maybe this will take some of the load off. Probably wouldn't hurt if others did the same thing over the weekend until you got this worked out. (but then again if more people were doing that you would probably think 6 is enough) <smile>


ID: 13219 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 13221 - Posted: 28 Feb 2009, 3:44:28 UTC - in response to Message 13218.  

Up until yesterday my quad had very rarely run out of work. Now, in the last 24 hours, it has run completely dry probably 5 times or more and all I get is the same as others:

"Scheduler request completed: got 0 new tasks"


I've been debugging the new code for the assimilator/validator which lets us do some validation of workunits to keep our searches from getting screwed up by invalid results. This has caused the server to crash quite a few times this evening, so that might be causing a lot of the lack of work.

The assimilator/validator doesn't seem to be crashing anymore *fingers crossed* so work availability should be better from here on out.
ID: 13221 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jedirock
Avatar

Send message
Joined: 8 Nov 08
Posts: 178
Credit: 6,140,854
RAC: 0
Message 13223 - Posted: 28 Feb 2009, 4:48:30 UTC - in response to Message 13221.  

The assimilator/validator doesn't seem to be crashing anymore *fingers crossed* so work availability should be better from here on out.

My quad seems to be keeping filled so far. It's running the GPU app, 0.19.
ID: 13223 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Glenn Rogers
Avatar

Send message
Joined: 4 Jul 08
Posts: 165
Credit: 364,966
RAC: 0
Message 13224 - Posted: 28 Feb 2009, 4:59:13 UTC

Gday all.. Just saw this result in my task list wondering why it is so???

Task ID 12690510
Name ps_s82_10_394_1235776518_0
Workunit 12383039
Created 27 Feb 2009 23:15:21 UTC
Sent 27 Feb 2009 23:15:54 UTC
Received 28 Feb 2009 4:51:43 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 21898
Report deadline 2 Mar 2009 23:15:54 UTC
CPU time 1027.891
stderr out <core_client_version>6.4.6</core_client_version>
<![CDATA[
<stderr_txt>
Running Milkyway@home version 0.19 by Gipsel
CPU: Genuine Intel(R) CPU T2300 @ 1.66GHz (2 cores/threads) 1.66677 GHz (917ms)

WU completed. It took 1027.89 seconds CPU time and 1038.24 seconds wall clock time @ 1.66678 GHz.

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 2.83954728750274
Granted credit 0
application version 0.19

Every other task has been validated correctly dont know why this is the odd one out.......
ID: 13224 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 302,524,931
RAC: 2
Message 13225 - Posted: 28 Feb 2009, 6:09:49 UTC - in response to Message 13202.  

What lowering to 6 does is temporarily (say for maybe a couple of hours) reduce the false (success / 0 new work) messages by replacing them with 'met your CPU limit of 6'. Then, when the completed work drops the queue back from 20 to 6, the same problem pops up again -- but *more* frequently. Now one needs to hit the server for more work almost continuously since it may take about 15 minutes of server pounding for more work to get 45 minutes of work by which time two more work units have completed.

I'm still trying to figure out how one can script a 'pound the server continuously script' <rueful smile>.



I'm going to lower the number to 6 and see if that helps.


ID: 13225 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
BarryAZ

Send message
Joined: 1 Sep 08
Posts: 520
Credit: 302,524,931
RAC: 2
Message 13226 - Posted: 28 Feb 2009, 6:10:54 UTC - in response to Message 13219.  

Yup -- I think you are spot on here.


I haven't been able to keep my caches full (set for .1 days) for a while now. Cutting back to 6 per core is just going to make it beg even more. Seems to me this would cause more problems for the server instead of it making fewer requests at a longer interval. so I'll help out. Setting all systems to pull from einstein when the server here won't honor requests for work.

Not a complaint as I know you got problems, but would rather have something to do than the computers sitting at idle for long stretches. Maybe this will take some of the load off. Probably wouldn't hurt if others did the same thing over the weekend until you got this worked out. (but then again if more people were doing that you would probably think 6 is enough) <smile>



ID: 13226 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mscharmack
Avatar

Send message
Joined: 4 Dec 07
Posts: 45
Credit: 1,257,904
RAC: 0
Message 13227 - Posted: 28 Feb 2009, 6:24:52 UTC
Last modified: 28 Feb 2009, 6:42:29 UTC

Feed the beast. Limiting on the workunit queue size to 6 will not stop the beast from starvation. Feed the beast.


Holy Mackerel! Call headquarters. Get the lieutenant.
ID: 13227 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 13228 - Posted: 28 Feb 2009, 6:41:46 UTC - in response to Message 13224.  

Gday all.. Just saw this result in my task list wondering why it is so???

Task ID 12690510
Name ps_s82_10_394_1235776518_0
Workunit 12383039
Created 27 Feb 2009 23:15:21 UTC
Sent 27 Feb 2009 23:15:54 UTC
Received 28 Feb 2009 4:51:43 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 21898
Report deadline 2 Mar 2009 23:15:54 UTC
CPU time 1027.891
stderr out <core_client_version>6.4.6</core_client_version>
<![CDATA[
<stderr_txt>
Running Milkyway@home version 0.19 by Gipsel
CPU: Genuine Intel(R) CPU T2300 @ 1.66GHz (2 cores/threads) 1.66677 GHz (917ms)

WU completed. It took 1027.89 seconds CPU time and 1038.24 seconds wall clock time @ 1.66678 GHz.

</stderr_txt>
]]>

Validate state Invalid
Claimed credit 2.83954728750274
Granted credit 0
application version 0.19

Every other task has been validated correctly dont know why this is the odd one out.......


Hmmm...

Hard to say, but assuming it wasn't just a coincidence the most likely reason is that the backend 'lost' the output file (due to the troubleshooting at MW) for some reason.

If there aren't any further repeats of it, then I wouldn't worry about it.

Alinator
ID: 13228 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile m4rtyn
Avatar

Send message
Joined: 16 Jan 08
Posts: 18
Credit: 4,111,257
RAC: 0
Message 13229 - Posted: 28 Feb 2009, 7:18:57 UTC

Sorry, but reducing the wu limit to 6 has only made things worse.
m4rtyn
******************************* *******************************

ID: 13229 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Riil

Send message
Joined: 10 Feb 09
Posts: 13
Credit: 1,704,492
RAC: 0
Message 13234 - Posted: 28 Feb 2009, 8:26:32 UTC
Last modified: 28 Feb 2009, 8:42:00 UTC

2009-02-28 09:22:33|Milkyway@home|Message from server: No work sent
2009-02-28 09:22:33|Milkyway@home|Message from server: (reached per-CPU limit of 6 tasks)

I see this info more often than this about getin' 0 new tasks. So it's gettin' better now imo. Anyway i see that more WUs is with granted credit 0.
As far as i can see now 0 credit is granted in about 10 % of my new WUs.
ID: 13234 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 13236 - Posted: 28 Feb 2009, 8:50:39 UTC - in response to Message 13234.  

i see that more WUs is with granted credit 0.

I'm seeing WUs claiming zero credit and being awarded the credit they jolly well deserve.

(OK, it's some GPU which show zero crunching time but I've timed some of them on a stopwatch and they do actually take a few seconds. I mean, I wouldn't even see them if it was zero seconds, would I)



ID: 13236 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
etrecords

Send message
Joined: 15 May 08
Posts: 7
Credit: 126,077,128
RAC: 0
Message 13238 - Posted: 28 Feb 2009, 9:18:07 UTC

At this moment I see also workunits that are getting q credits. I did not see this before this night. Now it looks like to happen with about 10% of the wu.
ID: 13238 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
etrecords

Send message
Joined: 15 May 08
Posts: 7
Credit: 126,077,128
RAC: 0
Message 13239 - Posted: 28 Feb 2009, 9:21:17 UTC

I forgot. For me it seems to work more smoothfull. I get less wu but the system is running most of time now. The message reaches cpu limit is ocuring regurly, but this is the setting at this moment. This means that the work available for my serer is at that moment the maximum that it is allowed, so I think this has made the situation more stable.
ID: 13239 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 13241 - Posted: 28 Feb 2009, 9:39:24 UTC - in response to Message 13239.  

I forgot. For me it seems to work more smoothfull. I get less wu but the system is running most of time now. The message reaches cpu limit is ocuring regurly, but this is the setting at this moment. This means that the work available for my serer is at that moment the maximum that it is allowed, so I think this has made the situation more stable.

Hallelujah, someone's happy ;)


ID: 13241 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
Message 13243 - Posted: 28 Feb 2009, 9:44:51 UTC
Last modified: 28 Feb 2009, 9:45:19 UTC

Compared to yesterday my rigs (CPU only) seem to be running well, and fed. Yesterday, with the cache at 20 per CPU, I was in the same position as everyone else - zilch.
ID: 13243 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : Number crunching : new workunit queue size (6)

©2024 Astroinformatics Group