Welcome to MilkyWay@home

30 Workunit Limit Per Request - Fix Implemented


Advanced search

Message boards : News : 30 Workunit Limit Per Request - Fix Implemented
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Max_Pirx

Send message
Joined: 13 Dec 17
Posts: 9
Credit: 506,141,877
RAC: 1,142,117
500 million credit badge2 year member badge
Message 68497 - Posted: 10 Apr 2019, 12:07:31 UTC - in response to Message 68496.  

Looks like rpc_delay of 90 secs is a bit hard on the server? Since yesterday, my Event Log is showing a lot of these:
.......
........

Same with me, it seems quite hard on the server indeed.
ID: 68497 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Manfred Reiff
Avatar

Send message
Joined: 27 Apr 18
Posts: 11
Credit: 66,322,103
RAC: 97,797
50 million credit badge1 year member badge
Message 68498 - Posted: 10 Apr 2019, 15:11:20 UTC - in response to Message 68495.  
Last modified: 10 Apr 2019, 15:15:04 UTC

Hi Jake,

I don't know which settings you changed on your server(s) but yesterday evening local time (CEST = UTC+2h) I received much, much more new workunits for both of my computers that at present calculate Milkyway@Home. Before that "changes" I only received 20 WUs for my newest computer (Intel Core i9-7900X with 20 processors) and approx. 60 for my upgraded computer (now: Intel Core i7-8700K with 12 processors). I received approx. 600 WUs for each computer, that's 1,200 in total! Just, WOW! A lot of WUs to be calculated until April 22...
Whatever you changed - many thanks!

Notice: At present I'm only working CPU WUs. My GeForce 1080 Ti (Intel Core i9 computer) is currently working on Collitz WUs with some Einstein@Home interruptions.
My second computer (Intel Core i7) is equipped with a GeForce 1070 Ti. The GPU is solely used for Collitz WUs.

Greetings from Remscheid
Manfred
ID: 68498 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 75,271,794
RAC: 0
50 million credit badge6 year member badgeextraordinary contributions badge
Message 68499 - Posted: 10 Apr 2019, 15:25:21 UTC

I'll bump the RPC delay up to 3 minutes. If this is too long, let me know.

Best,

Jake
ID: 68499 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bluestang

Send message
Joined: 13 Oct 16
Posts: 95
Credit: 591,128,859
RAC: 1,204,218
500 million credit badge3 year member badge
Message 68500 - Posted: 10 Apr 2019, 15:47:10 UTC
Last modified: 10 Apr 2019, 15:53:06 UTC

Much better at 3 minutes since it won't be hammering the server with requests. As long as you let us have a nice big cache full of WUs we are all good :)

Although I'm not sure the WU download issue is fixed for GPUs like @Manfred said it is for CPUs. I still can't keep a full cache until they all run out and it downloads them all again???

And getting this alot now:
4/10/2019 11:41:58 AM | Milkyway@Home | Sending scheduler request: Requested by project.
4/10/2019 11:41:58 AM | Milkyway@Home | Reporting 9 completed tasks
4/10/2019 11:41:58 AM | Milkyway@Home | Requesting new tasks for AMD/ATI GPU
4/10/2019 11:42:10 AM | | Project communication failed: attempting access to reference site
4/10/2019 11:42:10 AM | Milkyway@Home | Scheduler request failed: Failure when receiving data from the peer
4/10/2019 11:42:11 AM | | Internet access OK - project servers may be temporarily down.

Doesn't look like stats sites are making contact either, so not sure what's up.
ID: 68500 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
aad

Send message
Joined: 30 Mar 09
Posts: 63
Credit: 500,938,521
RAC: 239,016
500 million credit badge10 year member badge
Message 68501 - Posted: 10 Apr 2019, 16:11:23 UTC
Last modified: 10 Apr 2019, 16:17:24 UTC

Hi Jake,
I get this;
10-4-2019 18:08:24 | Milkyway@Home | Sending scheduler request: Requested by project.
10-4-2019 18:08:24 | Milkyway@Home | Reporting 2 completed tasks
10-4-2019 18:08:24 | Milkyway@Home | Requesting new tasks for AMD/ATI GPU
10-4-2019 18:08:24 | Milkyway@Home | [http] HTTP_OP::init_post(): http://milkyway.cs.rpi.edu/milkyway_cgi/cgi
10-4-2019 18:08:24 | Milkyway@Home | [http] HTTP_OP::libcurl_exec(): ca-bundle set
10-4-2019 18:08:25 | Milkyway@Home | [http] [ID#1] Info:  Connection 14796 seems to be dead!
10-4-2019 18:08:25 | Milkyway@Home | [http] [ID#1] Info:  Closing connection 14796
10-4-2019 18:08:25 | Milkyway@Home | [http] [ID#1] Info:  timeout on name lookup is not supported
10-4-2019 18:08:25 | Milkyway@Home | [http] [ID#1] Info:  Hostname was NOT found in DNS cache
10-4-2019 18:08:25 | Milkyway@Home | [http] [ID#1] Info:    Trying 128.113.126.23...
10-4-2019 18:08:46 | Milkyway@Home | [http] [ID#1] Info:  connect to 128.113.126.23 port 80 failed: Timed out
10-4-2019 18:08:46 | Milkyway@Home | [http] [ID#1] Info:  Failed to connect to milkyway.cs.rpi.edu port 80: Timed out
10-4-2019 18:08:46 | Milkyway@Home | [http] [ID#1] Info:  Closing connection 14797
10-4-2019 18:08:46 | Milkyway@Home | [http] HTTP error: Couldn't connect to server
10-4-2019 18:08:47 | Milkyway@Home | Message from task: 0
10-4-2019 18:08:47 |  | Project communication failed: attempting access to reference site
10-4-2019 18:08:47 |  | [http] HTTP_OP::init_get(): http://www.google.com/
10-4-2019 18:08:47 |  | [http] HTTP_OP::libcurl_exec(): ca-bundle set
10-4-2019 18:08:47 | Milkyway@Home | Computation for task de_modfit_82_bundle5_3s_NoContraintsWithDisk200_6_1554838388_259291_0 finished
10-4-2019 18:08:47 | Milkyway@Home | Starting task de_modfit_82_bundle5_3s_NoContraintsWithDisk200_6_1554838388_259043_0
10-4-2019 18:08:47 | Milkyway@Home | Scheduler request failed: Couldn't connect to server
10-4-2019 18:08:47 |  | [http] [ID#0] Info:  Found bundle for host www.google.com: 0x420e9d0
10-4-2019 18:08:47 |  | [http] [ID#0] Info:  Re-using existing connection! (#14792) with host www.google.com
10-4-2019 18:08:47 |  | [http] [ID#0] Info:  Connected to www.google.com (172.217.168.196) port 80 (#14792)
10-4-2019 18:08:47 |  | [http] [ID#0] Sent header to server: GET / HTTP/1.1
10-4-2019 18:08:47 |  | [http] [ID#0] Sent header to server: User-Agent: BOINC client (windows_x86_64 7.6.9)
10-4-2019 18:08:47 |  | [http] [ID#0] Sent header to server: Host: www.google.com
10-4-2019 18:08:47 |  | [http] [ID#0] Sent header to server: Accept: */*
10-4-2019 18:08:47 |  | [http] [ID#0] Sent header to server: Accept-Encoding: deflate, gzip
10-4-2019 18:08:47 |  | [http] [ID#0] Sent header to server: Content-Type: application/x-www-form-urlencoded
10-4-2019 18:08:47 |  | [http] [ID#0] Sent header to server: Accept-Language: nl_NL
10-4-2019 18:08:47 |  | [http] [ID#0] Sent header to server:
10-4-2019 18:08:47 |  | [http] [ID#0] Received header from server: HTTP/1.1 200 OK
10-4-2019 18:08:47 |  | [http] [ID#0] Received header from server: Date: Wed, 10 Apr 2019 16:08:44 GMT
10-4-2019 18:08:47 |  | [http] [ID#0] Received header from server: Expires: -1
10-4-2019 18:08:47 |  | [http] [ID#0] Received header from server: Cache-Control: private, max-age=0
10-4-2019 18:08:47 |  | [http] [ID#0] Received header from server: Content-Type: text/html; charset=ISO-8859-1
10-4-2019 18:08:47 |  | [http] [ID#0] Received header from server: P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
10-4-2019 18:08:47 |  | [http] [ID#0] Received header from server: Content-Encoding: gzip
10-4-2019 18:08:47 |  | [http] [ID#0] Received header from server: Server: gws
10-4-2019 18:08:47 |  | [http] [ID#0] Received header from server: Content-Length: 5391
10-4-2019 18:08:47 |  | [http] [ID#0] Received header from server: X-XSS-Protection: 0
10-4-2019 18:08:47 |  | [http] [ID#0] Received header from server: X-Frame-Options: SAMEORIGIN
10-4-2019 18:08:47 |  | [http] [ID#0] Received header from server: Set-Cookie: 1P_JAR=2019-04-10-16; expires=Fri, 10-May-2019 16:08:44 GMT; path=/; domain=.google.com
10-4-2019 18:08:47 |  | [http] [ID#0] Received header from server: Set-Cookie: NID=181=bwE86MbCUawsBGBqs_fbVtR-dbTSa4bch72lxHSb7KkM9bxznXqB3nk65g-EKFS_M1mB9LlYUDrSGlqc_-pq6UbxzZAeQxp9LgGNmFXK2NFqR3FjtVseMt_SDO5_oV-GaCnjzLZSwHAWyNtXGIKj_Is-PwPRUgugsACmhglVlDA; expires=Thu, 10-Oct-2019 16:08:44 GMT; path=/; domain=.google.com; HttpOnly
10-4-2019 18:08:47 |  | [http] [ID#0] Received header from server:
10-4-2019 18:08:47 |  | [http] [ID#0] Info:  Connection #14792 to host www.google.com left intact
10-4-2019 18:08:48 |  | Internet access OK - project servers may be temporarily down.

I will set some more debug logs to help you understand the problem
ID: 68501 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 94
Credit: 2,315,965,677
RAC: 2,208,606
2 billion credit badge10 year member badgeextraordinary contributions badge
Message 68502 - Posted: 10 Apr 2019, 16:31:54 UTC

According to BOINC Wiki, next_rpc_delay means: "Make another RPC ASAP after this amount of time elapses".
So, with next_rpc_delay of 180 secs, we are forcing ALL clients to contact server every 3 mins, even if they have nothing to report. Looks like a huge burden on the server.

I have checked the settings of my other BOINC projects (in ProgramData\BOINC) and apparently none use the next_rpc_delay setting at all, they use only request_delay. Perhaps we would be better off with just a request_delay setting of 180 secs and next_rpc_delay unspecified?
ID: 68502 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
aad

Send message
Joined: 30 Mar 09
Posts: 63
Credit: 500,938,521
RAC: 239,016
500 million credit badge10 year member badge
Message 68503 - Posted: 10 Apr 2019, 16:33:49 UTC

Another log;
10-4-2019 18:30:58 |  | [work_fetch] ------- start work fetch state -------
10-4-2019 18:30:58 |  | [work_fetch] target work buffer: 259200.00 + 0.00 sec
10-4-2019 18:30:58 |  | [work_fetch] --- project states ---
10-4-2019 18:30:58 | Milkyway@Home | [work_fetch] REC 119466.096 prio -38.414 can't request work: scheduler RPC backoff (8735.63 sec)
10-4-2019 18:30:58 |  | [work_fetch] --- state for CPU ---
10-4-2019 18:30:58 |  | [work_fetch] shortfall 1386459.51 nidle 0.00 saturated 28117.82 busy 0.00
10-4-2019 18:30:58 | Milkyway@Home | [work_fetch] share 0.000 blocked by project preferences
10-4-2019 18:30:58 |  | [work_fetch] --- state for AMD/ATI GPU ---
10-4-2019 18:30:58 |  | [work_fetch] shortfall 113602.96 nidle 0.00 saturated 144319.80 busy 0.00
10-4-2019 18:30:58 | Milkyway@Home | [work_fetch] share 0.000
10-4-2019 18:30:58 |  | [work_fetch] ------- end work fetch state -------
10-4-2019 18:30:58 |  | [work_fetch] No project chosen for work fetch
10-4-2019 18:31:01 |  | [work_fetch] Request work fetch: Backoff ended for Cosmology@Home
10-4-2019 18:31:04 |  | [work_fetch] ------- start work fetch state -------
10-4-2019 18:31:04 |  | [work_fetch] target work buffer: 259200.00 + 0.00 sec
10-4-2019 18:31:04 |  | [work_fetch] --- project states ---
10-4-2019 18:31:04 | Milkyway@Home | [work_fetch] REC 119466.096 prio -54.314 can't request work: scheduler RPC backoff (8730.30 sec)
10-4-2019 18:31:04 |  | [work_fetch] --- state for CPU ---
10-4-2019 18:31:04 |  | [work_fetch] shortfall 1386444.14 nidle 0.00 saturated 28125.51 busy 0.00
10-4-2019 18:31:04 | Milkyway@Home | [work_fetch] share 0.000 blocked by project preferences
10-4-2019 18:31:04 |  | [work_fetch] --- state for AMD/ATI GPU ---
10-4-2019 18:31:04 |  | [work_fetch] shortfall 113650.18 nidle 0.00 saturated 144237.46 busy 0.00
10-4-2019 18:31:04 | Milkyway@Home | [work_fetch] share 0.000
10-4-2019 18:31:04 |  | [work_fetch] ------- end work fetch state -------
10-4-2019 18:31:05 |  | [http_xfer] [ID#1] HTTP: wrote 2946 bytes
10-4-2019 18:31:05 |  | [work_fetch] Request work fetch: RPC complete
10-4-2019 18:31:10 |  | [work_fetch] ------- start work fetch state -------
10-4-2019 18:31:10 |  | [work_fetch] target work buffer: 259200.00 + 0.00 sec
10-4-2019 18:31:10 |  | [work_fetch] --- project states ---
10-4-2019 18:31:10 | Milkyway@Home | [work_fetch] REC 119466.096 prio -38.405 can't request work: scheduler RPC backoff (8723.90 sec)
10-4-2019 18:31:10 |  | [work_fetch] --- state for CPU ---
10-4-2019 18:31:10 |  | [work_fetch] shortfall 1386446.16 nidle 0.00 saturated 28124.50 busy 0.00
10-4-2019 18:31:10 | Milkyway@Home | [work_fetch] share 0.000 blocked by project preferences
10-4-2019 18:31:10 |  | [work_fetch] --- state for AMD/ATI GPU ---
10-4-2019 18:31:10 |  | [work_fetch] shortfall 113699.39 nidle 0.00 saturated 144155.11 busy 0.00
10-4-2019 18:31:10 | Milkyway@Home | [work_fetch] share 0.000
10-4-2019 18:31:10 |  | [work_fetch] ------- end work fetch state -------
10-4-2019 18:31:10 |  | [work_fetch] No project chosen for work fetch
10-4-2019 18:31:12 |  | [work_fetch] Request work fetch: Backoff ended for Cosmology@Home
10-4-2019 18:31:15 |  | [work_fetch] ------- start work fetch state -------
10-4-2019 18:31:15 |  | [work_fetch] target work buffer: 259200.00 + 0.00 sec
10-4-2019 18:31:15 |  | [work_fetch] --- project states ---
10-4-2019 18:31:15 | Milkyway@Home | [work_fetch] REC 119466.096 prio -54.303 can't request work: scheduler RPC backoff (8718.85 sec)
10-4-2019 18:31:15 |  | [work_fetch] --- state for CPU ---
10-4-2019 18:31:15 |  | [work_fetch] shortfall 1386446.17 nidle 0.00 saturated 28124.49 busy 0.00
10-4-2019 18:31:15 | Milkyway@Home | [work_fetch] share 0.000 blocked by project preferences
10-4-2019 18:31:15 |  | [work_fetch] --- state for AMD/ATI GPU ---
10-4-2019 18:31:15 |  | [work_fetch] shortfall 113741.79 nidle 0.00 saturated 144090.83 busy 0.00
10-4-2019 18:31:15 | Milkyway@Home | [work_fetch] share 0.000
10-4-2019 18:31:15 |  | [work_fetch] ------- end work fetch state -------
10-4-2019 18:31:15 |  | [work_fetch] No project chosen for work fetch
10-4-2019 18:32:00 | Milkyway@Home | Message from task: 0
10-4-2019 18:32:00 |  | [work_fetch] Request work fetch: application exited
10-4-2019 18:32:00 | Milkyway@Home | Computation for task de_modfit_82_bundle5_3s_NoContraintsWithDisk200_6_1554838388_240454_1 finished
10-4-2019 18:32:00 | Milkyway@Home | Starting task de_modfit_82_bundle5_3s_NoContraintsWithDisk200_6_1554838388_188776_1
10-4-2019 18:32:01 |  | [work_fetch] ------- start work fetch state -------
10-4-2019 18:32:01 |  | [work_fetch] target work buffer: 259200.00 + 0.00 sec
10-4-2019 18:32:01 |  | [work_fetch] --- project states ---
10-4-2019 18:32:01 | Milkyway@Home | [work_fetch] REC 119467.655 prio -54.178 can't request work: scheduler RPC backoff (8673.18 sec)
10-4-2019 18:32:01 |  | [work_fetch] --- state for CPU ---
10-4-2019 18:32:01 |  | [work_fetch] shortfall 1386531.14 nidle 0.00 saturated 28082.01 busy 0.00
10-4-2019 18:32:01 | Milkyway@Home | [work_fetch] share 0.000 blocked by project preferences
10-4-2019 18:32:01 |  | [work_fetch] --- state for AMD/ATI GPU ---
10-4-2019 18:32:01 |  | [work_fetch] shortfall 114107.80 nidle 0.00 saturated 143502.34 busy 0.00
10-4-2019 18:32:01 | Milkyway@Home | [work_fetch] share 0.000
10-4-2019 18:32:01 |  | [work_fetch] ------- end work fetch state -------
10-4-2019 18:32:01 |  | [work_fetch] No project chosen for work fetch
ID: 68503 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 94
Credit: 2,315,965,677
RAC: 2,208,606
2 billion credit badge10 year member badgeextraordinary contributions badge
Message 68505 - Posted: 10 Apr 2019, 18:13:14 UTC

Alright, I can see in my sched_reply_milkyway.cs.rpi.edu_milkyway.xml that next_rpc_delay is now deleted. The server is working much more smoothly now.

But that old problem remains - it's possible to get 200 tasks now, but the client goes through them without getting any new ones. It reports XX completed tasks about every 90 secs and it requests new work every time, but it doesn't get any. Have no idea what is causing this problem, but it's probably unrelated with next_rpc_delay. Jake, perhaps you can try increasing request_delay from 91 secs to 180 or so, maybe it would help? It's just a wild guess, I can't think of anything else, perhaps someone more knowledgeable will be of more help.
ID: 68505 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Manfred Reiff
Avatar

Send message
Joined: 27 Apr 18
Posts: 11
Credit: 66,322,103
RAC: 97,797
50 million credit badge1 year member badge
Message 68507 - Posted: 10 Apr 2019, 18:50:16 UTC - in response to Message 68505.  

Yeep, Vortac!

I agree with you. My computers are "running" through all the downloaded workunits and uploading finished WUs is running very smoothly after some upload problems throughout our afternoon hours.
Please, Jake and all the others working for the project, conserve the actual excellent status till ethernity... (if possible)

In the past I changed settings for downloading new M@H WUs from 2 + 2 days (= 4 days) to 2 + 4 or 2 + 5 days or more over and over again. But I did not received more than 20 WUs for my i9-7900X computer for months. The i7-8700K computer always received 50+ new workunits. During server down times my i9 often ran out of workunits. Hopefully that's not important anymore ?!
I hope I can finish most workunits (or all of them) within the time-limits (April 21 and 22).

Greetings from Remscheid
Manfred "Yoda"
ID: 68507 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 75,271,794
RAC: 0
50 million credit badge6 year member badgeextraordinary contributions badge
Message 68508 - Posted: 10 Apr 2019, 18:50:55 UTC

Hey Vortac,

I'm equally stumped here. My only thought is that sometimes we get unlucky and you request work when another group of people have also request/before the feeder can refill the queue. I'm going to do a little thinking before I try to implement any solutions to this issue for now.

Best,

Jake
ID: 68508 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 94
Credit: 2,315,965,677
RAC: 2,208,606
2 billion credit badge10 year member badgeextraordinary contributions badge
Message 68510 - Posted: 10 Apr 2019, 19:18:09 UTC - in response to Message 68508.  

I'm equally stumped here. My only thought is that sometimes we get unlucky and you request work when another group of people have also request/before the feeder can refill the queue. I'm going to do a little thinking before I try to implement any solutions to this issue for now.

Nah, I don't think it's up to luck. After the queue is completely emptied (all tasks completed AND reported), it always gets refilled with new 200 tasks the first time client contacts the server. But there's always a few mins of idle, from the time the queue is completely emptied to the time when client contacts the server and gets a new set of 200 tasks. Few minutes are not a big deal obviously, but it would be interesting to track down the exact issue, haven't seen anything similar before.
ID: 68510 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bluestang

Send message
Joined: 13 Oct 16
Posts: 95
Credit: 591,128,859
RAC: 1,204,218
500 million credit badge3 year member badge
Message 68512 - Posted: 11 Apr 2019, 0:12:10 UTC

Until you do find a solution, can you increase the max allowed WUs in progress at once per GPU from the current 200 (600 max) to maybe double, say 1200 max? That would at least decrease the amount of idle time per day.
ID: 68512 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill
Avatar

Send message
Joined: 8 Jan 18
Posts: 35
Credit: 7,434,844
RAC: 28,311
5 million credit badge2 year member badge
Message 68513 - Posted: 11 Apr 2019, 12:41:18 UTC - in response to Message 68508.  

I'm equally stumped here. My only thought is that sometimes we get unlucky and you request work when another group of people have also request/before the feeder can refill the queue. I'm going to do a little thinking before I try to implement any solutions to this issue for now.

Best,

Jake
I've only been casually lurking in this thread, but now after re-reading through everything I have a couple of questions.

Is everyone that is experiencing this problem running out of tasks for their CPU, GPU, or both? It may be helpful to know if you are restricting the number of CPUs or GPUs that work on tasks.

If you are experiencing this problem with just GPUs, are they AMD GPUs, or Nvidia GPUs?

The reason I ask is that I am having a similar problem with SETI@Home. I can maintain a healthy queue of CPU tasks, but my AMD GPU (Vega 8, part of the Ryzen 3 2200G) will only download one task at a time, if I'm lucky. I have not really played with any settings to attempt to trick it into downloading more tasks.

I did pose this question on the BOINC form here, and the bug is noted and is being worked on (eventually).

I don't know if the problem here at MW is the same one that I am experiencing at SETI. My Ryzen rig only crunches MW as a backup and I have not tested to see if the same problem occurs if I run MW exclusively. However, I thought that I would point out the problem that I have been experiencing in case it is the same one that you are experiencing.
ID: 68513 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 75,271,794
RAC: 0
50 million credit badge6 year member badgeextraordinary contributions badge
Message 68514 - Posted: 11 Apr 2019, 16:13:07 UTC

Hey Everyone,

I bumped up the number of workunits allowed at any given time a little bit. Let me know if that helps.

Jake
ID: 68514 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Manfred Reiff
Avatar

Send message
Joined: 27 Apr 18
Posts: 11
Credit: 66,322,103
RAC: 97,797
50 million credit badge1 year member badge
Message 68515 - Posted: 11 Apr 2019, 16:24:27 UTC - in response to Message 68513.  
Last modified: 11 Apr 2019, 16:25:29 UTC

Hi Bill,

to start with I only calculate CPU based Milkyway@Home tasks. My GPUs are doing Collatz workunits with some Einstein@Home workunits (approx. 10-20 per day) throughout the day.
Until Tuesday I had a lot of problems with Milkyway@Home. I received only 20 workunits at any given time for my computer equipped with an Intel Core i9-7900X with 20 processors. For my other computer (equipped with an Intel Core i7-8700K with 12 processors) I received dozens of workunits. At first I tried to change the amount of downloaded workunits from 2 + 2 days to 2 + 4, 2 + 5 and more but... a failure!
Tuesday evening (local time = UTC+2 hours) I suddenly received hundreds of M@H workunits for both of my comupters (in total: 1,200 workunits) Due to fast CPUs I could decrease the number to 600 till this morning. Unfortunately I didn't disabled downloading new tasks on my i9 machine before I went to work at 6:30am. So I got 280+ new tasks today with a total of actually 900 for both machines. Now I disabled settings to finished these huge amount of tasks first till 23 April.
But thanks to Jake Milkyway@Home servers seem to work much, much better and much more stable than before!!!

Earlier I also worked on both M@H CPU and GPU tasks but actually I'm concentrating on Collatz Conjuncture GPU tasks so I only crunch M@H CPU tasks. The i9 CPU I can work on 20 tasks in parallel. That's enough...

Concerning your problem with SETI@Home I had the same trouble with S@H and Einstein@Home. It needed much patience and weeks of experiments to find a solution that works WITH ME. Maybe my advise may also help you.

My standard setting is "receiving tasks for 2 days plus another 2 days", in total for 4 days. I increased the maximum number of downloaded tasks several times to 2 + 4 days. But it didn't worked. Manual updating wasn't successful, too. So I let do the manager the downloading automatically.
To do this I stopped and closed the manager and RESTARTED (!) Windows 10 (although probably this isn't necessary).
Then I restarted Windows and then the BOINC manager and et voilĂ  new tasks were downloaded. On an other occasion this did not worked. So I waited until the next morning. When I restarted BOINC working it worked.

I also noticed that you have to activate ALL tasks of the respective project in your manager so the manager can download new tasks. I don't know why. If you have deactivated some or all project tasks - donwloading new tasks will fail acc. to my experience.

Under normal conditions a high number of tasks for a single project is not necessary. But I made the experience that S@H, E@H and M@H servers are down frequently for maintainance or because of other reasons for several hours once every week (sometimes only a few hours, sometimes even longer).
From late autuum 2018 till, let's say, February or the beginning of march this year Milkyway@Home had a lot of trouble with its server stability. Servers were down for many days so you ran out of new tasks when you use the standard settings (that's because I increased the maximum number of tasks to be downloaded) and your computer is just consuming electric power without doing anything useful.

I haven't went deeper into the probable options of the projects. I only created special "app_config.xml" files to manipulate project behaviour with the help of more experienced users.

I hope this and the changes Jake made in the meantime since you posted will help.

Best wishes and good luck!
Manfred

BTW... my graphic cards are nVidia GeForce GTX 1080 Ti and GTX 1070 Ti.
ID: 68515 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 94
Credit: 2,315,965,677
RAC: 2,208,606
2 billion credit badge10 year member badgeextraordinary contributions badge
Message 68516 - Posted: 11 Apr 2019, 16:39:07 UTC

Bill, I looked at your logs and I think this problem is completely different. For some reason, your client requests 0 secs of GPU work for SETI@home - and receives the same. But in this case (Milkyway@home), client requests lots of GPU work, however none is assigned from the server:

11/04/2019 18:18:06 | Milkyway@Home | [work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 32.87 nidle_now 0.00 fetch share 1.00 req_inst 0.00 req_secs 127884.51
11/04/2019 18:18:06 | Milkyway@Home | [sched_op] Starting scheduler request
11/04/2019 18:18:06 | Milkyway@Home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (127884.51 sec, 0.00 inst)
11/04/2019 18:18:06 | Milkyway@Home | Sending scheduler request: To fetch work.
11/04/2019 18:18:06 | Milkyway@Home | Reporting 6 completed tasks
11/04/2019 18:18:06 | Milkyway@Home | Requesting new tasks for NVIDIA GPU
11/04/2019 18:18:06 | Milkyway@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
11/04/2019 18:18:06 | Milkyway@Home | [sched_op] NVIDIA GPU work request: 127884.51 seconds; 0.00 devices
11/04/2019 18:18:09 | Milkyway@Home | Scheduler request completed: got 0 new tasks
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] Server version 713
11/04/2019 18:18:09 | Milkyway@Home | Project requested delay of 91 seconds
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] handle_scheduler_reply(): got ack for task de_modfit_82_bundle5_3s_NoContraintsWithDisk200_6_1554915636_381123_0
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] handle_scheduler_reply(): got ack for task de_modfit_82_bundle5_3s_NoContraintsWithDisk200_6_1554915636_381183_0
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] handle_scheduler_reply(): got ack for task de_modfit_82_bundle5_3s_NoContraintsWithDisk200_6_1554915636_381167_0
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] handle_scheduler_reply(): got ack for task de_modfit_82_bundle5_3s_NoContraintsWithDisk200_6_1554915636_381153_0
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] handle_scheduler_reply(): got ack for task de_modfit_82_bundle5_3s_NoContraintsWithDisk200_6_1554915636_381131_0
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] handle_scheduler_reply(): got ack for task de_modfit_82_bundle5_3s_NoContraintsWithDisk200_6_1554915636_381217_0
11/04/2019 18:18:09 | Milkyway@Home | [work_fetch] backing off NVIDIA GPU 699 sec
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] Deferring communication for 00:01:31
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] Reason: requested by project
11/04/2019 18:18:09 | | [work_fetch] Request work fetch: RPC complete
ID: 68516 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill
Avatar

Send message
Joined: 8 Jan 18
Posts: 35
Credit: 7,434,844
RAC: 28,311
5 million credit badge2 year member badge
Message 68517 - Posted: 11 Apr 2019, 16:57:56 UTC

Thanks for the input, Manfred and Vortac. From what I understand in the post on BOINC, the problem I'm experiencing is an artificially high REC (not RAC) due to an incorrect GFLOPS calculation. Perhaps I should have mentioned that before. Regardless, if my problem isn't related to your problem then I suppose it is moot.

Manfred, I did adjust the storage requirements in the past, and it did not help at all. My problem is an actual bug in the code that needs to get corrected. Other people will fix the bug once time allows, so I will just have to wait it out until then.
ID: 68517 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 94
Credit: 2,315,965,677
RAC: 2,208,606
2 billion credit badge10 year member badgeextraordinary contributions badge
Message 68518 - Posted: 11 Apr 2019, 17:08:46 UTC

This is how a successful RPC with the Milkyway server looks in the Event Log. No tasks are reported (because the queue is completely empty by now) but 200 new tasks are assigned:

11/04/2019 18:56:52 | Milkyway@Home | [work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 0.00 nidle_now 0.20 fetch share 1.00 req_inst 1.00 req_secs 129416.79
11/04/2019 18:56:52 | Milkyway@Home | [sched_op] Starting scheduler request
11/04/2019 18:56:52 | Milkyway@Home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (129416.79 sec, 1.00 inst)
11/04/2019 18:56:52 | Milkyway@Home | Sending scheduler request: Requested by user.
11/04/2019 18:56:52 | Milkyway@Home | Requesting new tasks for NVIDIA GPU
11/04/2019 18:56:52 | Milkyway@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
11/04/2019 18:56:52 | Milkyway@Home | [sched_op] NVIDIA GPU work request: 129416.79 seconds; 1.00 devices
11/04/2019 18:56:56 | | [work_fetch] Request work fetch: project finished uploading
11/04/2019 18:56:56 | Milkyway@Home | Scheduler request completed: got 200 new tasks
11/04/2019 18:56:56 | Milkyway@Home | [sched_op] Server version 713
11/04/2019 18:56:56 | Milkyway@Home | Project requested delay of 91 seconds
11/04/2019 18:56:56 | Milkyway@Home | [sched_op] estimated total CPU task duration: 0 seconds
11/04/2019 18:56:56 | Milkyway@Home | [sched_op] estimated total NVIDIA GPU task duration: 12201 seconds
11/04/2019 18:56:56 | Milkyway@Home | [sched_op] Deferring communication for 00:01:31
11/04/2019 18:56:56 | Milkyway@Home | [sched_op] Reason: requested by project
11/04/2019 18:56:56 | | [work_fetch] Request work fetch: RPC complete
ID: 68518 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JHMarshall

Send message
Joined: 24 Jul 12
Posts: 39
Credit: 2,864,563,851
RAC: 6,248,508
2 billion credit badge7 year member badge
Message 68519 - Posted: 11 Apr 2019, 21:08:19 UTC - in response to Message 68518.  

Vortac

Very interesting.

From your log this one worked:

11/04/2019 18:56:52 | Milkyway@Home | [sched_op] NVIDIA GPU work request: 129416.79 seconds; 1.00 devices
11/04/2019 18:56:56 | Milkyway@Home | Scheduler request completed: got 200 new tasks

and this one failed:

11/04/2019 18:18:06 | Milkyway@Home | [sched_op] NVIDIA GPU work request: 127884.51 seconds; 0.00 devices
11/04/2019 18:18:09 | Milkyway@Home | Scheduler request completed: got 0 new tasks

Notice the failed request had 0.00 devices. The question here is why 0 devices?


Joe
ID: 68519 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Hurr1cane78

Send message
Joined: 7 May 14
Posts: 9
Credit: 43,935,664
RAC: 17
30 million credit badge5 year member badge
Message 68562 - Posted: 19 Apr 2019, 8:25:05 UTC
Last modified: 19 Apr 2019, 8:25:50 UTC

hi, im still not getting new wu's after completion? ive have to manually update if I catch it ??
ID: 68562 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : News : 30 Workunit Limit Per Request - Fix Implemented

©2020 Astroinformatics Group