Welcome to MilkyWay@home

Posts by Vortac

1) Message boards : Number crunching : Large surge of Invalid results and Validate errors on ALL machines (Message 68754)
Posted 2 days ago by Vortac
Post:
I've been checking the top hosts and noticed thousands of Invalid results on ALL machines, including mine. They started to appear in numbers four days ago and have been only increasing since then. Since it's impossible that all machines have gone off at the same time, I guess it's a validation problem of some sort which needs an urgent fix - we are wasting a lot of computation power now.
2) Message boards : Number crunching : WUs not downloaded in time - rig is idling - doing no work ... (Message 68556)
Posted 17 Apr 2019 by Vortac
Post:
Yes, for some reason new work is assigned ONLY when the current queue is completely empty - all tasks must be completed AND reported. After that, server will assign new work upon next contact, but for a few minutes the client is idle. I have a backup BOINC project which kicks in during that period, but would prefer to crunch only Milkyway (if the queue was maintained).

Perhaps it's necessary from a scientific standpoint i.e. new tasks are created according to results from previous tasks? Obviously, if that's the case, all previous tasks must be sorted out first.
3) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68518)
Posted 11 Apr 2019 by Vortac
Post:
This is how a successful RPC with the Milkyway server looks in the Event Log. No tasks are reported (because the queue is completely empty by now) but 200 new tasks are assigned:

11/04/2019 18:56:52 | Milkyway@Home | [work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 0.00 nidle_now 0.20 fetch share 1.00 req_inst 1.00 req_secs 129416.79
11/04/2019 18:56:52 | Milkyway@Home | [sched_op] Starting scheduler request
11/04/2019 18:56:52 | Milkyway@Home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (129416.79 sec, 1.00 inst)
11/04/2019 18:56:52 | Milkyway@Home | Sending scheduler request: Requested by user.
11/04/2019 18:56:52 | Milkyway@Home | Requesting new tasks for NVIDIA GPU
11/04/2019 18:56:52 | Milkyway@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
11/04/2019 18:56:52 | Milkyway@Home | [sched_op] NVIDIA GPU work request: 129416.79 seconds; 1.00 devices
11/04/2019 18:56:56 | | [work_fetch] Request work fetch: project finished uploading
11/04/2019 18:56:56 | Milkyway@Home | Scheduler request completed: got 200 new tasks
11/04/2019 18:56:56 | Milkyway@Home | [sched_op] Server version 713
11/04/2019 18:56:56 | Milkyway@Home | Project requested delay of 91 seconds
11/04/2019 18:56:56 | Milkyway@Home | [sched_op] estimated total CPU task duration: 0 seconds
11/04/2019 18:56:56 | Milkyway@Home | [sched_op] estimated total NVIDIA GPU task duration: 12201 seconds
11/04/2019 18:56:56 | Milkyway@Home | [sched_op] Deferring communication for 00:01:31
11/04/2019 18:56:56 | Milkyway@Home | [sched_op] Reason: requested by project
11/04/2019 18:56:56 | | [work_fetch] Request work fetch: RPC complete
4) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68516)
Posted 11 Apr 2019 by Vortac
Post:
Bill, I looked at your logs and I think this problem is completely different. For some reason, your client requests 0 secs of GPU work for SETI@home - and receives the same. But in this case (Milkyway@home), client requests lots of GPU work, however none is assigned from the server:

11/04/2019 18:18:06 | Milkyway@Home | [work_fetch] set_request() for NVIDIA GPU: ninst 1 nused_total 32.87 nidle_now 0.00 fetch share 1.00 req_inst 0.00 req_secs 127884.51
11/04/2019 18:18:06 | Milkyway@Home | [sched_op] Starting scheduler request
11/04/2019 18:18:06 | Milkyway@Home | [work_fetch] request: CPU (0.00 sec, 0.00 inst) NVIDIA GPU (127884.51 sec, 0.00 inst)
11/04/2019 18:18:06 | Milkyway@Home | Sending scheduler request: To fetch work.
11/04/2019 18:18:06 | Milkyway@Home | Reporting 6 completed tasks
11/04/2019 18:18:06 | Milkyway@Home | Requesting new tasks for NVIDIA GPU
11/04/2019 18:18:06 | Milkyway@Home | [sched_op] CPU work request: 0.00 seconds; 0.00 devices
11/04/2019 18:18:06 | Milkyway@Home | [sched_op] NVIDIA GPU work request: 127884.51 seconds; 0.00 devices
11/04/2019 18:18:09 | Milkyway@Home | Scheduler request completed: got 0 new tasks
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] Server version 713
11/04/2019 18:18:09 | Milkyway@Home | Project requested delay of 91 seconds
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] handle_scheduler_reply(): got ack for task de_modfit_82_bundle5_3s_NoContraintsWithDisk200_6_1554915636_381123_0
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] handle_scheduler_reply(): got ack for task de_modfit_82_bundle5_3s_NoContraintsWithDisk200_6_1554915636_381183_0
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] handle_scheduler_reply(): got ack for task de_modfit_82_bundle5_3s_NoContraintsWithDisk200_6_1554915636_381167_0
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] handle_scheduler_reply(): got ack for task de_modfit_82_bundle5_3s_NoContraintsWithDisk200_6_1554915636_381153_0
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] handle_scheduler_reply(): got ack for task de_modfit_82_bundle5_3s_NoContraintsWithDisk200_6_1554915636_381131_0
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] handle_scheduler_reply(): got ack for task de_modfit_82_bundle5_3s_NoContraintsWithDisk200_6_1554915636_381217_0
11/04/2019 18:18:09 | Milkyway@Home | [work_fetch] backing off NVIDIA GPU 699 sec
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] Deferring communication for 00:01:31
11/04/2019 18:18:09 | Milkyway@Home | [sched_op] Reason: requested by project
11/04/2019 18:18:09 | | [work_fetch] Request work fetch: RPC complete
5) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68510)
Posted 10 Apr 2019 by Vortac
Post:
I'm equally stumped here. My only thought is that sometimes we get unlucky and you request work when another group of people have also request/before the feeder can refill the queue. I'm going to do a little thinking before I try to implement any solutions to this issue for now.

Nah, I don't think it's up to luck. After the queue is completely emptied (all tasks completed AND reported), it always gets refilled with new 200 tasks the first time client contacts the server. But there's always a few mins of idle, from the time the queue is completely emptied to the time when client contacts the server and gets a new set of 200 tasks. Few minutes are not a big deal obviously, but it would be interesting to track down the exact issue, haven't seen anything similar before.
6) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68505)
Posted 10 Apr 2019 by Vortac
Post:
Alright, I can see in my sched_reply_milkyway.cs.rpi.edu_milkyway.xml that next_rpc_delay is now deleted. The server is working much more smoothly now.

But that old problem remains - it's possible to get 200 tasks now, but the client goes through them without getting any new ones. It reports XX completed tasks about every 90 secs and it requests new work every time, but it doesn't get any. Have no idea what is causing this problem, but it's probably unrelated with next_rpc_delay. Jake, perhaps you can try increasing request_delay from 91 secs to 180 or so, maybe it would help? It's just a wild guess, I can't think of anything else, perhaps someone more knowledgeable will be of more help.
7) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68502)
Posted 10 Apr 2019 by Vortac
Post:
According to BOINC Wiki, next_rpc_delay means: "Make another RPC ASAP after this amount of time elapses".
So, with next_rpc_delay of 180 secs, we are forcing ALL clients to contact server every 3 mins, even if they have nothing to report. Looks like a huge burden on the server.

I have checked the settings of my other BOINC projects (in ProgramData\BOINC) and apparently none use the next_rpc_delay setting at all, they use only request_delay. Perhaps we would be better off with just a request_delay setting of 180 secs and next_rpc_delay unspecified?
8) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68496)
Posted 10 Apr 2019 by Vortac
Post:
Looks like rpc_delay of 90 secs is a bit hard on the server? Since yesterday, my Event Log is showing a lot of these:

10/04/2019 12:09:37 | Project communication failed: attempting access to reference site
10/04/2019 12:09:37 | Milkyway@Home | Scheduler request failed: Failure when receiving data from the peer
10/04/2019 12:09:38 | Internet access OK - project servers may be temporarily down.

and these

10/04/2019 12:17:09 | Milkyway@Home | Scheduler request failed: Couldn't connect to server
10/04/2019 12:17:10 | Project communication failed: attempting access to reference site
10/04/2019 12:17:12 | Internet access OK - project servers may be temporarily down
9) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68443)
Posted 28 Mar 2019 by Vortac
Post:
I have been closely monitoring my BOINC machines as well and I can confirm that, despite downloading 200 tasks per request, the app is still running out of work. As wb8ili said, the queue is not maintained and the client goes through those 200 tasks without downloading new ones. The Event Log repeatedly shows something like this:

28/03/2019 20:06:26 | Milkyway@Home | Sending scheduler request: To fetch work.
28/03/2019 20:06:26 | Milkyway@Home | Reporting 8 completed tasks
28/03/2019 20:06:26 | Milkyway@Home | Requesting new tasks for NVIDIA GPU
28/03/2019 20:06:29 | Milkyway@Home | Scheduler request completed: got 0 new tasks

Eventually, after the queue is completely cleared and when there are no more tasks to crunch, new 200 tasks are downloaded - but there is always a period of time during which the client is out of work before the queue is refilled.
10) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68432)
Posted 27 Mar 2019 by Vortac
Post:
Getting 200 tasks now per request, well done Jake. Was only getting 40-50 of them before this fix.
11) Message boards : News : New Server Update (Message 68424)
Posted 27 Mar 2019 by Vortac
Post:
Indeed, my machine with Titan V is running out of work regularly. My other machine with two 7970s (which are much slower) is also running out of work occasionally, but not nearly so often.
12) Message boards : News : New Server Update (Message 68415)
Posted 27 Mar 2019 by Vortac
Post:
It's still hard to obtain enough work. I often get "Scheduler request completed: got 0 new tasks" and "Project has no tasks available".
13) Message boards : Number crunching : Invalids Exit status 0 (0x0) after server came back (Message 68206)
Posted 6 Mar 2019 by Vortac
Post:
Same here - lots of invalids, on all machines. I had zero invalids before the last outage.
14) Message boards : Number crunching : Server can't open database? (Message 68200)
Posted 4 Mar 2019 by Vortac
Post:
I noticed that both seti and milkyway are no longer on the gridcoin whitelist. Both went on the "Excluded Projects" list in March

Those are just occasional glitches. Milkyway and SETI are back already. Collatz will be back soon too, I guess.
15) Message boards : News : Database Maintenance 12-18-2018 - Ended (Message 67956)
Posted 20 Dec 2018 by Vortac
Post:
Database was down again today, for even longer than usual. I hoped this maintenance would fix those problems.
16) Message boards : Number crunching : Titan V (Message 67299)
Posted 3 Apr 2018 by Vortac
Post:
There's some info here:
https://steemit.com/gridcoin/@cautilus/my-quest-to-use-an-nvidia-titan-v-for-boinc-titan-v-and-1080-ti-boinc-benchmarks
17) Message boards : Number crunching : Errors (Message 66647)
Posted 18 Sep 2017 by Vortac
Post:
Hey Everyone,

Sorry, I put up a bad set of runs on Friday. They're down now and I'm going to check out the errors now.

Sorry again,

Jake

Proverbial Friday afternoon, nothing implemented on a Friday afternoon works correctly. The problem usually becomes self-evident right after the last employee has left the building :)
18) Message boards : Number crunching : Errors (Message 66615)
Posted 16 Sep 2017 by Vortac
Post:
Another bad run, most likely.
19) Message boards : Number crunching : Trouble with a Titan Black (Message 66603)
Posted 11 Sep 2017 by Vortac
Post:
My guess is that Titan Black shows its full strength only with CUDA applications.
20) Message boards : Number crunching : AMD VEGA FE (Message 66525)
Posted 6 Jul 2017 by Vortac
Post:
Hello.
What about AMD VEGA FE? Do you thing it would have the right power to crunch on Collatz, Milkyway or Einstein?
Thanks for INFO. I am thinking of purchasing one just pro boinc.

HD7970 (AMD Tahiti) still trumps it in FP64. I guess it would be better suited for Einstein or Collatz.


Next 20

©2019 Astroinformatics Group