Welcome to MilkyWay@home

Posts by ritterm

1) Message boards : News : Binary Download Issue - Fix Implemented (Message 68402)
Posted 26 Mar 2019 by Profile ritterm
Post:
I have implemented a fix for the binary download issues some people were having...

Which seems to have fixed the problems I was having on a new host -- I now have a couple of WUs running. Thanks, Jake!
2) Message boards : News : Nbody 1.68 release (Message 67093)
Posted 16 Feb 2018 by Profile ritterm
Post:
It looked good for a few days, but I've picked up some v166 tasks recently (see examples).
3) Message boards : Number crunching : MW@H DBase problems (Message 67076)
Posted 13 Feb 2018 by Profile ritterm
Post:
This seems to be happening more and more recently, with multiple outages on some days.
4) Message boards : News : Reducing Workunits to Unreliable Hosts (Message 67074)
Posted 12 Feb 2018 by Profile ritterm
Post:
I'd say we're good to go if 643627 doesn't get new work after it reports its latest batch sometime after 1800 UTC.

Well, it's still being sent new work in spite of a 100% error rate. However, it's getting fewer and fewer new tasks so maybe it will be completely shut down in another couple of days.

Good work, Jake. Thanks for chasing that down for us.
5) Message boards : News : Reducing Workunits to Unreliable Hosts (Message 67073)
Posted 12 Feb 2018 by Profile ritterm
Post:
I am going to consider this issue resolved for now unless anyone objects...

It definitely looks like we're making progress. I'd say we're good to go if 643627 doesn't get new work after it reports its latest batch sometime after 1800 UTC.
6) Message boards : News : Nbody 1.68 release (Message 67062)
Posted 11 Feb 2018 by Profile ritterm
Post:
Sidd wrote:
It seems for some reason, that workunit did exactly that, the binary being used is the nbody v168 but the workunit is from the v166 runs, using the v166 parameter files. Before releasing, I took down the older runs...

Maybe I misunderstand, but v166 tasks are still being sent out.
7) Message boards : News : Nbody 1.68 release (Message 67060)
Posted 11 Feb 2018 by Profile ritterm
Post:
How does this happen?

Stderr output
<core_client_version>7.8.6</core_client_version>
<![CDATA[
<message>
process exited with code 13 (0xd, -243)</message>
<stderr_txt>
<search_application> milkyway_nbody 1.66 Darwin x86_64 double OpenMP, Crlibm </search_application>
Using OpenMP 8 max threads on a system with 8 processors
Application version too old. Workunit requires version 1.68, but this is 1.66
Failed to read input parameters file
04:21:59 (82996): called boinc_finish(13)

</stderr_txt>
]]>
8) Message boards : News : Reducing Workunits to Unreliable Hosts (Message 67033)
Posted 6 Feb 2018 by Profile ritterm
Post:
I will do a little more research into configuring the server better throughout the week and run some more tests...

Any update on this, Jake? I've had another result invalidated due to unreliable wingmen. I guess it doesn't make any difference, but all the hosts I've looked appear to be using GPUs that aren't double precision.
9) Message boards : News : Reducing Workunits to Unreliable Hosts (Message 67032)
Posted 5 Feb 2018 by Profile ritterm
Post:
One of my PCs is not getting any credit for HOST AVERAGE work done, another seems to be working OK. The OK one also posts USER AVERAGE totals which seem to include activity of BOTH of my Milkyway machines as well as what looks like correct HOST AVERAGE totals.

Just so there's no confusion, what does this have to do with unreliable hosts? For your hosts, I see some user aborts and errors in N-body tasks. However, no massive and continuing computation errors like those pointed out earlier in this thread.
10) Message boards : News : Reducing Workunits to Unreliable Hosts (Message 67011)
Posted 29 Jan 2018 by Profile ritterm
Post:
Another otherwise likely valid result of mine invalidated due to unreliable wingmen (628802 and 761112).
11) Message boards : News : Reducing Workunits to Unreliable Hosts (Message 67009)
Posted 27 Jan 2018 by Profile ritterm
Post:
Unfortunately, host 643627 went through another cycle this morning of returning 80 errored tasks and getting 80 new tasks...

And it continues...
12) Message boards : News : Reducing Workunits to Unreliable Hosts (Message 66999)
Posted 23 Jan 2018 by Profile ritterm
Post:
I just tried turning on some options to reduce workunits sent to hosts that return a significant number of errors. If you see any issues, please let me know...

Unfortunately, host 643627 went through another cycle this morning of returning 80 errored tasks and getting 80 new tasks. Even though the host has a history of returning a lot of errors, does it take time for the server to "learn" that it's unreliable?
13) Message boards : Number crunching : Hosts with only invalid results (Message 66984)
Posted 21 Jan 2018 by Profile ritterm
Post:
I've sent a PM to admins Sidd and Jake Weiss asking them to review this thread.
14) Message boards : Number crunching : Hosts with only invalid results (Message 66983)
Posted 21 Jan 2018 by Profile ritterm
Post:
On Workunit 1563422500, all three of my wingmen returned computation errors which caused my result to be marked invalid. Each of their respective hosts (629667, 551062, and 740662) is returning errors, almost entirely.
15) Message boards : Number crunching : Hosts with only invalid results (Message 66979)
Posted 19 Jan 2018 by Profile ritterm
Post:
80 workunits is the max any one gpu can get at a time, they do restrict all of us that way. As we return one we can get another, so as that pc returns an invalid workunit if it gets another and trashes it it could be going thru at least 80 per day, alot more if it connects again once it's out of trashed workunits.

Right. But, there seems to be a 24-hour backoff imposed. It's returning all 80 crashed tasks at once, getting 80 new ones, and repeating that cycle 24 hours later.
16) Message boards : Number crunching : Hosts with only invalid results (Message 66974)
Posted 18 Jan 2018 by Profile ritterm
Post:
Has anyone ever PM'd the owner to inform them their host is producing nothing but invalids?

In my case, I did but got no response.

The host continues to generate nothing but errors. At least it's being limited to 80 tasks/day and is contacting the project only once every 24-hrs. So, it seems like some kind of restriction is being imposed. The user has another host and it's returning valid results.
17) Message boards : News : Validation Inconclusive Errors (Message 66951)
Posted 10 Jan 2018 by Profile ritterm
Post:
For me, the N-Body inconclusives are clearing out nicely. It seems now that it's the MilkyWay@Home app that's lagging...

Looks like progress is being made. All of my inconclusives have wingmen now.
18) Message boards : News : Validation Inconclusive Errors (Message 66949)
Posted 10 Jan 2018 by Profile ritterm
Post:
Can you please list some of the workunits that are marked as inconclusive and unsent?

For me, the N-Body inconclusives are clearing out nicely. It seems now that it's the MilkyWay@Home app that's lagging. Recent examples:

1560210075
1560204482
1560197795
1560195886
1560188186
19) Message boards : News : Validation Inconclusive Errors (Message 66944)
Posted 9 Jan 2018 by Profile ritterm
Post:
And I just noticed I'm building up inconclusives for the MilkyWay@Home app, not just the N-Body, as was the case in the last couple of days.
20) Message boards : Number crunching : Hosts with only invalid results (Message 66942)
Posted 8 Jan 2018 by Profile ritterm
Post:
I've returned to the project after being away for awhile so I'm not sure how much of a problem this has been recently. Regardless, I thought I'd give this thread a bump...

I've only returned a few results so far, but all of my inconclusives for the MilkyWay@Home app (not the N-body) include a wingman whose host (ID 643627) has been returning nothing but errors for at least the last 4 days.

Can't something be done to halt the sending of new work to unreliable hosts?


Next 20

©2024 Astroinformatics Group