Welcome to MilkyWay@home

Posts by Keith Myers

1) Message boards : Number crunching : MilkyWay takes a backseat to Einstein ??? (Message 68965)
Posted 9 days ago by ProfileKeith Myers
Post:
You might try setting a 1million to 1 resource share for MilkyWay:Einstein. Still probably won't work. Einstein will still overload the computer with more work and swamp any request for work from MilkyWay.

The only way I can control Einstein is to get 120 tasks or such and then set NNT. I let MilkyWay grab work continuously with no issues. When Einstein is down under 10 tasks left or so, unset NNT and get another slug of work. Set NNT again.
2) Message boards : Number crunching : Long crunch time on new N-Body simulations? (Message 68946)
Posted 17 days ago by ProfileKeith Myers
Post:
When the run_time greatly exceeds the cpu_time, that indicates a cpu that is overcommitted. Try running fewer tasks. Or reduce the number of background processes that are stealing cpu cycles from the crunching.
3) Message boards : Number crunching : Monitoring of Invalid results on separation run de_modfit_84_bundle4_4s_south4s_1 (Message 68902)
Posted 21 Jul 2019 by ProfileKeith Myers
Post:
Since I have such a large cache and a resource share that is relatively small in relation to Seti, I only process work as it reaches EDF. So still lots of these "bad" de_modfit_84_bundle4_4s_south4s_1 tasks processing through. It will be nice to clear the cache of them and only have good data to crunch.
4) Message boards : Number crunching : Computer details...wrong GPU description (Message 68890)
Posted 10 Jul 2019 by ProfileKeith Myers
Post:
BOINC always identifies a hosts co-processors with the most capable card installed. Which in your case is the R9 390. So nobody that is familiar with BOINC is confused. To see what actual cards are installed in any host requires looking at a reported WU result stderr.txt file which lists all detected gpus and identifies them correctly.
5) Message boards : Number crunching : Monitoring of Invalid results on separation run de_modfit_84_bundle4_4s_south4s_1 (Message 68856)
Posted 13 Jun 2019 by ProfileKeith Myers
Post:
I'll throw my data point out here. Only gpu tasks processed for MilkyWay. Same for all my other projects except for Seti which does cpu work also. Oh, forgot the Raspberry Pi3B+ and the Jetson Nano do Seti cpu work also.
6) Message boards : Number crunching : Monitoring of Invalid results on separation run de_modfit_84_bundle4_4s_south4s_1 (Message 68854)
Posted 13 Jun 2019 by ProfileKeith Myers
Post:
From my ever growing list of invalids, it seems that Linux based clients are failing including MAC clients. Only Windows based clients seem to be succeeding.
7) Message boards : Number crunching : Monitoring of Invalid results on separation run de_modfit_84_bundle4_4s_south4s_1 (Message 68852)
Posted 11 Jun 2019 by ProfileKeith Myers
Post:
I am picking up validate errors on these tasks. I have never had any issues with any of my cards being the problem. Only the client software or the tasks themselves being the problem.
8) Message boards : News : New Separation Runs (Message 68833)
Posted 3 Jun 2019 by ProfileKeith Myers
Post:
If you look at the previous BOINC client major release transitions, the highest .x release we had was for the 6 series at 6.12. We are already past that point release for the 7 series.
DA might in the same camp as Linus Torvalds of Linux fame, that says point releases only go up as high as how many fingers and toes he can count on. :->
9) Message boards : News : New Separation Runs (Message 68828)
Posted 2 Jun 2019 by ProfileKeith Myers
Post:
My best hunch is that there won't be a 7.16.0 release. The BOINC 8.0 Manager/Client milestone looks just as developed with more commits going into it than the 7.16.0 milestone. No expected date on it either. I think the 7.16.0 is actually going to be released as 8.0 as a hunch. But that is just my musings and guesses.
10) Message boards : News : New Separation Runs (Message 68825)
Posted 1 Jun 2019 by ProfileKeith Myers
Post:

I believe the true solution would be to upgrade to the latest 7.15.0 master which has an improved work fetch to prevent idle cores, fetching work when work is returned and not requesting work over excessive scheduler backoff intervals.


How long before you release 7.15 for testing to the test group? I haven't seen it there yet so am guessing you are still working on it, or I missed it.

I don't know. I'm not privy to the information. All I can say is to watch the Milestone tab for 7.16.0. They don't set any calendar date for release. They just make a group decision whether a client is ready for release. I think DA is the final arbiter of when to release but he gets input from all the project administrators and developers.
The 7.16.0 milestone progress is here. https://github.com/BOINC/boinc/projects/14
Says 10 done, 3 in progress.
11) Message boards : News : New Separation Runs (Message 68822)
Posted 1 Jun 2019 by ProfileKeith Myers
Post:


https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4424



Is downgrading to BOINC 7.12 a successful solution then?

Guess I'll try it just in case.

I believe the true solution would be to upgrade to the latest 7.15.0 master which has an improved work fetch to prevent idle cores, fetching work when work is returned and not requesting work over excessive scheduler backoff intervals.

But you need to compile the master yourself or get the latest appveyor client artifact and try running that. You can download the Windows client appveyor artifact here.
https://ci.appveyor.com/api/buildjobs/4qe96bvaqh4cklou/artifacts/deploy%2Fwin-client%2Fwin-client_PR3169_2019-05-29_4928cc96.7z

I run Linux so can't make any valid statement about how Windows behaves, but the work fetch has been significantly improved since the release of the official 7.14.2 client.

I don't have any issues getting MilkyWay work, or work from any project for that matter and maintaining my cache levels. As long as work is to be found, I get it when I request it. Setting the sched_ops flag and/or also the work_fetch_debug flag logging options for the Event Log will show how much work is requested at each scheduler connection and also the shortfall on each project and how busy BOINC thinks your client is among all attached projects. Those logging options point out very well why a client does not request work.
12) Message boards : Number crunching : Large surge of Invalid results and Validate errors on ALL machines (Message 68793)
Posted 28 May 2019 by ProfileKeith Myers
Post:
I compile the current BOINC master from source. The master is staging the development 7.15.0 platform. BOINC does not release odd number versions. The developers are testing out all the fixes in 7.15.0 for eventual freeze to be released to public as the next 7.16.0 release.

I needed the latest master because it contains the fix for my documented problem of max_concurrent being incompatible with gpu_exclude. I discovered that when I needed to block my Turing cards from GPUGrid.net since they don't have an app that works with Turing. But that fix broke work fetch so then work fetch had to be fixed. The latest master also contains the much needed fix for the "finish file present too long" error which has been around forever. Everything working fine now.

Finally I also modify the code to spoof more gpus than physical to get a larger cache for Seti.
13) Message boards : Number crunching : Large surge of Invalid results and Validate errors on ALL machines (Message 68783)
Posted 26 May 2019 by ProfileKeith Myers
Post:
My validate errors seems to be of one specific work unit type -- the de_modfit_84_xxxxx series (crunching GPU only for MilkyWay). The vast majority of these work units result in validate errors on both of my hosts. The only other consistency I see (I am not a coder) is that both my hosts are Linux (Mint) and when looking at the wingmen who end up validating the WU, they are all Windows. I have seen several cases where another Linux host has failed validation on the same work unit. All the other MilkyWay GPU WUs seem to be doing fine.

Hopefully this helps, or someone can point me towards an individual solution.

For the moment, I am attempting to abort the xxx_84_xxx WUs when I see them; lots of computing time not useful otherwise.

Have we found another corner case where the parameter string is too long for the BOINC client? This was the case for BOINC versions earlier than 7.6.31. I had to abort all 4 bundle tasks and only run 6 bundle tasks when I was running BOINC version 7.4.44 or all the 4 bundle tasks would fail. It got to be too much work managing aborting work so I just acquiesced and updated to BOINC 7.8.3.
It was explained to me that the problem was discovered a long time ago because the parameter string was too long for the older clients. I posted first about this error in New Linux system trashes all tasks and the reason why they fail was provided by AlanB https://milkyway.cs.rpi.edu/milkyway/show_user.php?userid=94054
in message https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4288&postid=67510
14) Questions and Answers : Unix/Linux : MilyWay OpenCL applications fails to build wisdom file and all tasks error in Ubuntu 18.04 LTS (Message 68764)
Posted 20 May 2019 by ProfileKeith Myers
Post:
Just to make closure on this thread. The problem is that MW@home doesn't correctly support BOINC versions earlier than 7.6.31. So to prevent these types of errors you need to use a modern client that can handle the very long parameter strings of certain tasks.
15) Questions and Answers : Unix/Linux : Running 3 tasks only (Message 68763)
Posted 20 May 2019 by ProfileKeith Myers
Post:
Not sure if those are the correct names. You need to use however those projects are labelled in your client_state.xml under the app section. For example Separation is labelled simply as milkyway.
<app>
    <name>milkyway</name>
    <user_friendly_name>MilkyWay@Home</user_friendly_name>


Since I've never done N-body, I am not sure what the app name for it is. Likely just nbody.

Other than that correction, that is the correct way to limit max_concurrent to each project.
16) Message boards : Number crunching : Long crunch time on new N-Body simulations? (Message 68756)
Posted 19 May 2019 by ProfileKeith Myers
Post:
BOINC client keeps a running average of completion times to estimate completion and the old runtimes outweigh the new runtimes in the average.

I think if you set, in cc_config.xml, <rec_half_life_days>0</rec_half_life_days> then restart BOINC and run it for an hour then set it back to default 10 days <rec_half_life_days>10</rec_half_life_days>, you'll reset the running averages (of all WU's) and it should be close to the right number in 24 hours.

I don't usually worry about the estimate (it's usually always wrong) and so haven't tested this.


This is the correct way to update the estimated times. Or if you want an estimate that changes more rapidly based on a quickly changing data mix, set <rec_half_life_days>10</rec_half_life_days> to <rec_half_life_days>1</rec_half_life_days> and your estimates will only average over the last day.

That is what I run on my clients since Seti has a fairly diverse data mix that changes daily.
17) Message boards : Number crunching : So no way to select project campaigns anymore on the new server code (Message 68714)
Posted 9 May 2019 by ProfileKeith Myers
Post:
System administrator Eric has fixed the project preferences to restore the ability to select campaigns again. Kudos.
18) Message boards : News : New runs of MilkyWay Nbody out (Message 68713)
Posted 9 May 2019 by ProfileKeith Myers
Post:
Thanks Eric for the effort to keep digging into the code to fix the preferences the way they were before the upgrade.
19) Message boards : News : New runs of MilkyWay Nbody out (Message 68704)
Posted 7 May 2019 by ProfileKeith Myers
Post:
Other than asking for help from other administrators like I suggested in https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4439&postid=68651
I can't offer much further help as I have never compiled the server code before, just the client and manager.
20) Message boards : Number crunching : So no way to select project campaigns anymore on the new server code (Message 68696)
Posted 5 May 2019 by ProfileKeith Myers
Post:
The QC Chemistry app is only working well for Linux hosts. They are still debugging a QC Chemistry app for Windows hosts.


Next 20

©2019 Astroinformatics Group