Welcome to MilkyWay@home

Posts by mikey

1) Message boards : Number crunching : scheduler request timeout when reporting more than 1 result at once (Message 77093)
Posted 2 days ago by Profile mikey
Post:
mikey wrote:
Your location is hidden but do you think it could be the internet itself that's not cooperating between you and the MilkyWay Servers in Wisconsin?
I am based in Germany, and I do have internet service of terrible quality via coaxial cable, which is why I prefer >0.5 days deep work buffers. However:

During these situations with scheduler request timeouts between this host and MilkyWay, the MW web site keeps responding very promptly, including dynamic web pages with database access such as the result tables. Second: If I reduce max_tasks_reported as far down as necessary, MW's scheduler responds very quickly to the client again too (to most but not all requests). And third: When I manage to gradually get the number of tasks in progress down again during recovery from these situations, I can gradually increase max_tasks_reported again too.

The correlation between likelihood of scheduler request timeout, max_tasks_reported, and number of tasks in progress is really evident in my observations. (The three occasions reported here were with ~1,800, ~1,600, and ~2,000 tasks in progress when — or a while after — the described condition started.)

Obviously, only large hosts with large work buffer setting will ever receive so many tasks from the server. I, like most everyone else, am running a stock client which does not request more work once there are 1,000 runnable tasks buffered.


I wonder if the Server has a move on to the next client setting after it's been connected more than x seconds? That way most people can get and return their tasks within that x seconds but you, because of your large cache of tasks both needed and returning, takes longer and you get disconnected? Whatever the problem it sounds like it's on the Server side as my 17 desktop pc's can get and return tasks just fine.
2) Message boards : Number crunching : scheduler request timeout when reporting more than 1 result at once (Message 77085)
Posted 3 days ago by Profile mikey
Post:
I didn't run MilkyWay@Home for a week but restarted it half a day ago.

And the problem reappeared today at 00:00 UTC.
This happened:
– At this point, the host had 1003 tasks in progress, reported 7 results, and requested more work.
– The server assigned 1009 ! tasks to the host, and the host had now 2005 tasks in progress.
– From then on, almost all scheduler requests to report results (with max_tasks_reported=20) failed with timeout.
– I discovered this situation at 04:30 UTC, at which time the host still had 1752 tasks in progress, even though a background script had been forcing scheduler requests every 97 seconds as soon as more than 20 results ready to report had been piling up.

To recover, I switched temporarily to max_tasks_reported=1, set the project to no new work, and engaged a script to report a result every ten seconds. Most of these requests succeed, but occasionally even these single-result reports time out.

I will let the computer run another project while I am away from home today. After that I will perhaps set up multiple client instances on this computer in order to be able to maintain about half a day work buffer depth on it without having so many tasks in progress per client.


Your location is hidden but do you think it could be the internet itself that's not cooperating between you and the MilkyWay Servers in Wisconsin? There are alot of reports on the various News sites about internet cables are physically down meaning routing options are limited at times. My thought is can you do an IP scan of the MilkyWay Server when it's not cooperating and see if it's up and running normally or if it's slow too.
3) Message boards : Number crunching : No Milkyway@home N-Body Simulation Tasks (Message 77076)
Posted 10 days ago by Profile mikey
Post:
Currently there are only "Milkyway@home N-Body Simulation with Orbit Fitting tasks" available, which is basically a newer version of the N-Body application. Simply enable both, they use about same amount of memory and same amount of threads, so you shouldn't run into any issues, you might however need to update your app_config.xml if that's what you are using to limit the number of cores per task.


I hope they bring back the regular N-Body Simulation tasks, I'm almost to a milestone and would hate to miss it.
4) Message boards : News : Admin Updates Discussion (Message 77069)
Posted 11 days ago by Profile mikey
Post:
Link wrote:
now there are 4 ST tasks waiting to run while 8 2-thread tasks are running. Doesn't make sense at all.
In my experience, the BOINC client often resorts to put various tasks into waiting state — or starts tasks from the queue of downloaded tasks in an order which is unexpected to the user — whenever it is faced with a mixture of tasks with various thread counts in its buffer. Obviously the client has to have a scheduling algorithm which aims among else for good host utilization. The outcome of this algorithm may not always be what the user believes to be expected or optimal.

--------

On the new settings: This is what I *think* what they do:

"Max # jobs" — means "limit of tasks in progress" (or more precisely said, limit of MilkyWay@Home tasks in progress)

"Max # CPUs" — means "thread count per task" (or more precisely said, thread count per MilkyWay@Home task)

"Max # CPUs = "No limit" — The server assigns a mixture of tasks for the single-threaded app_version and for the multithreaded app_version to the host. The choice of thread count of MT tasks is left to the application. (The MT app_version behaves as before the introduction of the Max # CPUs preferences setting). Standard(?) BOINC server behaviour is to watch the computing performance of each app_version on the host, and from some point on assign only tasks of the better performing app_version to the host.

"Max # CPUs = 1" — The server assigns only tasks for the single-threaded app_version to the host.

"Max # CPUs = 2…256" — The server assigns only tasks for the multi-threaded app_version to the host. Furthermore, the application's built in upper limit of thread count per task is overridden by this setting.

(Again, that's only what I *think* how it works.)
On top of that, the user can still override the thread count of the multithreaded app_version by means of app_config.xml, or/and enforce the use of desired application binaries by means of app_info.xml.


Wouldn't it be far easier for MilkyWay to just add a check box for the single and MT tasks, so we can check one or both of them and let Boinc figure it out. That way everyone with an app_config file wouldn't have to do anything at all, except uncheck the ST tasks box if they choose too, and those only wanting the ST tasks can get them.
5) Questions and Answers : Preferences : how to adjust resource share? (Message 77054)
Posted 16 days ago by Profile mikey
Post:
More specifically, how do I adjust resource share for MWAH?
In your MilkyWay@home preferences.

But I wouldn't expect much from WCG right now since their so called tech team is working since over a month on getting the work generators to work again at full speed without having a clue on why they got so slow suddenly. That's likely the reason why MilkyWay gets so much ressources recently, WCG simply can't send you enough work.


I added this to the Options section of my cc_config.xml file and it is supposed to help

<rec_half_life_days>1.000000</rec_half_life_days>

But if WCG doesn't have any tasks it doesn't really matter what you do it will never balance out.
6) Message boards : Number crunching : Unable to get MW tasks using TSC computers (Message 77022)
Posted 20 days ago by Profile mikey
Post:
I am unable to get any Milkyway project tasks when using single core CPU computers from The Science Cloud.

Please fix the blockage.


If you are using the Account Manager Science United then you need to talk to them as you are not signed on to any Boinc Projects except thru them.

On the other hand you can always delete MilkyWay on Science United and sign up manually using the Boinc Manager itself by opening the Boinc Manager and selecting Tools, Add Project and selecting MilkyWay off the list.
7) Message boards : Number crunching : Constant Validation Inconclusive Results (Message 76966)
Posted 9 Mar 2024 by Profile mikey
Post:
Please SEND task 936827717 so task 935840769 can finally receive credit for the more than 10 (TEN) hours of CPU time run.
thanks.

936827717 task Name de_nbody_11_02_2023_v183_pal5__data__2_1705435140_605857_1 Created 12 Feb 2024, 7:03:08 UTC


They go at the end of the queue so if you look at the number of tasks to send out you can guess how long it will take, it's ALOT less than it was a month ago though!!
8) Message boards : Number crunching : Why is this project using all 8 cores when another project is trying to run (Message 76964)
Posted 9 Mar 2024 by Profile mikey
Post:
This project is using all eight available cores while another project is trying to run. Why is that?


Because it's an MT app, meaning 'multi-threaded', it has an upper limit of 16 cpu cores but with an app_config.xml file you can change how many cpu cores each tasks uses. Of course the trade-off is each task will take longer to run but if you are okay with that then follow the link below or above depending on how you have this thread sorted.
9) Message boards : Number crunching : HIgh thread count applications (Message 76961)
Posted 7 Mar 2024 by Profile mikey
Post:
hello


Welcome!! Nice bunch of computers you have there and they seem to be doing great!!
10) Questions and Answers : Windows : No new tasks (Message 76958)
Posted 6 Mar 2024 by Profile mikey
Post:
I am no longer getting tasks from milkyway, I also run asteroid and those are working fine.

3/5/2024 12:02:15 PM | Milkyway@home | Sending scheduler request: To fetch work.
3/5/2024 12:02:15 PM | Milkyway@home | Requesting new tasks for CPU
3/5/2024 12:02:18 PM | Milkyway@home | Scheduler request completed: got 0 new tasks
3/5/2024 12:02:18 PM | Milkyway@home | No tasks sent
3/5/2024 12:02:18 PM | Milkyway@home | Project requested delay of 91 seconds

Any idea? Its only been a day or 2 so maybe just wait?


Increase your cache size, your other project probably has your cache already filled up so Boinc won't get more work as it doesn't think it will stay within your cache limits.
11) Message boards : Number crunching : Windows Downloading issues (Message 76953)
Posted 5 Mar 2024 by Profile mikey
Post:
I am having a problem on one system with failed downloads.

Here is a portion of my event Log. I have reset the Project which had no effect. If someone could give me a pointer for possible solutions it would be wonderful.

3/4/2024 9:02:46 PM | Milkyway@home | Fetching scheduler list
3/4/2024 9:02:47 PM | Milkyway@home | Master file download succeeded
3/4/2024 9:02:52 PM | Milkyway@home | Sending scheduler request: Requested by user.
3/4/2024 9:02:52 PM | Milkyway@home | Reporting 67 completed tasks
3/4/2024 9:02:52 PM | Milkyway@home | Requesting new tasks for CPU
3/4/2024 9:02:53 PM | Milkyway@home | Scheduler request completed: got 33 new tasks
3/4/2024 9:02:53 PM | Milkyway@home | Project requested delay of 91 seconds
3/4/2024 9:02:55 PM | Milkyway@home | Started download of milkyway_nbody_orbit_fitting_1.86_windows_x86_64__mt.exe
3/4/2024 9:02:57 PM | Milkyway@home | Finished download of milkyway_nbody_orbit_fitting_1.86_windows_x86_64__mt.exe (6388224 bytes)
3/4/2024 9:02:57 PM | Milkyway@home | md5_file failed for projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_orbit_fitting_1.86_windows_x86_64__mt.exe: fopen() failed
3/4/2024 9:02:57 PM | Milkyway@home | [error] Checksum or signature error for milkyway_nbody_orbit_fitting_1.86_windows_x86_64__mt.exe


2 thoughts..1st can you copy it over from another pc? and 2nd try turning off your a/v and try the transfers again, after it's done turn the a/v back on again
12) Message boards : News : Admin Updates Discussion (Message 76949)
Posted 1 Mar 2024 by Profile mikey
Post:
I got the first of the de_nbody orbit_fitting tasks today. It seems like they will not follow the app_conf.xml. I have configured one of my "48 CPU" computers to run 4 tasks at a time using 12 CPUs each. All of the "old" nbody tasks obey this config file. But the 10 orbit_fitting tasks I got today are all listed as "Ready to start (16 CPUs) (none have run yet). Background, I have two identical computers. One has no app_config.xml file ( runs three tasks at a time using 16 CPUs ). The other has an app_config.xml file to run 4 tasks at a time using 12 CPUs. This has always worked. Even the plain ole nbody tasks I got AFTER the orbit_fitting tasks show "Ready to start (12 CPUs). Is this by design?[/img]


Mine looks like this now and works for me:

<app_config>


<app_version>
<app_name>milkyway_nbody</app_name>
<plan_class>mt</plan_class>
<avg_ncpus>2</avg_ncpus>
<cmdline>--nthreads 2</cmdline>
</app_version>

<app_version>
<app_name>milkyway_nbody_orbit_fitting</app_name>
<plan_class>mt</plan_class>
<avg_ncpus>2</avg_ncpus>
<cmdline>--nthreads 2</cmdline>
</app_version>

<project_max_concurrent>1</project_max_concurrent>

</app_config>

You can see from mine that you have to add a new section with the new app name in it.

I run mine with 2 cpu cores each and they just take longer to run but they run just fie so far, I'm waiting for my wingmen to know for sure of course. I am also only running 1 task at a time on my laptop, my desktops will have different settings based on the capability of each one.
13) Questions and Answers : Windows : BOINC NOT DOWNLOADING WU'S even over night for months now ??? (Message 76932)
Posted 18 Feb 2024 by Profile mikey
Post:
BOINC NOT DOWNLOADING WU'S even over night for months now ??? ive tried reinstalling same .....and yes have latest install


What kind of tasks are you trying to get cpu or gpu tasks? Because the gpu tasks were removed from MiklyWay and we can only get cpu tasks now. Are you using an Account Manager like Bam, Science United etc?
14) Message boards : News : Admin Updates Discussion (Message 76921)
Posted 13 Feb 2024 by Profile mikey
Post:
One valid here and I got now one _1 on my computer. The ready to send buffer also dropped by about 3k. That means we made it through that huge pile of _0s and now we need to make it through the same huge pile of _1s. :D


I have some _2 and _3 tasks on my pc, so we ARE getting closer to normal day to day stuff again.
15) Message boards : Number crunching : HIgh thread count applications (Message 76912)
Posted 11 Feb 2024 by Profile mikey
Post:
In another forum I read an article about applications with high thread counts are not as efficient. For example 16 thread count application will not finish twice as fast as an 8 thread count application. This got me thinking about how I might improving my throughput by running multiple applications at a lower thread count AND can I increase CPU utilization on some systems. One of my computers is a XEON E5 2678 with 24 "CPUs". Milkyway uses 16 by default but with a config file I can run two applications at a time with 12 CPUs apiece. Seems like a "no brainer". But how much did I gain? To "prove" that I would need 2 test files that had equal run times. First run sequentially, then run concurrently. Anyone here ever see any data like this? Pointers to other articles?


Why not run some regular tasks one way and then some the other way, say for about 24 hours each, and then take the average times of each and see which is better. The problem with choosing just one task is that one task could be just that a one off and not representative of real life tasks except that one. IF you do want to run just one task you can run it outside of Boinc but I don't know how to do that.
16) Message boards : News : Admin Updates Discussion (Message 76904)
Posted 10 Feb 2024 by Profile mikey
Post:
Because no one is leading the project. There is no IT department that deals with the project's server. It is not clear which hardware is used in the project's server. The computers you currently support for the project have much more advanced and high-tech hardware than the servers of this project. And now this project has started to lose its seriousness. Look, they haven't been able to solve a database problem for 2 weeks. Personally, if this problem is not solved within 1 week, I will withdraw my support from the project and turn to the universe@home project.


Universe's main Scientist died recently and while they have a new one they are taking a break from sending out tasks for up to 3 months while they do things the way the new guy wants them done. But there's always Cosmology, as long as you are already have an account there, and Asteroids.
17) Message boards : News : Admin Updates Discussion (Message 76900)
Posted 9 Feb 2024 by Profile mikey
Post:
Are completed N-Body tasks ever going to be validated? I now have over 100 completed tasks in my que, validation inconclusive. Should I quit crunching for this project?


Yes they will be validated and no you shouldn't quit because that means they will take even longer to validate. The problem is the Server made a whole bunch of extra main tasks and when it makes a wingman task it goes at the end of the queue, so we are plowing thru all the main tasks before we start on all the wingman tasks.

BTW I have 702 tasks waiting for a wingman.
18) Questions and Answers : Web site : Server Status Page (Message 76888)
Posted 7 Feb 2024 by Profile mikey
Post:
The Server Status page does not reflect correct numbers in the Work Status portion at the upper right when compared to the Tasks by Application at the lower left. Tasks Unsent vs Tasks Ready to Send

Bill F

The standard Server Status page caches the information used to display the Tasks by Application section to reduce the amount of database activity needed -- it won't refresh for aboujt an hour, after which it refreshes the next time someone accesses the page!

I think it used to refresh the Work Status part of the page separately, but the PHP I found on GitHub seems to cache that as well, so I'm [now] at a loss to explain the discrepancy...

The Work Status part of the page only does simple counts against the results table with the various result status codes, which is a lot easier so not cached!

Cheers - Al.

[Edited after a re-check on the recent PHP sources...]


One kink in that is you have to look at the Server version MW is using, I think they are an older version due to all the tweaks they have to make everytime a new version comes out.
19) Message boards : Number crunching : Project communication failed: attempting access to reference site (Message 76886)
Posted 7 Feb 2024 by Profile mikey
Post:
Since 1/28/2024 I keep getting the above message and "Scheduler request to url failed: Couldn't resolve host name". My average work units keep falling.


I used to get that at alot of projects but if I kept trying it would finally say 'ah I know who you are' and let things happen.

What do you mean by your 'average workunits keep failing'? Are they going as 'inconclusive', are they going as 'invalid' or what?
20) Message boards : News : Admin Updates Discussion (Message 76879)
Posted 6 Feb 2024 by Profile mikey
Post:
On February 5, Kevin Roux wrote (message 76876 in thread "Admin Updates"):
Working on
- giving tasks needed for validation priority so credit can be given out faster
Just a word of caution [although I do not have detailed knowledge of BOINC server features and how you plan to use them]:
In a few(?) projects, the BOINC server is configured such that "resends" (additional replica after aborts, invalids etc.) are assigned to hosts which recently returned valid results within a certain turnaround time. I have once witnessed this feature creating a deadlock of work distribution at QuChemPedIA: First there was a wave of troublesome workunits which gave a lot of invalid results. (Their input parameters didn't lead to physically sensible model configs.) That way, eventually all of the active hosts dropped out of the aforementioned category of prioritized hosts. The server got to a point at which it didn't assign any new work any more at all. This deadlock was resolved when the admin figured out the cause and where in the server configuration to remove or relax the host discrimination for replica task assignments.

In other words, now that there are practically no hosts with recent valid results any more, watch out that the server nevertheless will assign _1 tasks to such seemingly untrustworthy hosts. (Though I guess we are still perhaps two weeks or so away from the point when we are through with the current stash of _0 tasks.)


That was initially designed at Seti so units that were waiting for a 3rd of 4th valid result would get it back more quickly than waiting thru the queue, IOW it got the tasks off the Server and into storage quicker because they would no longer be waiting for a valid result match. In the end they too turned it off because the ;faster; hosts, they initially tried to pick hosts that were returning tasks within 24 hours, were just pc's and like all pc's they too had the occasional problem and tasks weren't really coming back any sooner.


Next 20

©2024 Astroinformatics Group