Welcome to MilkyWay@home

Present burn rate, and future possibilities...

Message boards : Number crunching : Present burn rate, and future possibilities...
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 5,987
Message 72500 - Posted: 5 Apr 2022, 1:51:34 UTC

Recently I've been thinking(no, really!) about the server issues and lack of work units, etc. and was wondering why such a huge back log of unsent work. The numbers seem high, until you consider the burn rate of over 5k clients. This chart might explain, in part, why we were running out of separation work. Could it be the case that the generator just couldn't keep up with the demand? My original thoughts were, like some others here, was to turn the generators off, and let the system flush itself. The chart seems to indicate that the flush would not have taken long at all.



Also, there is a flaw in the math here, in that the users in 24 hours numbers, probably don't reflect 100% utilization for all users. So the time to run out would be stretched accordingly. Does anyone know if these are average user numbers, as the run time column is?

In view of Prof. Heidi's very cool vision of 7 to 10 galaxies in one mambo king sized simulation, then burn rate becomes an issue, if you want to finish in a reasonable time frame! We need more crunchers! And will also need to dramatically scale up the MW@H server fleet, with all new faster, leaner, meaner hardware. Heck, it may even be a good idea to drop BOINC and roll our own platform. After all, BOINC is 20 years old or so, and kinda long in the tooth, needing a refresh. In the Find-A-Drug era, we did not use BOINC, and that project ran just fine.

Also #2, It would be nice if someone could check/peer review my math and concepts.

All this, sez a guy with no $$$, and just a slug, a potato, and a nice Ryzen 9 3900x.
ID: 72500 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kiska

Send message
Joined: 31 Mar 12
Posts: 94
Credit: 151,956,434
RAC: 2,768
Message 72503 - Posted: 5 Apr 2022, 2:45:07 UTC

Seems about right. If you draw a few lines:



You can access the dashboard here: https://grafana.kiska.pw/d/boinc/boinc?orgId=1&var-project=milkyway%40home&from=now-168h&to=now&chunkNotFound=&refresh=1m

I have added a few temp panels for my own use, you have viewer + temp edit permission, you can't save your changes but anyhow have fun with the data
ID: 72503 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 5,987
Message 72504 - Posted: 5 Apr 2022, 2:54:57 UTC - in response to Message 72503.  

Kiska, this is excellent. I have seen your charts for the last week or so. rally enjoy them. Is this your own work? Can I set something like this up locally, for my 3 computers?
ID: 72504 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kiska

Send message
Joined: 31 Mar 12
Posts: 94
Credit: 151,956,434
RAC: 2,768
Message 72505 - Posted: 5 Apr 2022, 3:42:34 UTC - in response to Message 72504.  

Kiska, this is excellent. I have seen your charts for the last week or so. rally enjoy them. Is this your own work? Can I set something like this up locally, for my 3 computers?


What do you mean by your computers?

I get the server_status page from each project and put them into a time series database, which grafana queries and generates the graphs for you to play with.

You can replicate them by running influxdb, grafana and some solution to grab the pages(I have done some programming in python to do this)
ID: 72505 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 5,987
Message 72506 - Posted: 5 Apr 2022, 3:47:04 UTC - in response to Message 72505.  
Last modified: 5 Apr 2022, 3:47:54 UTC

my computers at home. sounds like its not a local thing.
ID: 72506 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kiska

Send message
Joined: 31 Mar 12
Posts: 94
Credit: 151,956,434
RAC: 2,768
Message 72507 - Posted: 5 Apr 2022, 4:01:24 UTC - in response to Message 72506.  

my computers at home. sounds like its not a local thing.


You can definitely set this up locally if you wish to. All you need is influxdb, grafana and something to poll the server_status page once in a while
ID: 72507 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 708
Credit: 543,108,982
RAC: 143,828
Message 72520 - Posted: 5 Apr 2022, 18:01:40 UTC - in response to Message 72500.  

Chart is interesting. Also maybe a comment about the reason people get no work sent on request when the work available numbers in the millions.
As we learned at Seti when you have thousands of hosts pinging a project every second looking for work, you aren't really pulling from that pool of 18M work units.
You are in fact pulling from a much smaller 100 WU buffer pool for scheduler connections. If a single host contacts the buffer and empties it, the next person contacting the project will get no work sent until the buffer refills.
You can't pull directly from the entire cache of work units because that would need a database access across all 18M tasks and much too slow to service each work request from every host.
All that can be done is either increase the fill rate on the buffer or increase the buffer size a bit more so that it doesn't get emptied as fast.
ID: 72520 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 5,987
Message 72527 - Posted: 6 Apr 2022, 0:09:51 UTC - in response to Message 72520.  

Thanks Keith
ID: 72527 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 72789 - Posted: 14 Apr 2022, 17:46:04 UTC

The Nbody WU generators have been off for some time now. There is just an enormous backlog of Nbody WUs to get sent out, and with few users running Nbody compared to Separation, we get the current issue.

Ideally their WU pools wouldn't be shared, but I think the big backlog of Nbody WUs slows down the separation generation and buffer refills significantly.
ID: 72789 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Present burn rate, and future possibilities...

©2024 Astroinformatics Group