Nbody WU Flush

Author	Message
Keith Myers Send message Joined: 24 Jan 11 Posts: 709 Credit: 549,612,876 RAC: 56,648	Message 72911 - Posted: 16 Apr 2022, 22:37:22 UTC - in response to Message 72890. Thanks for explaining. Have never seen this information. Probably because I never needed it? Still wondering what "milkyway_ops" is. I guess I'm just a plain noob ... I'm assuming it is a project admin configuration menu or application. Not something a volunteer can see or access. ID: 72911 · Rating: 0 · rate: / Reply Quote

Tom Donlon Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0	Message 72918 - Posted: 17 Apr 2022, 14:43:11 UTC It's an admin-only project menu provided by BOINC. And yes, they should be flagged as "not needed", and not as an error or validation not conclusive. I only cancelled WUs, I did not delete any. Additionally, I only cancelled Wus that had not yet been sent out to any users, so ideally you should never see a cancelled WU in your client. ID: 72918 · Rating: 0 · rate: / Reply Quote

HRFMguy Send message Joined: 12 Nov 21 Posts: 236 Credit: 575,038,236 RAC: 0	Message 72944 - Posted: 17 Apr 2022, 23:30:00 UTC - in response to Message 72918. It's an admin-only project menu provided by BOINC. And yes, they should be flagged as "not needed", and not as an error or validation not conclusive. I only cancelled WUs, I did not delete any. Additionally, I only cancelled Wus that had not yet been sent out to any users, so ideally you should never see a cancelled WU in your client. We as clients have an option to "Task Suspend by user". Do you have an Admin option to "work unit suspended by admin"? that way they wont be deleted of cancelled, either one. Just held in limbo until an admin changes it back. No need to regenerate it or anything. easy peasy ID: 72944 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3335 Credit: 524,010,781 RAC: 962	Message 72967 - Posted: 18 Apr 2022, 11:22:44 UTC - in response to Message 72944. It's an admin-only project menu provided by BOINC. And yes, they should be flagged as "not needed", and not as an error or validation not conclusive. I only cancelled WUs, I did not delete any. Additionally, I only cancelled Wus that had not yet been sent out to any users, so ideally you should never see a cancelled WU in your client. We as clients have an option to "Task Suspend by user". Do you have an Admin option to "work unit suspended by admin"? that way they wont be deleted of cancelled, either one. Just held in limbo until an admin changes it back. No need to regenerate it or anything. easy peasy On the Client side if you suspend a workunit, or workunits, you won't get any new tasks from that project until you resume or abort it/them, I have no idea if that works the same way on the Server side or not but it would need a close eye on things to make sure 5k workunits aren't suspended and then the Server stops making tasks altogether because it thinks you already have tasks. ID: 72967 · Rating: 0 · rate: / Reply Quote

poppinfresh99 Send message Joined: 28 Feb 22 Posts: 16 Credit: 2,400,538 RAC: 0	Message 72980 - Posted: 18 Apr 2022, 14:36:43 UTC - in response to Message 72967. Tom, I assume you selected the "cancel only jobs with no instance in progress" option? Workunits would be marked as error or invalid if you hadn't selected this (depending on if the user had completed a task or not). I assume you have a way to get the work generator to recreate these canceled workunits, or does it not matter if some N-Body workunits are skipped? N-Body was only fixing the problem of reducing my 30-day backlog of validation-inconclusive workunits for about 12 hours. For over a day since, all my in-progress tasks are brand new workunits. Even if suspending new workunits isn't possible, maybe there's a way of increasing the priority of older workunits? Or just turn off the N-Body work generator until the 30-day backlog is fixed? During the 12 hours that the backlog was being fixed, I put a couple more computers towards N-Body, which have been removed since they just seem to be increasing the backlog now. I still have 1 computer doing N-Body because a firewall blocks it from doing most other projects (and it lets me keep tabs on N-Body), and putting CPU towards Separation seems to be a waste. By the way, setting up a local BOINC server on Linux (Ubuntu) is relatively easy and lets me play around with BOINC. Via your home router, just give your computer (I used an old laptop that is set to dual boot Ubuntu and Windows) a local IP address such as 192.168.0.201 Then follow the following instructions... https://boinc.berkeley.edu/trac/wiki/ServerIntro I ignored the Docker and boinc-server-maker stuff. The trick is, when running make_project, add the following option --url_base http://192.168.0.201/ (or whatever your static local IP is) If you follow the instructions, a test applications is installed, which you can modify and/or play around with! ID: 72980 · Rating: 0 · rate: / Reply Quote

WMD Send message Joined: 15 Jun 13 Posts: 15 Credit: 2,070,897,222 RAC: 5	Message 72985 - Posted: 18 Apr 2022, 20:05:13 UTC I added a spare PC to nbody for the first time, to help out. It's only two Sandy Bridge cores, but, it seems to get through a work unit every 16 minutes :) ID: 72985 · Rating: 0 · rate: / Reply Quote

WMD Send message Joined: 15 Jun 13 Posts: 15 Credit: 2,070,897,222 RAC: 5	Message 72988 - Posted: 19 Apr 2022, 0:04:48 UTC Last modified: 19 Apr 2022, 0:41:07 UTC Update: I hadn't been getting any Separation tasks all day, so I opted to also try nbody on my main machine. A few people have had issues getting Separation CPU tasks that they didn't want, but, I was able to fetch 100 nbody tasks by themselves, for 16 CPUs. So far so good... but wouldn't you know it? I got more Separation tasks several minutes later, including a bunch of CPU ones. >.< I aborted those, and it's doing GPU work at the moment, but if it keeps trying to get Separation CPU work I'll just have to turn that off. Oh well... (That particular CPU has been doing distributed.net OGR for a while, but either way, it's going back to WCG once they come back online.) EDIT: Looks like the Separation tasks were a one-time thing, for CPU and GPU. It's steadily getting nbody only, at least for now. I'll keep an eye on it... I get the feeling that we'll just have to keep chipping away at the nbody backlog, slowly but surely, until it's under control and the server can handle things again. ID: 72988 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0	Message 73001 - Posted: 19 Apr 2022, 12:19:50 UTC Status still has 13 million Nbodys. I shoved 90 CPU cores and 6 280X GPUs (1Teraflop DP each) onto MW, and everything filled up. I could stare at the Boinctasks screen all day, MW stuff moves so quickly. I'm trying to do some Folding@Home but it seems those cards can do both at once and share nicely at 50% each. when I get the R9 Nanos they can do the Folding and leave the 280Xs for this project. (Nanos don't do DP well but they're much faster at SP). ID: 73001 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0	Message 73002 - Posted: 19 Apr 2022, 12:20:34 UTC TOM: I don't know if you failed to clear the Nbodys or the server status is wrong, but they're still there. We'll work through them eventually and things seem to be flowing ok. It should sort itself if you just leave it. But.... your suggestions of increasing how many Separations I can get at once would be nice! ID: 73002 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0	Message 73005 - Posted: 19 Apr 2022, 14:39:37 UTC Now I get no Nbody available. Never seen that message before. ID: 73005 · Rating: 0 · rate: / Reply Quote

Septimus Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 2	Message 73006 - Posted: 19 Apr 2022, 14:53:09 UTC In amongst other projects I have managed to do nearly 100 Nbody WU's. They are unique in that every one so far has been VALIDATED. In trying to establish why I note the following :- I am the "wingman" task in every instance (ie 2nd) . I also note that in every case they were sent out today which is the 30 day anniversary of when they were first received back. I might be wrong or every one knows this but it looks like an automatic 30 day delay in sending out the second task to me. Apologies if I am talking rubbish. ID: 73006 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0	Message 73007 - Posted: 19 Apr 2022, 14:59:34 UTC - in response to Message 73006. In amongst other projects I have managed to do nearly 100 Nbody WU's. They are unique in that every one so far has been VALIDATED. In trying to establish why I note the following :- I am the "wingman" task in every instance (ie 2nd) . I also note that in every case they were sent out today which is the 30 day anniversary of when they were first received back. I might be wrong or every one knows this but it looks like an automatic 30 day delay in sending out the second task to me. Apologies if I am talking rubbish. I confirm, mine are also 30 days after the primary task was done. Both validated ok. You do get 12 days to do them, perhaps they wait to see if they self validate first then the wingman one gets shoved in a queue? They do say quorum minimum 1 as though they don't always have to have a wingman. ID: 73007 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0	Message 73008 - Posted: 19 Apr 2022, 15:06:59 UTC Last modified: 19 Apr 2022, 15:07:36 UTC I am also confused by this, my seperation GPU WUs have: minimum quorum 2 initial replication 3 Why send out 3 if only 2 are needed? This seems like a waste of processing time. If two of us have agreed on the result, why get a third GPU to run it? ID: 73008 · Rating: 0 · rate: / Reply Quote

Septimus Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 2	Message 73009 - Posted: 19 Apr 2022, 15:28:55 UTC - in response to Message 73008. Surely they could make the delay 14 days rather than 30 ? That would still allow the 12 days to get it done. ID: 73009 · Rating: 0 · rate: / Reply Quote

HRFMguy Send message Joined: 12 Nov 21 Posts: 236 Credit: 575,038,236 RAC: 0	Message 73010 - Posted: 19 Apr 2022, 15:30:28 UTC Last modified: 19 Apr 2022, 15:31:35 UTC OK. The n body backlog is out of control. I think I will abort all separation _0 CPU work for a week or so. This will free up 36 CPU threads for the N Body WU Flush. Any wingman separation tasks that are _1 and above, will be processed normally. All GPU separations will also be processed normally. Separation retests will also go through normally. I'm starting to get worked up into campaign mode here. Thoughts? ID: 73010 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0	Message 73011 - Posted: 19 Apr 2022, 15:34:32 UTC - in response to Message 73009. Surely they could make the delay 14 days rather than 30 ? That would still allow the 12 days to get it done. Maybe they do, but after the 12 days it gets stuck on the end of the queue which is huge? Normally it's 1000 tasks, so wouldn't be much longer than the time for the first task to complete. ID: 73011 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0	Message 73012 - Posted: 19 Apr 2022, 15:35:32 UTC - in response to Message 73010. Last modified: 19 Apr 2022, 15:37:59 UTC OK. The n body backlog is out of control. I think I will abort all separation _0 CPU work for a week or so. This will free up 36 CPU threads for the N Body WU Flush. Any wingman separation tasks that are _1 and above, will be processed normally. All GPU separations will also be processed normally. Separation retests will also go through normally. I'm starting to get worked up into campaign mode here. Thoughts? I just got told by the server there were no Nbodys left, so perhaps Tom cleared it? I'm also getting loads of GPU separation every time I ask so I think the server is ok now and you can run what you like. I try not to do seperation on the CPUs anyway, as the GPUs are many many times faster and it seems a waste. It's a pity the server options don't let me choose that completely. ID: 73012 · Rating: 0 · rate: / Reply Quote

Septimus Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 2	Message 73014 - Posted: 19 Apr 2022, 16:43:23 UTC Last modified: 19 Apr 2022, 16:49:51 UTC I had to hit the enter button loads of times to get Nbody work, my PC sat for some while getting nothing. Obviously a significant contributor to the the backlog is the 30 day limit. In theory we could get no WU's because of the date regardless of the number waiting(? Already Happening?). My backlog is miniscule I guess compared to others but it will be at least 30 days before the processing done in the last few days gets resent. I think the 30 day parameter has to be looked at otherwise the backlog is never going to be shifted as people will get fed up particularly as the backlog is not going down very much even though there has been a big increase in users. ID: 73014 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0	Message 73015 - Posted: 19 Apr 2022, 17:12:46 UTC - in response to Message 73014. I had to hit the enter button loads of times to get Nbody work, my PC sat for some while getting nothing. Obviously a significant contributor to the the backlog is the 30 day limit. In theory we could get no WU's because of the date regardless of the number waiting(? Already Happening?). My backlog is miniscule I guess compared to others but it will be at least 30 days before the processing done in the last few days gets resent. I think the 30 day parameter has to be looked at otherwise the backlog is never going to be shifted as people will get fed up particularly as the backlog is not going down very much even though there has been a big increase in users. I can't believe all those 13 million on the server status page are wingman jobs. They all appeared at once when the disk failure occurred, so I assume most are fresh jobs. But for some reason I can't now access them. ID: 73015 · Rating: 0 · rate: / Reply Quote

Tom Donlon Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0	Message 73017 - Posted: 19 Apr 2022, 18:31:40 UTC I am going to try to manually flush the Nbody Wus without using the BOINC ops page. The server will be down for a little bit while this runs. ID: 73017 · Rating: 0 · rate: / Reply Quote