Message boards :
News :
Nbody WU Flush
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 22 · Next
Author | Message |
---|---|
Send message Joined: 24 Jan 11 Posts: 712 Credit: 553,573,147 RAC: 58,100 |
Thanks for explaining. I'm assuming it is a project admin configuration menu or application. Not something a volunteer can see or access. |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
It's an admin-only project menu provided by BOINC. And yes, they should be flagged as "not needed", and not as an error or validation not conclusive. I only cancelled WUs, I did not delete any. Additionally, I only cancelled Wus that had not yet been sent out to any users, so ideally you should never see a cancelled WU in your client. |
Send message Joined: 12 Nov 21 Posts: 236 Credit: 575,038,236 RAC: 0 |
It's an admin-only project menu provided by BOINC. And yes, they should be flagged as "not needed", and not as an error or validation not conclusive. I only cancelled WUs, I did not delete any. Additionally, I only cancelled Wus that had not yet been sent out to any users, so ideally you should never see a cancelled WU in your client. We as clients have an option to "Task Suspend by user". Do you have an Admin option to "work unit suspended by admin"? that way they wont be deleted of cancelled, either one. Just held in limbo until an admin changes it back. No need to regenerate it or anything. easy peasy |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 1 |
It's an admin-only project menu provided by BOINC. And yes, they should be flagged as "not needed", and not as an error or validation not conclusive. I only cancelled WUs, I did not delete any. Additionally, I only cancelled Wus that had not yet been sent out to any users, so ideally you should never see a cancelled WU in your client. On the Client side if you suspend a workunit, or workunits, you won't get any new tasks from that project until you resume or abort it/them, I have no idea if that works the same way on the Server side or not but it would need a close eye on things to make sure 5k workunits aren't suspended and then the Server stops making tasks altogether because it thinks you already have tasks. |
Send message Joined: 28 Feb 22 Posts: 16 Credit: 2,400,538 RAC: 0 |
Tom, I assume you selected the "cancel only jobs with no instance in progress" option? Workunits would be marked as error or invalid if you hadn't selected this (depending on if the user had completed a task or not). I assume you have a way to get the work generator to recreate these canceled workunits, or does it not matter if some N-Body workunits are skipped? N-Body was only fixing the problem of reducing my 30-day backlog of validation-inconclusive workunits for about 12 hours. For over a day since, all my in-progress tasks are brand new workunits. Even if suspending new workunits isn't possible, maybe there's a way of increasing the priority of older workunits? Or just turn off the N-Body work generator until the 30-day backlog is fixed? During the 12 hours that the backlog was being fixed, I put a couple more computers towards N-Body, which have been removed since they just seem to be increasing the backlog now. I still have 1 computer doing N-Body because a firewall blocks it from doing most other projects (and it lets me keep tabs on N-Body), and putting CPU towards Separation seems to be a waste. By the way, setting up a *local* BOINC server on Linux (Ubuntu) is relatively easy and lets me play around with BOINC. Via your home router, just give your computer (I used an old laptop that is set to dual boot Ubuntu and Windows) a local IP address such as 192.168.0.201 Then follow the following instructions... https://boinc.berkeley.edu/trac/wiki/ServerIntro I ignored the Docker and boinc-server-maker stuff. The trick is, when running make_project, add the following option --url_base http://192.168.0.201/(or whatever your static local IP is) If you follow the instructions, a test applications is installed, which you can modify and/or play around with! |
Send message Joined: 15 Jun 13 Posts: 15 Credit: 2,070,897,222 RAC: 0 |
I added a spare PC to nbody for the first time, to help out. It's only two Sandy Bridge cores, but, it seems to get through a work unit every 16 minutes :) |
Send message Joined: 15 Jun 13 Posts: 15 Credit: 2,070,897,222 RAC: 0 |
Update: I hadn't been getting any Separation tasks all day, so I opted to also try nbody on my main machine. A few people have had issues getting Separation CPU tasks that they didn't want, but, I was able to fetch 100 nbody tasks by themselves, for 16 CPUs. So far so good... but wouldn't you know it? I got more Separation tasks several minutes later, including a bunch of CPU ones. >.< I aborted those, and it's doing GPU work at the moment, but if it keeps trying to get Separation CPU work I'll just have to turn that off. Oh well... (That particular CPU has been doing distributed.net OGR for a while, but either way, it's going back to WCG once they come back online.) EDIT: Looks like the Separation tasks were a one-time thing, for CPU and GPU. It's steadily getting nbody only, at least for now. I'll keep an eye on it... I get the feeling that we'll just have to keep chipping away at the nbody backlog, slowly but surely, until it's under control and the server can handle things again. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Status still has 13 million Nbodys. I shoved 90 CPU cores and 6 280X GPUs (1Teraflop DP each) onto MW, and everything filled up. I could stare at the Boinctasks screen all day, MW stuff moves so quickly. I'm trying to do some Folding@Home but it seems those cards can do both at once and share nicely at 50% each. when I get the R9 Nanos they can do the Folding and leave the 280Xs for this project. (Nanos don't do DP well but they're much faster at SP). |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
TOM: I don't know if you failed to clear the Nbodys or the server status is wrong, but they're still there. We'll work through them eventually and things seem to be flowing ok. It should sort itself if you just leave it. But.... your suggestions of increasing how many Separations I can get at once would be nice! |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Now I get no Nbody available. Never seen that message before. |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
In amongst other projects I have managed to do nearly 100 Nbody WU's. They are unique in that every one so far has been VALIDATED. In trying to establish why I note the following :- I am the "wingman" task in every instance (ie 2nd) . I also note that in every case they were sent out today which is the 30 day anniversary of when they were first received back. I might be wrong or every one knows this but it looks like an automatic 30 day delay in sending out the second task to me. Apologies if I am talking rubbish. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
In amongst other projects I have managed to do nearly 100 Nbody WU's.I confirm, mine are also 30 days after the primary task was done. Both validated ok. You do get 12 days to do them, perhaps they wait to see if they self validate first then the wingman one gets shoved in a queue? They do say quorum minimum 1 as though they don't always have to have a wingman. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
I am also confused by this, my seperation GPU WUs have: minimum quorum 2 initial replication 3 Why send out 3 if only 2 are needed? This seems like a waste of processing time. If two of us have agreed on the result, why get a third GPU to run it? |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
Surely they could make the delay 14 days rather than 30 ? That would still allow the 12 days to get it done. |
Send message Joined: 12 Nov 21 Posts: 236 Credit: 575,038,236 RAC: 0 |
OK. The n body backlog is out of control. I think I will abort all separation _0 CPU work for a week or so. This will free up 36 CPU threads for the N Body WU Flush. Any wingman separation tasks that are _1 and above, will be processed normally. All GPU separations will also be processed normally. Separation retests will also go through normally. I'm starting to get worked up into campaign mode here. Thoughts? |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Surely they could make the delay 14 days rather than 30 ? That would still allow the 12 days to get it done.Maybe they do, but after the 12 days it gets stuck on the end of the queue which is huge? Normally it's 1000 tasks, so wouldn't be much longer than the time for the first task to complete. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
OK. The n body backlog is out of control. I think I will abort all separation _0 CPU work for a week or so. This will free up 36 CPU threads for the N Body WU Flush. Any wingman separation tasks that are _1 and above, will be processed normally. All GPU separations will also be processed normally. Separation retests will also go through normally.I just got told by the server there were no Nbodys left, so perhaps Tom cleared it? I'm also getting loads of GPU separation every time I ask so I think the server is ok now and you can run what you like. I try not to do seperation on the CPUs anyway, as the GPUs are many many times faster and it seems a waste. It's a pity the server options don't let me choose that completely. |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
I had to hit the enter button loads of times to get Nbody work, my PC sat for some while getting nothing. Obviously a significant contributor to the the backlog is the 30 day limit. In theory we could get no WU's because of the date regardless of the number waiting(? Already Happening?). My backlog is miniscule I guess compared to others but it will be at least 30 days before the processing done in the last few days gets resent. I think the 30 day parameter has to be looked at otherwise the backlog is never going to be shifted as people will get fed up particularly as the backlog is not going down very much even though there has been a big increase in users. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
I had to hit the enter button loads of times to get Nbody work, my PC sat for some while getting nothing. Obviously a significant contributor to the the backlog is the 30 day limit. In theory we could get no WU's because of the date regardless of the number waiting(? Already Happening?). My backlog is miniscule I guess compared to others but it will be at least 30 days before the processing done in the last few days gets resent. I think the 30 day parameter has to be looked at otherwise the backlog is never going to be shifted as people will get fed up particularly as the backlog is not going down very much even though there has been a big increase in users.I can't believe all those 13 million on the server status page are wingman jobs. They all appeared at once when the disk failure occurred, so I assume most are fresh jobs. But for some reason I can't now access them. |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
I am going to try to manually flush the Nbody Wus without using the BOINC ops page. The server will be down for a little bit while this runs. |
©2024 Astroinformatics Group