Welcome to MilkyWay@home

Posts by Thunder

1) Message boards : News : Running Modfit on MilkyWay@home (Message 64934)
Posted 23 Jul 2016 by ProfileThunder
Post:
Unfortunately this weekend has brought a fresh starvation of work. About 2/3 of requests are getting 0 tasks and those that do are dribbling out a few at a time.

Perhaps once an hour I'm getting a full 60 tasks when requested.
2) Message boards : Number crunching : threads per gpu? (Message 64917)
Posted 20 Jul 2016 by ProfileThunder
Post:
I'm running 4 per GPU on a 280X.

I've heard of the issue of the tasks all getting "synchronized" and wanting to start/finish at the same time (stalling the GPU during loading), but I've never actually witnessed it enough to be concerned.
3) Message boards : News : Running Modfit on MilkyWay@home (Message 64910)
Posted 19 Jul 2016 by ProfileThunder
Post:
So far no troubles running out! (If I could dance, I'd be doing a little jig.)

Considering the number of tasks in progress for the project has increased by over 30k since that change, I'd say a lot more people are happy as well. :-)

As I've said before, if you are willing to increase the number allowed on-hand a bit, that would be ideal (I think about 90 instead of 60 would keep mine fully fed even switching between projects), but *NOT* if that's going to cause you to get so many errors/aborts/etc. as to compromise the science.

Seriously good work, Jake! This was an absolute sea change in the function of the project. :-)
4) Message boards : News : Running Modfit on MilkyWay@home (Message 64908)
Posted 19 Jul 2016 by ProfileThunder
Post:
I don't know what you changed recently, but all morning the server has been INCREDIBLY reliable about always sending the full amount requested!

It's been great! :-)
5) Message boards : News : Running Modfit on MilkyWay@home (Message 64901)
Posted 16 Jul 2016 by ProfileThunder
Post:
I'm not sure if something in particular changed on Friday or if it's just a matter of more computers being switched back to the project, but the server is back to not having nearly enough work available.

For about the last 24 hours, I've been seeing about 80-90% of scheduler requests for MW work (not N-body) giving the "got 0 new tasks" response.

I think it's a matter of how much work is being generated, based on the fact that the server status page continually shows the unsent tasks somewhere in the 'teens and the in progress numbers dropping continually. :-(
6) Message boards : Number crunching : Computation Error on (Message 64877)
Posted 13 Jul 2016 by ProfileThunder
Post:
Jake,

Are you sure you've cleared them out (or perhaps more were started)?

I'm still getting quite a few (sometimes 2/3rds of the tasks sent) that are the 'fixed angle' jobs (over 24 hours later). All still crashing.
7) Message boards : News : Running Modfit on MilkyWay@home (Message 64868)
Posted 12 Jul 2016 by ProfileThunder
Post:
I noticed I just downloaded a new set of parameters and looks like all of the tasks for them are failing with computation errors.

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=691866&offset=0&show_names=0&state=6&appid=

Edit: It also just deferred communication on the last scheduler request for uh... a full 24 hours. O.o
8) Message boards : News : Running Modfit on MilkyWay@home (Message 64866)
Posted 12 Jul 2016 by ProfileThunder
Post:
Only because I'm updating it manually any time I'm near it. (For the two machines that have reasonably fast GPUs, 50 tasks is about 4 minutes for 1, 55 minutes for the other)

It's kind of become "feast or famine" now. Most scheduler requests either return 0 tasks or nearly 60. There's very little in-between.

Judging from the increase in speed of the whole project, I'm guessing you've increased the available work and the users have responded by getting it (and getting it done).

I know you're concerned about computers returning a lot of error tasks and how that affects the science. One possible idea would be to increase the maximum error tasks on each WU from 2 to 3 (as a start) and see if that reduces the number that are completely thrown out as possibly "bug" WUs. I know of very few projects that do the number of tasks that this one does that has that threshold set so low.

Beyond that, there are mechanisms in BOINC to establish "trusted" vs non computers and assign more/less work accordingly.
9) Message boards : News : Running Modfit on MilkyWay@home (Message 64864)
Posted 11 Jul 2016 by ProfileThunder
Post:
Here's a machine that I noticed had run completely out. Tried updating and got this (all times are central):

36000 Milkyway@Home 7/11/2016 5:27:05 PM Reporting 12 completed tasks
36001 Milkyway@Home 7/11/2016 5:27:05 PM Requesting new tasks for AMD/ATI GPU
36002 Milkyway@Home 7/11/2016 5:27:07 PM Scheduler request completed: got 0 new tasks
36003 Milkyway@Home 7/11/2016 5:28:39 PM update requested by user
36004 Milkyway@Home 7/11/2016 5:28:42 PM Sending scheduler request: Requested by user.
36005 Milkyway@Home 7/11/2016 5:28:42 PM Requesting new tasks for AMD/ATI GPU
36006 Milkyway@Home 7/11/2016 5:28:45 PM Scheduler request completed: got 0 new tasks
36007 Milkyway@Home 7/11/2016 5:30:11 PM update requested by user
36008 Milkyway@Home 7/11/2016 5:30:15 PM Sending scheduler request: Requested by user.
36009 Milkyway@Home 7/11/2016 5:30:15 PM Requesting new tasks for AMD/ATI GPU
36010 Milkyway@Home 7/11/2016 5:30:17 PM Scheduler request completed: got 0 new tasks
36011 Milkyway@Home 7/11/2016 5:31:21 PM update requested by user
36012 Milkyway@Home 7/11/2016 5:31:23 PM Sending scheduler request: Requested by user.
36013 Milkyway@Home 7/11/2016 5:31:23 PM Requesting new tasks for AMD/ATI GPU
36014 Milkyway@Home 7/11/2016 5:31:26 PM Scheduler request completed: got 5 new tasks

The server status page (which I know only shows a snapshot) is saying only 12 tasks are available to send on it's most recent update.

Edit: Have tried several times since on 2 different machines and most updates are returning 0 tasks. :-(
10) Message boards : News : Running Modfit on MilkyWay@home (Message 64859)
Posted 11 Jul 2016 by ProfileThunder
Post:
Well, there's still some sort of issue, because for the last 4 minutes, I've gotten back to back "Scheduler request completed: got 0 tasks" messages and the fast machine ran out of work.

Like I said, the change made a considerable difference, but it's still not enough to actually keep a fast GPU "fed".

Edit: The next request after waiting a couple minutes only sent 9 tasks, so there are still evidently times where the server is running dry
11) Message boards : News : Running Modfit on MilkyWay@home (Message 64857)
Posted 11 Jul 2016 by ProfileThunder
Post:
Jake,

That made a substantial difference! I'm still seeing rare scheduler requests that will only return a handful tasks, but overall it looks like 40-45 tasks are sent on most of them. (previously, it was typical to get 10-20)

If you're not seeing any negative impacts to the server (and I'd suspect you're seeing a reduced workload on it since it's probably having to handle fewer scheduler requests) and can increase it more, I would be in favor.

If you go ahead and dial the number of tasks available up, it would also be beneficial to increase the limit on maximum tasks on hand from the current 60. By all means, start small on that. Perhaps 90?

Thanks. :-)
12) Message boards : Application Code Discussion : Questions on Nbody (Message 64814)
Posted 5 Jul 2016 by ProfileThunder
Post:
It's worth noting that you do get more credit for the longer N-Body tasks.

How credit is computed is way too complicated to get into here, but suffice to say that you're getting roughly the same amount of credit per amount of time, regardless of how long the tasks take.

For me, in this last round of N-Body, I've seen tasks as short as about 2 minutes and as long as 170 hours (on the same machine, I might add) so as long as yours are falling inside that, it's probably all good. :-)
13) Message boards : News : Nbody Release 1.62 (Message 64768)
Posted 28 Jun 2016 by ProfileThunder
Post:
Is that the reason no N-Body work has been available for quite some time?

If so, is that going to be remedied soon?
14) Message boards : News : Running Modfit on MilkyWay@home (Message 64764)
Posted 27 Jun 2016 by ProfileThunder
Post:
Glenn, are those 32-bit installations? If not, what distro of linux?

I'm not having any trouble on 64-bit Ubuntu 15.10 or 16.04.
15) Message boards : News : Running Modfit on MilkyWay@home (Message 64762)
Posted 27 Jun 2016 by ProfileThunder
Post:
Looks like the total tasks allowed to have on-hand was increased from 40 to 60.

Unfortunately it's still only allowing 25 tasks per request, so for machines that are running multiple projects, they'll still only likely get the same (25 if they're available) per hour.

I fully appreciate making small changes and waiting to see their real-world effect, so could you next try increasing the amount of tasks per request a bit to say, 30? (Ultimately I'm hoping for much more, but let's not get crazy just yet)

Edit: Also, you still need most of all to figure out why the server has no tasks available to send so often. Here was an automatic (not prompted by me hitting update) request that just happened:

3860 Milkyway@Home 6/27/2016 9:35:08 AM Reporting 45 completed tasks
3861 Milkyway@Home 6/27/2016 9:35:08 AM Requesting new tasks for AMD/ATI GPU
3862 Milkyway@Home 6/27/2016 9:35:09 AM Scheduler request completed: got 0 new tasks

Same machine, trying a few minutes later:

3870 Milkyway@Home 6/27/2016 9:39:05 AM Reporting 3 completed tasks
3871 Milkyway@Home 6/27/2016 9:39:05 AM Requesting new tasks for AMD/ATI GPU
3872 Milkyway@Home 6/27/2016 9:39:07 AM Scheduler request completed: got 12 new tasks

(12 tasks were available, but I'm sure it requested much more)
16) Message boards : News : Running Modfit on MilkyWay@home (Message 64748)
Posted 24 Jun 2016 by ProfileThunder
Post:
Sure enough, exactly what I predicted happened a few minutes later.

After repeated requests for more work getting 0 tasks sent, the GPU ran out completely and I had to go on and allow work from other projects so it would at least be doing something productive. :-/
17) Message boards : News : Running Modfit on MilkyWay@home (Message 64747)
Posted 24 Jun 2016 by ProfileThunder
Post:
Thunder,

If it makes you feel any better, I am pretty sure the version in the official Ubuntu 16.04 package system has a memory leak that some users were complaining about. Hopefully updating the client will let you at least ask for work a little more often than once an hour. This is the next thing on my list to fix after I figure out why new compilations of MW@home take 7 times longer to run on Mac than old compilations.

Jake


So before going to extreme of installing a new client (a royal pain in the behind on linux), I tried a little experiment today and figured out what's going on.

Since I run 3 projects on this machine, the simple thing to try was to set the other two projects to "no new work" and see what happens. :-) As soon as the others were (nearly) out of work, the client started hitting up MW@H about once every minute or two for new work. (I got this idea after seeing the same behavior on a Windows machine with the latest version that also has a reasonably fast GPU)

So the client is basically getting work (but only a very little), finishing it, then seeing it has work for other projects and essentially making the decision that since it ran out of work for MW@H, it will switch over. Then, after a sufficient time has passed, it goes to see if MW@H has more work available. (And repeating this cycle ad infinitum)

I'm guessing this is why MW@H has seen a precipitous drop in credit since the switch was made (and also a pretty severe drop in active participants). It doesn't really affect those that run only MW&H, but if, as the majority of BOINC users do, you run multiple projects, it's only going to do as many tasks for MW&H as it can get in one scheduler request, then rotate around to another project. Since the default for BOINC installs is to rotate every 60 minutes, there you go. :-/

Obviously the science dictates the tasks, so you can't just make longer tasks to solve this. However, if you could increase the number of tasks available on the server and allow a much larger number to be issued per scheduler request. (if I could wave my magic wand, I'd ask for 250 instead of the current limit of 25)

Either way, you need to figure how to get more work available from the server because even if I set MW&H as the only project, it's going to run out of work once in a while due to:

WRF1

10172 Milkyway@Home 6/24/2016 11:56:38 AM Scheduler request completed: got 0 new tasks
WRF1

10200 Milkyway@Home 6/24/2016 11:57:42 AM Scheduler request completed: got 0 new tasks

(Two back to back scheduler requests in which is the server had zero tasks available to send for GPU work)

As I've said in a few posts before... IF you're doing the science as fast as you can handle already (as in, the users are solving the problems faster than you can posit new ones), then don't sweat it. We'll just keep on keepin' on as the work is available. :-)
18) Message boards : News : Running Modfit on MilkyWay@home (Message 64732)
Posted 22 Jun 2016 by ProfileThunder
Post:
I can try. I've not run BOINC on linux from anything but a package installation in as long as I can remember and unfortunately 7.6.6 is the package for Ubuntu 15.10.

Considering how difficult it was to get what I have working, I have a feeling there will be a lot of colorful language in my future. This will definitely be a weekend project. :-/
19) Message boards : News : Running Modfit on MilkyWay@home (Message 64730)
Posted 22 Jun 2016 by ProfileThunder
Post:
The same one I referenced earlier in this thread, 691866.

The only way it comes close to keeping the GPU "fed" is if I sit at it and hit the update button every 1-2 minutes. (Even then it's likely to run out and switch to another project after a dozen updates or so)

If you have MW@H set to use "Locality Scheduling" (which is probably a really good thing for both the project and volunteers, depending on the size of input files), then that might explain the disparity between what you're seeing as available tasks vs what's available for any given host.

Depending on how many variations of input files are active at any time, there might be 1,000 total tasks available, but (and this is somewhat random of course) there might only be 10 or 20 available for any given host (depending on what files it already has downloaded).

This is *just* a guess of course, because I've never really dug into whether or not MW@H even needs to use locality scheduling.
20) Message boards : News : Running Modfit on MilkyWay@home (Message 64727)
Posted 22 Jun 2016 by ProfileThunder
Post:
Are you crunching more than 40 work units per minute?


no, but
the server contact client boinc 1 per 60 second and sends 9-10 WU
then deducts another 60 seconds -> ATI sleeps
my ATI crunching WU 10-11 per 60 sec


I'm having exactly the same problem and it was only exacerbated by the change to all modfit units.

I can do 17-18 WU per minute, but since the scheduler typically only has 8-11 WUs available to send at any given moment, it takes me 2-3 minutes of updates to get 1 minute of work. It's irrelevant to ask if a machine is crunching more than 40 WU per minute (and the limit is actually 25 since the scheduler will not send more tasks than that even if it has them available), when the server rarely has that much even waiting to send.

My problem is made worse by the fact that despite making every configuration change I can think of, my clients refuse to update (on their own) more often than every 60 minutes. I've asked for help everywhere I can think, but since MW@H is the only (major) project that dribbles out teeny, weeny amounts of work at a time, it's not a problem that anyone else I can find has had need to try to solve.


Next 20

©2020 Astroinformatics Group