Welcome to MilkyWay@home

Posts by Alinator

21) Message boards : Number crunching : Cant select or deselect an (mt) app, and runtime estimate is 1 second (Message 59258)
Posted 6 Jul 2013 by Alinator
Post:
Well, I think the point here is the project never intended nBody to run as a single threaded application. IIRC, the rationale for it is it's a simulation which is better suited to the capabilities of SMT capable processors and/or not easily or efficiently 'ported' to the type of parallelism you get with GPU's.

However the real problem is the default plan_class explicitly tells BOINC the application is single threaded, and thus it assumes all tasks running on it are going to use one core only and schedules work at both ends accordingly. Therefore, strictly speaking, a multi-threaded application should never be assigned to it, period.

Regarding the project side scheduler, yeah every project needs to tailor it to their specific requirements. Given the moving target BOINC is, I would imagine it can be just about as aggravating to get right as getting your science apps to do just what you want them to (and a lot less fun). :-D

As far as your RT estimates go, I'm not sure what the issue is on your host. I don't have many onboard my hosts at the moment, but the one which does have some is showing estimates of 4 to 7 minutes for the ones which are ready to start. This is in the ballpark based on the completed ones it has showing. So it looks like there is a bit of a mystery going on for your host. ;-)
22) Message boards : Number crunching : Cant select or deselect an (mt) app, and runtime estimate is 1 second (Message 59254)
Posted 6 Jul 2013 by Alinator
Post:
The app is exactly the same in both cases (ie compiled against openMP), and IMHO it is a configuration error on the project side to have it assigned to the default plan_class (single threaded) for the reasons listed here.

If they wanted to run it on single core CPUs, they could do that by just spec'ing out the minimum number of CPUs as 1 in the MT plan_class for the app (the BOINC default is 2).

To do what you want requires you to run MW under the anonymous platform due to limitations in BOINC at this point.

As far as getting the runtime estimates to update, you're just going to have to wait until you run enough tasks for the APR to take effect on late model CC's.
23) Message boards : News : Separation Modified Fit v1.24 (Message 59250)
Posted 5 Jul 2013 by Alinator
Post:
OK, I got a chance to update my anonymous hosts and the 1.24 app seems to be working fine on all my Winboxes.
24) Message boards : News : N-Body 1.18 (Message 59249)
Posted 5 Jul 2013 by Alinator
Post:
OK, now that the dust is settling fairly well from the roll outs of the new stuff, I have a couple observations for some server side refinefiments you could make in order to get MW back to being more multi project friendly.

1.) Eliminate the 'default' plan_class application for nBody. I assume you aren't intending for it to run on a single core CPU, so to have it there is redundant and has an unintended consequence on Winboxes running 2k or XP.

It turns out anything less than Vista (Longhorn code base) has very limited native support in the OS for multi-threaded applications. However, since you're using OpenMP and apparently have XP assigned to the default plan_class the net result is you always end up having nBody running with other CPU tasks, or worse running multiple nBody tasks.

This isn't a great situation for a couple of reasons.

The better of the two is when running with other CPU tasks. Since nBody can only grab 'slack' time from the other CPU tasks running, the MT speedup is small to none.

The worst case is when you get two or more nBodies running at the same time. Not only do you have app limited to any 'slack' time from the other CPU tasks running, they are also 'fighting' with each other since they want to use all the cores as much as they can (up to all the time if they could).

Either way, the end result is the elapsed time to run the task increases, so MW ends up 'looking' like it's hogging the machine at times when it gets to the point where BOINC has to get rid of the nBody tasks quickly due to scheduling/deadline issues.

2.) The other problem is the default MT configuration for nBody is to use all the CPU cores available. The problem with that is once one starts GPU's are cut off for the duration. Now if all a user runs is MW that's not such a big deal, but most people aren't going to be happy about their GPU's going completely idle if they run other projects. I'm speculating here this is the reason you reconfigured the MWS(MF) apps to use 0.9 CPU's, but the drawback to that is every instance will grab a CPU core and thus that's one less which could be running a CPU task from another project.

Unfortunately, the only solution for both these issues at the moment is to use the anonymous platform (read that as difficult for most folks) to limit the CPU usage for nBody, and thanks to a 'brilliant' design decision on BOINC's part, there is no way to do it the easy way (read that as with app_config). :-(

Another possible solution would be to customize the scheduler at your end so the task gets sent to the host with the max CPU's and nthreads command switch set to the (<p_ncpus> - 1), or something along those lines. Although that would obviously add complexity at your end and would be something which would need pretty thorough testing before getting rolled out.
25) Message boards : Number crunching : Please assist - can't get Radeon HD 4850 recognized in Linux (Message 59247)
Posted 5 Jul 2013 by Alinator
Post:
Well one other problem is I don't think you can go as high as a 13.x version driver for a 4000 series Radeon and get OCL support, regardless of OS.

Try something based on a Cat 12 build. I have a 4850 on Win7 and am using 12.8, IIRC.
26) Message boards : Number crunching : Cant select or deselect an (mt) app, and runtime estimate is 1 second (Message 59242)
Posted 5 Jul 2013 by Alinator
Post:
I'm not quite sure what you're driving at here.

From looking at your hosts it would appear you have them set to do CPU tasks only. So if all you want is to run is MT tasks, then just deselect all the other apps except nBody for the venue you run your hosts on from the MilkyWay Preferences page. It's the only one which is CPU multi-threaded.

27) Message boards : News : nbody fpops calculation updated (Message 59238)
Posted 5 Jul 2013 by Alinator
Post:
Hmmm...

I've been looking over Kurt's host, and his is looking more like a driver kernel mode GPF to me.
28) Message boards : Number crunching : AMD GPU Computation errors (Message 59237)
Posted 5 Jul 2013 by Alinator
Post:
OK, I've been following the conversation about problems folks have been having with HD 6000 series AMD GPU's.

I have a 2GB Sapphire 6950 running on this host.

One difficulty I'm going to have in trying to help get this problem worked out is the host is running XP 64, so that limits how far up the driver chain I can go on it and still have OCL support. I'm currently running Cat 11.8 since that's the latest version which works properly on this particular machine.

That being said, early on in this round of project updates here at MW I was running a stock configuration and the project switched the host from the ati14 to the opencl_amd_ati plan_class and it ran both types fine (as you can see from the app details page for the host).

Currently I have the host running on the anonymous platform (as part of working around issues with the way MT applications are handled by BOINC currently). So I need to run the MW cache out on it to change the project config to see if it's having problems with the current OCL app or not. I'm actually running the CAL app on it right now because it's faster than the OCL app according to the APR's shown.

So the only thing I can suggest at the moment would be to try rolling back to an earlier Cat version and see if that helps. I'd suggest 12.6 based on my experience.

29) Message boards : Number crunching : Computation errors (Message 59234)
Posted 5 Jul 2013 by Alinator
Post:
Still getting computation errors on my coputer w/2 6950's.The other with a 7850 is running ok.Any reasons why?Kurt


Are you leaving a cpu free for each gpu to use all on its own? It not try suspending your cpu project and see if you can crunch a gpu unit okay.


That shouldn't be an issue with the current default project configuration, they changed it to use 0.9 CPU's. So every time one starts BOINC will suspend a core. In any event, even if BOINC doesn't free the core for some reason it should just hurt GPU utilization, not cause a general protection fault.

Kurt, another user is having problems with an HD 6950 on linux. So I suggest we take your problem over to this thread to consolidate troubleshooting this.
30) Message boards : Number crunching : Computation errors (Message 59232)
Posted 5 Jul 2013 by Alinator
Post:
Ok... I'll try the project reset... thanks.

...Darryl

P.S.... I fluid-cooled the processor and it's running around 50degC.


OK. it looks like the reset took care of the problem with the memory access error for the MWS and MWSMF tasks, but nBody still looks like it's having a missing and/or bad dll problem.

The first thing to do is to check in the MW project directory and make sure these two files are there:

1.) libgomp_64-1_nbody_1.18.dll

2.) pthreadGC2_64_nbody_1.18.dll

If they aren't there, then the next thing to try is to download them from here and put them in the project directory. Note that you should shut down BOINC before doing that, then restart afterwards.

If they are there or the previous step didn't take care of the problem, then we need you to cut and paste the part of the BOINC Event log from when an nBody tasks tries to run as the next step.
31) Message boards : Number crunching : Computation errors (Message 59212)
Posted 3 Jul 2013 by Alinator
Post:
I was looking over the stderr outputs on Darryl's host, and I'm not convinced the problem is with the apps per se.

The nBodies are faulting with the old '135' (0xffffffffc0000135) exit code which typically is some kind of dll problem. The standard MW Separation tasks are faulting with a memory access error (0xffffffffc0000005).

The really strange part here is, there were only 6 failures out of 45 for MWS tasks, and no failures for the 29 MWSMF tasks currently listed.

Since the host hasn't thrown errors on any of the other projects (except for the cancelled ones on Asteroids), I don't think there's a hardware or overheating issue here (which frequently show up as the general protection faults).

So what I would suggest is to do at least a project reset for MW, and if that doesn't help then do a complete detach/reattach for the host.
32) Message boards : News : Separation Modified Fit v1.24 (Message 59193)
Posted 2 Jul 2013 by Alinator
Post:
OK, I'll have to make some updates to my app_info files for the hosts I have on the anonymous platform.

<edit>BTW, is there a reason why there isn't a 32 bit version of MWSMF for Winboxes?
33) Message boards : News : Separation Modified Fit v1.24 (Message 59191)
Posted 2 Jul 2013 by Alinator
Post:
Were there any changes to the Windows apps, other than a version number change?
34) Message boards : News : nbody fpops calculation updated (Message 59184)
Posted 1 Jul 2013 by Alinator
Post:
Were you talking about problems like this?
35) Message boards : Number crunching : MT and single-CPU point differences (Message 59167)
Posted 29 Jun 2013 by Alinator
Post:
Hmmmm...

Something strange is going on. I've recently started having some that went through with more or less the same amount of net CPU time, but paid less than one credit!!??
36) Message boards : Number crunching : MT and single-CPU point differences (Message 59064)
Posted 23 Jun 2013 by Alinator
Post:
Mostly because the current nBody runs have been as much a beta test for the new app as much as anything else.

Thus there have been bigger fish to fry lately than getting the credit rate firmed up.
37) Message boards : Number crunching : GPU not at 100% when all CPU cores crunch (Message 59054)
Posted 23 Jun 2013 by Alinator
Post:
It's not a problem per se, it's just the nature of the beast.

There's a lot of memory IO involved with running GPU apps, so if you have all the cores busy doing other tasks (including using it yourself to do your work) something has to give. Therefore the graphics card sometimes has to wait around for it's turn to get to main memory, as your from the hip experiment demonstrated. ;-)
38) Message boards : Number crunching : MW Separation Modified Fit (Message 59046)
Posted 22 Jun 2013 by Alinator
Post:
Yes, problem solved.
39) Message boards : Number crunching : new app_config.xml for the new applications? (Message 59020)
Posted 21 Jun 2013 by Alinator
Post:
LOL...

Don't worry about it. I'm totally used to trying to do the next to impossible with almost no funding and/or staffing. So I feel your pain there! ;-)

In any event, if it was easy, that would take all the sport out of it and everybody would be doing it. Right? :-D
40) Message boards : Number crunching : Get 'Waiting to Run' when running a 'MilkyWay@Home N-Body Simulation 1.18 (mt)' WU (Message 59014)
Posted 21 Jun 2013 by Alinator
Post:
The issue here is due to the way the CC handles the scheduling of MT tasks on the CPU.

The problem is since you have your General prefs set to use all the cores, BOINC won't start the MT task until it has all cores available to use on the task. If ANYTHING needs/must use the CPU, it will suspended the MT task until it can have all of it again.

Unfortunately, you cannot use app_config to modify this behavior at this time.

So your options are:

1.) Change your general prefs to not use all the cores.

2.) Override the general prefs locally to not use all the cores.

3.) Limit nBody to use fewer cores with an app_info file. Of course you'll have to set up all your other MW apps as well.


Previous 20 · Next 20

©2024 Astroinformatics Group