Welcome to MilkyWay@home

Posts by Alinator

1) Message boards : Number crunching : MW Separation Modified Fit (Message 59452)
Posted 24 Jul 2013 by Alinator
Post:
Hmmm...

Are you using all the CPU cores to run tasks? I found I have to leave one free to drive MW, Collatz, and SAH to max GPU utilization. That gives me 98-99% for MW, and Collatz, and generally better than 95% for SAH (IOW's BOINC restricted to 7 of 8 on the FX-8350).

EAH is a different story. Apparently, it needs far more CPU support for the GPU so I have it spec'ed so when more than 4 want to run it will grab another CPU to help out and even then I can only get to about 85% by running 4 each on my dual 7970's.

<edit> Oh yeah, the MW, Collatz, and SAH data was for running 4 each on the duals, as well.
2) Message boards : Number crunching : Computation Error (de_modfit_09_3s_testwrap_2_1372784655_3079055) (Message 59428)
Posted 23 Jul 2013 by Alinator
Post:
Well...

All the errors except for one were on nBody tasks. There has been some issues with the new app for these, most of which have been taken care at this point. However, there appears to be at least one which is resisting being eradicated. I'm starting to wonder if it might be a bug in OpenMP, since it gets both Windows and Linux hosts (maybe Mac too, but you don't see as many of them). In any event, yours wasn't the only one to crap out on the tasks in question, so I wouldn't worry about those.

The regular MW Sep task which failed looked to be a GPU memory allocation failure. Since there was only one of them, I wouldn't worry too much about it either. Unfortunately, my experience is nVidia drivers can be a little cranky at times and I've had failures like that from time to time for no apparent reason I could determine. I suspect it's some kind of timing issue going on when one tasks ends and another is beginning, at the same time the OS says "Oh gee, I can get some time on the GPU at last!". Needless to say, something like that is going to be pretty tough to nail down, unless you have set up a pretty specialized test to trap on and catch it.

As far as the Inconclusives go, it just means the output of the task was outside the range the validator was expecting, so it just sends it out again to verify the result. As long as you aren't the odd man out when the WU validates it's normal and nothing to be concerned over.

HTH
3) Message boards : Number crunching : Please assist - can't get Radeon HD 4850 recognized in Linux (Message 59381)
Posted 16 Jul 2013 by Alinator
Post:
Hmmm...

That's interesting. I didn't think the 2.7 SDK would work on a 4000 series, or at least not 'officially'. ;-)
4) Message boards : News : N-Body 1.32 (Message 59336)
Posted 12 Jul 2013 by Alinator
Post:
This one doesn't look good. :-(

<edit> Although after look further into it, this host doesn't have any successes showing for any version of nBody. So most likely the problem here is specific to this host, possibly the version and/or configuration of Linux its running.
5) Message boards : Number crunching : Updated GPU Requirements (Currently not supporting GPU tasks) (Message 59335)
Posted 12 Jul 2013 by Alinator
Post:
Answer
6) Message boards : News : N-Body 1.32 (Message 59333)
Posted 12 Jul 2013 by Alinator
Post:
Bad task here and here.

Also, there was a Win 7 (B 7.0.28) host assigned to the default plan_class on the former.
7) Message boards : Number crunching : N-Body MT issues (Message 59330)
Posted 12 Jul 2013 by Alinator
Post:
Here's a case of Server 2008 R2 being assigned to the default plan_class.

Note that BOINC (7.0.64) directed it to run ST and OpenMP said "OK, if you insist!". ;-)
8) Message boards : Number crunching : Computation errors (Message 59327)
Posted 12 Jul 2013 by Alinator
Post:
Hmmm...

Interesting idea. I hadn't considered that with an HT Intel, at least not with the way it got implemented in the current generation.

Like you said though, this is low risk experiment and nothing ventured nothing gained. ;-)

One other pretty easy thing to try if that doesn't work would be to just disable HT in the BIOS and see what happens. If it stops failing at startup then it could indicated an MB BIOS flash might be in order.
9) Message boards : News : N-Body 1.18 (Message 59316)
Posted 11 Jul 2013 by Alinator
Post:
Cool...

Overall a lot of progress has been made on all fronts the last month and a half or so.

Not bad, especially considering it's summertime and you're most likely even more shorthanded than usual! ;-)
10) Message boards : News : N-Body 1.18 (Message 59310)
Posted 11 Jul 2013 by Alinator
Post:
Hmmm...

Yep, intermittent faults like that can be tough little nuts to crack. :-/

I must have been having senior moment earlier. :-D

All of a sudden it dawned on me what the root cause of why running an MT capable app under the default plan_class is not desirable. The reason is the default plan_class explicitly tells BOINC the app is single threaded, so it treats it that way when it comes to task and work fetch scheduling. I remembered in our earlier experiments that on Win7/BOINC 7 setups you would get multiple instances running but they would run in ST mode, where on XP they would run as I described above. Mostly likely something to do with the mysterious ktmw32.dll file! ;-)
11) Message boards : News : N-Body 1.18 (Message 59304)
Posted 10 Jul 2013 by Alinator
Post:
Regarding recompiling without OpenMP, no I don't think that's necessary. The BOINC default for the MT plan_class is to have a minimum of two cores (CPU's), and from experiments Richard and I ran earlier in this nBody beta OpenMP doesn't have a problem running in ST mode if that's all that's available. So all you'd have to do to open nBody to single cores is change the default value to 1. Also, considering you still may have that intermittent stack overflow bug kicking around, recompiling and then having to debug a new app sounds like extra work you really don't need at the moment.

As far as dealing with older OS'es, that's probably better off handled at the individual user level at this point. For example, a really easy way to work around part of the problem on XP is to just limit the number of cores that BOINC can use to something less than the maximum available. Nowadays a lot of people have high performance GPU's onboard and have already figured out leaving one core free to feed it gives the best overall performance and that takes care of the GPU problem regardless of OS. This is probably the main reason there hasn't been more complaining about nBody in NC than there has been considering there's still a lot of XP hosts in the field.

The issue with running more than one nBody task at a time is a matter of what version of BOINC you're running as much as OS version. On Vista and higher, when run under the MT plan_class and BOINC 7x this doesn't happen. On XP (on BOINC 6.12.34), once I went to the anonymous platform and specified nBody to use 1 less than the max number of CPU's it stopped trying to run more than one at a time, even when it had more than one in the queue. I'm not quite sure why it did that, but that's been my observation so far. I haven't tried to see if BOINC 7x helps with MT issues on XP yet.

So basically on these matters, IMO, this is something which really needs to be addressed at the BOINC level. Unfortunately, you are the only project currently using a multi-threaded app I'm aware of, so there hasn't been much pressure on the BOINC dev team to look into or do anything to improve matters when it comes to issues with MT CPU apps.
12) Message boards : Number crunching : Computation errors (Message 59299)
Posted 10 Jul 2013 by Alinator
Post:
OK, it looks like your basic BOINC and project configuration is correct.

What I would suggest now is to add a specific exclusion in Bitdefender for both the main BOINC directory and the BOINC Data directory. This will rule out the AV scanner slowing down the loading and initializing of the task to the point where the DLL's aren't available in time.

If that doesn't do the trick, the only other thing which comes to mind at the moment would be to try uninstalling and reinstalling BOINC. I know that sounds unlikely in this case, but I've had times where I gave it a try in desperation and it worked! ;-)
13) Message boards : Number crunching : Computation errors (Message 59294)
Posted 9 Jul 2013 by Alinator
Post:
Ok.. I'm new to this whole computing model using BOINC, so help me out with what .dll s I'm supposed to find and where would they be located, please?

Here are the messages from the BOINC monitor Event Log:

<snip BM Event Log>



All right. The nbody dll's you need to have are listed in this post, and they will be located in the MW project directory, which in your case is located here:

C:\ProgramData\BOINC\projects\milkyway.cs.rpi.edu_milkyway

Note for future reference, the 'relative' root directory for BOINC is shown in the startup up messages in the Event Log (7/6/2013 12:38:03 PM | | Data directory: C:\ProgramData\BOINC).

Note these two output lines:

7/7/2013 7:08:31 AM | Milkyway@Home | Starting task de_nbody_06_20_dark_1372784655_214185_0 using milkyway_nbody version 118 (mt) in slot 9
7/7/2013 7:08:33 AM | Milkyway@Home | Computation for task de_nbody_06_20_dark_1372784655_214185_0 finished

This explains why you aren't seeing any runtime. BOINC is loading the task into it's slot, but as soon as it commands the application to run the task it's exiting with the (0xffffffffc0000135) error code.

So the next step is to verify the dll's are there, and if not follow the instructions in the linked to post. You can skip the part about reposting the event log in case this step doesn't work.
14) Message boards : News : Separation Modified Fit v1.24 (Message 59280)
Posted 8 Jul 2013 by Alinator
Post:
Roger that. Thanks.
15) Message boards : News : Separation Modified Fit v1.24 (Message 59277)
Posted 8 Jul 2013 by Alinator
Post:
LOL...

DOHHHH.... I stand corrected! That's what I get for not staying fully up to date over on the BOINC boards. :-)

I guess it would be too much to ask if Dr. Anderson could add the <max_concurrent> element to the anonymous platform, or at least add the ability to reread all app_info files on the fly, like you can do for the local prefs. ;-)

<edit> Is the rereading of app_config automatic, or do you have to use the Read Config File command or something like that?
16) Message boards : News : Separation Modified Fit v1.24 (Message 59275)
Posted 8 Jul 2013 by Alinator
Post:
Agreed. So in that case (which is what I'm doing) you may as well get rid of app_config since there isn't anything in it you can't do with app_info, and it's one less thing to worry about getting and keeping right. ;-)

<edit> IMHO, they got the precedence backwards simply because app_info is the more 'powerful' tool. But then what do I know? :-D
17) Message boards : News : Separation Modified Fit v1.24 (Message 59272)
Posted 8 Jul 2013 by Alinator
Post:
Unfortunately, to workaround BOINC limitations in the way it handles multi-threaded CPU applications (nBody) you have to use the anonymous platform currently. Once you do that the app_config file is ignored.

So it's all or nothing in the app_info file.

18) Message boards : Number crunching : "Validation inconclusive" what does that mean and how to prevent this ? (Message 59269)
Posted 8 Jul 2013 by Alinator
Post:
Basically all it means is that the results from the task were outside the range the validator was expecting, so it sends the task out again to a different host to make sure there wasn't a problem with the first one. Sometimes it even takes a couple of tries before the validator is convinced the results were accurate.
19) Message boards : Number crunching : Computation errors (Message 59267)
Posted 7 Jul 2013 by Alinator
Post:
OK, a couple of points here.

First is given the error code for the failing nBody tasks, I'm pretty sure the zero runtime is accurate. The reason is as I mentioned before the error code typically refers to a missing dll. So did you check to make sure both of them are in the project directory?

Second, apparently BoincTasks does some kind of interpretation/filtering of the stdoutdae output before displaying it in its GUI. What I wanted was the verbatim output of the core client, which will be displayed in BOINC Manager's Event Log viewer.
20) Message boards : Number crunching : AMD GPU Computation errors (Message 59260)
Posted 6 Jul 2013 by Alinator
Post:
Agreed. But OS differences aside, illegal memory access fault outs have been a problem for both when using 13.x AMD drivers. :-(

In fact, none of my hosts have liked any Cat 13.x driver particularly, or 12.10 for that matter. :-/


Next 20

©2024 Astroinformatics Group