Welcome to MilkyWay@home

Request help updating app_info.xml for Linux

Message boards : Number crunching : Request help updating app_info.xml for Linux
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
swiftmallard
Avatar

Send message
Joined: 18 Jul 09
Posts: 300
Credit: 303,562,776
RAC: 0
Message 58283 - Posted: 15 May 2013, 22:33:30 UTC

I had a similar problem. Finally I got tired of trying to figure it out and just let it run. The second WU started some time after the first. I suggest using the app_config file again and just let it run for a while and see if the second WU eventually starts.
ID: 58283 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
europa

Send message
Joined: 29 Oct 10
Posts: 89
Credit: 39,246,947
RAC: 0
Message 58286 - Posted: 16 May 2013, 11:57:12 UTC

I wanted to thank everyone for their suggestions.

What I've decided to do is to work off the existing WUs on all of the machines and upgrade the OS to the latest version (Linux Mint 14 from 12) since I see the latest BOINC (7.0.65) in that repository. After upgrading the BOINC installation and running for a few days, I'll see how things look.

Regards,
Steve
ID: 58286 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,943,543
RAC: 22,328
Message 58289 - Posted: 16 May 2013, 12:06:10 UTC - in response to Message 58286.  

I wanted to thank everyone for their suggestions.

What I've decided to do is to work off the existing WUs on all of the machines and upgrade the OS to the latest version (Linux Mint 14 from 12) since I see the latest BOINC (7.0.65) in that repository. After upgrading the BOINC installation and running for a few days, I'll see how things look.

Regards,
Steve


Can't you just upgrade Boinc without having to upgrade the whole OS?
ID: 58289 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mmstick
Avatar

Send message
Joined: 23 Nov 09
Posts: 29
Credit: 17,119,258
RAC: 0
Message 58292 - Posted: 16 May 2013, 13:24:34 UTC

You should be using gedit to make the file, and you want to make sure it is properly aligned/formatted. You also need to make sure it has proper permissions. I'm not sure why, but when I copied an identical set of parameters from the forum which removes all the formatting, it didn't work. However, after formatting/aligning it properly it worked. The moral of the story is to write it yourself rather than copy it from here and paste it.
ID: 58292 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
europa

Send message
Joined: 29 Oct 10
Posts: 89
Credit: 39,246,947
RAC: 0
Message 58309 - Posted: 17 May 2013, 11:08:07 UTC - in response to Message 58289.  

Actually, it's not a big deal since these are Boinc-only machines so it's not a case of having to do any backups or re-copying of databases etc.. Once the system is re-installed that will also eliminate any compatibility questions among versions that may have crept in over the years. Also, as one person noted, with Linux you have to watch the file permissions so this will let me start with a clean slate.

It looks like the final Cosmology wu's will be done tomorrow so we'll see what happens then.

Regards,
Steve
ID: 58309 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
europa

Send message
Joined: 29 Oct 10
Posts: 89
Credit: 39,246,947
RAC: 0
Message 58310 - Posted: 17 May 2013, 11:10:27 UTC - in response to Message 58292.  

I do use gedit/Pluma (same thing). Good point, on the file permissions issue. These installations are years old and through the updates, something may have gotten changed. Doing a fresh re-install will let me start off clean.

Regards,
Steve
ID: 58310 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,943,543
RAC: 22,328
Message 58319 - Posted: 18 May 2013, 12:21:47 UTC - in response to Message 58309.  

Actually, it's not a big deal since these are Boinc-only machines so it's not a case of having to do any backups or re-copying of databases etc..
Regards,
Steve


I have a batch of Windows based Boinc only machines too and still back them up once a month making an image of the system. I back up all of my machines, most onto one machine, but two internally as they are more important to me. I use a Windows program called Macrium Reflect, it has few options and can be a bit ornery at times, but usually works without a hitch. It does NOT do different versions as updates are done, it is a full image every time. On most of my machines it takes less then an hour once month to do it. Then with a generic Linux cd I can put a new drive in a machine and have it restored and up and running in less then an hour, just where it was when it died. I reuse old harddrives so this lets me get back up and running quickly when they crash. I was thinking there should be a small FREE Linux program to do the same for you. I back up everything to a 2tb drive and then go thru and delete the older ones regularly. I only keep the latest backup, as you said they are Boinc only machines.
ID: 58319 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
europa

Send message
Joined: 29 Oct 10
Posts: 89
Credit: 39,246,947
RAC: 0
Message 58332 - Posted: 19 May 2013, 14:57:47 UTC

I've gotten my two BOINC machines changed over to Linux Mint 15 and BOINC 7.1 and wanted to provide an update as well as well ask one or two final questions. I have had some success.

To recap, all graphics cards are Nvidia Fermi's and each machine has at least 8GB of RAM.

Machine 1 has the following 2 Fermi cards:
Sun 19 May 2013 05:24:58 AM EDT | | CUDA: NVIDIA GPU 0: GeForce GTX 460 (driver version unknown, CUDA version 5.0, compute capability 2.1, 767MB, 691MB available, 1009 GFLOPS peak)

Sun 19 May 2013 05:24:58 AM EDT | | CUDA: NVIDIA GPU 1: GeForce GT 430 (driver version unknown, CUDA version 5.0, compute capability 2.1, 1024MB, 1009MB available, 269 GFLOPS peak)

Sun 19 May 2013 05:24:58 AM EDT | | OpenCL: NVIDIA GPU 0: GeForce GTX 460 (driver version 304.88, device version OpenCL 1.1 CUDA, 767MB, 691MB available, 1009 GFLOPS peak)

Sun 19 May 2013 05:24:58 AM EDT | | OpenCL: NVIDIA GPU 1: GeForce GT 430 (driver version 304.88, device version OpenCL 1.1 CUDA, 1024MB, 1009MB available, 269 GFLOPS peak)

The system is processing two concurrent MW 1.02 cuda wu's simultaneously, so success there. However, no matter what I do with the cc_config.xml or the app_config.xml, I can't process more than one N-body Simulation 1.09 cuda wu at a time. In fact, it looks like it only sends them to GPU 1 which has more video ram as if that might be required. It makes me wonder if the cuda N-Body Simulation GPU WUs are so computationally intensive that you can't run muliple simultaneous WUs of the 1.09?

I made sure that all file ownerships were the same (BOINC) so that's not an issue.

My first app_config.xml was:
<app_config>
<app>
<name>milkyway</name>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.05</cpu_usage>
</gpu_versions>
</app>
</app_config>

When that didn't make a difference with the 1.09's, I remembered from the app_info.xml's that the N-Body's had their own section and file name so I changed the app_config.xml to:
<app_config>
<app>
<name>milkyway</name>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.05</cpu_usage>
</gpu_versions>
</app>
<name>milkyway_nbody</name>
<gpu_versions>
<gpu_usage>.5</gpu_usage>
<cpu_usage>.05</cpu_usage>
</gpu_versions>
</app>
</app_config>

No errors noted by BOINC but also no change to the processing of 1.09's unless it takes some time to cycle through the system. If so, it makes it hard to troubleshoot problems if the changes don't reflect fairly quickly.

I'm going to run down the cache and then let it re-load one more time. If it's still not running multiple concurrent 1.09 wu's I may stop doing the N-Body's since it really ties up the GPU and not only denies it to the MW 1.02 cuda wu's but also to E@H.

Situation is the same with machine #2 which is a Fermi 550TI with 2GB of VRAM.

Thanks again to everyone for their suggestions.

Regards,
Steve
ID: 58332 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 58333 - Posted: 19 May 2013, 15:27:15 UTC - in response to Message 58332.  

You're missing an opening <app> to match the second closing </app>, in the second file.

N-Body apps don't use GPUs at all (despite what it says), so you're missing nothing by not running two at once.
ID: 58333 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
europa

Send message
Joined: 29 Oct 10
Posts: 89
Credit: 39,246,947
RAC: 0
Message 58336 - Posted: 19 May 2013, 20:26:15 UTC - in response to Message 58333.  

Richard,

Thanks for the correction, it's been incorporated.

I have to ask, are you really sure about no GPU WUs for the N-Body Simulation 1.09 because I not only seen a notation of (opencl_nvidia) after some of the WU's but the one that's waiting also says (0.05 CPU + 1 NVIDIA GPU). It's waiting right now on some E@H GPU WUs to run but I could have sworn that when it was running none of the available E@H were.

Also, I was wondering why the N-Body WU's 1.09 that are not marked as cuda all seem to have very short expected durations (most under an hour) while the cuda-marked ones are all well over 20 and even 30 hrs! I almost felt like I was having a Cosmology@Home flashback!

Thanks for any further info you can provide. Enquiring minds want to know! :)

Regards,
Steve
ID: 58336 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 58338 - Posted: 19 May 2013, 22:32:05 UTC

nbody exes are multithreaded but cpu only. Why there are still additional exes for linux linked indicating they would make use of the gpu has been has been the question here for some time.

The app_config defines how many WU's of one app can run on the gpu at the same time. Trying to use it for a cpu only app can only mess things up.

Since I think you can't use app_config and app_info at the same time (someone correct me if I am wrong), I fear you have to use the app_info again to make settings for separation and nbody.
The nbody part in your app_info needs some cleanup and correction:

<app>
<name>milkyway_nbody</name>
</app>
<file_info>
<name>milkyway_nbody_1.09_x86_64-pc-linux-gnu__mt</name>
<executable/>
</file_info>
<app_version>
<app_name>milkyway_nbody</app_name>
<version_num>1.09</version_num>
<plan_class>mt</plan_class>
[color=green]<avg_ncpus>4</avg_ncpus>
<max_ncpus>4</max_ncpus>
<cmdline>--nthreads=4</cmdline>

<file_ref>
<file_name>milkyway_nbody_1.09_x86_64-pc-linux-gnu__mt</file_name>
<main_program/>
</file_ref>
</app_version>

The first green marked lines are for BOINC to know how many cpus to reserve and the last green line tells nbody how many cpus to use. So the setting above would use 4 cpus for nbody.
If you reduce the green lines to <cmdline></cmdline> only, it will run single threaded (1 WU per cpu).
With separation and nbody properly defined in the app_info, you should be able to run 2 separation tasks on your gpu and at the same time nbody tasks single or multithreaded on your cpus.

Note 1: I am running Win and haven't crunched a nbody WU for some time.
Note 2: Maybe you need to change the commandline above
to <cmdline>--nthreads=4 --disable-opencl</cmdline>
ID: 58338 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 58339 - Posted: 19 May 2013, 23:13:51 UTC - in response to Message 58336.  

Thanks for any further info you can provide. Enquiring minds want to know! :)

I'm afraid the only further information I can provide is a private message exchange which took place on 2 April 2013, the day the most recent Linux applications were deployed on the server.

I received an acknowledgement and thanks (which I'm not making public) for my reply, but I have seen no sign of any change to the N-Body deployment in the seven weeks since this exchange. I'll leave it to speak for itself, without comment.

When I started releasing the format was to use gpu classes and I kept the convention. Since the idea is to get a gpu version of nbody out I thought it was a place holder for turning on that option. I have been asking the people who originated this convention for their ideas on the best way to proceed. I am hoping to deprecate the gpu versions until we move past testing on the gpu versions. I will update the list when we change the format.

Jeff Thompson

Jeff, I'm sorry, but there seems to be a very deep misunderstanding there.

What you are calling a 'placeholder' is actually a sophisticated command-and-control structure which directs which parts of a volunteer's computer are made available for your research (and hence are not available for other scientists, with other research projects, to use): and conversely, which parts of that volunteer's computer are likely to be fully occupied doing other things.

I'm exclusively a Windows person myself, but it appears that the Linux situation is the clearest example of where this is going wrong.

Reading jay_e's posts on the message boards, your 'placeholder' is telling his computer that your application is going to use 0.05 CPUs + 1 ATI GPU: so that's what resources are being held in reserve for you to use, and - by implication - the only resources jay_e is making available (freely and gratis) for your research - at that instant, for the N-Body study.

Instead, the N-Body application is unilaterally using multiple CPU cores - irrespective of any other work they might have been assigned by other researchers - and is completely ignoring the ATI GPU which has been reserved - at your request - for it to use. That's a waste.

BOINC is a collaborative platform. You get, for free, access to thousands of personal computers (some of them with the newest and most powerful computing resources available): you don't even have to fund the electricity supply to run them. In return, you are expected to behave like a polite and responsible 'good neighbor': only use the resources that you have, in advance, declared that you would like access to, and don't hoard resources that you have no current need for.

That's the whole point of the 'plan_class' mechanism (the thing you're treating as a placeholder). You are, in effect, 'programming' BOINC: giving it a set of instructions for how it is going to control the volunteer's computer. The idea is that you describe, as exactly as possible, how your application is going to behave in the volunteer environment: BOINC then uses your formal description of that behaviour to schedule your task alongside (either spatially or temporally) the other research that the volunteer and his/her computer are participating in.

We make our computers available to you on trust: on trust that you will use them in ways and for purposes as described on Milkyway's website. Milkyway has described N-Body as a multi-threaded, pure CPU application (at least for the time being). So long as that remains the case, please respect our trust, and repay it by giving BOINC an accurate description, version by version, of how each application is intended and expected to run on our computers.

Richard Haselgrove

ID: 58339 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
europa

Send message
Joined: 29 Oct 10
Posts: 89
Credit: 39,246,947
RAC: 0
Message 58343 - Posted: 20 May 2013, 14:14:04 UTC - in response to Message 58338.  

Thanks for the feedback. I've modified my app_info.xml and will give it a try. Right now the two machines are using the app_config.xml from the recent re-installation.

Thanks again for the help.

Regards,
Steve
ID: 58343 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
europa

Send message
Joined: 29 Oct 10
Posts: 89
Credit: 39,246,947
RAC: 0
Message 58345 - Posted: 20 May 2013, 14:33:34 UTC - in response to Message 58343.  

Thank you for providing the backstory. I certainly share your sentiments on a number of levels. As an interested supporter, but not a programmer, who has dedicated some decent boxes to BOINC, I want my spending to be used in the most efficent manner that supports all of the projects that I participate in. It may not amount to much from a NASA grant perspective, but it's something to me and I'm happy to provide it as long as the projects hold up their end.

I guess that's why I'm so interested in being able to process the multiple simultaneous WU's whether it's through app_config or app_info files. If an project app won't support that and then wants to tie up my GPU with it's 2GB of VRAM but is only using 300MB of it and in the process denying the GPU's concurrent use to other projects, that hurts everybody in terms of wasted resources.

Thanks again for the feedback.

Regards,
Steve

ID: 58345 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Request help updating app_info.xml for Linux

©2024 Astroinformatics Group