Message boards :
Number crunching :
Need help with linux and app_info
Message board moderation
Author | Message |
---|---|
Send message Joined: 26 May 20 Posts: 23 Credit: 669,967,409 RAC: 17,237 |
Hi, I have 2 gpu's installed on an i9-9900k system and boinc 7.16.3
CUDA: NVIDIA GPU 1: GeForce GTX 1660 Ti (driver version 440.31, CUDA version 10.2, compute capability 7.5, 4096MB, 3972MB available, 5668 GFLOPS peak)
|
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 6 |
Hi, It looks like both Gpu's are being used and it's using all the cpu cores it needs to do it. Since both gpu's are the same and are found by Boinc you shouldn't need anything else to make them crunch. Now if you want to crunch more than one at a time that's a different story and not what you asked about. |
Send message Joined: 26 May 20 Posts: 23 Credit: 669,967,409 RAC: 17,237 |
sorry, I would assume the benefit of having multiple GPU's was to crunch in parallel otherwise - whats the point? I want both GPU's fully occupied all the time. As it is right now, only 1 GPU is in use at any given time. Thanks. |
Send message Joined: 27 Jul 14 Posts: 23 Credit: 921,261,826 RAC: 0 |
It sounds like you need to enable all GPUs in your cc_config. <use_all_gpus>0|1</use_all_gpus> If 1, use all GPUs (otherwise only the most capable ones are used). Requires a client restart. This file should be found in /var/lib/boinc. Edit with a standard text editor, setting use_all_gpus to 1 and make sure it's saved as a .xml file. Restart BOINC. If cc_config doesn't exist, create it via the manager: Options -> Event Log Options, then Save. You don't need app_info for this case, that would normally be used if you compiled your own app. An app_config will work. Create it with a text editor and save it to the project data folder. This should be in /var/lib/boinc/projects/milkyway.cs.rpi.edu_milkyway. <app_config> <app> <name>milkyway</name> <gpu_versions> <gpu_usage>0.49</gpu_usage> <cpu_usage>0.50</cpu_usage> </gpu_versions> </app> </app_config> This will run two tasks at a time. Adjust gpu_usage and cpu_usage depending on how many tasks you want to run. Make sure it's saved as app_config.xml. Just re-reading the config via the manager - Options -> Read Config Files - will start it working. Finally, just to be clear re. running in parallel: If you meant Crossfire or SLI, that doesn't work for any BOINC project. You can use all your GPUs but they will run individually on separate tasks, not together on the same task. Team USA forum | Team USA page Always crunching / Always recruiting |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 6 |
sorry, I would assume the benefit of having multiple GPU's was to crunch in parallel otherwise - whats the point? In Boinc you can use both gpu's at the same time just not on the same workunit, in fact if you have the SLI connector connected and do NOT game then it's best to take it off. To get both gpu's to crunch a workunit at the same time use a text editor and make a cc-config.xml file like this: <cc_config> <options> <use_all_gpus>1</use_all_gpus> </options> </cc_config> Put that into the Boinc directory using your admin password and stop and restart Boinc. If you don't know how to fully stop Boinc using the command line in Linux then just restart the pcand it should start using both gpu's. |
Send message Joined: 26 May 20 Posts: 23 Credit: 669,967,409 RAC: 17,237 |
Apparently there is still some confusion about GPU usage. I think people are over thinking the issue. Just consider the GPU as if it were just another cpu core. Although it takes a special app but it consumes 1 work unit at a time, processes it, spits out the result and then gets another work unit. If you have two GPU's e.g 2 GTX 1660 Ti's like me then each GPU will get a work unit and each GPU does not know or care about the other GPU, thus you get 2 work uints being processed at the exact same time. If you had 6 GPU's you'd be able to process 6 gpu-type work units in parallel. Now, yes, you can run more than 1 WU's on a GPU, simultaneously but you generally take a performance hit when you do, you need to actually test it to be sure. SLI makes 2 GPU's look like 1 and as far as I have heard has no performance benefit for the kind of computational work we do. I tried the app_confg you posted but what i get then is 2 work units assigned to GPI 0 and none assigned to GPU 1. I think it needs some kind of device line added to it? |
Send message Joined: 26 May 20 Posts: 23 Credit: 669,967,409 RAC: 17,237 |
On startup boinc reports: "[---] Config: use all coprocessors" so we are set there. |
Send message Joined: 27 Jul 14 Posts: 23 Credit: 921,261,826 RAC: 0 |
After reading the thread a bit more closely, the question seems to be why the 2nd GPU is detected but not being used. app_config and app_info are irrelevant in that context. Judging by this: CUDA: NVIDIA GPU 0: GeForce GTX 1660 Ti CUDA: NVIDIA GPU 1: GeForce GTX 1660 Ti Both cards are detected and both should work. This snippet from a job log, "Found 2 CL devices", shows that the MilkyWay app is seeing both cards so I think we can rule out a driver or weird OpenCL problem, or an exclusion in cc_config. This is just a guess, but your CPU may not have an available thread to support a task on the second GPU. BOINC will typically over commit the CPU when running GPU work. If you've set BOINC to use all 16 threads, it will run 16 CPU tasks and at least one more GPU task. I don't know how much CPU the Nvidia app schedules but generally Nvidia OpenCL tasks take a full thread. I suggest reducing the number of threads BOINC can use and see if that solves the problem. Your GPU task run times are much longer than the corresponding CPU time, that's an indication the CPU is overtaxed so reducing the load is a good idea just to help with that. Another possibility is you have a CPU project that's gone into high priority mode. If that's the case it's likely keeping the 2nd GPU from running because BOINC is trying to get that work done before the deadline. If this is what's happening, usually the best thing to do is lower your work cache, i.e. Store at least N days of work / Store additional N days, then give it some time to clear out. I'd also remove the app_config until you get things working, you can delete or rename it and re-read the config files. Team USA forum | Team USA page Always crunching / Always recruiting |
Send message Joined: 26 May 20 Posts: 23 Credit: 669,967,409 RAC: 17,237 |
How do i set the number of cpu's that boinc will use as you mentioned? I want to try cutting it back and see if that gets both cards in use. I think we can tell right away if this is the issue if i set boinc to use say only 8 threads out of thee 16 (temporarily as an experiment), that leaves 4 threads per GPU so if this is the issue I should see both GPU's in use right away. Sound right? |
Send message Joined: 24 Jan 11 Posts: 712 Credit: 552,201,415 RAC: 48,261 |
In the Computing Preferences in the Options menu in the Manager. In the Computing tab, select "Use at most 50% of cpus" and you will only use 8 threads out of your 16. Question - you are running 4 concurrent tasks per gpu? Seems unlikely for Nvidia cards. |
Send message Joined: 26 May 20 Posts: 23 Credit: 669,967,409 RAC: 17,237 |
Hi Keith, I want to run 1 task per GPU. right now I am running 1 task on 1 GPU and nothing on the 2nd GPU |
Send message Joined: 26 May 20 Posts: 23 Credit: 669,967,409 RAC: 17,237 |
Hi again, I just set both systems to a percentage of CPU so that i would have 2 cores free and after awhile it did start using both GPU's finally. When I was running SETI we did this via an app info (see below). The SETI app_info.xml allocated however much of a cpu you wanted (.45 in this case) to tend to a single GPU's needs. So far I have not found a linux based working app_info.xml for Milkyway that handles all the apps:
milkyway_1.46_x86_64-pc-linux-gnu__opencl_nvidia_101 milkyway_1.46_x86_64-pc-linux-gnu
<app_info> <app> <name>setiathome_v8</name> </app> <file_info> <name>setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <platform>x86_64-pc-linux-gnu</platform> <version_num>801</version_num> <plan_class>cuda10.1</plan_class> <cmdline>-nobs</cmdline> <coproc> <type>NVIDIA</type> <count>1</count> </coproc> <avg_ncpus>.45</avg_ncpus> <ngpus>1</ngpus> <file_ref> <file_name>setiathome_x41p_V0.98b1_x86_64-pc-linux-gnu_cuda101</file_name> <main_program/> </file_ref> </app_version> <app> <name>setiathome_v8</name> </app> <file_info> <name>MBv8_8.05r3345_avx_linux64</name> <executable/> </file_info> <app_version> <app_name>setiathome_v8</app_name> <version_num>805</version_num> <platform>x86_64-pc-linux-gnu</platform> <plan_class>avx</plan_class> <cmdline></cmdline> <file_ref> <file_name>MBv8_8.05r3345_avx_linux64</file_name> <main_program/> </file_ref> </app_version> </app_info> |
Send message Joined: 24 Jan 11 Posts: 712 Credit: 552,201,415 RAC: 48,261 |
Most people don't understand the use of <cpu_usage> values in the app_config or app_info files. That can't limit the actual usage of the cpu thread in support of the gpu task. Only the actual science gpu application itself determines how much cpu support the gpu task needs. Some applications need very little cpu support, the MW ATI application for example. But the MW Nvidia application uses almost the full cpu thread on the exact same tasks. Just the difference in the applications is what determines the actual cpu usage. The setting of cpu usage is only for BOINC scheduling purposes, to assist BOINC in determining how much resources to allocate for each project and how much work can be run simultaneously. In the case of your Seti special sauce application, those tasks actually used almost a full cpu core to support the gpu task. All the 0.45 usage value did was free up a cpu thread to do something else, run another cpu task or another gpu task for another project for example. You would need to run an app_config with max_concurrent statements to control the Separation tasks and most definitely you would need to run a <nthreads> statement to control and limit the mt tasks which would commandeer all cpu threads if not limited and prevent the other applications from running. Read the BOINC document for examples of setting up the proper controls. https://boinc.berkeley.edu/wiki/Client_configuration#Application_configuration |
Send message Joined: 24 Jan 11 Posts: 712 Credit: 552,201,415 RAC: 48,261 |
First question to answer is how many total cpu threads on the host do you want to commit to BOINC. Second question is how many concurrent mt tasks do you want to run. Third question is how many cpu threads per mt task do you want to commit to the task. Fourth question is how many gpu tasks per card do you want to run. I advise to stick to a single task per Nvidia card unless they are very high end like a 2080 or 2080 Ti. All of those configurations need to be put into an app_config.xml file for the project to control how you want to run the project on your hardware. |
Send message Joined: 26 May 20 Posts: 23 Credit: 669,967,409 RAC: 17,237 |
hi, The whole system is dedicated to boinc 24/7/365 so for the i9-99000k the answer is 16 cpu threads I only want to commit whatever number of cpu threads are required buy each GPU application. My goal is to run 2 concurrent GPU tasks - 1 task per card (which it seems to be doing now that i set the global cpu % down to 90%) and have the remainder cpu resources crunching CPU tasks. So if each nvidia app actually requires a full cpu thread to keep it fed, then the remaining 14 threads should be crunching CPU tasks What bothers me is that by using the Global "Use at most xx CPU percentage" option I am affecting other projects where if i had a decent app_info.xml or app_config.xml (whatever i need) it would only apply to Mikyway and leave the other (presently idle) projects alone. I would only be running a single project not more than that concurrently. e.g I switched to MW only because SETI isnt handing out work while they manage an overwhelming amount of returned results. TIA |
Send message Joined: 27 Jul 14 Posts: 23 Credit: 921,261,826 RAC: 0 |
hi, I would expect setting 90% would use all 16 threads, 14 for CPU and 2 for GPU, if all you're running is MilkyWay. The % might need tweaked if it's not working as expected. 14/16 = 87.5 Set your % to 88 - generally it's best to round up rather than use a fraction. You can also set CPU % to 100 and tweak the app_config. The following says to run one task on the GPU with CPU use set to one tenth of a thread. This should get both GPUs working if you have CPU % set to 100, for a total of 18 tasks. As stated above, this doesn't limit what it will actually use, but you can set it to manipulate BOINC scheduling. <app_config> <app> <name>milkyway</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>0.10</cpu_usage> </gpu_versions> </app> </app_config> Alternatively, set cpu_usage to 1 to keep BOINC from running more than 16 tasks total, and to make sure the GPU has a full thread available for support. You'd have to do some testing on your own to see what works best for you. Team USA forum | Team USA page Always crunching / Always recruiting |
Send message Joined: 24 Jan 11 Posts: 712 Credit: 552,201,415 RAC: 48,261 |
Well if you want to run 14 total cpu threads out of the 16 and assign two threads for the two gpus, that leaves you with 12 threads to run the cpu milkyway nbody tasks. So you should run this app_config.xml file. <app_config> <app> <name>milkyway_nbody</name> <max_concurrent>3</max_concurrent> </app> <app_version> <app_name>milkyway_nbody</app_name> <plan_class>mt</plan_class> <avg_ncpus>4</avg_ncpus> <cmdline>--nthreads 4</cmdline> </app_version> <app> <name>milkyway</name> <gpu_versions> <gpu_usage>1.0</gpu_usage> <cpu_usage>1.0</cpu_usage> </gpu_versions> </app> </app_config> This would run 3 concurrent nbody cpu tasks using 4 threads each and two gpu tasks running. |
©2024 Astroinformatics Group