Welcome to MilkyWay@home

GPU error

Questions and Answers : Preferences : GPU error
Message board moderation

To post messages, you must log in.

AuthorMessage
blyons123

Send message
Joined: 18 May 10
Posts: 2
Credit: 440,085
RAC: 0
Message 69882 - Posted: 2 Jun 2020, 12:29:37 UTC

I doesn't matter if I have GPU checked or not I continue to get this notice!?

Milkyway@Home: Notice from BOINC
Your settings do not allow fetching tasks for NVIDIA GPU. To fix this, you can change Project Preferences on the project's web site.
6/2/2020 8:22:50 PM
ID: 69882 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,943,223
RAC: 22,516
Message 69935 - Posted: 18 Jun 2020, 9:29:49 UTC - in response to Message 69882.  

I doesn't matter if I have GPU checked or not I continue to get this notice!?

Milkyway@Home: Notice from BOINC
Your settings do not allow fetching tasks for NVIDIA GPU. To fix this, you can change Project Preferences on the project's web site.
6/2/2020 8:22:50 PM


You have not had a computer get any tasks since March, try reloading the gpu drivers Win10 updates often messes them up.
ID: 69935 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile TouchuvGrey
Avatar

Send message
Joined: 25 Dec 19
Posts: 1
Credit: 306,411,689
RAC: 0
Message 70166 - Posted: 11 Nov 2020, 6:21:45 UTC

i have two video cards, a Radeon RX580 and a NVidia GTX680.
Device Manager shows both of the functioning properly, yet MW @Home
is only using the Radeon card even though i have both ticked in preferences.
Can someone tell me how to get MW@Home to use both ?

Mike
ID: 70166 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,943,223
RAC: 22,516
Message 70167 - Posted: 11 Nov 2020, 12:49:00 UTC - in response to Message 70166.  

i have two video cards, a Radeon RX580 and a NVidia GTX680.
Device Manager shows both of the functioning properly, yet MW @Home
is only using the Radeon card even though i have both ticked in preferences.
Can someone tell me how to get MW@Home to use both ? Mike


You need a cc_config.xml file with these lines in it:

<cc_config>
<options>
<use_all_gpus>1</use_all_gpus>
</options>
</cc_config>

If you don't already have a cc_config.xml file you can make one yourself, but check the folder c:\program data\boinc to see if it's already in there or not.

To make one use NOTEPAD in Windows, NOT a word processing program as it will add stuff to the file making it unreadable by Boinc, and copy and paste the above line in it.
When you save the file you will have to remove the .txt extension and make it a .xml file and then save it in the Boinc folder above.

After you do that you should stop and restart Boinc, after Boinc is up go into the Boinc Manager, down by the clock, and click on Tools, Event Log and scroll up to the top and see if Boinc lists both of your gpu's. If so you are good to go and both should start crunching right away.
ID: 70167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
EPFrybar

Send message
Joined: 28 Jan 21
Posts: 8
Credit: 25,225,698
RAC: 0
Message 70460 - Posted: 28 Jan 2021, 2:46:03 UTC
Last modified: 28 Jan 2021, 2:50:43 UTC

just joined the project with windows 10 on an amd ryzen threadripper 1920x 12-core 3.50 Ghz

I have a cc_config.xml in place under BoincData as follows:
<cc_config>
<options>
<ncpus>0</ncpus>
<use_all_gpus>1</use_all_gpus>
<ignore_cuda_dev>11</ignore_cuda_dev>
<ignore_cuda_dev>12</ignore_cuda_dev>
<max_file_xfers_per_project>8</max_file_xfers_per_project>
<save_stats_days>360</save_stats_days>
<http_transfer_timeout>120</http_transfer_timeout>
<max_tasks_reported>50</max_tasks_reported>
<report_results_immediately>1</report_results_immediately>
</options>
</cc_config>


and have 2 GPU's as follows:
nvidia geforce gtx 1660 super (device 0) and
nvidia geforce gtx 1070 (device 1)

and only the 1660 super runs the project

any easy fixes??

Ed F

P.S. also running WCG Covid WU's on the CPU's
ID: 70460 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 540,013,542
RAC: 86,805
Message 70461 - Posted: 28 Jan 2021, 4:04:52 UTC

Why do you have the ignore cuda device in your cc_config? And the device numbers are WAY outside your actual count. They should do nothing but maybe that is what is confusing BOINC.

You only have two cards, the 1660 Super and the 1070.

BOINC picks up both cards in your host. BOINC only reports the most capable card which is the 1660 Ti. That is why your host is shown with (2) GTX 1660 Super cards.
ID: 70461 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
EPFrybar

Send message
Joined: 28 Jan 21
Posts: 8
Credit: 25,225,698
RAC: 0
Message 70462 - Posted: 28 Jan 2021, 5:19:55 UTC - in response to Message 70461.  

Why do you have the ignore cuda device in your cc_config? And the device numbers are WAY outside your actual count.


That is a leftover from the SETI days when I was doing funky configurations :-)

BOINC picks up both cards in your host. BOINC only reports the most capable card which is the 1660


That doesn't answer the question of why MW@H only uses 1 GPU. SETI used both and GPU@H uses both (when it's up).

Ed F
ID: 70462 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 540,013,542
RAC: 86,805
Message 70463 - Posted: 28 Jan 2021, 6:48:48 UTC - in response to Message 70462.  

Then the likely scenario is that for some reason BOINC doesn't think you have enough cpu resources to support another gpu task running on the 1070.

Have you overcommitted the cpu to other projects?

And just to cover the bases, you DO see both cards reported in the startup of the Event Log? Correct? Both cards getting polled for the CUDA drivers and then for the OpenCL drivers?
ID: 70463 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
EPFrybar

Send message
Joined: 28 Jan 21
Posts: 8
Credit: 25,225,698
RAC: 0
Message 70464 - Posted: 28 Jan 2021, 17:50:31 UTC - in response to Message 70463.  

Yes, BOINC sees both cards..

I have 11 CPU busy running WCG@H 1 CPU running MW@H and 12 "CPU's" idle. windows-10 is doing a good job of keeping CPU "pairs" (0:1, 2:3, 4:5, etc) at 100% and not mis-scheduling / overusing any one cpu.

Ed F
ID: 70464 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 540,013,542
RAC: 86,805
Message 70467 - Posted: 29 Jan 2021, 7:42:50 UTC

Are you doing some fancy cpu scheduling and assigning specific affinities?

AFAIK, Ryzen/Threadripper cpus do not have physical paired cores as in your example when the OS and CPPC mask is concerned.

You may be actually hurting the OS thread scheduler by trying to impose your own affinities.
ID: 70467 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
EPFrybar

Send message
Joined: 28 Jan 21
Posts: 8
Credit: 25,225,698
RAC: 0
Message 70470 - Posted: 29 Jan 2021, 22:20:11 UTC - in response to Message 70467.  

Windows 10 schedules a task in cpu pairs like I indicated (0:1, 2:3, 4:5 etc) and Ubuntu schedules in pairs of 0:12, 1:13, 2:14 etc. if it can ... I ASSUME this reduces context switching and thus reduces overhead.

I am not doing anything funky at this point ... running BOINC "out-of-the-box" ... I do have the old cc_config from SETI.

I continue to be puzzled by BOINC only scheduling 1 GPU for MW@H but scheduling both for other projects I've tried. I think it interesting that it runs 1 GPU task but downloads 2 pending even though i have a resource share set to "0".

Still open to ideas!!

Ed F
ID: 70470 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,943,223
RAC: 22,516
Message 70473 - Posted: 30 Jan 2021, 0:28:49 UTC - in response to Message 70470.  

Windows 10 schedules a task in cpu pairs like I indicated (0:1, 2:3, 4:5 etc) and Ubuntu schedules in pairs of 0:12, 1:13, 2:14 etc. if it can ... I ASSUME this reduces context switching and thus reduces overhead.

I am not doing anything funky at this point ... running BOINC "out-of-the-box" ... I do have the old cc_config from SETI.

I continue to be puzzled by BOINC only scheduling 1 GPU for MW@H but scheduling both for other projects I've tried. I think it interesting that it runs 1 GPU task but downloads 2 pending even though i have a resource share set to "0".

Still open to ideas!!

Ed F


Can you post your cc_config file here and confirm that it's in the Boinc folder and not the Boinc/programs/seti folder

As for scheduling the newer versions of Boinc, since 7.14.2 I think, a project will predownload the next task so you aren't waiting for a project that's busy and you are out of work for a bit. There is a percentage completed number where it starts trying to get the next task but I don't know what that is, ie it doesn't download the next one at 3% completed. As for MilkyWay they are probably sending out 2 at a time since they are so short.
ID: 70473 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
EPFrybar

Send message
Joined: 28 Jan 21
Posts: 8
Credit: 25,225,698
RAC: 0
Message 70474 - Posted: 30 Jan 2021, 2:22:35 UTC - in response to Message 70473.  

I have a cc_config.xml in place under BoincData as follows:
<cc_config>
<options>
<ncpus>0</ncpus>
<use_all_gpus>1</use_all_gpus>
<ignore_cuda_dev>11</ignore_cuda_dev>
<ignore_cuda_dev>12</ignore_cuda_dev>
<max_file_xfers_per_project>8</max_file_xfers_per_project>
<save_stats_days>360</save_stats_days>
<http_transfer_timeout>120</http_transfer_timeout>
<max_tasks_reported>50</max_tasks_reported>
<report_results_immediately>1</report_results_immediately>
</options>
</cc_config>


Just for fun, I think I'll turn MW@H on on a 1070 only computer and see if it is recognized and runs...

(if it gives me problems I'll trim cc_config.xml down a bit and see if that helps)

Ideas are welcome :-)

Ed F
ID: 70474 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,943,223
RAC: 22,516
Message 70475 - Posted: 30 Jan 2021, 11:42:48 UTC - in response to Message 70474.  
Last modified: 30 Jan 2021, 11:43:59 UTC

I have a cc_config.xml in place under BoincData as follows:
<cc_config>
<options>
<ncpus>0</ncpus>
<use_all_gpus>1</use_all_gpus>
<ignore_cuda_dev>11</ignore_cuda_dev>
<ignore_cuda_dev>12</ignore_cuda_dev>
<max_file_xfers_per_project>8</max_file_xfers_per_project>
<save_stats_days>360</save_stats_days>
<http_transfer_timeout>120</http_transfer_timeout>
<max_tasks_reported>50</max_tasks_reported>
<report_results_immediately>1</report_results_immediately>
</options>
</cc_config>


Just for fun, I think I'll turn MW@H on on a 1070 only computer and see if it is recognized and runs...

(if it gives me problems I'll trim cc_config.xml down a bit and see if that helps)

Ideas are welcome :-)

Ed F


Personally I would get rid of all but a few lines but I would trim the <ignore_cuda_dev> and the <max_file_xfers_per_project> lines first, save it and see what happens. If you cut them out and save them below the last [/quote] line it won't affect the file but may show up as some unknown instructions which you can safely ignore. MW loves to send hundreds of tasks at once and then force a 10 minute timeout of no communications requests before you can get new gpu tasks, cpu tasks are not affected by that.

I know Keith Myers is far more experienced in this kind of stuff than I am so hopefully he will give his thoughts too.
ID: 70475 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 540,013,542
RAC: 86,805
Message 70476 - Posted: 30 Jan 2021, 17:32:48 UTC

I already posted about the bogus ignore_gpu statements.

I think the report_tasks_immediately is the next likely culprit.

That along with the foible of MW not allowing work to be requested when a task is being reported is a definite no-no.

I run a special GPUUG client to get around that project problem. There is also the open-source developed MW client by JStateson that also gets around that issue on github.
ID: 70476 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
EPFrybar

Send message
Joined: 28 Jan 21
Posts: 8
Credit: 25,225,698
RAC: 0
Message 70523 - Posted: 3 Feb 2021, 17:04:49 UTC - in response to Message 70476.  
Last modified: 3 Feb 2021, 17:06:53 UTC

Great we now have work again!!

Well I had time to look through my archives of playing with BOINC and came across an old friend that worked here.

This is the ONLY change I made:

added BoincData\projects\milkyway.cs.rpi.edu_milkyway\app_config.xml:

<app_config>
<app>
<name>milkyway</name>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>0.49</cpu_usage>
</gpu_versions>
</app>
</app_config>

Read config files
and presto-changeo I now have both GPU's running!!


Thanks for the ideas!!

Ed F
ID: 70523 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3315
Credit: 519,943,223
RAC: 22,516
Message 70538 - Posted: 4 Feb 2021, 2:11:46 UTC - in response to Message 70523.  

Great we now have work again!!

Well I had time to look through my archives of playing with BOINC and came across an old friend that worked here.

This is the ONLY change I made:

added BoincData\projects\milkyway.cs.rpi.edu_milkyway\app_config.xml:

<app_config>
<app>
<name>milkyway</name>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>0.49</cpu_usage>
</gpu_versions>
</app>
</app_config>

Read config files
and presto-changeo I now have both GPU's running!!


Thanks for the ideas!!

Ed F


You are confusing app_config.xml files and cc_config.xml files

THIS file is an app_config.xml file while the OTHER file was a cc_config.xml file
ID: 70538 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
EPFrybar

Send message
Joined: 28 Jan 21
Posts: 8
Credit: 25,225,698
RAC: 0
Message 70539 - Posted: 4 Feb 2021, 3:37:44 UTC - in response to Message 70538.  
Last modified: 4 Feb 2021, 3:43:17 UTC

yes,
this file (the one I just added and fixed the "problem")

<app_config>
<app>
<name>milkyway</name>
<gpu_versions>
<gpu_usage>1</gpu_usage>
<cpu_usage>0.49</cpu_usage>
</gpu_versions>
</app>
</app_config>

is :
BoincData\projects\milkyway.cs.rpi.edu_milkyway\app_config.xml:

and this file

<cc_config>
<options>
<ncpus>0</ncpus>
<use_all_gpus>1</use_all_gpus>
<ignore_cuda_dev>11</ignore_cuda_dev>
<ignore_cuda_dev>12</ignore_cuda_dev>
<max_file_xfers_per_project>8</max_file_xfers_per_project>
<save_stats_days>360</save_stats_days>
<http_transfer_timeout>120</http_transfer_timeout>
<max_tasks_reported>50</max_tasks_reported>
</options>
</cc_config>
is

BoincData\cc_config.xml
( I no longer have the

"<report_results_immediately>1</report_results_immediately>"

line in it (every 90 sec's seemed a bit too often)

Ed F
ID: 70539 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 540,013,542
RAC: 86,805
Message 70544 - Posted: 4 Feb 2021, 15:35:23 UTC

Yes, getting rid of the report immediately certainly helped. Still does not solve the problem of MW preventing a work request on the same connection as one where you report a result.
And the fast running tasks means you almost always have a new result to report in under 90 seconds. So you don't get any work.

This usually means you run through all of your allotted 900 tasks. But then sit there dormant for ten minutes on a work request backoff before your next request gets you your allotted 900 tasks again.

This has been reported ad nauseum in the NC forum many times. The work request delay should be longer than the shortest running task, but even that wouldn't help on hosts with multiple gpus, you inevitably would still have a finished result to report within the delay period. Some projects have ridiculously short work request delays. Universe has a 11 second delay.

As I mentioned there are solutions to get around this problem. One is to run the JStateson Milkyway BOINC client which gets around the work request/report result problem.
ID: 70544 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Preferences : GPU error

©2024 Astroinformatics Group