Welcome to MilkyWay@home

de_modfit_80_bundle4_4s_south4s - error messages

Message boards : Number crunching : de_modfit_80_bundle4_4s_south4s - error messages
Message board moderation

To post messages, you must log in.

AuthorMessage
greg_be

Send message
Joined: 18 Aug 09
Posts: 122
Credit: 20,694,647
RAC: 4,385
Message 69027 - Posted: 13 Sep 2019, 12:17:11 UTC

What do these messages mean?


Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood


CL_PLATFORM_NOT_FOUND_KHR
ID: 69027 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 540,095,996
RAC: 86,835
Message 69028 - Posted: 13 Sep 2019, 17:07:55 UTC - in response to Message 69027.  

You don't have the OpenCL component of your video drivers installed. If you are getting your drivers from Microsoft, that is the reason. Get your drivers directly from your card vendor.
ID: 69028 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
greg_be

Send message
Joined: 18 Aug 09
Posts: 122
Credit: 20,694,647
RAC: 4,385
Message 69029 - Posted: 13 Sep 2019, 19:33:45 UTC - in response to Message 69028.  
Last modified: 13 Sep 2019, 19:39:29 UTC

using NVIDIA Studio drivers...they are supposed to be better than the standard drivers.

plus..I am running other opencl tasks from other projects without a problem.
Amicable numbers has opencl and I don't get errors from them.

So I find it a bit odd that just one project crashes...
ID: 69029 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 69030 - Posted: 13 Sep 2019, 21:34:49 UTC - in response to Message 69029.  
Last modified: 13 Sep 2019, 21:43:34 UTC

using NVIDIA Studio drivers...they are supposed to be better than the standard drivers.

plus..I am running other opencl tasks from other projects without a problem.
Amicable numbers has opencl and I don't get errors from them.

So I find it a bit odd that just one project crashes...


All those errors happened recently as I see valid 1050ti results on the 12th

Try the following
<app_config>
<app_version>
<app_name>milkyway</app_name>
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
<cmdline>--non-responsive --verbose --gpu-target-frequency 1 --gpu-polling-mode -1 --gpu-wait-factor 0 --process-priority 4 --gpu-disable-checkpointing</cmdline>
</app_version>
</app_config>


this helped me debug a problem with an ATI board. Probably only one needed for debug is the verbose command. I removed the "ati class" from the app_config. Not sure if a class is needed, I am not an expert on app_config stuff.

also, maybe a memory allocation problem? Are you running more than 1 per gpu?

after the first tasks reports, look at the stderr for more info about the error. Looks to me like milkyway does not see (at a minimum 1.2) OpenCL. Just a guess. Post link to results here.

[edit]
If the event log shows a problem with the above put in
<plan_class>opencl_nvidia_101</plan_class> after the app_name

I rarely run milkway with any nvidia products as even old ATI boards handle FP64 far getter than even 1080ti.
ID: 69030 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
greg_be

Send message
Joined: 18 Aug 09
Posts: 122
Credit: 20,694,647
RAC: 4,385
Message 69031 - Posted: 14 Sep 2019, 7:31:24 UTC - in response to Message 69030.  
Last modified: 14 Sep 2019, 7:36:51 UTC

You bring up memory, I think that may be the problem though it does not say that (other projects will say something about memory in the error messages) since I am running LHC tasks. Their ATLAS sucks memory like a black hole.


But my memory compression/Free up program says I am only using 65% with ATLAS running. Weird.

Where did you use that code at? I don't play around with BOINC enough to know these things.

Complete error cycle is this:
Using SSE4.1 path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
ID: 69031 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 69033 - Posted: 14 Sep 2019, 13:07:03 UTC - in response to Message 69031.  
Last modified: 14 Sep 2019, 13:40:41 UTC

You bring up memory, I think that may be the problem
But my memory compression/Free up program says I am only using 65% with ATLAS running. Weird.

Where did you use that code at? I don't play around with BOINC enough to know these things.

Complete error cycle is this:
Using SSE4.1 path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood


OK, I ran a couple of milkyway tasks on an nvidia board. Put the following "app_config.xml" ...
<app_config>
<app_version>
<app_name>milkyway</app_name>
<plan_class>opencl_nvidia_101</plan_class>
<cmdline>--verbose</cmdline>
<avg_ncpus>1</avg_ncpus>
<ngpus>1</ngpus>
</app_version>
</app_config>


at
\ProgramData\Boinc\projects\milkyway.cs.rpi.edu_milkyway>

this is what the stderr file shows.
https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=314018306

Last night, my system (the ATI one) installed an unwanted driver update to my HD7950. That worked fine for the HD7950 but the other video boards, S9000 type, failed and over 800 milkyway tasks errored out before the reboot occurred. Boinc does not start automatically on this system which, fortunately, gives me enough time to fix problems. In this case it was an easy fix, I just rolled back one of the s9000 drivers and all the rest rolled back and I am back to normal operation (until the next Microsoft update).

I compared my 840 error tasks to your 37 error tasks and similar: cannot find an OpenCL device.

The last time I ran LHC I recall having to go to oracle and getting their "latest and greatest" as the one at Berkeley was too old. Also, as you mention, it is a memory hog, and things got too slow for the system I was running it on. I don't remember exactly what caused me to have to go get a later version of VB but I recall a warning message from LHC about VB having a problem.. Been a while, maybe Berkeley has the newer VB and that problem does not exist anymore.

You have a driver problem as Keith mentioned. I suspect the GTX1050ti is in virtual mode under VM and milkyway has a problem with that and other projects do not but I could be wrong.

The "--verbose" can show more information. Please do not copy the output of stderr on this thread, just leave a link to the result.

Suggest remove LHC and virtual box and run WCG apps for your CPU.. Helping LHC find black holes might end our universe as we know it.

My version of Virtual Box is 6.08 but I do not run any apps in it anymore for about a year.

[edit] fixed a number of things I this post that were wrong.

Use the file name "app_confrig.xml" for the above milkyway script
ID: 69033 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Oct 16
Posts: 162
Credit: 1,004,444,628
RAC: 22,344
Message 69037 - Posted: 15 Sep 2019, 17:11:51 UTC

Post results from the command 'clinfo' in CMD prompt. You may have installed NV drivers but Win10 probably overwrote it.
ID: 69037 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
greg_be

Send message
Joined: 18 Aug 09
Posts: 122
Credit: 20,694,647
RAC: 4,385
Message 69055 - Posted: 18 Sep 2019, 20:05:04 UTC - in response to Message 69037.  

Can't find clinfo.
The various batch files I tried to use for this do not work.
I'll just keep hunting for a solution.
Maybe just reinstall drivers.

I just had to mess with windows, because when I upgraded my system (mobo and the works) windows didn't recognize the system. So I had to use a win8.1 key and then "upgrade" to win10 which I already have. Who knows what files that wiped out.
ID: 69055 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
greg_be

Send message
Joined: 18 Aug 09
Posts: 122
Credit: 20,694,647
RAC: 4,385
Message 69056 - Posted: 18 Sep 2019, 20:20:35 UTC

My system was down for awhile. I switched power supplies (550 corair analog to a 650 nzxt digital) and added memory and switched to a new case. I committed the most stupid of Duh/DOH errors when trying to power up the system. I was using the reset switch and could not find the power switch on the front panel. Hauled it to a computer shop. They had the same issue and then found the button. Then I have had the windows license key issue and had to go back to validating with win8.1 and then "upgrading" to wind10 which is already installed.

So I think a bunch of stuff got erased during that upgrade.

I Just reinstalled NVIDIA Studio Driver Version 431.86
Maybe I should go back to the Game style DCH instead? Maybe studio has something in it setup that the program here does not understand?
I will try that in my morning (Central EU time)
ID: 69056 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
greg_be

Send message
Joined: 18 Aug 09
Posts: 122
Credit: 20,694,647
RAC: 4,385
Message 69058 - Posted: 18 Sep 2019, 23:29:21 UTC
Last modified: 18 Sep 2019, 23:32:19 UTC

0130 Thursday (Central EU time) the reinstallation of studio driver manually seems to have fixed this issue.
task is running ok.
ID: 69058 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 540,095,996
RAC: 86,835
Message 69059 - Posted: 19 Sep 2019, 1:29:48 UTC - in response to Message 69055.  

Can't find clinfo.
The various batch files I tried to use for this do not work.
I'll just keep hunting for a solution.
Maybe just reinstall drivers.

I just had to mess with windows, because when I upgraded my system (mobo and the works) windows didn't recognize the system. So I had to use a win8.1 key and then "upgrade" to win10 which I already have. Who knows what files that wiped out.


https://boinc.berkeley.edu/dl/clinfo.zip
ID: 69059 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
greg_be

Send message
Joined: 18 Aug 09
Posts: 122
Credit: 20,694,647
RAC: 4,385
Message 69061 - Posted: 19 Sep 2019, 5:37:09 UTC

Platform ID: 010021C8
Name: GeForce GTX 1050 Ti
Vendor: NVIDIA Corporation
Driver version: 431.86
Profile: FULL_PROFILE
Version: OpenCL 1.2 CUDA
Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d11_sharing cl_nv_copy_opts cl_nv_create_buffer


This is after the reinstall.
Tasks are completing normally now.
ID: 69061 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Oct 16
Posts: 162
Credit: 1,004,444,628
RAC: 22,344
Message 69065 - Posted: 19 Sep 2019, 10:56:53 UTC - in response to Message 69058.  

0130 Thursday (Central EU time) the reinstallation of studio driver manually seems to have fixed this issue.
task is running ok.


The very 1st suggestion was the fix...
ID: 69065 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
greg_be

Send message
Joined: 18 Aug 09
Posts: 122
Credit: 20,694,647
RAC: 4,385
Message 69078 - Posted: 19 Sep 2019, 22:35:33 UTC - in response to Message 69065.  

Computer was just being difficult.
Had some other issues today.
Finally got everything working the way it should.
Thanks guys.
ID: 69078 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : de_modfit_80_bundle4_4s_south4s - error messages

©2024 Astroinformatics Group