Message boards :
News :
GPU Issues Mega Thread
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8
Author | Message |
---|---|
Send message Joined: 15 Jul 12 Posts: 5 Credit: 119,063,045 RAC: 1,575 |
Seeing a BSOD which I think is being caused by MilkyWay GPU s/w. I have several Win10 minidumps of the issue (I hope), or whatever windows takes when it goes blue. The reason why I think its GPU is I have that set only to run when I am away from the computer and when screensaver is running, that's when it always happens. CPU = AMD FX-8370 Eight-Core Processor, 4013 MHz, 4 Kern, 8 logische Prozessors Mem = 16GB Graphics = GeForce GTX 1080 Driver = Nvidia 376.33 OS = Microsoft Windows 10 Pro, 10.0.14393 Build 14393 Boinc = 7.5.33 (x64), wxWidgets = 3.0.1 Not sure if you get informed by Microsoft on issues related to your product or not. Please advise what you need next. |
Send message Joined: 22 Jun 13 Posts: 44 Credit: 64,258,609 RAC: 0 |
wb8ili, Glad you got it fixed. Just in case you are still wondering, most of the client files are in /var/lib/boinc-client (projects folder and slots folder included). The procedure that starts/restarts the boinc client is in /etc/init.d and is called boinc-client. Happy crunching. |
Send message Joined: 15 Jul 12 Posts: 5 Credit: 119,063,045 RAC: 1,575 |
I think this is actually a BOINC issue, I stopped MilkyWay and loaded a different project that uses GPU, and hit same issue even faster. Will post on their forum. |
Send message Joined: 14 May 11 Posts: 7 Credit: 87,559,035 RAC: 2,841 |
Speaking about the "#define Q_INV_SQR inf" Milkyway error: I have been running BOINC/SETI for over 20 years. I have projects in Einstein, SETI, GPUGrid, Rosetta and Milkyway. Of the Milkyway project, there are 3 different types of jobs that I noticed. ONLY ONE OF THOSE IS FAILING. I do not subscribe to the idea that there is a configuration problem when all my other projects are screaming along (including alternate Milkyway jobs). The most reasonable explanation is a TYPO in the project files. Someone fatfingered a "t" into a "f" - which is not too hard to do since they are next to each other on the keyboard. What needs to happen is for a Milkyway developer to explain why this happens - and hopefully fix his own code. |
Send message Joined: 22 Jun 13 Posts: 44 Credit: 64,258,609 RAC: 0 |
Chris Rampson, If you believe that there is a problem with the code, you might ask yourself this question: Why are other Milkyway users able to successfully complete those tasks using Linux and NVIDIA GPU's? Edit: Re: this thread http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4087&postid=66127 |
Send message Joined: 8 Apr 09 Posts: 70 Credit: 11,027,167,827 RAC: 0 |
Hello again, and a happy new year to everybody. I still got issues when running several WUs in parallel on the Hawaii bases GPUs. One WU at a time still runs fine. Could someone look at my invalid WUs with the Validate errors. I can't make anything out of the text. https://milkyway.cs.rpi.edu/milkyway/results.php?hostid=705276&offset=0&show_names=0&state=5&appid= Maybe some can help me figure out what is broken. AMD drivers have improved, the WUs don't hang any longer, but there are still Validate errors. |
Send message Joined: 15 Aug 10 Posts: 1 Credit: 59,828,702 RAC: 0 |
Good day, I've been having issues lately with specifically "MilkyWay@home 1.43 (opencl_ati_101)" units which have a failure rate of 100% 2secs after starting up. The "MilkyWay@home 1.43 (opencl_nvidia_101)" actually works great on a separate machine so I leads me to believe that this problem is specific to the AMD Radeon GPU (6800 series - HD6870 in my case). The adapter has never been overclocked and is actually working on a machine crunching for 16 distinct BOINC projects without any issues on any other project. I'm running the latest AMD Catalyst Software Suite available on Windows 10 (non-beta drivers). Driver Packaging ver. 15.201.1151.1008-151104a-296217E Provider Advanced Micro Devices, Inc. 2D Driver Version 8.01.01.1500 Direct3D Version 9.14.10.01128 OpenGL Version 6.14.10.13399 Mantle Driver Version 9.1.10.0083 Mantle API Version Not Available AMD Catalyst CCV 2015.1104.1643.30033 Breakdown is as follows for the end result of completed units... - MilkyWay@home 1.43 (opencl_ati_101) => Computational error after 2 secs. - MilkyWay@home 1.43 (opencl_nvidia_101) => 100% successful completion. - MilkyWay@Home N-Body Simulation 1.62(mt) => 100% successful completion. - MilkyWay@Home 1.42 => 100% successful completion. regards, Peter |
Send message Joined: 13 Oct 16 Posts: 112 Credit: 1,174,293,644 RAC: 0 |
Did Win 10 automatically update your drivers? I think it screws up things if it did. Need to uninstall, clean drivers and then reinstall from AMD site. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 1 |
Good day, The 2 or 3 second errors can also be caused by not installing this: For Windows, the most recent Visual Studio 2012 C++ runtime |
Send message Joined: 4 Oct 11 Posts: 38 Credit: 309,729,457 RAC: 0 |
According to this url https://en.wikipedia.org/wiki/Radeon_HD_6000_Series The HD6870 does not support double precision. |
Send message Joined: 27 Apr 15 Posts: 4 Credit: 427,409,763 RAC: 0 |
I have been successfully running GPU tasks with both AMD (amdgpu-pro) and Nvidia devices, using their provided OpenCL libraries. Yesterday I upgraded one of the AMD workstations to use ROCm and its OpenCL (amdgpu-pro doesn't work on Ubuntu 17.10). GPU device is RX 480 (Ellesmere/Polaris, or 'gfx803' in ROCm). Since then MilkyWay@home GPU tasks are failing. Below is what I see in the task log and part of the clinfo. Let me know if there already is any solution to this or more info is needed. <core_client_version>7.8.3</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255)</message> <stderr_txt> <search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application> BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.' Setting process priority to 0 (13): Permission denied Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Switching to Parameter File 'astronomy_parameters.txt' <number_WUs> 5 </number_WUs> <number_params_per_WU> 20 </number_params_per_WU> Using AVX path Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR Failed to get information about device Error getting device and context (1): MW_CL_ERROR Failed to calculate likelihood Using AVX path Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR Failed to get information about device Error getting device and context (1): MW_CL_ERROR Failed to calculate likelihood Using AVX path Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR Failed to get information about device Error getting device and context (1): MW_CL_ERROR Failed to calculate likelihood Using AVX path Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR Failed to get information about device Error getting device and context (1): MW_CL_ERROR Failed to calculate likelihood Using AVX path Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR Failed to get information about device Error getting device and context (1): MW_CL_ERROR Failed to calculate likelihood 09:15:21 (13641): called boinc_finish(1) </stderr_txt> ]]> clinfo|head -20 Number of platforms 1 Platform Name AMD Accelerated Parallel Processing Platform Vendor Advanced Micro Devices, Inc. Platform Version OpenCL 2.0 AMD-APP (2508.0) Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd cl_amd_event_callback Platform Extensions function suffix AMD Platform Name AMD Accelerated Parallel Processing Number of devices 1 Device Name gfx803 Device Vendor Advanced Micro Devices, Inc. Device Vendor ID 0x1002 Device Version OpenCL 1.2 Driver Version 1.1 (HSA,LC) Device OpenCL C Version OpenCL C 2.0 Device Type GPU Device Profile FULL_PROFILE Max compute units 36 Max clock frequency 1288MHz ... |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 1 |
Sorry wrong thread! |
©2024 Astroinformatics Group