Welcome to MilkyWay@home

CL_OUT_OF_HOST_MEMORY with AMD RX 6600 XT on Xubuntu 20.04

Questions and Answers : Unix/Linux : CL_OUT_OF_HOST_MEMORY with AMD RX 6600 XT on Xubuntu 20.04
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
neofob

Send message
Joined: 4 Mar 18
Posts: 23
Credit: 265,230,226
RAC: 14,976
Message 73052 - Posted: 20 Apr 2022, 23:16:24 UTC
Last modified: 20 Apr 2022, 23:17:49 UTC

Is there anyone running Milkyway@Home with AMD RX 6600 XT in Ubuntu 20.04?

I run into a lot of error of computing. It turns out that it's the error "CL_OUT_OF_HOST_MEMORY".

Computer info:
* Xubuntu 20.04, kernel 5.13.19
* AMD OpenCL (ROCM 5.1.1) installed with amdgpu-install from: https://repo.radeon.com/amdgpu-install/22.10.1/ubuntu/focal/
* 32GB of RAM
* GPU: AMD Radeon 6600 XT 8 GB

* I do not have the APU (AMD integrated GPU) enabled in BIOS. Only the discrete Graphics card 6600 XT is used.

Interestingly, Einstein@Home project can run apps that use GPU without errors.
ID: 73052 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 73063 - Posted: 21 Apr 2022, 10:43:36 UTC - in response to Message 73052.  

Is there anyone running Milkyway@Home with AMD RX 6600 XT in Ubuntu 20.04?

I run into a lot of error of computing. It turns out that it's the error "CL_OUT_OF_HOST_MEMORY".

Computer info:
* Xubuntu 20.04, kernel 5.13.19
* AMD OpenCL (ROCM 5.1.1) installed with amdgpu-install from: https://repo.radeon.com/amdgpu-install/22.10.1/ubuntu/focal/
* 32GB of RAM
* GPU: AMD Radeon 6600 XT 8 GB

* I do not have the APU (AMD integrated GPU) enabled in BIOS. Only the discrete Graphics card 6600 XT is used.

Interestingly, Einstein@Home project can run apps that use GPU without errors.


I believe that's a programming error with the tasks themselves not your gpu.
ID: 73063 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
darknightcl

Send message
Joined: 23 May 09
Posts: 4
Credit: 16,387
RAC: 0
Message 74758 - Posted: 9 Dec 2022, 20:35:28 UTC - in response to Message 73063.  

I realize that this is an old thread, but is there any word on what we should do about this error? I have the same problem on my Radeon RX 6750 XT. Every task attempted fails with the CL_OUT_OF_HOST_MEMORY error. Should I let my computer burn through the bugged tasks? Should I disable the GPU for this project? Do the researchers know about this problem?

Thanks!
ID: 74758 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 74765 - Posted: 10 Dec 2022, 13:49:38 UTC - in response to Message 74758.  

I realize that this is an old thread, but is there any word on what we should do about this error? I have the same problem on my Radeon RX 6750 XT. Every task attempted fails with the CL_OUT_OF_HOST_MEMORY error. Should I let my computer burn through the bugged tasks? Should I disable the GPU for this project? Do the researchers know about this problem?

Thanks!


In hindsight both of you guys could be running into gpu's that are just too old to crunch here unless you have one of the newer ones with 12gb of onboard ram
ID: 74765 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
darknightcl

Send message
Joined: 23 May 09
Posts: 4
Credit: 16,387
RAC: 0
Message 74770 - Posted: 11 Dec 2022, 2:49:45 UTC - in response to Message 74765.  

I just bought my GPU last week, and it has 12GB of onboard RAM. Einstein@Home jobs run just fine. Any other thoughts?

Thanks
ID: 74770 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 74772 - Posted: 11 Dec 2022, 13:25:13 UTC - in response to Message 74770.  

I just bought my GPU last week, and it has 12GB of onboard RAM. Einstein@Home jobs run just fine. Any other thoughts?

Thanks


No i don't sorry
ID: 74772 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
darknightcl

Send message
Joined: 23 May 09
Posts: 4
Credit: 16,387
RAC: 0
Message 74774 - Posted: 11 Dec 2022, 21:59:43 UTC - in response to Message 74772.  

Ok, thanks for your help.
ID: 74774 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
darknightcl

Send message
Joined: 23 May 09
Posts: 4
Credit: 16,387
RAC: 0
Message 74792 - Posted: 13 Dec 2022, 18:39:30 UTC

In case anyone else runs into this problem and happens to find this page, it took extensive research, but I think I know what's happened.

In ~2017, AMD came out with a new opencl stack called Radeon Open Compute - runtime (ROCr), and started building it in to new GPUs. Specifically, anything newer than a Vega 10. In ~2019, the AMD GPU Linux driver was updated to deprecate the "legacy" opencl stack in favour of ROCr. According to the AMD GPU driver page, the legacy opencl stack doesn't support anything newer than the Vega 10. Newer GPUs must use ROCr. Since the apps for MilkyWay@Home haven't been updated since 2019, I assume that they haven't been updated to use ROCr, and therefore won't run on any of the newer AMD GPUs under Linux. This also explains why if you look at the GPU Models page under the Computing menu above, all the AMD GPUs listed running under Linux are older than the Vega 10.

Long story short, MilkyWay@Home needs to update its apps.
ID: 74792 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Martin

Send message
Joined: 28 May 22
Posts: 17
Credit: 402,111,833
RAC: 0
Message 74797 - Posted: 14 Dec 2022, 2:08:57 UTC - in response to Message 74792.  

Maybe a dumb idea, but could you run Windows in a Virtual Machine and put BOINC and AMD's cl compatible drivers on it ?

Martin
ID: 74797 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
k

Send message
Joined: 28 Jun 16
Posts: 1
Credit: 145,448
RAC: 0
Message 74899 - Posted: 11 Jan 2023, 12:30:11 UTC

I got the same errors on my RX 5600XT with ROCm.
ID: 74899 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alex

Send message
Joined: 1 Oct 14
Posts: 3
Credit: 20,121,618
RAC: 2,129
Message 74926 - Posted: 19 Jan 2023, 22:05:53 UTC - in response to Message 74792.  
Last modified: 19 Jan 2023, 22:07:16 UTC

Yeah, I see. Thanks, man! I have the exact same problem with my RX6600 on Ubuntu 22.10. Installed the latest drivers using amdgpu-install. It runs handsomly on Einstein and PrimeGrid, but no luck with Milkyway. So sad.

<core_client_version>7.20.2</core_client_version>
<![CDATA[
<message>
process exited with code 250 (0xfa, -6)</message>
<stderr_txt>
<search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Setting process priority to 0 (13): Permission denied
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 5 </number_WUs>
<number_params_per_WU> 20 </number_params_per_WU>
Using AVX path
Found 1 platform
Platform 0 information:
Name: AMD Accelerated Parallel Processing
Version: OpenCL 2.1 AMD-APP (3513.0)
Vendor: Advanced Micro Devices, Inc.
Extensions: cl_khr_icd cl_amd_event_callback
Profile: FULL_PROFILE
Using device 0 on platform 0
Found 1 CL device
Device 'gfx1032' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Board: AMD Radeon RX 6600
Driver version: 3513.0 (HSA1.1,LC)
Version: OpenCL 2.0
Compute capability: 0.0
Max compute units: 14
Clock frequency: 2750 Mhz
Global mem size: 8573157376
Local mem size: 65536
Max const buf size: 7287183768
Double extension: cl_khr_fp64
Error creating command queue (-6): CL_OUT_OF_HOST_MEMORY
Error getting device and context (-6): CL_OUT_OF_HOST_MEMORY
ID: 74926 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 554,885,540
RAC: 36,448
Message 74928 - Posted: 20 Jan 2023, 4:25:03 UTC

I believe is it is just a permissions issue with the Rocr drivers which have the OpenCL component in a different location from the legacy AMD OpenCL drivers.

You would have to get some AMD compute experts to chime in and verify that. Remember reading about the issue somewhere, on some project but don't know where to point you to.
ID: 74928 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 18 Nov 22
Posts: 84
Credit: 640,530,847
RAC: 0
Message 74979 - Posted: 31 Jan 2023, 15:47:36 UTC - in response to Message 74792.  

In case anyone else runs into this problem and happens to find this page, it took extensive research, but I think I know what's happened.

In ~2017, AMD came out with a new opencl stack called Radeon Open Compute - runtime (ROCr), and started building it in to new GPUs. Specifically, anything newer than a Vega 10. In ~2019, the AMD GPU Linux driver was updated to deprecate the "legacy" opencl stack in favour of ROCr. According to the AMD GPU driver page, the legacy opencl stack doesn't support anything newer than the Vega 10. Newer GPUs must use ROCr. Since the apps for MilkyWay@Home haven't been updated since 2019, I assume that they haven't been updated to use ROCr, and therefore won't run on any of the newer AMD GPUs under Linux. This also explains why if you look at the GPU Models page under the Computing menu above, all the AMD GPUs listed running under Linux are older than the Vega 10.

Long story short, MilkyWay@Home needs to update its apps.


i think you're very confused on what ROCm and ROCr actually are. your post implies it's something to do with hardware with your comment "started building it into new GPUs". this is not true. ROCm and ROCr are just the software/drivers. nothing to do with hardware.

people generally dont run newer AMD GPUs because AMD started nerfing the FP64 capabilities of their new cards and it's just not worth it to run here. the older cards just perform better. if you go out past the top 100, you'll see some Navi and Big Navi cards working on the project.

the problem is drivers, not the application. I'm betting that a full true ROCm install (NOT ROCr from the amdgpu installer) would work. unfortunately AMD linux drivers are a bit of a mess in this regard with OpenCL support coming from multiple drivers (amdgpu, ROCm, Mesa) and each of them with their own drawbacks and limitations.

but for the application itself, it does have a kind of flaw, not one with making it full-stop not work, but with memory management which is likely the reason for the memory errors by the OP. these tasks don't use much VRAM for each context, but the tasks are prepackaged groups of 5 tasks. and when the subsequent internal "jobs" run, the contexts, and hence VRAM used, are not released until the task has fully completed all 5. this is undoubtedly not necessary for the task to function. it's holding old data in the VRAM for no reason. when the task completes, the 5x tasks are taking up ~1500MB of VRAM. if you're running multiples (as most people do) for best performance, you can easily run out of VRAM and get this error. an 8GB card could only run 5 tasks at a time safely, MAYBE 6 if the tasks remain staggered.

ID: 74979 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
magic_sam

Send message
Joined: 8 Nov 22
Posts: 3
Credit: 8,617,255
RAC: 1,373
Message 75074 - Posted: 21 Feb 2023, 16:21:29 UTC

Dear all,

I believe I ran into a similar issue with an AMD Radeon RX 7900 XTX:

<core_client_version>7.20.5</core_client_version>
<![CDATA[
<message>
process exited with code 250 (0xfa, -6)</message>
<stderr_txt>
<search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Setting process priority to 0 (13): Permission denied
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' 
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 5 </number_WUs>
<number_params_per_WU> 20 </number_params_per_WU>
Using AVX path
Found 1 platform
Platform 0 information:
  Name:       AMD Accelerated Parallel Processing
  Version:    OpenCL 2.1 AMD-APP (3513.0)
  Vendor:     Advanced Micro Devices, Inc.
  Extensions: cl_khr_icd cl_amd_event_callback 
  Profile:    FULL_PROFILE
Using device 0 on platform 0
Found 2 CL devices
Device 'gfx1100' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Board: Radeon RX 7900 XTX
Driver version:      3513.0 (HSA1.1,LC)
Version:             OpenCL 2.0 
Compute capability:  0.0
Max compute units:   48
Clock frequency:     3220 Mhz
Global mem size:     25753026560
Local mem size:      65536
Max const buf size:  21890072576
Double extension:    cl_khr_fp64
Error creating command queue (-6): CL_OUT_OF_HOST_MEMORY
Error getting device and context (-6): CL_OUT_OF_HOST_MEMORY
Failed to calculate likelihood
Using AVX path
Found 1 platform
Platform 0 information:
  Name:       AMD Accelerated Parallel Processing
  Version:    OpenCL 2.1 AMD-APP (3513.0)
  Vendor:     Advanced Micro Devices, Inc.
  Extensions: cl_khr_icd cl_amd_event_callback 
  Profile:    FULL_PROFILE
Using device 0 on platform 0
Found 2 CL devices
Device 'gfx1100' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Board: Radeon RX 7900 XTX
Driver version:      3513.0 (HSA1.1,LC)
Version:             OpenCL 2.0 
Compute capability:  0.0
Max compute units:   48
Clock frequency:     3220 Mhz
Global mem size:     25753026560
Local mem size:      65536
Max const buf size:  21890072576
Double extension:    cl_khr_fp64
Error creating command queue (-6): CL_OUT_OF_HOST_MEMORY
Error getting device and context (-6): CL_OUT_OF_HOST_MEMORY
Failed to calculate likelihood
Using AVX path
Found 1 platform
Platform 0 information:
  Name:       AMD Accelerated Parallel Processing
  Version:    OpenCL 2.1 AMD-APP (3513.0)
  Vendor:     Advanced Micro Devices, Inc.
  Extensions: cl_khr_icd cl_amd_event_callback 
  Profile:    FULL_PROFILE
Using device 0 on platform 0
Found 2 CL devices
Device 'gfx1100' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Board: Radeon RX 7900 XTX
Driver version:      3513.0 (HSA1.1,LC)
Version:             OpenCL 2.0 
Compute capability:  0.0
Max compute units:   48
Clock frequency:     3220 Mhz
Global mem size:     25753026560
Local mem size:      65536
Max const buf size:  21890072576
Double extension:    cl_khr_fp64
Error creating command queue (-6): CL_OUT_OF_HOST_MEMORY
Error getting device and context (-6): CL_OUT_OF_HOST_MEMORY
Failed to calculate likelihood
Using AVX path
Found 1 platform
Platform 0 information:
  Name:       AMD Accelerated Parallel Processing
  Version:    OpenCL 2.1 AMD-APP (3513.0)
  Vendor:     Advanced Micro Devices, Inc.
  Extensions: cl_khr_icd cl_amd_event_callback 
  Profile:    FULL_PROFILE
Using device 0 on platform 0
Found 2 CL devices
Device 'gfx1100' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Board: Radeon RX 7900 XTX
Driver version:      3513.0 (HSA1.1,LC)
Version:             OpenCL 2.0 
Compute capability:  0.0
Max compute units:   48
Clock frequency:     3220 Mhz
Global mem size:     25753026560
Local mem size:      65536
Max const buf size:  21890072576
Double extension:    cl_khr_fp64
Error creating command queue (-6): CL_OUT_OF_HOST_MEMORY
Error getting device and context (-6): CL_OUT_OF_HOST_MEMORY
Failed to calculate likelihood
Using AVX path
Found 1 platform
Platform 0 information:
  Name:       AMD Accelerated Parallel Processing
  Version:    OpenCL 2.1 AMD-APP (3513.0)
  Vendor:     Advanced Micro Devices, Inc.
  Extensions: cl_khr_icd cl_amd_event_callback 
  Profile:    FULL_PROFILE
Using device 0 on platform 0
Found 2 CL devices
Device 'gfx1100' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Board: Radeon RX 7900 XTX
Driver version:      3513.0 (HSA1.1,LC)
Version:             OpenCL 2.0 
Compute capability:  0.0
Max compute units:   48
Clock frequency:     3220 Mhz
Global mem size:     25753026560
Local mem size:      65536
Max const buf size:  21890072576
Double extension:    cl_khr_fp64
Error creating command queue (-6): CL_OUT_OF_HOST_MEMORY
Error getting device and context (-6): CL_OUT_OF_HOST_MEMORY
Failed to calculate likelihood
18:10:55 (9902): called boinc_finish(-6)

</stderr_txt>
]]>


BOINC is version 7.20.5 installed on Ubuntu 22.04, with latest ROCm drivers:

https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.4.3/page/How_to_Install_ROCm.html

As reported by others, Einstein@Home GPU tasks work fine.

Best regards,

Samuel
ID: 75074 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 554,885,540
RAC: 36,448
Message 75076 - Posted: 22 Feb 2023, 4:32:02 UTC

Don't feel like you just lack the knowledge to figure out OpenCL on the Radeon 6900XTX cards. Even Michael Larabel at Phoronix, who is an actual Linux wiz, couldn't get the ROCm drivers to run OpenCL tests without these "out of memory" errors.

https://www.phoronix.com/review/nvidia-rtx4080-rtx4090-compute

Besides many of the binary-only (CUDA) benchmarks being incompatible with the AMD ROCm compute stack, even for the common OpenCL benchmarks there were problems testing the latest driver build; the Radeon RX 7900 XTX was hitting OpenCL "out of host memory" errors when initializing the OpenCL driver with the RDNA3 GPUs. So with those issues plus the AMD ROCm compute stack still being hit or miss depending upon the particular consumer GPU, this article ended up just being a generational look at the NVIDIA compute performance on Ubuntu Linux.


I really feel anyone that is still trying to tuff it out getting the newer AMD cards to do BOINC OpenCL projects is just a glutton for punishment.

Much simpler to use Nvidia cards which 'just work' and get on with crunching. There really is no difference in FP64 capabilities anymore in the latest generation of consumer cards from either camp.
ID: 75076 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile alk44
Avatar

Send message
Joined: 2 Mar 20
Posts: 131
Credit: 320,183,524
RAC: 13,434
Message 75077 - Posted: 22 Feb 2023, 5:36:49 UTC - in response to Message 75074.  

I'm certainly no authority on this problem, but have you attempted using an older driver version.
Just maybe, the newer ones don't agree with the Milkyway app or Boinc.

Sorry I can't be of any real help. Good luck!

Allen
ID: 75077 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 623
Credit: 19,260,717
RAC: 522
Message 75078 - Posted: 22 Feb 2023, 10:51:01 UTC - in response to Message 75076.  
Last modified: 22 Feb 2023, 11:00:46 UTC

There really is no difference in FP64 capabilities anymore in the latest generation of consumer cards from either camp.
Don't agree on that, AMD Radeon RX 7950 XTX 2.534 TFLOPS (1:32), NVIDIA GeForce RTX 4090 1,290 GFLOPS (1:64), so just half of the AMD card and nearly 100W higher TDP. The AMD Radeon RX 7900 XTX has still 1.919 TFLOPS FP64, i.e. ~1.5x of the RTX 4090 at 62% of the price. That are huge differences.


Much simpler to use Nvidia cards which 'just work' and get on with crunching.
Perhaps even simpler: use Windows. ;-)
(sorry, could not resist)
ID: 75078 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 18 Nov 22
Posts: 84
Credit: 640,530,847
RAC: 0
Message 75079 - Posted: 22 Feb 2023, 14:14:53 UTC - in response to Message 75078.  

There really is no difference in FP64 capabilities anymore in the latest generation of consumer cards from either camp.
Don't agree on that, AMD Radeon RX 7950 XTX 2.534 TFLOPS (1:32), NVIDIA GeForce RTX 4090 1,290 GFLOPS (1:64), so just half of the AMD card and nearly 100W higher TDP. The AMD Radeon RX 7900 XTX has still 1.919 TFLOPS FP64, i.e. ~1.5x of the RTX 4090 at 62% of the price. That are huge differences.


You’d have to look at actual power use. Very likely that full TDP is not being pulled to run on the 4090.

But the spirit of the comment is still valid. Both AMD and Nvidia are slashing the FP64 capabilities of their consumer based cards. AMD not as much as Nvidia, but they are still doing it to a large extent. Older Nvidia cards still reign supreme here though. P100s for the budget option, or Titan V for higher density.

ID: 75079 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 623
Credit: 19,260,717
RAC: 522
Message 75080 - Posted: 22 Feb 2023, 15:53:48 UTC - in response to Message 75079.  

You’d have to look at actual power use. Very likely that full TDP is not being pulled to run on the 4090.
I'm even pretty sure, that the full TDP isn't pulled while crunching, in particular here with FP64 load (my GTX 275 is quite a bit warmer when crunching Moo! for example), but that's the same for AMD cards.
ID: 75080 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 554,885,540
RAC: 36,448
Message 75081 - Posted: 22 Feb 2023, 20:21:08 UTC - in response to Message 75078.  

I could care less about theoretical FP64 specifications. I would just examine the actual 1X computation times for both cards. You won't see the 4090 card turning in 2X the computation time of the 7950 XTX.
ID: 75081 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Questions and Answers : Unix/Linux : CL_OUT_OF_HOST_MEMORY with AMD RX 6600 XT on Xubuntu 20.04

©2024 Astroinformatics Group