Welcome to MilkyWay@home

CL_OUT_OF_HOST_MEMORY with AMD RX 6600 XT on Xubuntu 20.04

Questions and Answers : Unix/Linux : CL_OUT_OF_HOST_MEMORY with AMD RX 6600 XT on Xubuntu 20.04
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 578
Credit: 18,845,239
RAC: 856
Message 75082 - Posted: 23 Feb 2023, 10:50:53 UTC - in response to Message 75081.  

You won't see the 4090 card turning in 2X the computation time of the 7950 XTX.
Perhaps not, but would it be at least as fast as the 7950 XTX? Considering it's price, it would have to be even quite a bit faster (or use less power), otherwise it's not a great deal for those using it mainly or only for Milkyway (it might be on other projects though).
ID: 75082 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 540,034,563
RAC: 86,710
Message 75083 - Posted: 23 Feb 2023, 21:57:25 UTC

I would state that it is more than likely that most BOINC crunchers crunch on more than one project. I woud guess there are very few crunchers, or very much in the minority that only crunch Milkyway.

So not having the most efficient gpu for just MW is not a consideration. There are a few projects that only offer CUDA applications so that leaves any AMD card out of consideration.

Based on your supposition for a MW only cruncher, I would state it is more efficient to use the much older AMD generations of gpus that have high FP64 rankings. They would be most cost effective but wouldn't be useful for other than crunching purposes.
ID: 75083 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 75168 - Posted: 20 Mar 2023, 15:57:06 UTC - in response to Message 75074.  
Last modified: 20 Mar 2023, 15:57:33 UTC


https://docs.amd.com/bundle/ROCm-Installation-Guide-v5.4.3/page/How_to_Install_ROCm.html

As reported by others, Einstein@Home GPU tasks work fine.
Best regards,

Samuel



I have just run into this problem testing out an AMD Mi25 that jpmboy sent me.

OpenCL: AMD/ATI GPU 0: Radeon Instinct MI25 (driver version 3513.0 (HSA1.1,LC), device version OpenCL 2.0, 16368MB, 16368MB available, 12288 GFLOPS peak)		
OS: Linux Ubuntu: Ubuntu 22.04.2 LTS [5.19.0-35-generic|libc 2.35]	
Task Ter5_4_cfbf00057_segment_13_dms_400_13200_243_250000_1 postponed for 900 seconds: Not enough free CPU/GPU memory available! Delaying next attempt for at least 15 minutes...


The above error is from an Einstein BREP7-opencl-ati task so it is not just Milkyway

Milkyway@home Separation v1.46 (opencl_ati_101)x86_64-pc-linux-gnu
Error getting device and context (-6): CL_OUT_OF_HOST_MEMORY


I am guessing the problem has to do with some memory segment that my H110-btc motherboard has enabled. I have little experience with UEFI bios and there are lot of settings. I was only able to get the Mi25 board to work by forcing GEN3 on the PCIe x16 slot it is in. I also had to disable CSM.
ID: 75168 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ian&Steve C.
Avatar

Send message
Joined: 18 Nov 22
Posts: 81
Credit: 636,355,130
RAC: 0
Message 75172 - Posted: 20 Mar 2023, 20:38:22 UTC - in response to Message 75168.  

i don't necessarily think it will solve the problem, but you could look for and try setting "Above 4G decoding" to enabled.

this setting does does have something to do with memory of pcie devices, and usually this being disabled will cause issues with GPU detection when you're running several GPUs. but could be worth a shot to see if it has any effect

what drivers have you installed? ROCm would be the best for an Instinct card on Linux I think. but it seems that ROCm 4.5 is the last to support Vega. newer versions might not work. AMD Linux driver compatibility is unnecessarily complicated and cumbersome. might have to run an older OS to run ROCm 4.5. check their install docs to be sure.

i do not think your motherboard has anything to do with the out of memory error you're getting. the error is about the opecl context, nothing to do with the motherboard or system memory since it's not sharing that.

ID: 75172 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 75173 - Posted: 20 Mar 2023, 23:05:34 UTC - in response to Message 75172.  
Last modified: 20 Mar 2023, 23:18:04 UTC

i don't necessarily think it will solve the problem, but you could look for and try setting "Above 4G decoding" to enabled.


It is enabled. When I disabled that (mmio segment?) I got BIOS warning "not enough PCIe resources to go around, remove some PCIe boards***" (I didnt take a picture of the exact wording).

I pulled all the x9150 and s9100 and all the only difference was a lot less fan noise. Got the same error message.

The h110 can set gen 3 for "pcie #2" (of 0..7), all the rest of the slots can be gen1, gen2 or auto
The slots are numbered 0,1,2,3,4,....11
the x16 slot is 3 so probably 2 and 3 are gen3. The Mi25 started working (clinfo only) when gen3 was enabled. All the other slots are x1 with risers.
I tried disabling the h110 internal video thinking that might pull that mmio segment back in but it was worse. 22.04 would not boot. I tried a cheap AMD video board in slot 2 and the Mi25 in slot 3 and vice-versa but system hung.

Tried the following kernels. the one named "jyskern" took me over 7 hours to build.
The AMD driver I used was for 22.04.2
1   Advanced options for Ubuntu                                    │
                                             │           1>0 Ubuntu, with Linux 5.19.0-35-generic                           │
                                             │           1>1 Ubuntu, with Linux 5.19.0-35-generic (recovery mode)           │
                                             │           1>2 Ubuntu, with Linux 5.19.0-32-generic                           │
                                             │           1>3 Ubuntu, with Linux 5.19.0-32-generic (recovery mode)           │
                                             │           1>4 Ubuntu, with Linux 5.13.0jyskern                               │
                                             │           1>5 Ubuntu, with Linux 5.13.0jyskern (recovery mode)


I read the following article where the user burned the wx9100 bios onto the Mi25 and it working in kernel 5.13. He got the video to work also. Normally the Mi25 has no video.
The guide I used to downgrade the kernel is
https://youtu.be/IDYZ9Hm-p44
It was 7 hours to build on a dual xeon 24 thread.
However, when I looked at the mi25 and wxl9100 bios sizes there was huge differences so I do not plan to flash wxl9100 unless I get advice from the author
Plus it does not seem to work anyway.

-rw-rw-r--  1 jstateson jstateson   39201 Mar 20 16:08 218718.rom
-rw-rw-r--  1 jstateson jstateson  262144 Mar  8  2021 230670.rom
-rw-rw-r--  1 jstateson jstateson  262144 Apr 25  2022 245174.rom
-rw-rw-r--  1 jstateson jstateson 1048576 Feb  8  2019 AMD.RadeonVII.16384.190116.rom
-rwxrwxrwx  1 jstateson jstateson 1660176 May 20  2021 amdvbflash*
-rw-rw-r--  1 jstateson jstateson  620696 Mar 20 10:37 amdvbflash_linux_4.71.zip
-rw-r--r--  1 root      root      1048576 Mar 20 16:01 original_mi25


Another problem is I do not know if the rom is locked. Have not figured out exactly what the following denotes.

jstateson@mi25-john:~/flash$ sudo ./amdvbflash -checklock 0
AMDVBFLASH version 4.71, Copyright (c) 2020 Advanced Micro Devices, Inc.

SW protection fail (0x9C)


The following error shows up in einstein slot 0

[15:30:32][2285][INFO ] Application startup - thank you for supporting Einstein@Home!
[15:30:32][2285][INFO ] Starting data processing...
[15:30:32][2285][INFO ] Using OpenCL platform provided by: Advanced Micro Devices, Inc.
[15:30:32][2285][INFO ] Using OpenCL device "gfx900:xnack-" by: Advanced Micro Devices, Inc.
[15:30:32][2285][ERROR] Couldn't create OpenCL command queue (error: -6)!
[15:30:32][2285][INFO ] OpenCL shutdown complete!
[15:30:32][2285][ERROR] Demodulation failed (error: 2013)!
[15:30:32][2285][WARN ] Sorry, at the moment your system doesn't have enough free CPU/GPU memory to run this task!


System has 8gb which is not a problem for AMD VII and 4 of the s91x0 types so I suspect memory to not be a problem.

****edit - That warning included the phrase that If I continue to boot there could be problems but there was no option to continue. There was no "press f1 to continue" or anything like that so not sure what the bios author meant to be done.
ID: 75173 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2

Questions and Answers : Unix/Linux : CL_OUT_OF_HOST_MEMORY with AMD RX 6600 XT on Xubuntu 20.04

©2024 Astroinformatics Group