1)
Questions and Answers :
Unix/Linux :
RX6600/Ubuntu 22.10 > Error creating command queue (-6): CL_OUT_OF_HOST_MEMORY
(Message 75415)
Posted 31 May 2023 by tictoc Post: MilkyWay@Home needs what is probably a fairly minor update in order to run on the ROCm OpenCL drivers. While quite a few older OpenCL applications will run without modifications on the newer ROCm stack, that is not the case for everything. For example, over on Einstein@Home, while the FGRPB1G app ran with no modifications, O2MDF (Gravitational Wave tasks from 2020) needed a few small modifications, in order to run on both the ROCm open source drivers, and the now deprecated AMDGPU-Pro drivers. https://einsteinathome.org/goto/comment/175741 There was probably a bit of AMD special sauce in the closed source Linux drivers, that most likely still exists in the Windows driver, which allows MilkyWay to run on the latest Windows driver. I did do a simple rebuild of MilkyWay on a modern ROCm Linux stack, but the rebuilt app still failed with the same errors. It has been on my much too long todo list to go through the MilkyWay app and see if it is a simple fix, but I haven't taken the time to do that. Since I recently started crunching MilkyWay again, I am just going the easy route and passing a few Radeon VII's through to a VM that is using the OpenCL bits from the old AMDGPU-Pro driver before it transitioned to ROCm for the compute backend (AMDGPU-Pro 20.30.1109583-1). This allows me to run MilkyWay on a fully up-to-date system (kernel 6.3.5|libc 2.37) until the MilkyWay app is updated. |
2)
Message boards :
Number crunching :
Benchmark thread 1-2019 on - GPU & CPU times wanted for new WUs, old & new hardware!
(Message 68939)
Posted 31 Jul 2019 by tictoc Post: Here are some results from my 5700XT. The 5700XT is running in an Ubuntu VM. The current drivers for the 5700XT are a mess, and it was much easier to just pass the GPU through to a VM and test with the AMD release drivers. Host OS: Arch Linux kernel 5.2.3 CPU: AMD Ryzen Threadrippper 2970WX @ 3775MHz (SMT on) Guest OS: Ubuntu 18.04.2 LTS kernel 4.18.0-25 GPU Driver: AMDGPU-Pro 19.30-838629 BOINC Version: 7.9.3 227.12 GPU: AMD Radeon RX 5700XT @ 1980/1750 10 WU avg run-time - 98.79s 227.51/52/53 GPU: AMD Radeon RX 5700XT @ 1980/1750 30 WU avg run-time (10 per point value) - 96.39s 244.01 GPU: AMD Radeon RX 5700XT @ 1980/1750 10 WU avg run-time - 103.59s All 300 tasks that I ran, completed without errors or invalids. If I tuned this VM by pinning CPUs and NUMA nodes, I could probably improve the performance a bit, but this should be within 5-8% of native performance. Also, if anyone is going to try to run this GPU in Linux at this early stage, there are a whole lot of bugs and issues (fan stuck at 40%, no gpu temp monitoring on the 4.18 LTS kernel, no underclocking or overclocking, etc, etc, etc......) |
3)
Message boards :
Number crunching :
Invalids Exit status 0 (0x0) after server came back
(Message 68216)
Posted 7 Mar 2019 by tictoc Post: I'm sitting at about 8-9% invalid, which is 1000+ tasks on one machine and 160 tasks on another machine. That seems to be a much higher rate than anyone else. No issues before the last outage, and I only had one invalid task prior to 00:00 on March 5th. |
4)
Message boards :
Number crunching :
Benchmark thread 1-2019 on - GPU & CPU times wanted for new WUs, old & new hardware!
(Message 68188)
Posted 24 Feb 2019 by tictoc Post: It looks like the only tasks I am getting now are 243.61 and 227.15/6/7 point tasks Here are some results from my Radeon VII Runtimes are an average of 20 tasks run singly. OS: Arch Linux kernel 4.20 BOINC Version: 7.12.1 GPU Driver: AMDGPU kernel driver with OpenCL from AMDGPU-Pro 18.50 CPU: AMD Ryzen Threadrippper 2970WX @ 3550MHz (SMT on) 227.15/6/7 GPU: AMD Radeon VII @ 1800/1000 20 WU avg run-time - 18.16s 243.61 GPU: AMD Radeon VII @ 1800/1000 20 WU avg run-time - 19.15s After pretty extensive testing running 8 tasks concurrently seems to give the maximum throughput on the Radeon VII. No issues running as many as 12 tasks concurrently, but there are zero improvements to overall throughput after 8 concurrent tasks. Running at 8x results in roughly a 75% bump in PPD. (1 million to 1.75+ million) |
5)
Message boards :
Number crunching :
AMD FirePro S9150
(Message 68180)
Posted 22 Feb 2019 by tictoc Post: My guess (from an actual experience I had years ago) is that those 5 tasks are not completely "unzipped" before the coprocessor starts on them and the situation gets worse as more tasks are added. When I was running 20 concurrent (total of 100 on an s9100) I got a huge amount of valid tasks, but the number invalids was so high the total throughput was worse then when 4 were running. However, I could have left it running like that but it would have caused delays in validation for other users I was a wingman to. Looking at GPU usage while a single task is running I don't think the tasks run in parallel, but are rather crunched in series. Each of the bundled tasks spikes CPU usage at what appears to be the conclusion of the task. With multiple 7970s in the same system, I noticed a rather large drop in throughput (along with some crashes) when running more than three tasks concurrently, unless I gave each GPU adequate CPU resources. The CPU in the system at the time was an AMD R7 1700, so I was able to free up CPU resources to feed the GPUs. With 20 tasks running concurrently, the spike in CPU and IO overhead, as each of the bundled tasks completes, is going to be pretty high. This is especially true as the tasks sync up and start finishing/starting at the same time. It was that behavior which would choke my system, and occasionally crash the driver. Ultimately, rather than babysit the system, I just ran two tasks per GPU and suffered the slightly worse overall throughput for the sake of system stability. |
6)
Message boards :
Number crunching :
New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new!
(Message 68010)
Posted 14 Jan 2019 by tictoc Post: Here's some of the new 227.62 tasks for your database. All times are with one task running per GPU. Those GPUs were in the same machine when I posted those times. The CPU is more or less an E5-2680v2, with an all-core turbo of 2700MHz. |
7)
Message boards :
Number crunching :
New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new!
(Message 67942)
Posted 12 Dec 2018 by tictoc Post: Here's some of the new 227.62 tasks for your database. All times are with one task running per GPU. OS: Arch Linux kernel 4.19 BOINC Version: 7.12.1 GPU Driver: Catalyst 15.12 CPU: Intel Xeon E5 ES 10 core @ 2700MHz (ht off) GPU: AMD HD7970 @ 1200/1400 10 WU avg run-time - 38.23s GPU: AMD R9 290 @ 1000/1300 10 WU avg run-time - 70.94s |
8)
Message boards :
Number crunching :
New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new!
(Message 67824)
Posted 26 Sep 2018 by tictoc Post: Will test with my RTX 2080 in a couple days but I'm assuming with a lack of double precision power it will do worse then the very old Titan and horribly worse then the Tesla. The 20xx series GPUs fp64 is 1/32 fp32, so only marginally better than a 1080ti. Which puts it in the same class as an RX 480 or an HD 5850. |
9)
Message boards :
Number crunching :
New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new!
(Message 66909)
Posted 31 Dec 2017 by tictoc Post: Here are some results from a 7970 running at stock clocks in Linux. OS: Manjaro Linux kernel 4.15 BOINC Version: 7.8.4 GPU Driver: Catalyst 15.9 CPU: AMD R7 1700 @ 3800MHz GPU: AMD HD7970 @ 925/1325 10 WU avg run-time - 53.9s |
10)
Message boards :
Number crunching :
New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new!
(Message 66667)
Posted 26 Sep 2017 by tictoc Post: Windows 7 Ultimate 64 bit - BOINC 7.6.33 - GPU Driver 14.1 dunx was running a pair of 7970s when those times were posted.
GPUs are watercooled. I generally run them at 1200 MHz, because they hit the wall at 1200 MHz and it takes quite a bit of additional voltage to get them stable at 1250. Once that machine is back on Windows, I will have a new result to add. At least one of my 7970s will crunch MilkyWay at 1315ish. Sorting out a few things, but I should have some Linux results, with the new AMD driver, on an R9 290 to add over the weekend. Making a push on Collatz at the moment, but I'll be back to MilkyWay in October. :) |
11)
Message boards :
Number crunching :
New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new!
(Message 66304)
Posted 21 Apr 2017 by tictoc Post: Looks like I haven't posted to this with the new app. I am currently running 4 concurrently per card, because the efficiency is much better with all of the CPU time in the new tasks. Here are the pages with the units I ran singly to get a baseline: https://milkyway.cs.rpi.edu/milkyway/results.php?userid=193178&offset=15360&show_names=0&state=4&appid= https://milkyway.cs.rpi.edu/milkyway/results.php?userid=193178&offset=15380&show_names=0&state=4&appid= No need to run 100s of units, I have ran 1000s of units at 4x, and the only invalids are due to other machines failing units. :) |
12)
Message boards :
Number crunching :
New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new!
(Message 66302)
Posted 18 Apr 2017 by tictoc Post: Looks like I haven't posted to this with the new app. OS: Windows 8.1 Pro x64 BOINC Version: 7.6.33 GPU Driver: Radeon Crimson 17.3.3 CPU: AMD R7 1700 @ 3800MHz GPU: AMD HD7970 @ 1250/1550 10 WU avg run-time - 32.08s |
13)
Message boards :
Number crunching :
Underused 970 ?
(Message 65571)
Posted 1 Nov 2016 by tictoc Post: MilkyWay uses double precision calculations, and the 460 was capped at I believe 1/12 fp64 and the 970 is capped at 1/32 fp64. Even though the 970 is a much faster GPU (at single precision aka fp32) the cap on fp64 has made all consumer NVIDIA cards after Fermi, with the exception of the OG Titan and Titan Black, very inefficient at double precision work loads. |
14)
Message boards :
Number crunching :
New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new!
(Message 64525)
Posted 1 May 2016 by tictoc Post: After a hiatus from MilkyWay I am back, and I have dedicated two 7970s to crunch on MilkyWay until they die. ;) OS: Windows 7 x64 BOINC Version: 7.6.22 GPU Driver: 14.9 WHQL CPU: AMD FX8320e @ 4.8 GHz GPU: AMD HD7970 @ 1250/1550 20 WU avg run-time - 21.02s Running two tasks concurrently has slightly better throughput, with each task taking an average of 35.5 seconds to complete. |
15)
Message boards :
Number crunching :
New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new!
(Message 63451)
Posted 23 Apr 2015 by tictoc Post:
Some of the lower end 7970s and 280x's had a pretty weak power delivery system, and did not have VRM temp monitoring. This is pretty important for MilkyWay, since it pushes the VRMs harder than just about any other task you can throw at it. I am also not a fan of the Boost Bios' or any type of throttling in the BIOS, since it can make it tough to keep the card running at full speed @usao the 7990 is just two full Tahiti(7970/280x) dies on one PCB. If temps and throttling can be kept in check then it theoretically will perform the same as two 7970s or 280x's. In practice this is not always true, since many BOINC projects can be tough to run on a dual GPU card. I never ran my 7990 on MilkyWay, but with a no-throttle BIOS it was comparable to a pair of stock clocked 7970s in other tasks. |
16)
Message boards :
Number crunching :
New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new!
(Message 63392)
Posted 17 Apr 2015 by tictoc Post: Sorry for the late reply tictoc, I think the email sub got buried :o, or there isn't 1. Anyway, thx for the times, you've posted a new fastest time :). My 7970's are original reference models. The 7970 in my Windows 7 rig has been crunching and folding at 1200 Mhz since release day. It has spent most of it's life under water, so core and VRM temps have always been kept in check. It is actually stable at 1250Mhz for a few other projects. My other three 7970's are also standard reference models. The PCB on the original reference 7970s is a very robust design, with none of the short-comings that many of the later model boards had. |
17)
Message boards :
Number crunching :
New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new!
(Message 63236)
Posted 15 Mar 2015 by tictoc Post: Here are some more WU times. OS: Windows 7 x64 BOINC Version: 7.4.36 GPU Driver: 14.9 WHQL CPU: intel i7-4790k @ 4.6 Ghz GPU: AMD HD7970 @ 1200/1550 20 WU avg run-time - 21.29s OS: Windows 10 x64 BOINC Version: 7.4.36 GPU Driver: 14.9 WHQL CPU: AMD x6 1055t @ 3.2 Ghz GPU: AMD HD7970 @ 1200/1550 20 WU avg run-time - 23.74s For the Windows 10 rig, I was initially running at 1175 Mhz on the packaged driver, which is 14.8 WHQL. I thought the older driver and slightly lower clock speed might be the reason for the longer run-times, so I updated to 14.9 WHQL (best driver for the majority of the projects I run). After updating to 14.9, and pushing my core clock up to 1200 Mhz, the run-times are still about 2-2.5 seconds longer on Windows 10. It Looks like Windows 10, build 9926, is not quite as efficient as Windows 7 x64. |
©2024 Astroinformatics Group