Welcome to MilkyWay@home

Posts by tictoc

1) Questions and Answers : Unix/Linux : RX6600/Ubuntu 22.10 > Error creating command queue (-6): CL_OUT_OF_HOST_MEMORY (Message 75415)
Posted 31 May 2023 by tictoc
Post:
MilkyWay@Home needs what is probably a fairly minor update in order to run on the ROCm OpenCL drivers. While quite a few older OpenCL applications will run without modifications on the newer ROCm stack, that is not the case for everything. For example, over on Einstein@Home, while the FGRPB1G app ran with no modifications, O2MDF (Gravitational Wave tasks from 2020) needed a few small modifications, in order to run on both the ROCm open source drivers, and the now deprecated AMDGPU-Pro drivers. https://einsteinathome.org/goto/comment/175741

There was probably a bit of AMD special sauce in the closed source Linux drivers, that most likely still exists in the Windows driver, which allows MilkyWay to run on the latest Windows driver. I did do a simple rebuild of MilkyWay on a modern ROCm Linux stack, but the rebuilt app still failed with the same errors. It has been on my much too long todo list to go through the MilkyWay app and see if it is a simple fix, but I haven't taken the time to do that.

Since I recently started crunching MilkyWay again, I am just going the easy route and passing a few Radeon VII's through to a VM that is using the OpenCL bits from the old AMDGPU-Pro driver before it transitioned to ROCm for the compute backend (AMDGPU-Pro 20.30.1109583-1). This allows me to run MilkyWay on a fully up-to-date system (kernel 6.3.5|libc 2.37) until the MilkyWay app is updated.
2) Message boards : Number crunching : Benchmark thread 1-2019 on - GPU & CPU times wanted for new WUs, old & new hardware! (Message 68939)
Posted 31 Jul 2019 by tictoc
Post:
Here are some results from my 5700XT.

The 5700XT is running in an Ubuntu VM. The current drivers for the 5700XT are a mess, and it was much easier to just pass the GPU through to a VM and test with the AMD release drivers.

Host OS: Arch Linux kernel 5.2.3
CPU: AMD Ryzen Threadrippper 2970WX @ 3775MHz (SMT on)

Guest OS: Ubuntu 18.04.2 LTS kernel 4.18.0-25
GPU Driver: AMDGPU-Pro 19.30-838629
BOINC Version: 7.9.3

227.12
GPU: AMD Radeon RX 5700XT @ 1980/1750
10 WU avg run-time - 98.79s

227.51/52/53
GPU: AMD Radeon RX 5700XT @ 1980/1750
30 WU avg run-time (10 per point value) - 96.39s

244.01
GPU: AMD Radeon RX 5700XT @ 1980/1750
10 WU avg run-time - 103.59s

All 300 tasks that I ran, completed without errors or invalids.

If I tuned this VM by pinning CPUs and NUMA nodes, I could probably improve the performance a bit, but this should be within 5-8% of native performance.

Also, if anyone is going to try to run this GPU in Linux at this early stage, there are a whole lot of bugs and issues (fan stuck at 40%, no gpu temp monitoring on the 4.18 LTS kernel, no underclocking or overclocking, etc, etc, etc......)
3) Message boards : Number crunching : Invalids Exit status 0 (0x0) after server came back (Message 68216)
Posted 7 Mar 2019 by tictoc
Post:
I'm sitting at about 8-9% invalid, which is 1000+ tasks on one machine and 160 tasks on another machine. That seems to be a much higher rate than anyone else. No issues before the last outage, and I only had one invalid task prior to 00:00 on March 5th.
4) Message boards : Number crunching : Benchmark thread 1-2019 on - GPU & CPU times wanted for new WUs, old & new hardware! (Message 68188)
Posted 24 Feb 2019 by tictoc
Post:
It looks like the only tasks I am getting now are 243.61 and 227.15/6/7 point tasks

Here are some results from my Radeon VII
Runtimes are an average of 20 tasks run singly.

OS: Arch Linux kernel 4.20
BOINC Version: 7.12.1
GPU Driver: AMDGPU kernel driver with OpenCL from AMDGPU-Pro 18.50
CPU: AMD Ryzen Threadrippper 2970WX @ 3550MHz (SMT on)

227.15/6/7
GPU: AMD Radeon VII @ 1800/1000
20 WU avg run-time - 18.16s

243.61
GPU: AMD Radeon VII @ 1800/1000
20 WU avg run-time - 19.15s

After pretty extensive testing running 8 tasks concurrently seems to give the maximum throughput on the Radeon VII. No issues running as many as 12 tasks concurrently, but there are zero improvements to overall throughput after 8 concurrent tasks. Running at 8x results in roughly a 75% bump in PPD. (1 million to 1.75+ million)
5) Message boards : Number crunching : AMD FirePro S9150 (Message 68180)
Posted 22 Feb 2019 by tictoc
Post:
My guess (from an actual experience I had years ago) is that those 5 tasks are not completely "unzipped" before the coprocessor starts on them and the situation gets worse as more tasks are added. When I was running 20 concurrent (total of 100 on an s9100) I got a huge amount of valid tasks, but the number invalids was so high the total throughput was worse then when 4 were running. However, I could have left it running like that but it would have caused delays in validation for other users I was a wingman to.


Looking at GPU usage while a single task is running I don't think the tasks run in parallel, but are rather crunched in series. Each of the bundled tasks spikes CPU usage at what appears to be the conclusion of the task.

With multiple 7970s in the same system, I noticed a rather large drop in throughput (along with some crashes) when running more than three tasks concurrently, unless I gave each GPU adequate CPU resources. The CPU in the system at the time was an AMD R7 1700, so I was able to free up CPU resources to feed the GPUs. With 20 tasks running concurrently, the spike in CPU and IO overhead, as each of the bundled tasks completes, is going to be pretty high. This is especially true as the tasks sync up and start finishing/starting at the same time. It was that behavior which would choke my system, and occasionally crash the driver. Ultimately, rather than babysit the system, I just ran two tasks per GPU and suffered the slightly worse overall throughput for the sake of system stability.
6) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 68010)
Posted 14 Jan 2019 by tictoc
Post:
Here's some of the new 227.62 tasks for your database. All times are with one task running per GPU.

OS: Arch Linux kernel 4.19
BOINC Version: 7.12.1
GPU Driver: Catalyst 15.12
CPU: Intel Xeon E5 ES 10 core @ 2700MHz (ht off)

GPU: AMD HD7970 @ 1200/1400
10 WU avg run-time - 38.23s

GPU: AMD R9 290 @ 1000/1300
10 WU avg run-time - 70.94s


Hi, added times to the AnandTech benchmark thread, but are both those GPUs in the same machine? If not what CPUs are with which GPUs?


Those GPUs were in the same machine when I posted those times. The CPU is more or less an E5-2680v2, with an all-core turbo of 2700MHz.
7) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 67942)
Posted 12 Dec 2018 by tictoc
Post:
Here's some of the new 227.62 tasks for your database. All times are with one task running per GPU.

OS: Arch Linux kernel 4.19
BOINC Version: 7.12.1
GPU Driver: Catalyst 15.12
CPU: Intel Xeon E5 ES 10 core @ 2700MHz (ht off)

GPU: AMD HD7970 @ 1200/1400
10 WU avg run-time - 38.23s

GPU: AMD R9 290 @ 1000/1300
10 WU avg run-time - 70.94s
8) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 67824)
Posted 26 Sep 2018 by tictoc
Post:
Will test with my RTX 2080 in a couple days but I'm assuming with a lack of double precision power it will do worse then the very old Titan and horribly worse then the Tesla.


The 20xx series GPUs fp64 is 1/32 fp32, so only marginally better than a 1080ti. Which puts it in the same class as an RX 480 or an HD 5850.
9) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 66909)
Posted 31 Dec 2017 by tictoc
Post:
Here are some results from a 7970 running at stock clocks in Linux.

OS: Manjaro Linux kernel 4.15
BOINC Version: 7.8.4
GPU Driver: Catalyst 15.9
CPU: AMD R7 1700 @ 3800MHz
GPU: AMD HD7970 @ 925/1325

10 WU avg run-time - 53.9s
10) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 66667)
Posted 26 Sep 2017 by tictoc
Post:
Windows 7 Ultimate 64 bit - BOINC 7.6.33 - GPU Driver 14.1
Xeon E3 1270 V3 @ 3.7 GHz HT on
MW 1 task only - GPU @ 1065MHz - Memory @ 1555MHz 40 seconds - NO CPU tasks !
Same with CPU tasks limited to 88% - 43 seconds

dunx

P.S. Tried two tasks per GPU, and it messed up all four GPUs.... liking the sensible VRM temps though !


Err thanks for your time :), but what GPU is that?? lol ;)
And I assume you meant running 1 task at a time & not the time from just a single task?


dunx was running a pair of 7970s when those times were posted.


Tictoc
Thanks for your time :), will add it in.... awesome time btw! :D No1 by a long shot! I did a double take when I placed it, but when I saw your GPU clock I can see why, 1250 MHz is some o/c! :), water cooled?


GPUs are watercooled. I generally run them at 1200 MHz, because they hit the wall at 1200 MHz and it takes quite a bit of additional voltage to get them stable at 1250. Once that machine is back on Windows, I will have a new result to add. At least one of my 7970s will crunch MilkyWay at 1315ish.

Sorting out a few things, but I should have some Linux results, with the new AMD driver, on an R9 290 to add over the weekend. Making a push on Collatz at the moment, but I'll be back to MilkyWay in October. :)
11) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 66304)
Posted 21 Apr 2017 by tictoc
Post:
Looks like I haven't posted to this with the new app.

OS: Windows 8.1 Pro x64
BOINC Version: 7.6.33
GPU Driver: Radeon Crimson 17.3.3
CPU: AMD R7 1700 @ 3800MHz
GPU: AMD HD7970 @ 1250/1550

10 WU avg run-time - 32.08s


You have only one 32.08s and i can not find it... )))
your last 60 units are over 70s 80second ...
PLEASE do not "cherry pick" one short unit to benchmark
Several hundred or more can be take like some stable result.


I am currently running 4 concurrently per card, because the efficiency is much better with all of the CPU time in the new tasks. Here are the pages with the units I ran singly to get a baseline:

https://milkyway.cs.rpi.edu/milkyway/results.php?userid=193178&offset=15360&show_names=0&state=4&appid=

https://milkyway.cs.rpi.edu/milkyway/results.php?userid=193178&offset=15380&show_names=0&state=4&appid=

No need to run 100s of units, I have ran 1000s of units at 4x, and the only invalids are due to other machines failing units. :)
12) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 66302)
Posted 18 Apr 2017 by tictoc
Post:
Looks like I haven't posted to this with the new app.

OS: Windows 8.1 Pro x64
BOINC Version: 7.6.33
GPU Driver: Radeon Crimson 17.3.3
CPU: AMD R7 1700 @ 3800MHz
GPU: AMD HD7970 @ 1250/1550

10 WU avg run-time - 32.08s
13) Message boards : Number crunching : Underused 970 ? (Message 65571)
Posted 1 Nov 2016 by tictoc
Post:
MilkyWay uses double precision calculations, and the 460 was capped at I believe 1/12 fp64 and the 970 is capped at 1/32 fp64. Even though the 970 is a much faster GPU (at single precision aka fp32) the cap on fp64 has made all consumer NVIDIA cards after Fermi, with the exception of the OG Titan and Titan Black, very inefficient at double precision work loads.
14) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 64525)
Posted 1 May 2016 by tictoc
Post:
After a hiatus from MilkyWay I am back, and I have dedicated two 7970s to crunch on MilkyWay until they die. ;)

OS: Windows 7 x64
BOINC Version: 7.6.22
GPU Driver: 14.9 WHQL
CPU: AMD FX8320e @ 4.8 GHz
GPU: AMD HD7970 @ 1250/1550

20 WU avg run-time - 21.02s

Running two tasks concurrently has slightly better throughput, with each task taking an average of 35.5 seconds to complete.
15) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 63451)
Posted 23 Apr 2015 by tictoc
Post:

tictoc
Thx for the info :)
What are the short comings on latter 7970s? (I'm looking at buying a 2nd hand 7970 soon)

Some of the lower end 7970s and 280x's had a pretty weak power delivery system, and did not have VRM temp monitoring. This is pretty important for MilkyWay, since it pushes the VRMs harder than just about any other task you can throw at it.
I am also not a fan of the Boost Bios' or any type of throttling in the BIOS, since it can make it tough to keep the card running at full speed

@usao the 7990 is just two full Tahiti(7970/280x) dies on one PCB. If temps and throttling can be kept in check then it theoretically will perform the same as two 7970s or 280x's. In practice this is not always true, since many BOINC projects can be tough to run on a dual GPU card. I never ran my 7990 on MilkyWay, but with a no-throttle BIOS it was comparable to a pair of stock clocked 7970s in other tasks.
16) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 63392)
Posted 17 Apr 2015 by tictoc
Post:
Sorry for the late reply tictoc, I think the email sub got buried :o, or there isn't 1. Anyway, thx for the times, you've posted a new fastest time :).

Btw are they standard 7970 or GHz edition versions? Either way, nice o/cs :)


My 7970's are original reference models. The 7970 in my Windows 7 rig has been crunching and folding at 1200 Mhz since release day. It has spent most of it's life under water, so core and VRM temps have always been kept in check. It is actually stable at 1250Mhz for a few other projects.

My other three 7970's are also standard reference models. The PCB on the original reference 7970s is a very robust design, with none of the short-comings that many of the later model boards had.
17) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 63236)
Posted 15 Mar 2015 by tictoc
Post:
Here are some more WU times.

OS: Windows 7 x64
BOINC Version: 7.4.36
GPU Driver: 14.9 WHQL
CPU: intel i7-4790k @ 4.6 Ghz
GPU: AMD HD7970 @ 1200/1550

20 WU avg run-time - 21.29s

OS: Windows 10 x64
BOINC Version: 7.4.36
GPU Driver: 14.9 WHQL
CPU: AMD x6 1055t @ 3.2 Ghz
GPU: AMD HD7970 @ 1200/1550

20 WU avg run-time - 23.74s

For the Windows 10 rig, I was initially running at 1175 Mhz on the packaged driver, which is 14.8 WHQL. I thought the older driver and slightly lower clock speed might be the reason for the longer run-times, so I updated to 14.9 WHQL (best driver for the majority of the projects I run).

After updating to 14.9, and pushing my core clock up to 1200 Mhz, the run-times are still about 2-2.5 seconds longer on Windows 10. It Looks like Windows 10, build 9926, is not quite as efficient as Windows 7 x64.




©2024 Astroinformatics Group