Welcome to MilkyWay@home

Posts by Joseph Stateson

21) Message boards : Number crunching : Future of Milkyway@Home (Message 75059)
Posted 16 Feb 2023 by Profile Joseph Stateson
Post:
It is. *ahem* will be.



I assume Linux + CUDA which means only Nvidia cards can use the "mod".

Currently, I cannot run any of my (older) hi performance AMD cards under Ubuntu 20.04.5. Every month or two I take another stab at getting them to work.
Under 18.04 my S9xx0 and HD-79xx cards worked fine but not since a disk crash and upgrade to 20.04.5

Advice from AMD forum was not helpful: use the exact release listed and no other.

The release, dated 2021 Q2, is for Ubuntu 20.04.2
https://www.amd.com/en/support/professional-graphics/firepro/firepro-s-series/firepro-s9000

The instructions come with a warning that if one upgrades to kernel 4.15 then the driver from the year 2018 needs to be used instead of the 2021 drivers. This does not make a lot of sense considering that the 20.04.5 kernel is way past 4.15


OS: Linux Ubuntu: Ubuntu 20.04.5 LTS [5.4.0-139-generic|libc 2.31]	


If I follow AMD's exact installation instructions for the Firepro s9000 card, only my RX-570 card works. I discovered this by plugging in the RX-570, powering the system back on, and doing nothing else. The 570 is significantly slower than the s9000 or s9050 cards.

OpenCL: AMD/ATI GPU 0: Radeon RX 570 Series (driver version 3224.4, device version OpenCL 1.2 AMD-APP (3224.4), 4082MB, 4082MB available, 5095 GFLOPS peak)	
22) Message boards : Number crunching : Future of Milkyway@Home (Message 75048)
Posted 10 Feb 2023 by Profile Joseph Stateson
Post:
Something that has always bothered me was the lack of support for CUDA in this project. If you look back at the 2008 "Application Code Discussions" you will find that a developer, Travis, was working on a CUDA implementation. I assume that OpenCL worked better for this project than CUDA and/or Travis left. Unless I am mistaken l remember that "Travis" was also developing CUDA code for, or contributing code, to other projects.
23) Message boards : Number crunching : Run Multiple WU's on Your GPU (Message 75019)
Posted 5 Feb 2023 by Profile Joseph Stateson
Post:
i use afterburner for overclocking and gpu z and hwinfo64 for temperature monitoring


Can you take the side off the pc? If so try that and see if it helps or not, if not you can always put a small floor fan blowing into the open side or as you said get more fans and leave the side on. If your current case doesn't have top fans in some cases it's pretty easy to swap to a new case, Dell and HP's being the notable exceptions.


Small fans are like small dogs. They make a lot of noise but do not do much. Larger fans push more air with less noise but it is possibe to get too big a fan

24) Questions and Answers : Unix/Linux : Compute errors (Message 74999)
Posted 3 Feb 2023 by Profile Joseph Stateson
Post:
Login and run that nvidia diagnostic. I suspect the board got hung up. Possible oveheated.


jstateson@dual-linux:~$ nvidia-smi
Fri Feb  3 09:28:27 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01    Driver Version: 515.86.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA P102-100     Off  | 00000000:01:00.0 Off |                  N/A |
|  0%   52C    P0   142W / 250W |   1804MiB /  5120MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:03:00.0 Off |                  N/A |
| 62%   61C    P2    94W / 120W |    857MiB /  6144MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:04:00.0 Off |                  N/A |
| 17%   57C    P2    63W / 151W |   1056MiB /  8192MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     16919      C   ...-linux-gnu__cuda118_linux     1802MiB |
|    1   N/A  N/A     16934      C   ...-linux-gnu__cuda118_linux      854MiB |
|    2   N/A  N/A     16963      C   ...-linux-gnu__cuda118_linux     1054MiB |
+-----------------------------------------------------------------------------+


[edit] if it is overheating there is a "coolbits" setting that you can use to speed up the fan.
25) Message boards : Number crunching : AMD VII: Occasional a task never finishes and is "hot spot" too high? (Message 74924)
Posted 19 Jan 2023 by Profile Joseph Stateson
Post:
I finally got the AMD performance software to work. Problem was after driver install, I had 5 "ghost" GPU cards. I had to edit the coproc_info.xml file, remove the extra 5 GPUs and then mark the file read only so that BOINC would not be able to add the "extra" GPUs' back in.

The driver I used was "win10-radeon-pro-software-enterprise-21.Q2.1"
Possible using a device driver cleaner and a re-install might fix the copro_info.xml problem. I got duplicate GPUs due to BOINC ( or clinfo) seeing two drivers instead of one so it marked my system as having two opencl platforms and I had almost 500 error'ed out tasks before I could suspend the project and fix the problem.

What driver(s) are you using? Does one card have one version and a different card have another?

Anyway, this is the display of 4 of the 5 gpus. The 5tth would not fit in the screen capture. Note the junction temp, the so-called "hot spot".
Tuning is only available for the VII. I have not tried any tuning yet.
Is it the same app you are using?


26) Message boards : Number crunching : AMD VII: Occasional a task never finishes and is "hot spot" too high? (Message 74921)
Posted 18 Jan 2023 by Profile Joseph Stateson
Post:
When one of my AMD 7970`s got to 104c one day it was dead the next week .
80c is still to hot for my likeing . even short term .
I do everything I can to keep every bit of a hard working GPU/card below 70c ,
As far below as possible .
If you want them to last .


Your HD-7970 is the "Tahiti" chip, same as the S9000 and S9050 cards
None of the S9xxx cards have the so called "hot spot" sensor capability
The temps shown by GPUz or MSI's afterburner are all under 60c

GPUz shows two temps for the VII "GPU"
One is low like 60-70 and is in the same location as the S9xxx series cards.
the other is the "Hot Spot" which jumps extremely, from 70 to 107 and all around
The S9xxx do not have that sensors and are missing other feature that GPUz can show for the VII
27) Message boards : Number crunching : AMD VII: Occasional a task never finishes and is "hot spot" too high? (Message 74919)
Posted 18 Jan 2023 by Profile Joseph Stateson
Post:
I am running 4 work units per GPU: One AMD VIi and several AMD S9xxx boards

About once every 4-5 days a task hangs up on the VII. It is easily fixed by suspending and then resuming the task. The VII is the fastest board, 2x as fast as S9150 and the problem is only on the VII. I used a boinctask "rule" to automatically suspend any MW task taking over 5 minutes which allows the card to continue processing 4 tasks as is normal instead of 3 and a hung task.

1 - Has anyone seen a problem like this before?

2 - GPUz has a feature that shows the "hot spot". My RTX-2080Ti and the VII card are the only ones that reports "hot spot". My other Nvidia and AMD cards lack that feature, probably too old. The VII has 3 fans and is in an open frame rack and there is a box fan cooling the rack. Its "hot spot" runs 102-107c. The RTX-2080Ti is in a case "Area51" that is cramped. It shows 80c for its hot spot. Do these values seem ok?
28) Message boards : Number crunching : seems AMD VII cannot be used on a riser (Message 74910)
Posted 14 Jan 2023 by Profile Joseph Stateson
Post:
For what it's worth, I tried one of those ribbon cable risers that have all the x1 wiring, not just the wiring in the USB3 type cable.
Same problem: VII slowed down terribly. It needs to be in an x16 slot yet crypto miners run the VII on USB3 risers so I am unsure what is happening.
The eBayer I bought the card from was a crypto miner and had no problem as all the cards were in x1 USB3 risers and suggested a bad risers but I tried several.
29) Questions and Answers : Windows : GPU in non-continuous operation. (Message 74909)
Posted 14 Jan 2023 by Profile Joseph Stateson
Post:
[quote]For what it is worth, I updated the Milkyway fix "mod" using BOINC version 7.20.2
https://github.com/JStateson/Milkyway-7-21
If you are running 7.15.0 there is no Milkyway advantage other than having 7.20.2 functionality instead of 7.14
Any problems post as an issue over at github.


Per chance does your 'fix' include the Lunatics 'fix' for Einstein as well, or is yours solely focused on Milkyway?

Not sure what is broken at Einstein that needs a "fix" Mikey.

Can you elucidate the problem?


Poor choice of words, the speed-up would have been a better choice I think

I will try and compile Joseph's Linux BOINC client and drop it into the Lunatics AIO version of BOINC on a spare PC for beta testing his MW "fix"[/quotd]

WOO HOO!!

Only problem is all I have are Nvidia cards and he mentions they aren't fast enough to trigger the problem at MW.

But I used to run into the MW Scheduler problem all the time back when I was running MW at 100% resource share. Why I asked our dev to come up with the report_delay parameter for our Pandora client which he kindly put into the cs_scheduler.cpp for me even though I was the only team member running MW at 100% at that time.


Hmmmm that does present a problem



I had resources set to 0 for the Milkyway project as I run Einstein at %100 on my single Linux system. As Keith mentioned, the share needs to be set to %100 to trigger the problem and be able to see if the mod works properly. AFAICT the Linux version works. The executable can be downloaded or can be built. It does not do anything special over what the 7.15 mod does, but it does work with the 7.20 Berkeley manager AFAICT.
30) Questions and Answers : Windows : GPU in non-continuous operation. (Message 74904)
Posted 12 Jan 2023 by Profile Joseph Stateson
Post:
For what it is worth, I updated the Milkyway fix "mod" using BOINC version 7.20.2
https://github.com/JStateson/Milkyway-7-21
If you are running 7.15.0 there is no Milkyway advantage other than having 7.20.2 functionality instead of 7.14
Any problems post as an issue over at github.


Per chance does your 'fix' include the Lunatics 'fix' for Einstein as well, or is yours solely focused on Milkyway?


That Einstein mod is in Linux and AFAICT makes an improvement in performance of Linux applications. There was also a mod to BOINC: 7.17 and 7.19

It was done by Petri and possibly others using Linux
This Einstein@home App (v1.0 by petri33) was built at: Apr 28 2022 18:47:15


All I have done is modify the scheduling algorithm in BOINC to bypass that 91 second delay that the Milkyway server wants.
31) Questions and Answers : Windows : GPU in non-continuous operation. (Message 74901)
Posted 12 Jan 2023 by Profile Joseph Stateson
Post:
For what it is worth, I updated the Milkyway fix "mod" using BOINC version 7.20.2
https://github.com/JStateson/Milkyway-7-21
If you are running 7.15.0 there is no Milkyway advantage other than having 7.20.2 functionality instead of 7.14
Any problems post as an issue over at github.
32) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 74894)
Posted 6 Jan 2023 by Profile Joseph Stateson
Post:
Here are some more WU times.


BOINC Version: 7.4.36

BOINC Version: 7.4.36
.


You have some OLD versions of Boinc running there, I thought all the Projects stopped accepting tasks from any version below 7.10.? but obviously that was wrong. The latest release version of Boinc is 7.20.2.


Thanks for providing that info and especially the message url
Just above that referenced message is a nice table of GPU elapsed times.

Some observations ---
The 21.29 to 23.7 seconds is based on overclocking the HD7970 to 1200 gpu clock and 1550 memory clock. I suspect that board is pulling the full 250 watts (or more) that it is spec'ed at. It normally runs at 925 / 1375

The 7970 is an excellent board and cost a lot more than the 7950 that I could afford years ago.

The top boards listed in that table message 63317 (dated 2015) are the 28nm fab sizes and based on "Southern Isles". The 79xx , s9000 and s9050 are Tahiti and the s9100, s9150 are Hawaii. The prices have dropped for s9150 and they are $80 USD and lower on eBay. Those "S" boards have no moving parts (fans) and except for the s90x0 they have no video out. They cannot be overclocked that I know of. The "s" boards do not have UEFI bios so the motherboard, if newer, needs on-board video or a temp video card to set the system up for remote access.

Some statistics on S9150: Temp average 71c, Power draw (estimate using a wattmeter): 175, Anything less or greater than 4 concurrent work units seems to run slower. Throughput seems to be 24 seconds, best I can do using 1x risers. [edit] 24 was average for 4 different video boards. I am seeing under 20 seconds but if I raise the concurrent WU count up then the s9000 boards to not have enough memory. Going to re-structure the board assignment to get the better boards on one system.

GPUz shows 133-222 with average of 162 watts drawn through AMD VII but the s9xxx series do not report power.

Basically, the s9150 is a really cheap HD7970 that runs cooler, and one can easily DIY a blower fan onto. MY 0.02c opinion
33) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 74886)
Posted 5 Jan 2023 by Profile Joseph Stateson
Post:
You might want to read this discussion

https://www.techpowerup.com/forums/threads/s9150-flash-problem-bios-size-inconsistency.303055/

One of the $80 boards I bought is not reporting sensor data like the newer firmware does and I was unable to upgrade that board's firmware. It otherwise works fine.
I do not have a fan yet for the other board I bought.
34) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 74880)
Posted 5 Jan 2023 by Profile Joseph Stateson
Post:
Yeah, external cooling fan required. Disadvantage when compared to the non-server type GPUs. But the price was good. I'll be using this fan. NEW Dell Optiplex 3050 5050 SFF AVC CPU Fan Heatsink BAZB0925R2U TKR4X 7D86K. I plan on using nylon screws to hold it to the case. It's a pretty good fit, except for the power cables, but nothing a box cutter and some foil tape can't fix. Good find on the voltage step up. I just may need that in the summer. Our garage gets up to 100 some days in the summer. I do need to find the mounting pci mounting bracket, or repurpose the ones that came out since the r9 280x had a built in one.

Do you think I can get 2x performance over the existing r9 280x?

As a side note, I looked at the top computers listing and did not see any radeon pro VII(6,528 Gflops), only the radeon VII(3,360 Gflops. Is the pro not usable here?


Sorry, should have replied earlier.

I have not taken the 9150 or 9100 apart but I suspect the processor chip is mounted like the s9050, s9000 as they are all "hawaii"
The chip is recessed, and a standard heat sink will need a copper shim

The 30 second you mention for the r9 is extremely good. You might get 24 for the 9150 running 4 concurrent work units.

If you get a blower fan make sure it takes at least 2 amps minimum. There is a gap between the bottom of the case and the cooling fins. I had to block the gap to force more air through the fins.
35) Message boards : Number crunching : seems AMD VII cannot be used on a riser (Message 74859)
Posted 26 Dec 2022 by Profile Joseph Stateson
Post:
Are you sure its just not a bad riser? They aren't very reliable. Milkyway doesn't really use much PCI-e lanes, but you will see improvements on fast cards running at least 8x anything less and it does hinder performance, but marginally.


I do not know what is happening. I read this article that indicates AMD is not trying to stop mining

Earlier today I remove the Vii from the X16 slot and put in in a new riser and new cable. I have a lot left over from when SETI went out of business.

I put the X1 connector into the X16 slot and the elapsed time went from just under a minute to 30 minutes. I actually did not let it run that long as it was obviously not working. I then move the X1 connector to an adjacent "black" slot that I knew was good was I had taken a riser out of it.

Same problem, same about 30 minutes to do a set of 4 work units.

All the other cards, S9150, S9100, S9000 all working fine 4 units each. All completing in 2-3 minutes except the VII

I checked GPUZ and system was running at same temperature and % load as when the work units took under 1 minute which seems totally wrong.
All slots on this H110 are set for Gen1. The mombo has 12 slots, 11 are X1. This system used to have about 9 NVidia boards of all type and ran SETI and Einstein.

I also checked the bios with GPUz and the Version & Date matches the bios in the techpowerup database of BIOS which indicates it is not a special "one-off" mining bios. If someone else is using risers with VII I would like to know the firmware version and how they did it.

I have no idea what is happening and have never seen a performance difference between X1 and X16 other than on the GPUGRID project. I looked through the credit performance leader and was unable to find any VII systems that had more than 2 VII.

I also noticed that my "mod" of boinc, 7.15.0, is used by the majority of credit leaders. I am working in a 7.21 version and am testing the windows version now.
36) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 74857)
Posted 26 Dec 2022 by Profile Joseph Stateson
Post:
Interesting. I just purchased an S9150 on ebay for $68 plus shipping, and was wondering what the power draw for that particular unit was. I expect to put it in play in a few weeks here. The hope is to 2x my sapphire toxic r9 280x, which chunks out 1 every 30 seconds, with an average power of 175 watts total. It looks like yours gets 1 every 28 seconds. Have I got that right? If so, can it be pushed harder? Say to 1 every 15 seconds?


Prices have really dropped. I thought $80 (USD) was good, now I see $68 !

This fan can be made to fit with Aluminum tape and using a rear bracket guide or blade. It is an exact fit to the plastic case and with enough metal tape the guide is not needed. If you use a guide there are 3 screws onto the back (front?) but (depending on guide shape) only 2 of the 3 holes can be used due to the pair of power connectors. I had to turn my dell guide upside down and use two screws.

I used the following to step the voltage from 12 to 16 so I could run during the summer in my garage. You may not need it.

All devices take about 200 watts each.
37) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 74854)
Posted 26 Dec 2022 by Profile Joseph Stateson
Post:
Avg of 1045 watts on a 220v circuit for
Below stats are for 4 concurrent work units per device.
I plan on removing the S9000s. Right now, using the system to help keep warm.

GPU4: VII
GPU1: S9150
GPU3: S9100
2&0: pair of s9000 (same as HD-7950 but have a single DisplayPort). They have fans but take 2.5 slots. They also work fine on non-UEFI motherboards.
The S91xx do not have video out so not to useful except for boinc.
Unaccountably, I cannot use AMD VII on a riser running BOINC but no such problem when mining. I would rather add a 2nd VII than obtain a pair of S9150

There are 5 GPUs, units are minutes
Dev#   WU count  Avg and Std of avg
GPU0 WUs:2,740 -Stats- Avg:2.9(0.59)
GPU1 WUs:4,134 -Stats- Avg:1.9(0.43)
GPU2 WUs:3,155 -Stats- Avg:2.5(0.57)
GPU3 WUs:3,613 -Stats- Avg:2.2(0.77)
GPU4 WUs:8,496 -Stats- Avg:0.8(0.11)

Work Units Attempted: 22360
Work Units Completed: 22138


All previously owned, bought on eBay.
A pair of S9150 has the same work unit production as a single ViI and cost the same on ebay : 2*150 = $300

Start time 12/25/2022 11:18:29 AM
Stop  time 12/25/2022 6:18:53 PM
Elapsed secs(includes down time): 25,338
Most minutes between tasks: 1.53
Number Work Units: 4781
Work units per second(system): 0.1887
Calc secs per work unit per devices: 26
Secs per work unit this system: 5
Credits/sec (one device): 8.64
Credits/sec (system): 43.21
System Daily Avg: 3,733,323.6
Avg work units per day:16,302.7
Credits per watt:3,572.6
38) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 74852)
Posted 23 Dec 2022 by Profile Joseph Stateson
Post:
From my performance app
There are 4 GPUs, units are minutes
Dev#   WU count  Avg and Std of avg
GPU0 WUs:327 -Stats- Avg:2.9(0.40)
GPU1 WUs:415 -Stats- Avg:2.3(0.31)
GPU2 WUs:329 -Stats- Avg:2.9(0.41)
GPU3 WUs:864 -Stats- Avg:0.8(0.10)


GPU0 & GPU2: AMD S9000
GPU1: AMD S9100
GPU3: AMD VII

All times are for 4 concurrent work units so 0.8 minutes is 0.8 * 60 /4 = 12 seconds for AMD VII
39) Message boards : Number crunching : seems AMD VII cannot be used on a riser (Message 74851)
Posted 23 Dec 2022 by Profile Joseph Stateson
Post:


I think it shows the slowness of the pci-e connection thru the riser as opposed to being plugged into the MB, what happens if you only run 1. 2 or 3 tasks at a time instead of the full 4 tasks on the riser?

The Amazon warehouse has more than a couple AMD VII's priced in the $350 to $400 range, all used.


The S9100 and the S9000 have no problems with the x1, all running 4 WU's concurrently.
There probably is a performance loss for projects that need more memory.
The PCIe generation is set to "Gen 1" for all slots else the VII would hog all the lanes and the other boards would not work at all.
My ViI was $300 + tax USD and I have a single used S9150 on order ($100). All use the same driver. Windows 10 thinks they are the "W" versions.
5			12/23/2022 3:41:33 AM	OpenCL: AMD/ATI GPU 0: AMD FirePro W8000 (driver version 3075.13, device version OpenCL 1.2 AMD-APP (3075.13), 6144MB, 6144MB available, 3226 GFLOPS peak)	
6			12/23/2022 3:41:33 AM	OpenCL: AMD/ATI GPU 1: AMD FirePro W8100 (driver version 3075.13, device version OpenCL 2.0 AMD-APP (3075.13), 12288MB, 12288MB available, 4219 GFLOPS peak)	
7			12/23/2022 3:41:33 AM	OpenCL: AMD/ATI GPU 2: AMD FirePro W8000 (driver version 3075.13, device version OpenCL 1.2 AMD-APP (3075.13), 6144MB, 6144MB available, 3226 GFLOPS peak)	
8			12/23/2022 3:41:33 AM	OpenCL: AMD/ATI GPU 3: AMD Radeon VII (driver version 3075.13 (PAL,HSAIL), device version OpenCL 2.0 AMD-APP (3075.13), 16368MB, 16368MB available, 13832 GFLOPS peak)	


Some double precision float performance from TechPowerUp for these AMD.
MilkyWay is the only project that uses GPU double precision FP extensively that I know of.
The S90x0 are the same form factor (heat sink usually fits) and use the same chip as the HD-7950
VII      3.360 TFLOPS
S9150    2.534 TFLOPS
S9100    2.109 TFLOPS
S90x0    0.804 TFLOPS

From my performance app
Time shown is for 4 concurrent so divide by 4 for each work unit.
There are 4 GPUs, units are minutes
Dev#   WU count  Avg and Std of avg
GPU0 WUs:327 -Stats- Avg:2.9(0.40)
GPU1 WUs:415 -Stats- Avg:2.3(0.31)
GPU2 WUs:329 -Stats- Avg:2.9(0.41)
GPU3 WUs:864 -Stats- Avg:0.8(0.10)


I am running 7.16.3 with my "fix" for the Milkyway timeout
40) Message boards : Number crunching : seems AMD VII cannot be used on a riser (Message 74848)
Posted 23 Dec 2022 by Profile Joseph Stateson
Post:
Bought a used AMD VII, my first VII. It ran just fine in x16 slot (H110 btc motherboard).
Four WU at a time averaged 15 second per single work unit.
I put the VII, an S9100 and a pair of S9000 all on risers.
The VII slowed to 30 minutes per work unit. I assume this is AMD not letting me mine.
Anyway, it is back in the x16 slot and all boards are working fine. I have several more of those S9xxx boards to add to my H110 BTC. Eventually my "stats" will indicate I have 6 of those VII but there is actually only one.

Is there a VII firmware upgrade to allow running at full speed on an x1 slot?


Previous 20 · Next 20

©2024 Astroinformatics Group