Welcome to MilkyWay@home

Posts by BeemerBiker

1) Message boards : Number crunching : Large surge of Invalid results and Validate errors on ALL machines (Message 68786)
Posted 15 hours ago by ProfileBeemerBiker
Post:


I am using BOINC 7.9.3, which is the 'current' one in the Mint Software Manager repository. Is there another later version to install?


https://boinc.berkeley.edu/forum_thread.php?id=12973

but watch for problems with AMD driver install if using recent boards
2) Message boards : Number crunching : Large surge of Invalid results and Validate errors on ALL machines (Message 68772)
Posted 4 days ago by ProfileBeemerBiker
Post:

Not quite, I'm afraid... If you look a little earlier in the two invalid results, you'll discover significant discrepancies in the third stream_only_likelihood values as indicated below (I've only cited one of the validated results...)


Thanks, I missed that, I did not purposely omit it. However the bottom two are identical, my S9000 (invalid) and then valid RX470

Your comment about the chunk size and different convergences is correct, especially since the algorithm uses random numbers as part of its attempts to predict a likelihood.
3) Message boards : Number crunching : Large surge of Invalid results and Validate errors on ALL machines (Message 68769)
Posted 5 days ago by ProfileBeemerBiker
Post:
Have noticed for some time that the number of invalides fluctuates. I had thought it was the driver I was using.
Tried finding where the "results" are calculated so as to do my own validation
used the sources that I found here
Could one of the developers comment on my last item at the bottom?

My Analysis---

the program "separation_main" calls a worker that iterates through 4 "work units".
---right there is an indication there should be 4 results, not just a single item to validate.

the worker calls "evaluate" before cleaning up and exiting.

evaluate, toward its end, calculates a likehood and then does a "printSeparationResults"
before cleaning up and exiting.

Out of curiosity I looked at my wingmen's results (the work unit).
There were 2 invalid (one was mine, the 3rd one) and 2 valid work tasks for the work unit
This will be gone from database eventually but the workunit is here

I was surprised to see that the output of the "printSeparationResults" for all 4 systems, differed only after the 12th decimal point in every result or was identical to all digits.


task 224802410, nvidia 1080TI VALID ===================

Running likelihood with 31815 stars
Likelihood time = 0.856660 s
Non-finite result: setting likelihood to -999
<background_integral3> 0.000008466303075 </background_integral3>
<stream_integral3> 136.269058837911870 43.219889454989044 -0.000000000000001 2.115804393012327 </stream_integral3>
<background_likelihood3> -3.680025259872701 </background_likelihood3>
<stream_only_likelihood3> -3.541128263071147 -3.118475664249611 -1.#IND00000000000 -109.139579663895260 </stream_only_likelihood3>


task 224932122 ATI RX560 INVALID ====================

Running likelihood with 31815 stars
Likelihood time = 1.268224 s
Non-finite result: setting likelihood to -999
<background_integral3> 0.000008466303075 </background_integral3>
<stream_integral3> 136.269058837911757 43.219889454989008 -0.000000000000001 2.115804393012326 </stream_integral3>
<background_likelihood3> -3.680025259872701 </background_likelihood3>
<stream_only_likelihood3> -3.541128263071147 -3.118475664249611 nan -109.139579663895262 </stream_only_likelihood3>


task 224990755 ATI S9000 INVALID======================

Running likelihood with 31815 stars
Likelihood time = 1.024323 s
Non-finite result: setting likelihood to -999
<background_integral3> 0.000008466303075 </background_integral3>
<stream_integral3> 136.269058837911870 43.219889454989044 -0.000000000000001 2.115804393012327 </stream_integral3>
<background_likelihood3> -3.680025259872701 </background_likelihood3>
<stream_only_likelihood3> -3.541128263071147 -3.118475664249611 -1.#IND00000000000 -109.139579663895260 </stream_only_likelihood3>


task 225048095 ATI RX470 VALID=======================

Running likelihood with 31815 stars
Likelihood time = 0.988171 s
Non-finite result: setting likelihood to -999
<background_integral3> 0.000008466303075 </background_integral3>
<stream_integral3> 136.269058837911870 43.219889454989044 -0.000000000000001 2.115804393012327 </stream_integral3>
<background_likelihood3> -3.680025259872701 </background_likelihood3>
<stream_only_likelihood3> -3.541128263071147 -3.118475664249611 -1.#IND00000000000 -109.139579663895260 </stream_only_likelihood3>


=================IN CONCLUSION FOR WHAT IT IS WORTH===========
Results 3 & 4 above are identical exactly to all decimal digits but only the last one is valid
Results 1 & 2 differ at only the 12 or 13th decimal digit but only the first one is valid.

Since there seem to be 4 "work units" in each "work unit" maybe there is additional testing at the server end when the result arrives


====================SOME FOOD FOR THOUGHT===============

There is a "WTF" moment in program "prob_ok" file "separation_utils"
/* FIXME: WTF? */
/* FIXME: lack of else leads to possibility of returned garbage */

According to github this file has not been changed in 7 years.
...one can make the following conclusions...
(1) Not looked at since the comment was made 7 years ago
(2) Looked at and analyzed but didn't make a difference in outcome and not worth trouble to change the comment.
(3) Looked at but unable to figure out WTF was going on so left it for another grad student to fix
4) Message boards : Number crunching : AMD FirePro S9150 (Message 68758)
Posted 7 days ago by ProfileBeemerBiker
Post:
Had to go ask the big boys over at the AMD forum about a problem I ran into.
Thread is here.
Hope I get a response. I had a H**L of a time registering. Get a laugh about it here
Even the Microsoft MVP'er who help me fell for the same trap.

First post I made was a complaint about their registration protocol.
5) Message boards : Number crunching : AMD FirePro S9150 (Message 68741)
Posted 9 days ago by ProfileBeemerBiker
Post:
follow-up on my previous post. Be nice if one could add comments to existing instead of a new message

I have my S9x00 boards working fine with the 2015 driver and have been able to set the clock speed whereas I could not do that with AMD latest "Pro Series". However the opencl is old, 2015 as shown by clinfo.exe

I then downloaded and extracted AMD_OpenCL64.dll from both the 2018Q4 and the 2019Q2 and put those at \windows\system32 and also at \windows\SysWOW64

clinfo.exe showed I had the latest opencl but all milkyway tasks errored out on either of those two. I had suspended all but a few MW tasks as I did not want 800+ tasks to error out in a couple of seconds like they did a week ago. Looks like I am stuck with the correct driver but a 4 year old opencl library.
6) Message boards : Number crunching : AMD FirePro S9150 (Message 68738)
Posted 9 days ago by ProfileBeemerBiker
Post:
I tried the latest MSI even the beta version and when I clicked on the APPLY checkmark the changes I made want back to the default.


I had to go to an older non-WHQL driver to get the Power control to functionality back on two of my cards. (Installing more than 2 years newer driver past the manufacturer date on a card increases the risk of planned obsolescence problems).
Tried about 9 drivers, 17.11.4 was the winner.

Do you see the custom fan profile tab under MSI configurations? I get the feeling you're not concerned with fan sounds. Forcing my cards fans to 100% by 61C, with a custom MSI fan profiles, has made all of their BIOS algorithms decide to delay actions to reduce heat, leading to higher clock speeds.


SOLVED!!!


I tried another enterprise driver dated may 10 but that didn't work so I used the device manager and let windows 10x64 find the best one. It got a 2015 driver

After a few minutes of downloading all the boards showed the following (see below). I then brought up afterburner and was able to change parameter as shown below also.
Boinc client is working fine. I looked at coproc_info.xml and it has the correct s9000 and s9100 definitions but shows the "FierGL V" which I had never seen before.

Hopefully my number of invalids will decrease. I had no invalids for several weeks up until I messed with that may 6 2019 drivers.




7) Message boards : Number crunching : AMD FirePro S9150 (Message 68735)
Posted 10 days ago by ProfileBeemerBiker
Post:
I thought I found the culprit, DirectGMA was enabled; disabled and rebooting seemed to clear up the invalids. Updated to the latest 19.Q2 drivers. Invalids appear to be back. I may give it a bit more time, then consider reverting back to an older driver.

And, yes, my card is underclocking considerably. I don't have the room for the same cooling solution as you, but the card is getting considerable airflow. Scratching my noggin about how to improve cooling further.


If you go back to an older driver you might want to first save the \windows\system32\opencl.dll as it is much more recent than the 2018 library.

A (somewhat one sided) discussion here
https://boinc.berkeley.edu/forum_thread.php?id=12948

I also have a lot of invalids (just now looked!!) and will try what I just suggested.
8) Message boards : Number crunching : AMD FirePro S9150 (Message 68730)
Posted 12 days ago by ProfileBeemerBiker
Post:
I tried the latest MSI even the beta version and when I clicked on the APPLY checkmark the changes I made want back to the default. I even downloaded the 19Q2 driver for the W9100 card as it is much more recent but there were no tools to configure that %20 which, according to GPU-z, I should be able to change. GPU-z says I have a 129 watt board, it should be closer to 225 watts. It shows a higher wattage for S9000,175 watts.

The W9100 driver seems to be working OK with boinc, crunching 5 concurrent tasks each board and it completes work units slightly faster than the S9000.. I did lose over 300 milkyway work units between uninstalling the old driver and putting in the new one. I will have to run some more tests to verify this.

9) Message boards : Number crunching : AMD FirePro S9150 (Message 68727)
Posted 13 days ago by ProfileBeemerBiker
Post:
I have to stop BOINC before bringing up GPU-z or system hangs. There is nothing obvious wrong looking at event logs, not sure what causes this.

My S9x00 temps are all much lower than 180-195f
Your temps seem high compared to mine in mid 65c (149f) for three S9000 (same core as HD7950) and one S9100.
I found some max temps here but our firepros are not listed
The w9100 reviewed by tom shows 92-93c which is what you see. so I guess it is ok https://www.tomshardware.com/reviews/firepro-w9100-performance,3810-16.html

does you clock on your S9100 vary like mine? I show 550-650 and rarely hit the design of 825 as you can see in the graph.

[/url]
10) Message boards : Number crunching : AMD FirePro S9150 (Message 68702)
Posted 19 days ago by ProfileBeemerBiker
Post:
Not sure what is going on, sorry. Guessing: did you enable ECC on the board?
My 9100 has a high power blower and it barely cools the board. I had to tape it on
as shown cuz it fell off once and the system shut down in seconds. Your 9150 runs
a lot hotter I suspect.

One of the fans on that adjacent S9000 quit spinning and I temporarily put a 120mm butted up against it.

11) Message boards : Number crunching : Errors, invalid, and validation inconclusive. Anything to worry about? (Message 68664)
Posted 23 days ago by ProfileBeemerBiker
Post:
Well, for whatever it is worth, the ATI "invalid" error also occurred on another system.

However, the rest of the invalids (cpu ones) as well as all of your "errors" are just yours alone.

What are temps (cpu & gpu)?
12) Message boards : Number crunching : AMD FirePro S9150 (Message 68611)
Posted 26 Apr 2019 by ProfileBeemerBiker
Post:
Also interesting, I was just checking the forums as I've recently noticed I now getting a lot of invalids. I'm only running 3 WU per card and haven't changed anything recently


State: All (11325) · In progress (162) · Validation pending (0) · Validation inconclusive (1060) · Valid (10094) · Invalid (0) · Error (9) 
Application: All (11325) · Milkyway@home N-Body Simulation (0) · Milkyway@home Separation (11325) 


Your computers are hidden so just guessing: Do you have the same drivers as I do as shown in my event file image below?

[edit] additional info on my drivers was added. if image not working add www to url

13) Message boards : Number crunching : AMD FirePro S9150 (Message 68605)
Posted 26 Apr 2019 by ProfileBeemerBiker
Post:
This is an old thread but I would like to mention that I am no longer getting invalid work like I used to. Something has changed for the better. Running 5 concurrent work units on a any S9x00 GPU use to generate 1 invalid for every 5-6 valid ones but they are all valid now (keeping my fingers crossed of course) Measured 730 watts used by this 4 GPU Z400 system. Above 5 concurrent tasks per GPU things slow down.



14) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68602)
Posted 26 Apr 2019 by ProfileBeemerBiker
Post:
s9000 has almost same form factor as HD-7950 so I bought a few "parts only" HD-7950 and use the cooler. Cooler fits fine but cannot be used in a case as the molding extend too far to the rear. If you look a the photo you see I had to offset the mounting bracket.

This system is running 5 concurrent work units using three S9000.

The S9100 does not have the same form as the S9000 and a copper shim would be needed which would be impracticable. Blowers are available from 3rd party for S9100 that fit on the back of the S91000. I may try it in this system but I suspect the Z400 will have problems with a 4th board.

Power runs 540watts full load and about 140 watts no load. The S9000 and S9100 take only a single 8 pin power connector unlike the 7950 that has 8 + 6 connectors.







I have had problem posting images, some sites show fine in preview but not after posting. If a problem add 'www" to the url.
15) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68600)
Posted 25 Apr 2019 by ProfileBeemerBiker
Post:
200 WUs per GPU up to 600 WUs total per system could be doubled until this bug gets worked out for less idle time between fill-ups :)


Actually, it can be worse. During the 10 or so minutes between fill-ups another project can sneak in and play havoc. My priority projects are science related but if milkyway (100%) goes out, then Einstein (%50) gets a boatload. If both go down my fallback is seti also at %50. One of these days asteroids@home will get an ATI app but I am not holding my breath.

Was testing this system, playing with risers, and only allowed milkyway tasks so I got gpu's fully idle during that 10 minutes.
16) Message boards : News : Leaving MilkyWay@home (Message 68597)
Posted 25 Apr 2019 by ProfileBeemerBiker
Post:
Thanks for the work you have done! I wish you the best!
17) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68595)
Posted 25 Apr 2019 by ProfileBeemerBiker
Post:
I reviewed the history log in BoincTasks looking for the largest gap between milkyway completions and found a 10.25 minute gap as shown HERE I have a c# program that does this.

This system completes a work unit every 15 seconds on the average so typically I see close to 600 WUs downloaded then the queue goes to 0 and usually just a minute or 2 till the next batch but occasionally a delay as long as 10 minutes it seems.

I can live with this.
18) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68590)
Posted 25 Apr 2019 by ProfileBeemerBiker
Post:
I have the same problem on both S9000 (2 boards, 4 concurrent tasks) and RX-570 (3 boards, 1 task) systems. I can have 100s of WU queued up and anywhere from 4-5 complete at a time and are reported but "got 0 new tasks" shows up. Eventually, system is idle, I notice the problem and a manual update fixes it. Then anywhere from 200 - 400 get downloaded instantly.

My log is not as detailed as the ones I read here. There must be some diagnostic setting I am not using.

I looked at BoincTasks to see if its "rules" support making an auto update but there is no 'Work Units Remaining" Type in the rule selection.

Expedient workaround would be to use boinccmd.exe to do an update every x minutes but it would have to go on each of my systems and would be a PITA

Question: Is this a problem that fixes itself after some time passes? If so, about how many minutes maximum of idle time?

I have been playing with a wattmeter and the S9000 outperforms the RX570 5 to 2 (credits per sec) with only slightly more wattage. I will probably add risers for additional S9000 and switch the RX570 to Einstein which consumes far less watts then milkyway
19) Message boards : Number crunching : Benchmark thread 1-2019 on - GPU & CPU times wanted for new WUs, old & new hardware! (Message 68456)
Posted 29 Mar 2019 by ProfileBeemerBiker
Post:
x5690 3.47ghz 1@RX-580 0.44 second per credit STATS
20) Message boards : Number crunching : Benchmark thread 1-2019 on - GPU & CPU times wanted for new WUs, old & new hardware! (Message 68266)
Posted 16 Mar 2019 by ProfileBeemerBiker
Post:
all stock gpu & cpu

x5650 2.933ghz 3@RX-560 1.0 second per credit STATS

x5690 3.47ghz 1@RX-570 0.5 second per credit STATS

x5675 3.07ghz 2@RX-570 0.5 second per credit STATS

the following run 4 wu concurrently giving about 5-6 credits per second

E5620 2.527ghz 1@S9000 0.7 seconds per (4) credits STATS

x5650 2.933ghz S9000 & S9100 0.6 secs per (4) credits STATS


Next 20

©2019 Astroinformatics Group