Welcome to MilkyWay@home

Posts by JStateson

1) Message boards : Number crunching : Benchmark thread 1-2019 on - GPU & CPU times wanted for new WUs, old & new hardware! (Message 68952)
Posted 15 days ago by ProfileJStateson
Post:

jstateson
x5690 3.47ghz 1@RX-580 0.44 second per credit STATS

Interesting stats site you use there, is the credit/s consistent over different credited WUs on the same machine? I have a feeling it doesn't, which would make those stats just a rough estimate unfortunately.
Btw, that site isn't going to the 2nd decimal point.

Btw, as you guys have probably noticed, the WU credit/type has changed again, and it seems the 227.51 & .53 are the most common.
This frequent change in WU type makes it impossible to collect stats over the longer term, so I am no longer posting stats tables.

But feel free to carry on posting valid task WU times along with it's exact credit to compare amongst yourselves :).
And thanks for the useful stats posted here so far :).


Have made some improvements to that aspx app

https://stateson.net/HostProjectStats

A much more accurate program (but not web based) is available and described here

https://forum.efmer.com/index.php?topic=1355.0

Both program sources are at github. Feel free to PM me any suggestions or you could also post over at the boinctasks forum.
2) Message boards : Number crunching : Computer details...wrong GPU description (Message 68920)
Posted 26 days ago by ProfileJStateson
Post:
BOINC always identifies a hosts co-processors with the most capable card installed. Which in your case is the R9 390. So nobody that is familiar with BOINC is confused. To see what actual cards are installed in any host requires looking at a reported WU result stderr.txt file which lists all detected gpus and identifies them correctly.


Milkyway does a good job of identifying the platform and gpu and debugging can be enabled for more info. GPUGrid shows temperatures of gpus as the work units is being processed in addition to other interesting info.. OTOH, Einstein shows errors like "out of paper", "NetBIOS limit exceeded" and "network name not found", that have nothing to do with the real problem.
3) Message boards : Number crunching : Computation errors (Message 68919)
Posted 26 days ago by ProfileJStateson
Post:
Ah, at least they only taking 2 seconds to error.


Agree, unlike the new batch of Einstein tasks that run for 6-7 hours and then show 35 - 45 days to complete.
4) Message boards : Number crunching : Computation errors (Message 68917)
Posted 26 days ago by ProfileJStateson
Post:
I thought this was fixed after reading the post under news quote: "Thanks for bringing that to my attention, none of the runs are returning data. I'll try to fix that as soon as I can."

I did get about 100-200 w/o error but this morning I see another 253 errored out and 212 more waiting to error out.

Compounding the problem is that Einstein coincidently has a series of bad runs that affect older GPUs like my S9000 boards

[EDIT] only a few more errored out. Looks like there are just a few of the bad ones left Out of those 253 "waiting" only about 5 were bad and I now have over 500 downloaded and do not see any more errors.

Thanks for the fix.
5) Message boards : News : New Separation Runs [UPDATE] (Message 68907)
Posted 26 days ago by ProfileJStateson
Post:
Same problem, got over 500 error'ed out tasks with message "number of streams do not match"

Please post when tasks are ready as I have to stop processing.
6) Questions and Answers : Web site : create or edit personal profile wont work (Message 68897)
Posted 15 Jul 2019 by ProfileJStateson
Post:
I cannot edit my profile. Tried edge and chrome and two different computers.

Verified that active-x (virus transfer protocol) is enabled.
Verified that popups are enabled
Verified that ubock was disabled

Please fix this problem.

thanks!
7) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68896)
Posted 15 Jul 2019 by ProfileJStateson
Post:
some thoughts on fixing the problem with fast GPUs running out of data.

I just switched to the new "seti special app" that uses CUDA90 and am getting elapsed times comparable to milkyway times.

Slightly over 1 minute for GTX1070 (Linux only) This is comparable to my S9000 boards running milkyway. Currently I show almost 600 work units in the queue where normally 100 is the max for non cuda90. Looking at the event messages I see the following

Seti completes a task and that task is added to the queue of tasks "ready to report". That queue grows while the queue of tasks "ready to start" deceases. When this process occurs the message "upload" is displayed in the manager (boinctasks for me) and the event message dialog box shows "Finished upload of …." At no time are tasks as for and no tasks are downloaded. ie: there is no message about getting "0" tasks like what shows up constantly on milkyway.

Periodically I see the following:
905	SETI@home	7/15/2019 10:58:30 AM	Reporting 64 completed tasks	
906	SETI@home	7/15/2019 10:58:30 AM	Requesting new tasks for NVIDIA GPU	
907	SETI@home	7/15/2019 10:58:34 AM	Scheduler request completed: got 77 new tasks	


Seti is not asking for new data after every "upload" and it seems an "upload" is not the same are reporting. In any event, seti asks for data much less frequently than milkyway. I am guessing that milkyway, on fast gpus, is asking for more data BEFORE the timeout "you asked for data too soon" and that is why no data is ever sent till 10 or more minutes after the very last reported tasks.

Milkyway needs to STOP asking for data after each upload or better, get help from the SETI folks on how they implemented their buffering.
8) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68895)
Posted 15 Jul 2019 by ProfileJStateson
Post:
sorry, got posted twice, please delete
9) Message boards : News : 30 Workunit Limit Per Request - Fix Implemented (Message 68892)
Posted 12 Jul 2019 by ProfileJStateson
Post:
That 12 minutes of average idle time every 26 minutes is the equivalent of 220 WU not being done a hour or over 5,000 a day. Granted I only run the machine for a couple days a month but if there is a solution for this I'd appreciate it.


This has been on a wish list for a long time. Affects only the higher performance GPUs.

Several ways to fix this. All rely on issuing an update after waiting the about 2.5 minutes minimum. The delay is to allow the timeout of the "your request is too soon".

:start
ping -n 150 127.0.0.1>nul
boinccmd --project http://milkyway.cs.rpi.edu/milkyway/ update
goto start

Alternately you can create a task using task scheduler to run boinccmd every 3 or so minutes.

I have not actually tried the above as I have remote systems that use both ubuntu and windows and for me it is easier to use a boinctask "rule" and issue a remote procedure call to do an update when the number of work units drops to zero. This project seems to be the only one with the problem but that is probably the case as the throughput is very high for S9000, s9100, etc like your tesla.

here is another view of the "lost" idle time (the white space between completion time). My 4 GPUs run out of data about 2.25 hours and average 8 minute gaps (tall blue bar) before getting more data.

10) Message boards : Number crunching : Monitoring of Invalid results on separation run de_modfit_84_bundle4_4s_south4s_1 (Message 68859)
Posted 15 Jun 2019 by ProfileJStateson
Post:
I have been following this and the other invalid thread and found it difficult to look up the various systems, OS's and app. Having nothing else to do I put together a program that can obtain information about the invalids. This program runs only under windows and I hope it is of some use in figuring out what is going on. The program executables and sources are at
https://github.com/JStateson/Gridcoin/tree/master/InvalidAnalysis The executables are in a 7z file "IVexecutables.7z" which has to be unpacked.

Browsing: This program does not work if the phrase "userid" is in the url line. You must browse to a computer and then select the "invalid" tasks. Once there at the project page of interest) you then copy the url from the browser and paste it into the url field on this C# app. This may not work on projects that have blocked anon access.

[EDIT] forgot to mention I do minimal error checking. If project is offline no telling what will happen. Same if you try another project to see what happen as yhis was coded for milkyway. If you mess up and put in the wrong url, then the program remember to restore the wrong one when you run it again.

Here are some pictures of what it can do. Let me know if you have any suggestions or see any bugs.
The program will compile under VS2017. I have not implemented the CPU or INTEL filters yet.

This is shows all invalid datasets and gives a count of how many. One full page (20 work units)
were fetched at the initial read and all 20 were invalid



The following shows the valid datasets.


The following shows only those valid datasets that were from apple or linux.
11) Message boards : Number crunching : Exporting of stats. (Message 68858)
Posted 15 Jun 2019 by ProfileJStateson
Post:
AFAICT the boinc manager (BM) displays statistics such as total credit and recent credit (RAC) but this is for the host that is running BM. Not sure other than screen grab how to get that data. The client (BC) write information to a log that can be quite detailed if the appropriate logging paremeters are set in the cc_config xml file. The program BoincTasks can interrogate the client and save information in a history file that is quite useful.

I wrote two programs in c# (source at GitHub) that can be used to obtain performance information. Both programs are described here
https://forum.efmer.com/index.php?board=47.0

If you want to get statistics from the project web sites then HostProjectStats might be useful.

IMHO the boinc manager does not have mechanism to export information nor is the program source code easy to work with (maintain) as it needs to function on a lot of different platforms with various hardware both really old and also recent.

What stats are you looking for?

If you are referring to the somewhat recent EU laws that were passed, only the following sites require an opt-in to look at existing statistic
Albert@home
Einstein@Home
LHC@Home
LHCathome-dev
NumberFields@home 
VGTU
World Community Grid 
WUProp@Home 


The site managers would know how to export or arrange for exporting.
I know that recently gridcoin had to write their own program to obtain statistics from Einstein@home. I am guessing that was because of their security concerns of some typel.
12) Questions and Answers : Windows : Milkyway@Home | Not requesting tasks: don't need (CPU: ; AMD/ATI GPU: GPUs not usable) (Message 68845)
Posted 6 Jun 2019 by ProfileJStateson
Post:
There is a message down near the bottom about disabling GPU tasks because of remote desktop. I am not sure of the significance of that but if true then that is the problem

I use the free splashtop and vnc and have never had a problem with windows 10.

The W7100 has low double precision performance

https://www.tomshardware.com/reviews/amd-radeon-pro-wx-7100,4896-6.html but it does support double precision so that is not the problem

Is it working on other projects? When about when not using remote desktop?

[EDIT] was in tflops, not gflops so dp performance is ok
13) Message boards : Number crunching : AMD FirePro S9150 (Message 68826)
Posted 1 Jun 2019 by ProfileJStateson
Post:

I use a watercool with Kraken G12 adapter for my S9150. The temperature is around 40s Celcius. It's a good drop from


I thought I might expand on this seeing as lot of users are into water cooling.

You cant got wrong with a closed kit. I used a number of those kits: hybrid adapters for NVidia, water blocks for ATI and kits for CPU. OTOH I built several custom systems w/o a problem, then I got cheap and on my last build I had run out of 1/2- 3/8 soft tubing so I got Ez-Flow from (it seems) Lowes. I failed to notice it was rated at 70f and the output of my dual xeons was closers to 112f and frequently higher. It ran fine for several weeks but I noticed the pump flowmeter was turning slower and slower. System is disassembled and I am waiting for new parts. Fortunately, the electronics are ok, but the pumps and tubing and probably the radiator are shot. Very difficult to lean up the melted vinyl. Details here
https://www.reddit.com/r/watercooling/comments/bugb19/need_fix_to_pump_eventually_stopping_corrosion/
14) Message boards : Number crunching : AMD FirePro S9150 (Message 68814)
Posted 31 May 2019 by ProfileJStateson
Post:

I use a watercool with Kraken G12 adapter for my S9150. The temperature is around 40s Celcius. It's a good drop from
over 70s with fan cooling.
Is anyone else using Kraken G12 for for this card?


I picked up two of those. One went on an HD7950 and I transferred the cooling fans to an S9000 which is normally passive cooled. S9000 (the chip) used less power than the HD7950 so it didn't need liquid cooling.
Both my S9000 and S9100 have 2x to 4x the number of memory chips than the HD7950 and I was concerned that the liquid cooling might not cool the memory chips so that contributed to me using fans on the S9000 and not the liquid cooling.

I remember buying two sets of copper shims (S9100 chip was bigger) as it looked like the Kraken was not touching either chip. I do not remember if I used the shim or not. Don't want to take the Kraken off to see if it has a shim.

Since the S9100 had lot more memory chip and the chip die was bigger I was concerned about using "parts only" HB7950 fans or li\quid cooling so I elected to buy that terrible expensive POS from a 3d printer guy n Australia for cooling the S9100. It only worked because I am using an open frame for the motherboard.

Did you need a shim on your S9150?
15) Message boards : Number crunching : Performance Tool for Boinctasks users (Message 68804)
Posted 30 May 2019 by ProfileJStateson
Post:
I put together a program that uses the history files produced by BoincTasks to do some performance analysis. It only works with BoincTasks. There is a description of the program here (Fred created a 3rd party forum for add-ins) and the sources and executables are at github

Here is a sample plot of Elapsed Time for the Separation program.

16) Message boards : Number crunching : Large surge of Invalid results and Validate errors on ALL machines (Message 68792)
Posted 28 May 2019 by ProfileJStateson
Post:
Thank you BeemerBiker for pointing me at the ppa for the BOINC package.

I updated to BOINC 7.14.2 (Linux) and let everything run for about a day now.

Bottom line: not every de_modfit_84_xxxx WU is invalidated, but still a high percentage are. It helped a little, but hasn't 'fixed it'.

Keith Meyers, it looks like you are running BOINC 7.15, and with no problems. I do not see 7.15 at the ppa. Can you tell me where you got it from?



I used a binary editor on "boinc.exe", looked for and changed 7.14.2 into 7.15.2. The program ran correctly but it still showed 7.14 so Keith must know someone special.
17) Message boards : Number crunching : Large surge of Invalid results and Validate errors on ALL machines (Message 68786)
Posted 26 May 2019 by ProfileJStateson
Post:


I am using BOINC 7.9.3, which is the 'current' one in the Mint Software Manager repository. Is there another later version to install?


https://boinc.berkeley.edu/forum_thread.php?id=12973

but watch for problems with AMD driver install if using recent boards
18) Message boards : Number crunching : Large surge of Invalid results and Validate errors on ALL machines (Message 68772)
Posted 22 May 2019 by ProfileJStateson
Post:

Not quite, I'm afraid... If you look a little earlier in the two invalid results, you'll discover significant discrepancies in the third stream_only_likelihood values as indicated below (I've only cited one of the validated results...)


Thanks, I missed that, I did not purposely omit it. However the bottom two are identical, my S9000 (invalid) and then valid RX470

Your comment about the chunk size and different convergences is correct, especially since the algorithm uses random numbers as part of its attempts to predict a likelihood.
19) Message boards : Number crunching : Large surge of Invalid results and Validate errors on ALL machines (Message 68769)
Posted 21 May 2019 by ProfileJStateson
Post:
Have noticed for some time that the number of invalides fluctuates. I had thought it was the driver I was using.
Tried finding where the "results" are calculated so as to do my own validation
used the sources that I found here
Could one of the developers comment on my last item at the bottom?

My Analysis---

the program "separation_main" calls a worker that iterates through 4 "work units".
---right there is an indication there should be 4 results, not just a single item to validate.

the worker calls "evaluate" before cleaning up and exiting.

evaluate, toward its end, calculates a likehood and then does a "printSeparationResults"
before cleaning up and exiting.

Out of curiosity I looked at my wingmen's results (the work unit).
There were 2 invalid (one was mine, the 3rd one) and 2 valid work tasks for the work unit
This will be gone from database eventually but the workunit is here

I was surprised to see that the output of the "printSeparationResults" for all 4 systems, differed only after the 12th decimal point in every result or was identical to all digits.


task 224802410, nvidia 1080TI VALID ===================

Running likelihood with 31815 stars
Likelihood time = 0.856660 s
Non-finite result: setting likelihood to -999
<background_integral3> 0.000008466303075 </background_integral3>
<stream_integral3> 136.269058837911870 43.219889454989044 -0.000000000000001 2.115804393012327 </stream_integral3>
<background_likelihood3> -3.680025259872701 </background_likelihood3>
<stream_only_likelihood3> -3.541128263071147 -3.118475664249611 -1.#IND00000000000 -109.139579663895260 </stream_only_likelihood3>


task 224932122 ATI RX560 INVALID ====================

Running likelihood with 31815 stars
Likelihood time = 1.268224 s
Non-finite result: setting likelihood to -999
<background_integral3> 0.000008466303075 </background_integral3>
<stream_integral3> 136.269058837911757 43.219889454989008 -0.000000000000001 2.115804393012326 </stream_integral3>
<background_likelihood3> -3.680025259872701 </background_likelihood3>
<stream_only_likelihood3> -3.541128263071147 -3.118475664249611 nan -109.139579663895262 </stream_only_likelihood3>


task 224990755 ATI S9000 INVALID======================

Running likelihood with 31815 stars
Likelihood time = 1.024323 s
Non-finite result: setting likelihood to -999
<background_integral3> 0.000008466303075 </background_integral3>
<stream_integral3> 136.269058837911870 43.219889454989044 -0.000000000000001 2.115804393012327 </stream_integral3>
<background_likelihood3> -3.680025259872701 </background_likelihood3>
<stream_only_likelihood3> -3.541128263071147 -3.118475664249611 -1.#IND00000000000 -109.139579663895260 </stream_only_likelihood3>


task 225048095 ATI RX470 VALID=======================

Running likelihood with 31815 stars
Likelihood time = 0.988171 s
Non-finite result: setting likelihood to -999
<background_integral3> 0.000008466303075 </background_integral3>
<stream_integral3> 136.269058837911870 43.219889454989044 -0.000000000000001 2.115804393012327 </stream_integral3>
<background_likelihood3> -3.680025259872701 </background_likelihood3>
<stream_only_likelihood3> -3.541128263071147 -3.118475664249611 -1.#IND00000000000 -109.139579663895260 </stream_only_likelihood3>


=================IN CONCLUSION FOR WHAT IT IS WORTH===========
Results 3 & 4 above are identical exactly to all decimal digits but only the last one is valid
Results 1 & 2 differ at only the 12 or 13th decimal digit but only the first one is valid.

Since there seem to be 4 "work units" in each "work unit" maybe there is additional testing at the server end when the result arrives


====================SOME FOOD FOR THOUGHT===============

There is a "WTF" moment in program "prob_ok" file "separation_utils"
/* FIXME: WTF? */
/* FIXME: lack of else leads to possibility of returned garbage */

According to github this file has not been changed in 7 years.
...one can make the following conclusions...
(1) Not looked at since the comment was made 7 years ago
(2) Looked at and analyzed but didn't make a difference in outcome and not worth trouble to change the comment.
(3) Looked at but unable to figure out WTF was going on so left it for another grad student to fix
20) Message boards : Number crunching : AMD FirePro S9150 (Message 68758)
Posted 19 May 2019 by ProfileJStateson
Post:
Had to go ask the big boys over at the AMD forum about a problem I ran into.
Thread is here.
Hope I get a response. I had a H**L of a time registering. Get a laugh about it here
Even the Microsoft MVP'er who help me fell for the same trap.

First post I made was a complaint about their registration protocol.


Next 20

©2019 Astroinformatics Group