Welcome to MilkyWay@home

Monitoring of Invalid results on separation run de_modfit_84_bundle4_4s_south4s_1

Message boards : Number crunching : Monitoring of Invalid results on separation run de_modfit_84_bundle4_4s_south4s_1
Message board moderation

To post messages, you must log in.

AuthorMessage
pututu

Send message
Joined: 24 Aug 17
Posts: 8
Credit: 223,957,930
RAC: 0
Message 68832 - Posted: 3 Jun 2019, 0:58:41 UTC
Last modified: 3 Jun 2019, 1:09:13 UTC

Still seeing invalid results mostly from the new group of separation runs pertaining to de_modfit_84_bundle4_4s_south4s_1 on my PC. Not sure if the rate is the same or much less than before, so maybe best to wait before passing any judgement as there are other separation runs with invalid results too. Previously, seeing 2%-3% invalid results on de_modfit_84_bundle4_4s_south4s_0 on my PC. Will monitor this if the rate is the same or lower.

Previous relevant thread: https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4458

Some list of invalid results from different PCs:
https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1764048173
https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1764579823
https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1763917885
ID: 68832 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 68851 - Posted: 10 Jun 2019, 16:02:08 UTC - in response to Message 68832.  

Thanks for letting me know, I'll be keeping an eye on this problem. Still not sure exactly what the cause is yet. I was hoping the new runs' settings wouldn't cause invalid results, but I guess not.

- Tom
ID: 68851 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 540,027,223
RAC: 86,732
Message 68852 - Posted: 11 Jun 2019, 16:18:54 UTC

I am picking up validate errors on these tasks. I have never had any issues with any of my cards being the problem. Only the client software or the tasks themselves being the problem.
ID: 68852 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
VietOZ

Send message
Joined: 28 Mar 18
Posts: 14
Credit: 761,475,797
RAC: 0
Message 68853 - Posted: 12 Jun 2019, 15:14:11 UTC

I got 4 of these out of 15k of WU valid
max # of error/total/success tasks 2, 9, 6
errors Too many errors (may have bug)
ID: 68853 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 540,027,223
RAC: 86,732
Message 68854 - Posted: 13 Jun 2019, 0:00:53 UTC

From my ever growing list of invalids, it seems that Linux based clients are failing including MAC clients. Only Windows based clients seem to be succeeding.
ID: 68854 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alanb1951

Send message
Joined: 16 Mar 10
Posts: 208
Credit: 105,445,238
RAC: 36,686
Message 68855 - Posted: 13 Jun 2019, 2:42:16 UTC - in response to Message 68854.  

From my ever growing list of invalids, it seems that Linux based clients are failing including MAC clients. Only Windows based clients seem to be succeeding.

This matches what I saw with the south4s_0 tasks that were failing towards the end of that batch, though I also observed some NVIDIA GPU jobs flagging invalid on Windows (and as I don't run a huge number of tasks per day I had a fairly small sample to look at...)

(And on that note, I've not seen any invalid south4s_1 jobs yet - as I said, small sample...)

As I don't run any MilkyWay CPU jobs, I don't know whether it applies across both CPU and GPU tasks! It would be interesting to know if Linux/Mac CPU jobs get flagged invalid too - it would constitute another diagnostic point!

If Tom is going to suss this out, he will need to know things like "CPU versus GPU", Operating System, GPU type (and driver version) -- it could be a compilation issue (different compilers for different platforms, code generated with different rounding options, different execution sequences causing rounding differences, et cetera), and in the case of GPUs it could be a matter of whether the GPUs are capable of particular rounding options and, if so, whether all GPU kernels use the same control parameters... Also, of course, if there are any random numbers used in the processing it could simply be the butterfly effect!
ID: 68855 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 540,027,223
RAC: 86,732
Message 68856 - Posted: 13 Jun 2019, 8:05:18 UTC

I'll throw my data point out here. Only gpu tasks processed for MilkyWay. Same for all my other projects except for Seti which does cpu work also. Oh, forgot the Raspberry Pi3B+ and the Jetson Nano do Seti cpu work also.
ID: 68856 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 68859 - Posted: 15 Jun 2019, 22:12:26 UTC
Last modified: 15 Jun 2019, 22:52:37 UTC

I have been following this and the other invalid thread and found it difficult to look up the various systems, OS's and app. Having nothing else to do I put together a program that can obtain information about the invalids. This program runs only under windows and I hope it is of some use in figuring out what is going on. The program executables and sources are at
https://github.com/JStateson/Gridcoin/tree/master/InvalidAnalysis The executables are in a 7z file "IVexecutables.7z" which has to be unpacked.

Browsing: This program does not work if the phrase "userid" is in the url line. You must browse to a computer and then select the "invalid" tasks. Once there at the project page of interest) you then copy the url from the browser and paste it into the url field on this C# app. This may not work on projects that have blocked anon access.

[EDIT] forgot to mention I do minimal error checking. If project is offline no telling what will happen. Same if you try another project to see what happen as yhis was coded for milkyway. If you mess up and put in the wrong url, then the program remember to restore the wrong one when you run it again.

Here are some pictures of what it can do. Let me know if you have any suggestions or see any bugs.
The program will compile under VS2017. I have not implemented the CPU or INTEL filters yet.

This is shows all invalid datasets and gives a count of how many. One full page (20 work units)
were fetched at the initial read and all 20 were invalid



The following shows the valid datasets.


The following shows only those valid datasets that were from apple or linux.
ID: 68859 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 696
Credit: 540,027,223
RAC: 86,732
Message 68902 - Posted: 21 Jul 2019, 5:01:21 UTC

Since I have such a large cache and a resource share that is relatively small in relation to Seti, I only process work as it reaches EDF. So still lots of these "bad" de_modfit_84_bundle4_4s_south4s_1 tasks processing through. It will be nice to clear the cache of them and only have good data to crunch.
ID: 68902 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Monitoring of Invalid results on separation run de_modfit_84_bundle4_4s_south4s_1

©2024 Astroinformatics Group