Welcome to MilkyWay@home

Consistent "Validate error" status


Advanced search

Questions and Answers : Unix/Linux : Consistent "Validate error" status
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileDarrell Tangman

Send message
Joined: 30 Jun 10
Posts: 5
Credit: 12,110,997
RAC: 7,060
10 million credit badge9 year member badgeextraordinary contributions badge
Message 66149 - Posted: 30 Jan 2017, 21:46:01 UTC

Since at least January 1, 2017, and for an unknown time prior to that date, all work units from MilkyWay@Home have resulted in a "Validate error" status. Time on this system is split 50:50 between MilkyWay@Home and SETI@home; during the same period SETI@home has yielded no invalid results. Any thoughts on what might be going wrong?

AMD FX-6300 / Linux 4.4.0-59
BOINC 7.2.42
MilkyWay@Home v1.40
ID: 66149 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alanb1951

Send message
Joined: 16 Mar 10
Posts: 45
Credit: 34,569,751
RAC: 35,255
30 million credit badge9 year member badgeextraordinary contributions badge
Message 66152 - Posted: 1 Feb 2017, 5:59:49 UTC - in response to Message 66149.  

I notice that your work-units seem to fail the first task of the five in a batch, then complete the other four -- unfortunately, it still counts as a failed job...

Now, in each of the (random) sample of your results I looked at I see the following lines:

q is 0.0
Integral 0 time = 0.000004 s
Failed to calculate integral 0
Failed to calculate likelihood



Something similar has shown up in some GPU jobs (though there it manifests as a compile error):

<kernel>:227:26: error: use of undeclared identifier 'inf'
        tmp = mad((real) Q_INV_SQR, z * z, tmp);   /* (q_invsqr * z^2) + (x^2 + y^2) */
                         ^
<built-in>:33:19: note: expanded from here
#define Q_INV_SQR inf
                  ^


where delving into the source code suggests that inf is there because it found a value of q as zero when preparing to build a GPU kernel for the first task, and Q_INV_SQR seems to be initialized with the square root of 1/q (oops - infinity!) - the same issue but manifested in a different way!

Unfortunately, I don't know for sure what's causing this problem (and I've never been hit by it myself...) but I do note that you're running a very old version of the BOINC client, so I wonder if there's a glitch in there somewhere (the most recent user with the GPU job error shown above was also using 7.2.42...)

If you're on a Ubuntu/Debian based Linux, you should be able to find a much newer client than that in the repositories; might be worth a try. (I'm on 7.6.31, which is the standard offering with Ubuntu 16.04.)

Good luck - Al.
ID: 66152 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileDarrell Tangman

Send message
Joined: 30 Jun 10
Posts: 5
Credit: 12,110,997
RAC: 7,060
10 million credit badge9 year member badgeextraordinary contributions badge
Message 66157 - Posted: 4 Feb 2017, 0:39:33 UTC - in response to Message 66152.  

Since updating to BOINC client version 7.6.31 this computer has completed one work unit, which shows a status of "Completed, validation inconclusive", definitely an improvement on "Validate error". Now I have to wait for someone else to complete that work unit, but the "q is 0.0" message has disappeared, which is certainly a hopeful sign.

It took a little hacking to get this done, because I've been running BOINC for quite a while. When I installed Ubuntu 16.04 I just linked to the old BOINC directory, which had been initialized from the BOINC download page, which is still handing out 7.2.42. I start the client manually using scripts written some time in the distant past; this PC is rebooted infrequently enough that it's never been worthwhile to switch over to letting init manage BOINC. The scripts don't work if there's a boinc running as user "boinc", and the obvious way to prevent init from starting boinc (setting ENABLED="0" in /etc/default/boinc-client) didn't prevent init from starting boinc. It remains to be seen if I've fixed that, the next time this PC gets a reboot.

Thank you for your assistance!
ID: 66157 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileDarrell Tangman

Send message
Joined: 30 Jun 10
Posts: 5
Credit: 12,110,997
RAC: 7,060
10 million credit badge9 year member badgeextraordinary contributions badge
Message 66176 - Posted: 10 Feb 2017, 23:58:48 UTC - in response to Message 66157.  

The update to BOINC client version 7.6.31 successfully corrected my problem; Milkyway@Home work units are now completing correctly and validating.
ID: 66176 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : Consistent "Validate error" status

©2019 Astroinformatics Group