Update on This Weeks Errors
log in

Advanced search

Message boards : News : Update on This Weeks Errors

1 · 2 · Next
Author Message
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,894,655
RAC: 175,326

Message 66726 - Posted: 21 Oct 2017, 18:11:25 UTC

Hey Everyone,

I just wanted to give you all a quick and more technical update on what we have been dealing with over the last week.

Our project has been running for upwards of 10 years now and we have been crunching literally billions of workunits over those years. As a result of all of your hard work and dedication, we have actually calculated enough results that we have run out of room to store the IDs of all of these results in a normal unsigned integer value (the default data type used for storing IDs in BOINC databases). As a result, on Tuesday night, I updated our database to be able to store IDs in a much larger data type to prevent this issue from happening again during the remaining life of the project. As a result, I also had to quickly patch the BOINC code we run on the server to allow it to use this newly available data type in the database.

During this process, I missed one of the foreign keys that refers to the results, specifically in the validation process. This led to an issue in validation of work units over the last couple days. I had trouble diagnosing this issue because before the workunit queue clear, everything was running fine however, after the workunit queue clear, when new work with large IDs was being returned, it silently failed to validate workunits.

I have implemented a fix for this last issue and my hope is that things will be smooth sailing from here on with regards to this issue. Of course, there is always the possibility I missed another bug in this pipeline, or I might realize I need to force validation on workunits received over the last couple days. In any case, I will be watching the server pretty closely over the next couple days to make sure its running correctly.

Sorry for all of the trouble over the last couple days and I hope you all continue to support the important research we are doing into the future.

Jake

rbrahn
Send message
Joined: 16 Jul 17
Posts: 5
Credit: 23,735,606
RAC: 325,254

Message 66727 - Posted: 21 Oct 2017, 18:19:33 UTC - in response to Message 66726.

Thanks for the update Jake. Ask me sometime about when I ruined a university license server with a bit of misjudged scripting.

Just to be clear: how can we help you get things up and running smooth again--Should we clear out all currently-downloaded tasks and start fresh, keep crunching what we've got at the moment even if they're not validating, or some other option I've not considered yet?

Ulrich Metzner
Avatar
Send message
Joined: 11 Apr 15
Posts: 41
Credit: 16,121,772
RAC: 32,640

Message 66728 - Posted: 21 Oct 2017, 18:26:06 UTC

Well, i returned a few results now and they are still NOT validated:

https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1529117916
https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1529139409

:?
____________
Aloha, Uli

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,894,655
RAC: 175,326

Message 66729 - Posted: 21 Oct 2017, 21:06:21 UTC

Hey Ulrich,

I'll keep an eye on those two work units. It might take me a while to figure out why they aren't validating.

Sorry,
Jake

Ulrich Metzner
Avatar
Send message
Joined: 11 Apr 15
Posts: 41
Credit: 16,121,772
RAC: 32,640

Message 66732 - Posted: 22 Oct 2017, 0:58:14 UTC

Oh well, i doesn't seem to work, so i suspended again - sorry... :/
____________
Aloha, Uli

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,894,655
RAC: 175,326

Message 66733 - Posted: 22 Oct 2017, 1:29:23 UTC

It's okay Ulrich. I appreciate you're help.

Jake

Profile TimeRanger
Send message
Joined: 31 Oct 10
Posts: 74
Credit: 22,937,531
RAC: 28,503

Message 66734 - Posted: 22 Oct 2017, 4:03:20 UTC - in response to Message 66733.

I took a break for a day, then reloaded all new units ... most go to the Inconclusive file .. before going to the Invalid bin. Still nothing since the 16th going Valid or even Pending.

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,894,655
RAC: 175,326

Message 66735 - Posted: 22 Oct 2017, 4:29:14 UTC

Hey TimeRanger,

I just got the validator fixed. I've tested it on a few workunits and I watched it validate them. I'm going to go ahead and try to come up with a query to tell the validator to recheck all validation inconclusive workunits as well as workunits that were errored for running too many runs. You should see a massive validation run here in a few minutes and you should see retroactive credits even on dead workunits.

As a note some units which we cancelled over the last couple days still will not validate as they were cancelled before a consensus could have been reached on the correct answer.

Jake

shu
Avatar
Send message
Joined: 4 Aug 17
Posts: 7
Credit: 107,718,155
RAC: 1,059,205

Message 66736 - Posted: 22 Oct 2017, 6:20:11 UTC - in response to Message 66735.
Last modified: 22 Oct 2017, 6:20:36 UTC

Hey TimeRanger,

I just got the validator fixed. I've tested it on a few workunits and I watched it validate them. I'm going to go ahead and try to come up with a query to tell the validator to recheck all validation inconclusive workunits as well as workunits that were errored for running too many runs. You should see a massive validation run here in a few minutes and you should see retroactive credits even on dead workunits.

As a note some units which we cancelled over the last couple days still will not validate as they were cancelled before a consensus could have been reached on the correct answer.

Jake

Good luck and I hope it's gonna work, fingers crossed for you guys and I hope you have a good sunday regardless!

Profile TimeRanger
Send message
Joined: 31 Oct 10
Posts: 74
Credit: 22,937,531
RAC: 28,503

Message 66737 - Posted: 22 Oct 2017, 6:41:26 UTC - in response to Message 66736.

Hey TimeRanger,

I just got the validator fixed. I've tested it on a few workunits and I watched it validate them. I'm going to go ahead and try to come up with a query to tell the validator to recheck all validation inconclusive workunits as well as workunits that were errored for running too many runs. You should see a massive validation run here in a few minutes and you should see retroactive credits even on dead workunits.

As a note some units which we cancelled over the last couple days still will not validate as they were cancelled before a consensus could have been reached on the correct answer.

Jake

Good luck and I hope it's gonna work, fingers crossed for you guys and I hope you have a good sunday regardless!


UPDATE - I now have VALID results and a number of units PENDING. My number of Inconclusive hasn't changed ... most of them were completed within the last 24 hours. THANKS!

aad
Send message
Joined: 30 Mar 09
Posts: 51
Credit: 248,784,446
RAC: 337,743

Message 66740 - Posted: 22 Oct 2017, 11:43:46 UTC
Last modified: 22 Oct 2017, 11:48:44 UTC

I will try a single load of wu's to help things sorted out.

{edit}
Numbers are looking better now.
There are wu's validated now

State: All (828) · In progress (318) · Validation pending (1) · Validation inconclusive (59) · Valid (346) · Invalid (56) · Error (48)

paris
Avatar
Send message
Joined: 26 Apr 08
Posts: 75
Credit: 19,593,665
RAC: 14,934

Message 66741 - Posted: 22 Oct 2017, 12:06:01 UTC

Looking good on my end. Thank you for all of your effort, it appears to have paid off.
____________

Plus SETI Classic = 21,082 WUs

aad
Send message
Joined: 30 Mar 09
Posts: 51
Credit: 248,784,446
RAC: 337,743

Message 66742 - Posted: 22 Oct 2017, 13:49:07 UTC
Last modified: 22 Oct 2017, 13:51:46 UTC

State: All (825) · In progress (158) · Validation pending (1) · Validation inconclusive (62) · Valid (502) · Invalid (54) · Error (48)


Mutch better.
It probably takes some more time to clear the emty errorous 'Error' state.....;-)

shu
Avatar
Send message
Joined: 4 Aug 17
Posts: 7
Credit: 107,718,155
RAC: 1,059,205

Message 66743 - Posted: 22 Oct 2017, 16:07:08 UTC

Well done guys :)

Ulrich Metzner
Avatar
Send message
Joined: 11 Apr 15
Posts: 41
Credit: 16,121,772
RAC: 32,640

Message 66744 - Posted: 22 Oct 2017, 18:53:01 UTC

It works again - thanks a lot! :)
____________
Aloha, Uli

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,894,655
RAC: 175,326

Message 66745 - Posted: 22 Oct 2017, 19:04:43 UTC

Glad to hear everything is working for you all again. Thank you so much for sticking with me through the last week.

Hope you all enjoy the rest of your weekend.

Jake

HassanShebli
Send message
Joined: 2 Oct 10
Posts: 72
Credit: 17,079,518
RAC: 2

Message 66746 - Posted: 23 Oct 2017, 2:42:23 UTC

Completed, validation inconclusive

de_modfit_fast_18_3s_146_bundle5_ModfitConstraintsWithDisk-fixed_Bouncy_4_1508362762_134069_0

de_modfit_fast_19_3s_146_bundle5_ModfitConstraintsWithDiskandUpdateStreams_1_Bouncy_1_1508646134_281756_0

de_modfit_fast_19_3s_146_bundle5_ModfitConstraintsWithDiskandUpdateStreams_1_Bouncy_3_1508646134_281796_0

de_modfit_fast_19_3s_146_bundle5_ModfitConstraintsWithDiskandUpdateStreams_1_Bouncy_3_1508646134_281798_0

de_modfit_fast_19_3s_146_bundle5_ModfitConstraintsWithDiskandUpdateStreams_1_Bouncy_3_1508646134_281802_0

de_modfit_fast_20_3s_146_bundle5_ModfitConstraintsWithDisk-fixed_Bouncy_1_1508362762_134490_0

JHMarshall
Send message
Joined: 24 Jul 12
Posts: 31
Credit: 870,315,789
RAC: 2,117,731

Message 66747 - Posted: 23 Oct 2017, 4:58:58 UTC - in response to Message 66745.

I've had a couple of systems running MW on autopilot for the last few days while I address other issues. I just got back to all the commotion. WOW!! I don't know about the others, but I crunch for the science, not for the worthless credits. So, I just brought 4 more machines back onto MW because so many were whinning they were dropping out.

MW and Einstein were my first BOINC projects and they are still my favorite GPU projects.

Thanks for the hard work Jake and keep you head up, but duck when you have to!!!

Joe

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,894,655
RAC: 175,326

Message 66748 - Posted: 23 Oct 2017, 17:04:19 UTC

Hey HassanShebli,

I just checked all of those listed workunits. It seems the workunits that is still waiting to validate is de_modfit_fast_19_3s_146_bundle5_ModfitConstraintsWithDiskandUpdateStreams_1_Bouncy_3_1508646134_281802_0.

I can confirm this one is just waiting for its sister result to return to check your answer against.

Jake

HassanShebli
Send message
Joined: 2 Oct 10
Posts: 72
Credit: 17,079,518
RAC: 2

Message 66749 - Posted: 23 Oct 2017, 19:09:05 UTC - in response to Message 66748.
Last modified: 23 Oct 2017, 19:17:36 UTC

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=749153&offset=0&show_names=0&state=3&appid=

so does this mean that it is a matter of time?

1 · 2 · Next
Post to thread

Message boards : News : Update on This Weeks Errors


Main page · Your account · Message boards


Copyright © 2017 AstroInformatics Group