rpi_logo
Update on This Weeks Errors
Update on This Weeks Errors
log in

Advanced search

Message boards : News : Update on This Weeks Errors

1 · 2 · Next
Author Message
Profile Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 492
Credit: 34,647,251
RAC: 9,184

Message 66726 - Posted: 21 Oct 2017, 18:11:25 UTC

Hey Everyone,

I just wanted to give you all a quick and more technical update on what we have been dealing with over the last week.

Our project has been running for upwards of 10 years now and we have been crunching literally billions of workunits over those years. As a result of all of your hard work and dedication, we have actually calculated enough results that we have run out of room to store the IDs of all of these results in a normal unsigned integer value (the default data type used for storing IDs in BOINC databases). As a result, on Tuesday night, I updated our database to be able to store IDs in a much larger data type to prevent this issue from happening again during the remaining life of the project. As a result, I also had to quickly patch the BOINC code we run on the server to allow it to use this newly available data type in the database.

During this process, I missed one of the foreign keys that refers to the results, specifically in the validation process. This led to an issue in validation of work units over the last couple days. I had trouble diagnosing this issue because before the workunit queue clear, everything was running fine however, after the workunit queue clear, when new work with large IDs was being returned, it silently failed to validate workunits.

I have implemented a fix for this last issue and my hope is that things will be smooth sailing from here on with regards to this issue. Of course, there is always the possibility I missed another bug in this pipeline, or I might realize I need to force validation on workunits received over the last couple days. In any case, I will be watching the server pretty closely over the next couple days to make sure its running correctly.

Sorry for all of the trouble over the last couple days and I hope you all continue to support the important research we are doing into the future.

Jake

rbrahn
Send message
Joined: 16 Jul 17
Posts: 5
Credit: 139,744,696
RAC: 701,142

Message 66727 - Posted: 21 Oct 2017, 18:19:33 UTC - in response to Message 66726.

Thanks for the update Jake. Ask me sometime about when I ruined a university license server with a bit of misjudged scripting.

Just to be clear: how can we help you get things up and running smooth again--Should we clear out all currently-downloaded tasks and start fresh, keep crunching what we've got at the moment even if they're not validating, or some other option I've not considered yet?

Ulrich Metzner
Avatar
Send message
Joined: 11 Apr 15
Posts: 48
Credit: 23,610,139
RAC: 32,074

Message 66728 - Posted: 21 Oct 2017, 18:26:06 UTC

Well, i returned a few results now and they are still NOT validated:

https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1529117916
https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1529139409

:?
____________
Aloha, Uli

Profile Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 492
Credit: 34,647,251
RAC: 9,184