Welcome to MilkyWay@home

Update on This Weeks Errors

Message boards : News : Update on This Weeks Errors
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 66726 - Posted: 21 Oct 2017, 18:11:25 UTC

Hey Everyone,

I just wanted to give you all a quick and more technical update on what we have been dealing with over the last week.

Our project has been running for upwards of 10 years now and we have been crunching literally billions of workunits over those years. As a result of all of your hard work and dedication, we have actually calculated enough results that we have run out of room to store the IDs of all of these results in a normal unsigned integer value (the default data type used for storing IDs in BOINC databases). As a result, on Tuesday night, I updated our database to be able to store IDs in a much larger data type to prevent this issue from happening again during the remaining life of the project. As a result, I also had to quickly patch the BOINC code we run on the server to allow it to use this newly available data type in the database.

During this process, I missed one of the foreign keys that refers to the results, specifically in the validation process. This led to an issue in validation of work units over the last couple days. I had trouble diagnosing this issue because before the workunit queue clear, everything was running fine however, after the workunit queue clear, when new work with large IDs was being returned, it silently failed to validate workunits.

I have implemented a fix for this last issue and my hope is that things will be smooth sailing from here on with regards to this issue. Of course, there is always the possibility I missed another bug in this pipeline, or I might realize I need to force validation on workunits received over the last couple days. In any case, I will be watching the server pretty closely over the next couple days to make sure its running correctly.

Sorry for all of the trouble over the last couple days and I hope you all continue to support the important research we are doing into the future.

Jake
ID: 66726 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rbrahn

Send message
Joined: 16 Jul 17
Posts: 6
Credit: 222,105,326
RAC: 0
Message 66727 - Posted: 21 Oct 2017, 18:19:33 UTC - in response to Message 66726.  

Thanks for the update Jake. Ask me sometime about when I ruined a university license server with a bit of misjudged scripting.

Just to be clear: how can we help you get things up and running smooth again--Should we clear out all currently-downloaded tasks and start fresh, keep crunching what we've got at the moment even if they're not validating, or some other option I've not considered yet?
ID: 66727 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ulrich Metzner
Avatar

Send message
Joined: 11 Apr 15
Posts: 58
Credit: 63,291,127
RAC: 0
Message 66728 - Posted: 21 Oct 2017, 18:26:06 UTC

ID: 66728 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 66729 - Posted: 21 Oct 2017, 21:06:21 UTC

Hey Ulrich,

I'll keep an eye on those two work units. It might take me a while to figure out why they aren't validating.

Sorry,
Jake
ID: 66729 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ulrich Metzner
Avatar

Send message
Joined: 11 Apr 15
Posts: 58
Credit: 63,291,127
RAC: 0
Message 66732 - Posted: 22 Oct 2017, 0:58:14 UTC

Oh well, i doesn't seem to work, so i suspended again - sorry... :/
Aloha, Uli

ID: 66732 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 66733 - Posted: 22 Oct 2017, 1:29:23 UTC

It's okay Ulrich. I appreciate you're help.

Jake
ID: 66733 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile TimeRanger

Send message
Joined: 31 Oct 10
Posts: 83
Credit: 38,632,375
RAC: 0
Message 66734 - Posted: 22 Oct 2017, 4:03:20 UTC - in response to Message 66733.  

I took a break for a day, then reloaded all new units ... most go to the Inconclusive file .. before going to the Invalid bin. Still nothing since the 16th going Valid or even Pending.
ID: 66734 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 66735 - Posted: 22 Oct 2017, 4:29:14 UTC

Hey TimeRanger,

I just got the validator fixed. I've tested it on a few workunits and I watched it validate them. I'm going to go ahead and try to come up with a query to tell the validator to recheck all validation inconclusive workunits as well as workunits that were errored for running too many runs. You should see a massive validation run here in a few minutes and you should see retroactive credits even on dead workunits.

As a note some units which we cancelled over the last couple days still will not validate as they were cancelled before a consensus could have been reached on the correct answer.

Jake
ID: 66735 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
shu
Avatar

Send message
Joined: 4 Aug 17
Posts: 8
Credit: 199,494,186
RAC: 0
Message 66736 - Posted: 22 Oct 2017, 6:20:11 UTC - in response to Message 66735.  
Last modified: 22 Oct 2017, 6:20:36 UTC

Hey TimeRanger,

I just got the validator fixed. I've tested it on a few workunits and I watched it validate them. I'm going to go ahead and try to come up with a query to tell the validator to recheck all validation inconclusive workunits as well as workunits that were errored for running too many runs. You should see a massive validation run here in a few minutes and you should see retroactive credits even on dead workunits.

As a note some units which we cancelled over the last couple days still will not validate as they were cancelled before a consensus could have been reached on the correct answer.

Jake

Good luck and I hope it's gonna work, fingers crossed for you guys and I hope you have a good sunday regardless!
ID: 66736 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile TimeRanger

Send message
Joined: 31 Oct 10
Posts: 83
Credit: 38,632,375
RAC: 0
Message 66737 - Posted: 22 Oct 2017, 6:41:26 UTC - in response to Message 66736.  

Hey TimeRanger,

I just got the validator fixed. I've tested it on a few workunits and I watched it validate them. I'm going to go ahead and try to come up with a query to tell the validator to recheck all validation inconclusive workunits as well as workunits that were errored for running too many runs. You should see a massive validation run here in a few minutes and you should see retroactive credits even on dead workunits.

As a note some units which we cancelled over the last couple days still will not validate as they were cancelled before a consensus could have been reached on the correct answer.

Jake

Good luck and I hope it's gonna work, fingers crossed for you guys and I hope you have a good sunday regardless!


UPDATE - I now have VALID results and a number of units PENDING. My number of Inconclusive hasn't changed ... most of them were completed within the last 24 hours. THANKS!
ID: 66737 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
aad

Send message
Joined: 30 Mar 09
Posts: 63
Credit: 621,582,726
RAC: 11
Message 66740 - Posted: 22 Oct 2017, 11:43:46 UTC
Last modified: 22 Oct 2017, 11:48:44 UTC

I will try a single load of wu's to help things sorted out.

{edit}
Numbers are looking better now.
There are wu's validated now
State: All (828) · In progress (318) · Validation pending (1) · Validation inconclusive (59) · Valid (346) · Invalid (56) · Error (48) 
ID: 66740 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
paris
Avatar

Send message
Joined: 26 Apr 08
Posts: 87
Credit: 64,801,496
RAC: 0
Message 66741 - Posted: 22 Oct 2017, 12:06:01 UTC

Looking good on my end. Thank you for all of your effort, it appears to have paid off.


Plus SETI Classic = 21,082 WUs
ID: 66741 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
aad

Send message
Joined: 30 Mar 09
Posts: 63
Credit: 621,582,726
RAC: 11
Message 66742 - Posted: 22 Oct 2017, 13:49:07 UTC
Last modified: 22 Oct 2017, 13:51:46 UTC

State: All (825) · In progress (158) · Validation pending (1) · Validation inconclusive (62) · Valid (502) · Invalid (54) · Error (48) 


Mutch better.
It probably takes some more time to clear the emty errorous 'Error' state.....;-)
ID: 66742 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
shu
Avatar

Send message
Joined: 4 Aug 17
Posts: 8
Credit: 199,494,186
RAC: 0
Message 66743 - Posted: 22 Oct 2017, 16:07:08 UTC

Well done guys :)
ID: 66743 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ulrich Metzner
Avatar

Send message
Joined: 11 Apr 15
Posts: 58
Credit: 63,291,127
RAC: 0
Message 66744 - Posted: 22 Oct 2017, 18:53:01 UTC

It works again - thanks a lot! :)
Aloha, Uli

ID: 66744 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 66745 - Posted: 22 Oct 2017, 19:04:43 UTC

Glad to hear everything is working for you all again. Thank you so much for sticking with me through the last week.

Hope you all enjoy the rest of your weekend.

Jake
ID: 66745 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
HassanShebli

Send message
Joined: 2 Oct 10
Posts: 74
Credit: 18,362,557
RAC: 0
Message 66746 - Posted: 23 Oct 2017, 2:42:23 UTC

Completed, validation inconclusive

de_modfit_fast_18_3s_146_bundle5_ModfitConstraintsWithDisk-fixed_Bouncy_4_1508362762_134069_0

de_modfit_fast_19_3s_146_bundle5_ModfitConstraintsWithDiskandUpdateStreams_1_Bouncy_1_1508646134_281756_0

de_modfit_fast_19_3s_146_bundle5_ModfitConstraintsWithDiskandUpdateStreams_1_Bouncy_3_1508646134_281796_0

de_modfit_fast_19_3s_146_bundle5_ModfitConstraintsWithDiskandUpdateStreams_1_Bouncy_3_1508646134_281798_0

de_modfit_fast_19_3s_146_bundle5_ModfitConstraintsWithDiskandUpdateStreams_1_Bouncy_3_1508646134_281802_0

de_modfit_fast_20_3s_146_bundle5_ModfitConstraintsWithDisk-fixed_Bouncy_1_1508362762_134490_0
ID: 66746 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JHMarshall

Send message
Joined: 24 Jul 12
Posts: 40
Credit: 7,123,301,054
RAC: 0
Message 66747 - Posted: 23 Oct 2017, 4:58:58 UTC - in response to Message 66745.  

I've had a couple of systems running MW on autopilot for the last few days while I address other issues. I just got back to all the commotion. WOW!! I don't know about the others, but I crunch for the science, not for the worthless credits. So, I just brought 4 more machines back onto MW because so many were whinning they were dropping out.

MW and Einstein were my first BOINC projects and they are still my favorite GPU projects.

Thanks for the hard work Jake and keep you head up, but duck when you have to!!!

Joe
ID: 66747 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 66748 - Posted: 23 Oct 2017, 17:04:19 UTC

Hey HassanShebli,

I just checked all of those listed workunits. It seems the workunits that is still waiting to validate is de_modfit_fast_19_3s_146_bundle5_ModfitConstraintsWithDiskandUpdateStreams_1_Bouncy_3_1508646134_281802_0.

I can confirm this one is just waiting for its sister result to return to check your answer against.

Jake
ID: 66748 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
HassanShebli

Send message
Joined: 2 Oct 10
Posts: 74
Credit: 18,362,557
RAC: 0
Message 66749 - Posted: 23 Oct 2017, 19:09:05 UTC - in response to Message 66748.  
Last modified: 23 Oct 2017, 19:17:36 UTC

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=749153&offset=0&show_names=0&state=3&appid=

so does this mean that it is a matter of time?
ID: 66749 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : Update on This Weeks Errors

©2024 Astroinformatics Group