Message boards :
News :
Update on This Weeks Errors
Message board moderation
Author | Message |
---|---|
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey Everyone, I just wanted to give you all a quick and more technical update on what we have been dealing with over the last week. Our project has been running for upwards of 10 years now and we have been crunching literally billions of workunits over those years. As a result of all of your hard work and dedication, we have actually calculated enough results that we have run out of room to store the IDs of all of these results in a normal unsigned integer value (the default data type used for storing IDs in BOINC databases). As a result, on Tuesday night, I updated our database to be able to store IDs in a much larger data type to prevent this issue from happening again during the remaining life of the project. As a result, I also had to quickly patch the BOINC code we run on the server to allow it to use this newly available data type in the database. During this process, I missed one of the foreign keys that refers to the results, specifically in the validation process. This led to an issue in validation of work units over the last couple days. I had trouble diagnosing this issue because before the workunit queue clear, everything was running fine however, after the workunit queue clear, when new work with large IDs was being returned, it silently failed to validate workunits. I have implemented a fix for this last issue and my hope is that things will be smooth sailing from here on with regards to this issue. Of course, there is always the possibility I missed another bug in this pipeline, or I might realize I need to force validation on workunits received over the last couple days. In any case, I will be watching the server pretty closely over the next couple days to make sure its running correctly. Sorry for all of the trouble over the last couple days and I hope you all continue to support the important research we are doing into the future. Jake |
Send message Joined: 16 Jul 17 Posts: 6 Credit: 222,105,326 RAC: 0 |
Thanks for the update Jake. Ask me sometime about when I ruined a university license server with a bit of misjudged scripting. Just to be clear: how can we help you get things up and running smooth again--Should we clear out all currently-downloaded tasks and start fresh, keep crunching what we've got at the moment even if they're not validating, or some other option I've not considered yet? |
Send message Joined: 11 Apr 15 Posts: 58 Credit: 63,291,127 RAC: 0 |
Well, i returned a few results now and they are still NOT validated: https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1529117916 https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1529139409 :? Aloha, Uli |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey Ulrich, I'll keep an eye on those two work units. It might take me a while to figure out why they aren't validating. Sorry, Jake |
Send message Joined: 11 Apr 15 Posts: 58 Credit: 63,291,127 RAC: 0 |
Oh well, i doesn't seem to work, so i suspended again - sorry... :/ Aloha, Uli |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
It's okay Ulrich. I appreciate you're help. Jake |
Send message Joined: 31 Oct 10 Posts: 83 Credit: 38,632,375 RAC: 0 |
I took a break for a day, then reloaded all new units ... most go to the Inconclusive file .. before going to the Invalid bin. Still nothing since the 16th going Valid or even Pending. |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey TimeRanger, I just got the validator fixed. I've tested it on a few workunits and I watched it validate them. I'm going to go ahead and try to come up with a query to tell the validator to recheck all validation inconclusive workunits as well as workunits that were errored for running too many runs. You should see a massive validation run here in a few minutes and you should see retroactive credits even on dead workunits. As a note some units which we cancelled over the last couple days still will not validate as they were cancelled before a consensus could have been reached on the correct answer. Jake |
Send message Joined: 4 Aug 17 Posts: 8 Credit: 199,494,186 RAC: 0 |
Hey TimeRanger, Good luck and I hope it's gonna work, fingers crossed for you guys and I hope you have a good sunday regardless! |
Send message Joined: 31 Oct 10 Posts: 83 Credit: 38,632,375 RAC: 0 |
Hey TimeRanger, UPDATE - I now have VALID results and a number of units PENDING. My number of Inconclusive hasn't changed ... most of them were completed within the last 24 hours. THANKS! |
Send message Joined: 30 Mar 09 Posts: 63 Credit: 621,582,726 RAC: 0 |
I will try a single load of wu's to help things sorted out. {edit} Numbers are looking better now. There are wu's validated now State: All (828) · In progress (318) · Validation pending (1) · Validation inconclusive (59) · Valid (346) · Invalid (56) · Error (48) |
Send message Joined: 26 Apr 08 Posts: 87 Credit: 64,801,496 RAC: 0 |
Looking good on my end. Thank you for all of your effort, it appears to have paid off. Plus SETI Classic = 21,082 WUs |
Send message Joined: 30 Mar 09 Posts: 63 Credit: 621,582,726 RAC: 0 |
State: All (825) · In progress (158) · Validation pending (1) · Validation inconclusive (62) · Valid (502) · Invalid (54) · Error (48) Mutch better. It probably takes some more time to clear the emty errorous 'Error' state.....;-) |
Send message Joined: 4 Aug 17 Posts: 8 Credit: 199,494,186 RAC: 0 |
Well done guys :) |
Send message Joined: 11 Apr 15 Posts: 58 Credit: 63,291,127 RAC: 0 |
It works again - thanks a lot! :) Aloha, Uli |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Glad to hear everything is working for you all again. Thank you so much for sticking with me through the last week. Hope you all enjoy the rest of your weekend. Jake |
Send message Joined: 2 Oct 10 Posts: 74 Credit: 18,362,557 RAC: 0 |
Completed, validation inconclusive de_modfit_fast_18_3s_146_bundle5_ModfitConstraintsWithDisk-fixed_Bouncy_4_1508362762_134069_0 de_modfit_fast_19_3s_146_bundle5_ModfitConstraintsWithDiskandUpdateStreams_1_Bouncy_1_1508646134_281756_0 de_modfit_fast_19_3s_146_bundle5_ModfitConstraintsWithDiskandUpdateStreams_1_Bouncy_3_1508646134_281796_0 de_modfit_fast_19_3s_146_bundle5_ModfitConstraintsWithDiskandUpdateStreams_1_Bouncy_3_1508646134_281798_0 de_modfit_fast_19_3s_146_bundle5_ModfitConstraintsWithDiskandUpdateStreams_1_Bouncy_3_1508646134_281802_0 de_modfit_fast_20_3s_146_bundle5_ModfitConstraintsWithDisk-fixed_Bouncy_1_1508362762_134490_0 |
Send message Joined: 24 Jul 12 Posts: 40 Credit: 7,123,301,054 RAC: 0 |
I've had a couple of systems running MW on autopilot for the last few days while I address other issues. I just got back to all the commotion. WOW!! I don't know about the others, but I crunch for the science, not for the worthless credits. So, I just brought 4 more machines back onto MW because so many were whinning they were dropping out. MW and Einstein were my first BOINC projects and they are still my favorite GPU projects. Thanks for the hard work Jake and keep you head up, but duck when you have to!!! Joe |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey HassanShebli, I just checked all of those listed workunits. It seems the workunits that is still waiting to validate is de_modfit_fast_19_3s_146_bundle5_ModfitConstraintsWithDiskandUpdateStreams_1_Bouncy_3_1508646134_281802_0. I can confirm this one is just waiting for its sister result to return to check your answer against. Jake |
Send message Joined: 2 Oct 10 Posts: 74 Credit: 18,362,557 RAC: 0 |
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=749153&offset=0&show_names=0&state=3&appid= so does this mean that it is a matter of time? |
©2024 Astroinformatics Group