Scheduled Server Maintenance Concluded

Author	Message
Jake Weiss Volunteer moderator Project developer Project tester Project scientist Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0	Message 66327 - Posted: 3 May 2017, 15:51:46 UTC Hello Everyone, I just finished up server maintenance. Expect errors from any runs not labelled: de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_1 de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_2 de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_3 de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_4 If you experience errors from these runs specifically please post below so I can help troubleshoot them. Thanks for your patience and support, Jake W. ID: 66327 · Rating: 0 · rate: / Reply Quote

John G Send message Joined: 1 Apr 10 Posts: 49 Credit: 171,863,025 RAC: 0	Message 66338 - Posted: 3 May 2017, 17:50:55 UTC You said that there would be a credit update as well. Not seeing that ! Regards John ID: 66338 · Rating: 0 · rate: / Reply Quote

GIPICS Send message Joined: 24 Apr 17 Posts: 8 Credit: 77,149,813 RAC: 0	Message 66339 - Posted: 3 May 2017, 18:02:12 UTC Is a massacre. Hundreds of wu ends with error after 1 second of crunching on a r9-290. All worked fine till this update. Do we need some special app config? Regards ID: 66339 · Rating: 0 · rate: / Reply Quote

GIPICS Send message Joined: 24 Apr 17 Posts: 8 Credit: 77,149,813 RAC: 0	Message 66341 - Posted: 3 May 2017, 18:48:12 UTC Wus like this work well de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_4_1491420002_7966440_3 the ones you mentioned like de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_1_1493824438_35417_1 end all with error of BSOD. Why don't you delete this wus batch?? ID: 66341 · Rating: 0 · rate: / Reply Quote

Jake Weiss Volunteer moderator Project developer Project tester Project scientist Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0	Message 66342 - Posted: 3 May 2017, 19:13:17 UTC GIPICS, Can you double check and make sure you have the names right. The Sim19 runs should be running fine and the others should be erroring. Nothing should need to be changed on your end unless you are using a custom built binary. If you are running a custom binary you will need to rebuild it from master on our github page. I see you have several hosts attached to your account. Is this a common problem you see across all of your hosts? Jake ID: 66342 · Rating: 0 · rate: / Reply Quote

Jake Weiss Volunteer moderator Project developer Project tester Project scientist Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0	Message 66343 - Posted: 3 May 2017, 19:15:41 UTC John, Looking into the credit issue now. Might be a couple hours before you see it updated on your end as workunits run through the queue. Jake ID: 66343 · Rating: 0 · rate: / Reply Quote

Jake Weiss Volunteer moderator Project developer Project tester Project scientist Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0	Message 66344 - Posted: 3 May 2017, 19:29:03 UTC Giving the server a quick reboot while I update the server binaries to recognize the new credit calculation. Don't be alarmed. Jake ID: 66344 · Rating: 0 · rate: / Reply Quote

GIPICS Send message Joined: 24 Apr 17 Posts: 8 Credit: 77,149,813 RAC: 0	Message 66345 - Posted: 3 May 2017, 19:29:06 UTC Here i am so yes, i had the opposite situation What sholdn't work worked very well and the wus with good labels ended in massacre Anyway on the r9-290 all the wu like this de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_x ended after one second with computing error I would like to remind that till today, to let the 290 crunching i needed (and i think most of the crunchers) an app_info.xml I stopped boinc, deleted that app_info and .. started the boinc client and magia! all the wus de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_x works very well yeahhhhhhhhhhhhhhhhhh ID: 66345 · Rating: 0 · rate: / Reply Quote

Jake Weiss Volunteer moderator Project developer Project tester Project scientist Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0	Message 66346 - Posted: 3 May 2017, 19:33:36 UTC GIPICS, Looks like you were running with a custom application, or preventing the client application from updating with your old config file. Glad all works now. I just put up updated runs the corrected credits. A 5 work unit bundle should now give 221 credits I believe. This should reflect the increase crunching time required. Jake ID: 66346 · Rating: 0 · rate: / Reply Quote

GIPICS Send message Joined: 24 Apr 17 Posts: 8 Credit: 77,149,813 RAC: 0	Message 66348 - Posted: 3 May 2017, 19:46:52 UTC But this is not all on the 280x - 280 all the wu like de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_4_1491420002_7966440_3 end with error no app_info.xml here any way to prevent to get those wus? you were talking about a way to solve this issue... ID: 66348 · Rating: 0 · rate: / Reply Quote

aad Send message Joined: 30 Mar 09 Posts: 63 Credit: 621,678,585 RAC: 1,564	Message 66352 - Posted: 3 May 2017, 20:58:14 UTC - in response to Message 66346. GIPICS, I just put up updated runs the corrected credits. A 5 work unit bundle should now give 221 credits I believe. This should reflect the increase crunching time required. Jake Just had the first 'new credit'; 227.23 credits ID: 66352 · Rating: 0 · rate: / Reply Quote

Jake Weiss Volunteer moderator Project developer Project tester Project scientist Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0	Message 66353 - Posted: 3 May 2017, 21:09:56 UTC Last modified: 3 May 2017, 21:10:03 UTC GIPICS, I have already cancelled making more of those workunits. Just letting them clear out of the queue. Jake ID: 66353 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 5 Jul 11 Posts: 993 Credit: 377,123,066 RAC: 1,023	Message 66354 - Posted: 3 May 2017, 21:18:31 UTC I see the remaining time has been fixed on the 5 bundles. Thanks for that. ID: 66354 · Rating: 0 · rate: / Reply Quote

iwajabitw Send message Joined: 16 Nov 14 Posts: 16 Credit: 335,683,507 RAC: 0	Message 66355 - Posted: 3 May 2017, 23:59:39 UTC Last modified: 4 May 2017, 0:00:05 UTC I don't know if this means anything, but many of my compute errors like the ones listed in this thread say Nvida in the work unit details. This is a dual R9 280x system. https://milkyway.cs.rpi.edu/milkyway/results.php?hostid=723824&offset=0&show_names=0&state=6&appid= ID: 66355 · Rating: 0 · rate: / Reply Quote

Jake Weiss Volunteer moderator Project developer Project tester Project scientist Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0	Message 66356 - Posted: 4 May 2017, 0:21:20 UTC Hey iwajabitw, The Nvidia and AMD binaries are actually the same. We use OpenCL for our GPU code which is cross platform. Regardless, I see it saying its trying to run the AMD card. The errors are simply due to updating the parameter files. This means runs started before 1.46 use parameter files that are incompatible with the new application. These errors are completely normal and expected. All of the new runs with the new parameter files seem to be running just fine. Thanks for the update, Jake ID: 66356 · Rating: 0 · rate: / Reply Quote

iwajabitw Send message Joined: 16 Nov 14 Posts: 16 Credit: 335,683,507 RAC: 0	Message 66357 - Posted: 4 May 2017, 0:49:05 UTC - in response to Message 66356. Hey iwajabitw, The Nvidia and AMD binaries are actually the same. We use OpenCL for our GPU code which is cross platform. Regardless, I see it saying its trying to run the AMD card. The errors are simply due to updating the parameter files. This means runs started before 1.46 use parameter files that are incompatible with the new application. These errors are completely normal and expected. All of the new runs with the new parameter files seem to be running just fine. Thanks for the update, Jake Thanks Jake ID: 66357 · Rating: 0 · rate: / Reply Quote

Vortac Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0	Message 66358 - Posted: 4 May 2017, 6:33:21 UTC I am still getting tens of de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3 tasks. They are all erroring out immediately, deferring communication with the server. In turn, when there are plenty of such errored tasks, communication gets deferred for hours and the queue gets empty. The only solution is to force a manual update or to abort such tasks before they get errored out. But that's possible only for attended machines. Left alone, when there are plenty of such errors, the communication gets deferred further and further. ID: 66358 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 5 Jul 11 Posts: 993 Credit: 377,123,066 RAC: 1,023	Message 66361 - Posted: 4 May 2017, 10:36:18 UTC - in response to Message 66358. I'm getting the same, they stop after 1 or 2 seconds. I don't know how long I've had them for though, as I run 4 projects. ID: 66361 · Rating: 0 · rate: / Reply Quote

GIPICS Send message Joined: 24 Apr 17 Posts: 8 Credit: 77,149,813 RAC: 0	Message 66362 - Posted: 4 May 2017, 11:29:31 UTC - in response to Message 66358. I am still getting tens of de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3 tasks...... the communication gets deferred further and further. Is the same over here Why don't they delete those useless wus batch that bring only total mess? ID: 66362 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 5 Jul 11 Posts: 993 Credit: 377,123,066 RAC: 1,023	Message 66363 - Posted: 4 May 2017, 11:35:43 UTC - in response to Message 66362. Half of my 140 units are working fine. It's no big deal if they fail after 2 seconds, it's only wasted 2 seconds of my computer's time. ID: 66363 · Rating: 0 · rate: / Reply Quote