Scheduled Server Maintenance Concluded
log in

Advanced search

Message boards : News : Scheduled Server Maintenance Concluded

1 · 2 · 3 · Next
Author Message
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 430
Credit: 7,615,331
RAC: 8,186

Message 66327 - Posted: 3 May 2017, 15:51:46 UTC

Hello Everyone,

I just finished up server maintenance. Expect errors from any runs not labelled:

de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_1
de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_2
de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_3
de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_4

If you experience errors from these runs specifically please post below so I can help troubleshoot them.

Thanks for your patience and support,
Jake W.

John G
Send message
Joined: 1 Apr 10
Posts: 49
Credit: 171,863,025
RAC: 0

Message 66338 - Posted: 3 May 2017, 17:50:55 UTC

You said that there would be a credit update as well. Not seeing that !

Regards

John

Profile GIPICS
Send message
Joined: 24 Apr 17
Posts: 8
Credit: 47,456,536
RAC: 7,498

Message 66339 - Posted: 3 May 2017, 18:02:12 UTC

Is a massacre. Hundreds of wu ends with error after 1 second of crunching on a r9-290.

All worked fine till this update. Do we need some special app config?

Regards

Profile GIPICS
Send message
Joined: 24 Apr 17
Posts: 8
Credit: 47,456,536
RAC: 7,498

Message 66341 - Posted: 3 May 2017, 18:48:12 UTC

Wus like this work well

de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_4_1491420002_7966440_3


the ones you mentioned like

de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_1_1493824438_35417_1

end all with error of BSOD.

Why don't you delete this wus batch??

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 430
Credit: 7,615,331
RAC: 8,186

Message 66342 - Posted: 3 May 2017, 19:13:17 UTC

GIPICS,

Can you double check and make sure you have the names right. The Sim19 runs should be running fine and the others should be erroring. Nothing should need to be changed on your end unless you are using a custom built binary. If you are running a custom binary you will need to rebuild it from master on our github page.

I see you have several hosts attached to your account. Is this a common problem you see across all of your hosts?

Jake

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 430
Credit: 7,615,331
RAC: 8,186

Message 66343 - Posted: 3 May 2017, 19:15:41 UTC

John,

Looking into the credit issue now. Might be a couple hours before you see it updated on your end as workunits run through the queue.

Jake

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 430
Credit: 7,615,331
RAC: 8,186

Message 66344 - Posted: 3 May 2017, 19:29:03 UTC

Giving the server a quick reboot while I update the server binaries to recognize the new credit calculation. Don't be alarmed.

Jake

Profile GIPICS
Send message
Joined: 24 Apr 17
Posts: 8
Credit: 47,456,536
RAC: 7,498

Message 66345 - Posted: 3 May 2017, 19:29:06 UTC

Here i am

so yes, i had the opposite situation
What sholdn't work worked very well and the wus with good labels ended in massacre

Anyway on the r9-290 all the wu like this

de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_x

ended after one second with computing error

I would like to remind that till today, to let the 290 crunching i needed (and i think most of the crunchers) an app_info.xml

I stopped boinc, deleted that app_info and .. started the boinc client and magia!

all the wus
de_modfit_fast_Sim19_3s_146_bundle5_ModfitConstraintsWithDisk_x

works very well

yeahhhhhhhhhhhhhhhhhh

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 430
Credit: 7,615,331
RAC: 8,186

Message 66346 - Posted: 3 May 2017, 19:33:36 UTC

GIPICS,

Looks like you were running with a custom application, or preventing the client application from updating with your old config file. Glad all works now.

I just put up updated runs the corrected credits. A 5 work unit bundle should now give 221 credits I believe. This should reflect the increase crunching time required.

Jake

Profile GIPICS
Send message
Joined: 24 Apr 17
Posts: 8
Credit: 47,456,536
RAC: 7,498

Message 66348 - Posted: 3 May 2017, 19:46:52 UTC

But this is not all
on the 280x - 280 all the wu like

de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_4_1491420002_7966440_3

end with error

no app_info.xml here

any way to prevent to get those wus?
you were talking about a way to solve this issue...

aad
Send message
Joined: 30 Mar 09
Posts: 51
Credit: 242,155,069
RAC: 431,967

Message 66352 - Posted: 3 May 2017, 20:58:14 UTC - in response to Message 66346.

GIPICS,

I just put up updated runs the corrected credits. A 5 work unit bundle should now give 221 credits I believe. This should reflect the increase crunching time required.

Jake


Just had the first 'new credit'; 227.23 credits

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 430
Credit: 7,615,331
RAC: 8,186

Message 66353 - Posted: 3 May 2017, 21:09:56 UTC
Last modified: 3 May 2017, 21:10:03 UTC

GIPICS,

I have already cancelled making more of those workunits. Just letting them clear out of the queue.

Jake

Peter Hucker
Send message
Joined: 5 Jul 11
Posts: 103
Credit: 15,265,491
RAC: 61

Message 66354 - Posted: 3 May 2017, 21:18:31 UTC

I see the remaining time has been fixed on the 5 bundles. Thanks for that.

iwajabitw
Send message
Joined: 16 Nov 14
Posts: 11
Credit: 117,007,966
RAC: 116,049

Message 66355 - Posted: 3 May 2017, 23:59:39 UTC
Last modified: 4 May 2017, 0:00:05 UTC

I don't know if this means anything, but many of my compute errors like the ones listed in this thread say Nvida in the work unit details. This is a dual R9 280x system.

https://milkyway.cs.rpi.edu/milkyway/results.php?hostid=723824&offset=0&show_names=0&state=6&appid=
____________

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 430
Credit: 7,615,331
RAC: 8,186

Message 66356 - Posted: 4 May 2017, 0:21:20 UTC

Hey iwajabitw,

The Nvidia and AMD binaries are actually the same. We use OpenCL for our GPU code which is cross platform. Regardless, I see it saying its trying to run the AMD card. The errors are simply due to updating the parameter files. This means runs started before 1.46 use parameter files that are incompatible with the new application. These errors are completely normal and expected. All of the new runs with the new parameter files seem to be running just fine.

Thanks for the update,
Jake

iwajabitw
Send message
Joined: 16 Nov 14
Posts: 11
Credit: 117,007,966
RAC: 116,049

Message 66357 - Posted: 4 May 2017, 0:49:05 UTC - in response to Message 66356.

Hey iwajabitw,

The Nvidia and AMD binaries are actually the same. We use OpenCL for our GPU code which is cross platform. Regardless, I see it saying its trying to run the AMD card. The errors are simply due to updating the parameter files. This means runs started before 1.46 use parameter files that are incompatible with the new application. These errors are completely normal and expected. All of the new runs with the new parameter files seem to be running just fine.

Thanks for the update,
Jake

Thanks Jake
____________

Vortac
Send message
Joined: 22 Apr 09
Posts: 77
Credit: 1,051,203,004
RAC: 99,193

Message 66358 - Posted: 4 May 2017, 6:33:21 UTC

I am still getting tens of de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3 tasks. They are all erroring out immediately, deferring communication with the server. In turn, when there are plenty of such errored tasks, communication gets deferred for hours and the queue gets empty. The only solution is to force a manual update or to abort such tasks before they get errored out. But that's possible only for attended machines. Left alone, when there are plenty of such errors, the communication gets deferred further and further.

Peter Hucker
Send message
Joined: 5 Jul 11
Posts: 103
Credit: 15,265,491
RAC: 61

Message 66361 - Posted: 4 May 2017, 10:36:18 UTC - in response to Message 66358.

I'm getting the same, they stop after 1 or 2 seconds. I don't know how long I've had them for though, as I run 4 projects.

Profile GIPICS
Send message
Joined: 24 Apr 17
Posts: 8
Credit: 47,456,536
RAC: 7,498

Message 66362 - Posted: 4 May 2017, 11:29:31 UTC - in response to Message 66358.

I am still getting tens of de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3 tasks...... the communication gets deferred further and further.



Is the same over here

Why don't they delete those useless wus batch that bring only total mess?

Peter Hucker
Send message
Joined: 5 Jul 11
Posts: 103
Credit: 15,265,491
RAC: 61

Message 66363 - Posted: 4 May 2017, 11:35:43 UTC - in response to Message 66362.

Half of my 140 units are working fine. It's no big deal if they fail after 2 seconds, it's only wasted 2 seconds of my computer's time.

1 · 2 · 3 · Next
Post to thread

Message boards : News : Scheduled Server Maintenance Concluded


Main page · Your account · Message boards


Copyright © 2017 AstroInformatics Group