Scheduled Maintenance Friday November 11th
log in

Advanced search

Message boards : News : Scheduled Maintenance Friday November 11th

1 · 2 · Next
Author Message
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,794,220
RAC: 169,487

Message 65629 - Posted: 9 Nov 2016, 19:42:17 UTC

Hi Everyone,

The server will be down for scheduled maintenance on Friday while we update the server and client to allow for work unit bundling. The code for this update is written and currently undergoing testing. This should allow for longer work unit times while maintaining fast crunching for single likelihood values. The result should be lower traffic on the database providing for more stability on the server while providing more work units for everyone.

I apologize for the issues we have been having recently with work unit availability and server stability. Hopefully this will be a permanent solution. If you have any questions please let me know.

Jake

bluestang
Send message
Joined: 13 Oct 16
Posts: 36
Credit: 67,046,505
RAC: 131

Message 65630 - Posted: 9 Nov 2016, 20:15:20 UTC - in response to Message 65629.

Sounds good. Improvements appreciated.

With this affect the way points are allocated?

Rymorea
Send message
Joined: 6 Oct 14
Posts: 45
Credit: 10,019,397
RAC: 963

Message 65631 - Posted: 9 Nov 2016, 20:23:04 UTC

I hope this will be finally solve as you said "Hopefully this will be a permanent solution".

As you written I understand that you merge tasks as one and send us. What about credit calculations at these big tasks ?
____________

Profile TimeRanger
Send message
Joined: 31 Oct 10
Posts: 74
Credit: 22,930,259
RAC: 28,478

Message 65632 - Posted: 9 Nov 2016, 20:48:33 UTC - in response to Message 65631.

Jake, any idea on how long the maintenance will take? Maybe you can increase the number of tasks we can have in our stash to keep us busy during the down time?

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,794,220
RAC: 169,487

Message 65633 - Posted: 9 Nov 2016, 22:16:23 UTC

Hey everyone,

This will not effect the way points are allocated by the project. I am using a simple formula for calculating credits and that is (Number of credits per work unit) * (Number of work units crunched in a bundle) = (awarded credit). Seems like a pretty reasonable way of doing things to me and might actually improve your credit generation since the overhead should be reduced slightly.

As for increasing work unit stashes in the mean time while the server is down, this is not possible. The whole reason for the maintenance and server/client update is to improve our throughput on workunits while decreasing load on the server/database. If I could give you a stash of work units before the server went down, I wouldn't need to do the maintenance. My goal is for the maintenance to be quick (<1 hour) however as we all know, big updates never go as planned. I am allocating myself the entire day (8hours) to debug and issues that might arise due to this update and of course I will check in over the weekend as well.

Thank you all for your continued support.

Jake

bluestang
Send message
Joined: 13 Oct 16
Posts: 36
Credit: 67,046,505
RAC: 131

Message 65634 - Posted: 10 Nov 2016, 0:20:25 UTC - in response to Message 65633.

I like it! Good luck and have a cold one ready for when you finish ;)

Profile TimeRanger
Send message
Joined: 31 Oct 10
Posts: 74
Credit: 22,930,259
RAC: 28,478

Message 65635 - Posted: 10 Nov 2016, 4:07:55 UTC - in response to Message 65634.

I like it! Good luck and have a cold one ready for when you finish ;)


If everything goes as planned - +-1 hour, I hope he has a LOT of cold ones ready for the balance of the allotted time! :)

Arivald Ha'gel
Send message
Joined: 30 Apr 14
Posts: 67
Credit: 160,074,149
RAC: 0

Message 65636 - Posted: 10 Nov 2016, 8:00:18 UTC

Good Luck!

How many WU will be bundled in a single task? (I hope for at least 10) :)
I hope that my R280X will finally have enough work to do... it's getting cold in my place :)

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,794,220
RAC: 169,487

Message 65639 - Posted: 10 Nov 2016, 14:56:03 UTC

Hey Arivald,

The plan for the beginning is to start conservatively with 10 work units per bundle. Depending on how the CPU only crunchers handle this (about 5-10 hours of work per bundle) we can then decide to increase it. The current code is written to allow for an arbitrary number of work units to be bundled together so I can change it rather easily.

It is also possible I may put up different runs using different numbers of bundled work units so one bundle may have 10 and the other may have 100. There are a lot of cool things I can do once I get this running.

Jake

Nick Name
Send message
Joined: 27 Jul 14
Posts: 8
Credit: 22,896,017
RAC: 53,503

Message 65640 - Posted: 10 Nov 2016, 16:11:55 UTC - in response to Message 65639.

Jake,

Thanks for your efforts.

The plan for the beginning is to start conservatively with 10 work units per bundle.


Does this mean we will be crunching 10 work units together (or whatever is in the bundle) at once? Right now I use an app_config to run six at a time, I intend to remove that before the new app is released but I would not like to try running 60 at a time.
____________
Team USA forum
Team USA page

mmonnin
Send message
Joined: 2 Oct 16
Posts: 63
Credit: 57,153,101
RAC: 3

Message 65642 - Posted: 10 Nov 2016, 18:40:23 UTC - in response to Message 65640.

My GPU spends nearly half the day doing other work besides MW due to lack of units so 1 hour is no biggy.

Jake,

Thanks for your efforts.

The plan for the beginning is to start conservatively with 10 work units per bundle.


Does this mean we will be crunching 10 work units together (or whatever is in the bundle) at once? Right now I use an app_config to run six at a time, I intend to remove that before the new app is released but I would not like to try running 60 at a time.


Haha I think you'll still do 6, but the 6 will take 10x as long. So you probably won't need to have it set to 6 to keep a GPU busy.

[VENETO] boboviz
Send message
Joined: 10 Feb 09
Posts: 29
Credit: 4,417,135
RAC: 27,456

Message 65644 - Posted: 10 Nov 2016, 19:21:17 UTC - in response to Message 65639.

The plan for the beginning is to start conservatively with 10 work units per bundle. Depending on how the CPU only crunchers handle this (about 5-10 hours of work per bundle) we can then decide to increase it. The current code is written to allow for an arbitrary number of work units to be bundled together so I can change it rather easily.


How longer will be the gpu wus? 2x? 5x??

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,794,220
RAC: 169,487

Message 65645 - Posted: 10 Nov 2016, 19:29:53 UTC

Hey Everyone,

Just to clear this up, essentially what will happen here is it will run X number of work units in series (not parallel) from a single WU request (what I now refer to as a bundle). This means if you need to run more than 1 current WU to fully utilize your GPU, you will still need to do this after the update. However, it should take X times longer to complete these runs.

At the beginning, I will set X to 10 so expect a 10 times longer run time for a single WU bundle. After I see how that seems to be effecting things, I may increase that number to 50 or even 100.

Just a quick update on how things are going with testing, I currently have all of the server side stuff running on a test server I set up. Seems like there aren't any obvious errors. Still trying to get it to serve test work units to a test computer. Hopefully, that will be working by the end of the day just to make sure the validator is working as expected.

Looks like we are on schedule to release early tomorrow morning.

Jake

mmonnin
Send message
Joined: 2 Oct 16
Posts: 63
Credit: 57,153,101
RAC: 3

Message 65647 - Posted: 10 Nov 2016, 21:17:40 UTC

So there will still be the data crunch on the CPU in the middle of each bundle? The part that was at the end will now be spread out across the unit or will that all occur at the end for all 10 tasks? This is really why I run multiple tasks at once as there is some time where the GPU is idle between tasks.

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,794,220
RAC: 169,487

Message 65648 - Posted: 11 Nov 2016, 14:11:38 UTC

Hey Everyone,

Just want to update everyone on the plan for today.

I am fixing one small issue with the progress bars in the client (basically every work unit in a bundle it would reset the progress bar back to 0% complete). After that, I have to recompile all of the clients. Then, I have to move all of the code from the test server to the production server. I am going to do a recompile on the production server for the server binaries, run an update, and do a restart. After that, I will monitor everything for any issues that might arise.

I have also run into some technical issues when trying to bundle 10 WUs. I am going to limit to 5 (which I know works through testing), until I can figure out another way to pass parameters to the client.

Jake

Rymorea
Send message
Joined: 6 Oct 14
Posts: 45
Credit: 10,019,397
RAC: 963

Message 65649 - Posted: 11 Nov 2016, 14:52:11 UTC

waiting good news and bundle tasks :)

Vortac
Send message
Joined: 22 Apr 09
Posts: 77
Credit: 1,052,831,093
RAC: 49,694

Message 65655 - Posted: 11 Nov 2016, 19:31:30 UTC

Received first 1.40 workunits, both CPU and GPU. Unfortunately, all ended with computation errors within a few seconds.

bluestang
Send message
Joined: 13 Oct 16
Posts: 36
Credit: 67,046,505
RAC: 131

Message 65658 - Posted: 11 Nov 2016, 19:36:07 UTC - in response to Message 65655.
Last modified: 11 Nov 2016, 19:40:30 UTC

Received first 1.40 workunits, both CPU and GPU. Unfortunately, all ended with computation errors within a few seconds.


Yep, same deal here (GPU only).

EDIT: Task Details...

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073741515 (0xc0000135)
</message>
]]>

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,794,220
RAC: 169,487

Message 65660 - Posted: 11 Nov 2016, 19:40:38 UTC

Vortac,

Can you please double check that its both CPU and GPU giving you errors and not just GPU?

Jake

Vortac
Send message
Joined: 22 Apr 09
Posts: 77
Credit: 1,052,831,093
RAC: 49,694

Message 65663 - Posted: 11 Nov 2016, 19:47:43 UTC - in response to Message 65660.
Last modified: 11 Nov 2016, 19:49:51 UTC

I do confirm it - both CPU and GPU tasks errored out quickly.
I also got the following entries in stderr output:
(unknown error) - exit code -1073741515 (0xc0000135)

And here are my error tasks:
http://milkyway.cs.rpi.edu/milkyway/results.php?userid=36817&offset=0&show_names=0&state=6&appid=

1 · 2 · Next
Post to thread

Message boards : News : Scheduled Maintenance Friday November 11th


Main page · Your account · Message boards


Copyright © 2017 AstroInformatics Group