Welcome to MilkyWay@home

Scheduled Maintenance Friday November 11th

Message boards : News : Scheduled Maintenance Friday November 11th
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65629 - Posted: 9 Nov 2016, 19:42:17 UTC

Hi Everyone,

The server will be down for scheduled maintenance on Friday while we update the server and client to allow for work unit bundling. The code for this update is written and currently undergoing testing. This should allow for longer work unit times while maintaining fast crunching for single likelihood values. The result should be lower traffic on the database providing for more stability on the server while providing more work units for everyone.

I apologize for the issues we have been having recently with work unit availability and server stability. Hopefully this will be a permanent solution. If you have any questions please let me know.

Jake
ID: 65629 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bluestang

Send message
Joined: 13 Oct 16
Posts: 112
Credit: 1,174,293,644
RAC: 0
Message 65630 - Posted: 9 Nov 2016, 20:15:20 UTC - in response to Message 65629.  

Sounds good. Improvements appreciated.

With this affect the way points are allocated?
ID: 65630 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rymorea

Send message
Joined: 6 Oct 14
Posts: 46
Credit: 20,017,425
RAC: 0
Message 65631 - Posted: 9 Nov 2016, 20:23:04 UTC

I hope this will be finally solve as you said "Hopefully this will be a permanent solution".

As you written I understand that you merge tasks as one and send us. What about credit calculations at these big tasks ?
ID: 65631 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile TimeRanger

Send message
Joined: 31 Oct 10
Posts: 83
Credit: 38,632,375
RAC: 0
Message 65632 - Posted: 9 Nov 2016, 20:48:33 UTC - in response to Message 65631.  

Jake, any idea on how long the maintenance will take? Maybe you can increase the number of tasks we can have in our stash to keep us busy during the down time?
ID: 65632 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65633 - Posted: 9 Nov 2016, 22:16:23 UTC

Hey everyone,

This will not effect the way points are allocated by the project. I am using a simple formula for calculating credits and that is (Number of credits per work unit) * (Number of work units crunched in a bundle) = (awarded credit). Seems like a pretty reasonable way of doing things to me and might actually improve your credit generation since the overhead should be reduced slightly.

As for increasing work unit stashes in the mean time while the server is down, this is not possible. The whole reason for the maintenance and server/client update is to improve our throughput on workunits while decreasing load on the server/database. If I could give you a stash of work units before the server went down, I wouldn't need to do the maintenance. My goal is for the maintenance to be quick (<1 hour) however as we all know, big updates never go as planned. I am allocating myself the entire day (8hours) to debug and issues that might arise due to this update and of course I will check in over the weekend as well.

Thank you all for your continued support.

Jake
ID: 65633 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bluestang

Send message
Joined: 13 Oct 16
Posts: 112
Credit: 1,174,293,644
RAC: 0
Message 65634 - Posted: 10 Nov 2016, 0:20:25 UTC - in response to Message 65633.  

I like it! Good luck and have a cold one ready for when you finish ;)
ID: 65634 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile TimeRanger

Send message
Joined: 31 Oct 10
Posts: 83
Credit: 38,632,375
RAC: 0
Message 65635 - Posted: 10 Nov 2016, 4:07:55 UTC - in response to Message 65634.  

I like it! Good luck and have a cold one ready for when you finish ;)


If everything goes as planned - +-1 hour, I hope he has a LOT of cold ones ready for the balance of the allotted time! :)
ID: 65635 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Arivald Ha'gel

Send message
Joined: 30 Apr 14
Posts: 67
Credit: 160,674,488
RAC: 0
Message 65636 - Posted: 10 Nov 2016, 8:00:18 UTC

Good Luck!

How many WU will be bundled in a single task? (I hope for at least 10) :)
I hope that my R280X will finally have enough work to do... it's getting cold in my place :)
ID: 65636 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65639 - Posted: 10 Nov 2016, 14:56:03 UTC

Hey Arivald,

The plan for the beginning is to start conservatively with 10 work units per bundle. Depending on how the CPU only crunchers handle this (about 5-10 hours of work per bundle) we can then decide to increase it. The current code is written to allow for an arbitrary number of work units to be bundled together so I can change it rather easily.

It is also possible I may put up different runs using different numbers of bundled work units so one bundle may have 10 and the other may have 100. There are a lot of cool things I can do once I get this running.

Jake
ID: 65639 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Nick Name

Send message
Joined: 27 Jul 14
Posts: 23
Credit: 921,261,826
RAC: 0
Message 65640 - Posted: 10 Nov 2016, 16:11:55 UTC - in response to Message 65639.  

Jake,

Thanks for your efforts.

The plan for the beginning is to start conservatively with 10 work units per bundle.


Does this mean we will be crunching 10 work units together (or whatever is in the bundle) at once? Right now I use an app_config to run six at a time, I intend to remove that before the new app is released but I would not like to try running 60 at a time.
Team USA forum | Team USA page
Always crunching / Always recruiting
ID: 65640 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Oct 16
Posts: 162
Credit: 1,004,163,109
RAC: 887
Message 65642 - Posted: 10 Nov 2016, 18:40:23 UTC - in response to Message 65640.  

My GPU spends nearly half the day doing other work besides MW due to lack of units so 1 hour is no biggy.

Jake,

Thanks for your efforts.

The plan for the beginning is to start conservatively with 10 work units per bundle.


Does this mean we will be crunching 10 work units together (or whatever is in the bundle) at once? Right now I use an app_config to run six at a time, I intend to remove that before the new app is released but I would not like to try running 60 at a time.


Haha I think you'll still do 6, but the 6 will take 10x as long. So you probably won't need to have it set to 6 to keep a GPU busy.
ID: 65642 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 10 Feb 09
Posts: 52
Credit: 16,286,597
RAC: 0
Message 65644 - Posted: 10 Nov 2016, 19:21:17 UTC - in response to Message 65639.  

The plan for the beginning is to start conservatively with 10 work units per bundle. Depending on how the CPU only crunchers handle this (about 5-10 hours of work per bundle) we can then decide to increase it. The current code is written to allow for an arbitrary number of work units to be bundled together so I can change it rather easily.


How longer will be the gpu wus? 2x? 5x??
ID: 65644 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65645 - Posted: 10 Nov 2016, 19:29:53 UTC

Hey Everyone,

Just to clear this up, essentially what will happen here is it will run X number of work units in series (not parallel) from a single WU request (what I now refer to as a bundle). This means if you need to run more than 1 current WU to fully utilize your GPU, you will still need to do this after the update. However, it should take X times longer to complete these runs.

At the beginning, I will set X to 10 so expect a 10 times longer run time for a single WU bundle. After I see how that seems to be effecting things, I may increase that number to 50 or even 100.

Just a quick update on how things are going with testing, I currently have all of the server side stuff running on a test server I set up. Seems like there aren't any obvious errors. Still trying to get it to serve test work units to a test computer. Hopefully, that will be working by the end of the day just to make sure the validator is working as expected.

Looks like we are on schedule to release early tomorrow morning.

Jake
ID: 65645 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Oct 16
Posts: 162
Credit: 1,004,163,109
RAC: 887
Message 65647 - Posted: 10 Nov 2016, 21:17:40 UTC

So there will still be the data crunch on the CPU in the middle of each bundle? The part that was at the end will now be spread out across the unit or will that all occur at the end for all 10 tasks? This is really why I run multiple tasks at once as there is some time where the GPU is idle between tasks.
ID: 65647 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65648 - Posted: 11 Nov 2016, 14:11:38 UTC

Hey Everyone,

Just want to update everyone on the plan for today.

I am fixing one small issue with the progress bars in the client (basically every work unit in a bundle it would reset the progress bar back to 0% complete). After that, I have to recompile all of the clients. Then, I have to move all of the code from the test server to the production server. I am going to do a recompile on the production server for the server binaries, run an update, and do a restart. After that, I will monitor everything for any issues that might arise.

I have also run into some technical issues when trying to bundle 10 WUs. I am going to limit to 5 (which I know works through testing), until I can figure out another way to pass parameters to the client.

Jake
ID: 65648 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rymorea

Send message
Joined: 6 Oct 14
Posts: 46
Credit: 20,017,425
RAC: 0
Message 65649 - Posted: 11 Nov 2016, 14:52:11 UTC

waiting good news and bundle tasks :)
ID: 65649 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 65655 - Posted: 11 Nov 2016, 19:31:30 UTC

Received first 1.40 workunits, both CPU and GPU. Unfortunately, all ended with computation errors within a few seconds.
ID: 65655 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bluestang

Send message
Joined: 13 Oct 16
Posts: 112
Credit: 1,174,293,644
RAC: 0
Message 65658 - Posted: 11 Nov 2016, 19:36:07 UTC - in response to Message 65655.  
Last modified: 11 Nov 2016, 19:40:30 UTC

Received first 1.40 workunits, both CPU and GPU. Unfortunately, all ended with computation errors within a few seconds.


Yep, same deal here (GPU only).

EDIT: Task Details...

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073741515 (0xc0000135)
</message>
]]>
ID: 65658 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65660 - Posted: 11 Nov 2016, 19:40:38 UTC

Vortac,

Can you please double check that its both CPU and GPU giving you errors and not just GPU?

Jake
ID: 65660 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 65663 - Posted: 11 Nov 2016, 19:47:43 UTC - in response to Message 65660.  
Last modified: 11 Nov 2016, 19:49:51 UTC

I do confirm it - both CPU and GPU tasks errored out quickly.
I also got the following entries in stderr output:
(unknown error) - exit code -1073741515 (0xc0000135)

And here are my error tasks:
http://milkyway.cs.rpi.edu/milkyway/results.php?userid=36817&offset=0&show_names=0&state=6&appid=
ID: 65663 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : Scheduled Maintenance Friday November 11th

©2024 Astroinformatics Group