Message boards :
News :
Scheduled Maintenance Friday November 11th
Message board moderation
Author | Message |
---|---|
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hi Everyone, The server will be down for scheduled maintenance on Friday while we update the server and client to allow for work unit bundling. The code for this update is written and currently undergoing testing. This should allow for longer work unit times while maintaining fast crunching for single likelihood values. The result should be lower traffic on the database providing for more stability on the server while providing more work units for everyone. I apologize for the issues we have been having recently with work unit availability and server stability. Hopefully this will be a permanent solution. If you have any questions please let me know. Jake |
Send message Joined: 13 Oct 16 Posts: 112 Credit: 1,174,293,644 RAC: 0 |
Sounds good. Improvements appreciated. With this affect the way points are allocated? |
Send message Joined: 6 Oct 14 Posts: 46 Credit: 20,017,425 RAC: 0 |
|
Send message Joined: 31 Oct 10 Posts: 83 Credit: 38,632,375 RAC: 0 |
Jake, any idea on how long the maintenance will take? Maybe you can increase the number of tasks we can have in our stash to keep us busy during the down time? |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey everyone, This will not effect the way points are allocated by the project. I am using a simple formula for calculating credits and that is (Number of credits per work unit) * (Number of work units crunched in a bundle) = (awarded credit). Seems like a pretty reasonable way of doing things to me and might actually improve your credit generation since the overhead should be reduced slightly. As for increasing work unit stashes in the mean time while the server is down, this is not possible. The whole reason for the maintenance and server/client update is to improve our throughput on workunits while decreasing load on the server/database. If I could give you a stash of work units before the server went down, I wouldn't need to do the maintenance. My goal is for the maintenance to be quick (<1 hour) however as we all know, big updates never go as planned. I am allocating myself the entire day (8hours) to debug and issues that might arise due to this update and of course I will check in over the weekend as well. Thank you all for your continued support. Jake |
Send message Joined: 13 Oct 16 Posts: 112 Credit: 1,174,293,644 RAC: 0 |
I like it! Good luck and have a cold one ready for when you finish ;) |
Send message Joined: 31 Oct 10 Posts: 83 Credit: 38,632,375 RAC: 0 |
I like it! Good luck and have a cold one ready for when you finish ;) If everything goes as planned - +-1 hour, I hope he has a LOT of cold ones ready for the balance of the allotted time! :) |
Send message Joined: 30 Apr 14 Posts: 67 Credit: 160,674,488 RAC: 0 |
Good Luck! How many WU will be bundled in a single task? (I hope for at least 10) :) I hope that my R280X will finally have enough work to do... it's getting cold in my place :) |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey Arivald, The plan for the beginning is to start conservatively with 10 work units per bundle. Depending on how the CPU only crunchers handle this (about 5-10 hours of work per bundle) we can then decide to increase it. The current code is written to allow for an arbitrary number of work units to be bundled together so I can change it rather easily. It is also possible I may put up different runs using different numbers of bundled work units so one bundle may have 10 and the other may have 100. There are a lot of cool things I can do once I get this running. Jake |
Send message Joined: 27 Jul 14 Posts: 23 Credit: 921,261,826 RAC: 0 |
Jake, Thanks for your efforts. The plan for the beginning is to start conservatively with 10 work units per bundle. Does this mean we will be crunching 10 work units together (or whatever is in the bundle) at once? Right now I use an app_config to run six at a time, I intend to remove that before the new app is released but I would not like to try running 60 at a time. Team USA forum | Team USA page Always crunching / Always recruiting |
Send message Joined: 2 Oct 16 Posts: 167 Credit: 1,006,078,878 RAC: 46,225 |
My GPU spends nearly half the day doing other work besides MW due to lack of units so 1 hour is no biggy. Jake, Haha I think you'll still do 6, but the 6 will take 10x as long. So you probably won't need to have it set to 6 to keep a GPU busy. |
Send message Joined: 10 Feb 09 Posts: 52 Credit: 16,287,560 RAC: 86 |
The plan for the beginning is to start conservatively with 10 work units per bundle. Depending on how the CPU only crunchers handle this (about 5-10 hours of work per bundle) we can then decide to increase it. The current code is written to allow for an arbitrary number of work units to be bundled together so I can change it rather easily. How longer will be the gpu wus? 2x? 5x?? |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey Everyone, Just to clear this up, essentially what will happen here is it will run X number of work units in series (not parallel) from a single WU request (what I now refer to as a bundle). This means if you need to run more than 1 current WU to fully utilize your GPU, you will still need to do this after the update. However, it should take X times longer to complete these runs. At the beginning, I will set X to 10 so expect a 10 times longer run time for a single WU bundle. After I see how that seems to be effecting things, I may increase that number to 50 or even 100. Just a quick update on how things are going with testing, I currently have all of the server side stuff running on a test server I set up. Seems like there aren't any obvious errors. Still trying to get it to serve test work units to a test computer. Hopefully, that will be working by the end of the day just to make sure the validator is working as expected. Looks like we are on schedule to release early tomorrow morning. Jake |
Send message Joined: 2 Oct 16 Posts: 167 Credit: 1,006,078,878 RAC: 46,225 |
So there will still be the data crunch on the CPU in the middle of each bundle? The part that was at the end will now be spread out across the unit or will that all occur at the end for all 10 tasks? This is really why I run multiple tasks at once as there is some time where the GPU is idle between tasks. |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey Everyone, Just want to update everyone on the plan for today. I am fixing one small issue with the progress bars in the client (basically every work unit in a bundle it would reset the progress bar back to 0% complete). After that, I have to recompile all of the clients. Then, I have to move all of the code from the test server to the production server. I am going to do a recompile on the production server for the server binaries, run an update, and do a restart. After that, I will monitor everything for any issues that might arise. I have also run into some technical issues when trying to bundle 10 WUs. I am going to limit to 5 (which I know works through testing), until I can figure out another way to pass parameters to the client. Jake |
Send message Joined: 6 Oct 14 Posts: 46 Credit: 20,017,425 RAC: 0 |
waiting good news and bundle tasks :) |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
Received first 1.40 workunits, both CPU and GPU. Unfortunately, all ended with computation errors within a few seconds. |
Send message Joined: 13 Oct 16 Posts: 112 Credit: 1,174,293,644 RAC: 0 |
Received first 1.40 workunits, both CPU and GPU. Unfortunately, all ended with computation errors within a few seconds. Yep, same deal here (GPU only). EDIT: Task Details... <core_client_version>7.6.22</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1073741515 (0xc0000135) </message> ]]> |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Vortac, Can you please double check that its both CPU and GPU giving you errors and not just GPU? Jake |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
I do confirm it - both CPU and GPU tasks errored out quickly. I also got the following entries in stderr output: (unknown error) - exit code -1073741515 (0xc0000135) And here are my error tasks: http://milkyway.cs.rpi.edu/milkyway/results.php?userid=36817&offset=0&show_names=0&state=6&appid= |
©2024 Astroinformatics Group