Database Maintenance 9-4-2014

Author	Message
Jeffery M. Thompson Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0	Message 67765 - Posted: 4 Sep 2018, 17:08:03 UTC We are updating the database currently. The server will be down. As of 1:30 pm EST. ID: 67765 · Rating: 0 · rate: / Reply Quote

vseven Send message Joined: 26 Mar 18 Posts: 24 Credit: 102,912,937 RAC: 0	Message 67766 - Posted: 4 Sep 2018, 17:31:58 UTC It would be nice if we could have the WU limit increased and maybe the deadline decreased a bit so when things like this happen we can keep crunching. I'm using a Volta based card and 80 WU are gone in a couple minutes. ID: 67766 · Rating: 0 · rate: / Reply Quote

Jake Weiss Volunteer moderator Project developer Project tester Project scientist Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0	Message 67767 - Posted: 4 Sep 2018, 17:59:56 UTC Hey vseven, We have to walk a fine line with the number of workunits we allow users to download and their deadlines. We have both CPUs and GPUs that we have to balance with vastly different work times. I think what we have now is a reasonable compromise, but I would be open to hearing your suggestions. Jake ID: 67767 · Rating: 0 · rate: / Reply Quote

JohnDK Send message Joined: 18 Feb 10 Posts: 63 Credit: 225,601,240 RAC: 3,861	Message 67768 - Posted: 4 Sep 2018, 18:04:33 UTC Last modified: 4 Sep 2018, 18:08:13 UTC Just have to say I think it's a bad day having maintenance on a Tuesday since many SETI users have Milkyway as backup project, which also have maintenance on Tuesdays... (Also, maybe some Milkyway users have SETI as backup) ID: 67768 · Rating: 0 · rate: / Reply Quote

Jake Weiss Volunteer moderator Project developer Project tester Project scientist Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0	Message 67769 - Posted: 4 Sep 2018, 18:07:51 UTC Hey JohnDK, I had no idea their maintenance day was Tuesday. We just picked this day because its one of the two days that Jeff is in the office. This won't be too common of an occurrence and will consider switching to Thursdays. I just wanted to avoid being so close to the weekend when starting maintenance. Jake ID: 67769 · Rating: 0 · rate: / Reply Quote

Jeffery M. Thompson Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0	Message 67770 - Posted: 4 Sep 2018, 21:10:12 UTC The database is still updating I am watching this through completion. I am expecting late tonight early tomorrow morning to have the feeder dishing out units. ID: 67770 · Rating: 0 · rate: / Reply Quote

gambatesa Send message Joined: 23 Feb 18 Posts: 26 Credit: 4,744,416,145 RAC: 0	Message 67771 - Posted: 5 Sep 2018, 9:25:47 UTC - in response to Message 67766. It would be nice if we could have the WU limit increased and maybe the deadline decreased a bit so when things like this happen we can keep crunching. I'm using a Volta based card and 80 WU are gone in a couple minutes. 80 Workunits per Gpu are really too small.. if server is down in less then half hour (on 7970) you run out of work.. i understand that maybe with CPUs could be reasonable.. but "hardcore business" is made of GPUs ID: 67771 · Rating: 0 · rate: / Reply Quote

Gator 1-3 Send message Joined: 21 Dec 12 Posts: 3 Credit: 207,504,988 RAC: 0	Message 67772 - Posted: 5 Sep 2018, 12:28:30 UTC Any update on the expected time for the maintenance to end? I have a computer with 72 wu's on it that needs to be reformatted today. ID: 67772 · Rating: 0 · rate: / Reply Quote

Wisesooth Send message Joined: 2 Oct 14 Posts: 43 Credit: 55,516,331 RAC: 0	Message 67773 - Posted: 5 Sep 2018, 14:29:26 UTC IMHO, the database server seems to be the weakest link in your system. A DBMS is the most processor and storage intense application in a system like this. RPI really needs a server with enough cores and solid-state memory to handle the throughput required to manage a grid computing environment, especially if the DBMS is enforcing referential integrity. Intel might give RPI some hardware help if they ask. Yes, I know you think "Yes, we already know that." However, do the people with the purse know that? If they are not listening, maybe forwarding this message might get their attention. Overloaded servers breakdown at the most inconvenient times. The cost and time of RPI's most talented people should be considered in the total cost of ownership. ID: 67773 · Rating: 0 · rate: / Reply Quote

Jake Weiss Volunteer moderator Project developer Project tester Project scientist Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0	Message 67774 - Posted: 5 Sep 2018, 16:32:06 UTC Hey Everyone, Our database maintenance is coming to a close. We should be done by the 5pm today. It has been several months since we were down for maintenance last so it is taking us a little while to clean everything up. As far as future plans, it is actually within our budget to upgrade the server and we plan to do that within the next few months. Otherwise, we have a few maintenance periods planned in the upcoming weeks to help optimize the database. In the last few months, it has been running pretty smoothly, but we think we can continue to improve it. Thank you all for your continued support. Jake ID: 67774 · Rating: 0 · rate: / Reply Quote

[H]auntjemima Send message Joined: 6 Jul 18 Posts: 2 Credit: 596,287,750 RAC: 0	Message 67775 - Posted: 5 Sep 2018, 17:30:43 UTC Thanks for the update, Jake! ID: 67775 · Rating: 0 · rate: / Reply Quote

Marsinph Send message Joined: 13 Nov 10 Posts: 23 Credit: 108,282,839 RAC: 0	Message 67776 - Posted: 5 Sep 2018, 17:36:49 UTC - in response to Message 67774. Thank you Jake. But i expect a lot of problem when all will start again normally. I think there will be million of WU who will be reported at the same time. I hope it will not crash the server. In fact, the limitation of 80WU is not a bad idea. Already one day, all my WU are finished and unable to get any new I understand, sometimes it is needed to make a big clean. For sure for DB. Thanks for update. ID: 67776 · Rating: 0 · rate: / Reply Quote

wb8ili Send message Joined: 18 Jul 10 Posts: 76 Credit: 681,049,698 RAC: 81,703	Message 67777 - Posted: 5 Sep 2018, 21:56:12 UTC Jake wrote - "We have to walk a fine line with the number of workunits we allow users to download and their deadlines. We have both CPUs and GPUs that we have to balance with vastly different work times. I think what we have now is a reasonable compromise, but I would be open to hearing your suggestions." I understand you have to vary the number of workunits a user can download. But, the number should be based on the capabilities of the users' computer not some arbitrary number (80) that implies that one number fits all, whether it refers to CPU or GPU workunits. Your scheduling (workunit dispersal) algorithm knows everything about a user's computer (average computational time, number of invalid returns, up-time, etc.). It shouldn't be that hard for someone at a prestigious university like RPI to figure out a more equitable way of dispersing workunits. Fast computers get more, slow computers get less, "bad actors" get few. If my computer is returning valid results, and each workunit (GPU) takes 3 minutes, what is the problem with giving me 480 units (1 day), or 960 units (2 days), or more? The algorithm, if properly done, should work for CPU and GPU workunits. ID: 67777 · Rating: 0 · rate: / Reply Quote

Jeffery M. Thompson Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0	Message 67778 - Posted: 5 Sep 2018, 22:31:45 UTC Work units are coming back in and the feeder should be serving them out again. I am monitoring. There will be a few hours as the load balances let me know if you see anything on your side as that processes through. ID: 67778 · Rating: 0 · rate: / Reply Quote

[H]auntjemima Send message Joined: 6 Jul 18 Posts: 2 Credit: 596,287,750 RAC: 0	Message 67779 - Posted: 6 Sep 2018, 1:06:11 UTC I had 160 process through, but no more picked up. I'm sure it is being resolved. ID: 67779 · Rating: 0 · rate: / Reply Quote

Keith Myers Send message Joined: 24 Jan 11 Posts: 739 Credit: 578,552,078 RAC: 158,913	Message 67780 - Posted: 6 Sep 2018, 1:46:24 UTC I got 34 but nothing more since then, I assume the transitioner being offline must have something to do with it. ID: 67780 · Rating: 0 · rate: / Reply Quote

Manfred Reiff Send message Joined: 27 Apr 18 Posts: 11 Credit: 72,923,580 RAC: 0	Message 67781 - Posted: 6 Sep 2018, 10:09:11 UTC Milkyway@Home is working again but unfortunately I don't get any GPU workunits (no changes to settings). ID: 67781 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,398,788 RAC: 11	Message 67782 - Posted: 6 Sep 2018, 11:56:29 UTC I have workunits for my pc's but when I look at the workunits status I see that ALOT of them have NOT been sent out to a wingman yet!! They say "unsent", prior to this maintenance phase I had zero "unsent" tasks. ID: 67782 · Rating: 0 · rate: / Reply Quote

Gunnar Hjern Send message Joined: 14 Oct 16 Posts: 4 Credit: 25,135,416 RAC: 8	Message 67783 - Posted: 6 Sep 2018, 16:47:32 UTC - in response to Message 67782. Yes, I can confirm this! I currently have 304 "in progress" to my different computers, but none of them seems to have a "wing man". The same goes for the ones that I've completed and reported, and that is now in "Validation inconclusive". My heap of "Validation inconclusive" is constantly growing, and is already 267, while none of my reported tasks seems to be validated. Credits totally stuck! :-( What is happening??? //Gunnar ID: 67783 · Rating: 0 · rate: / Reply Quote

Saenger Send message Joined: 28 Aug 07 Posts: 133 Credit: 29,423,179 RAC: 0	Message 67784 - Posted: 6 Sep 2018, 21:06:33 UTC Yep, all of my WUs are _0 as well, so it looks like _1 are kept behind for now. What went wrong after the restart of the machines? GrÃ¼ÃŸe vom SÃ¤nger ID: 67784 · Rating: 0 · rate: / Reply Quote