Message boards :
News :
Database Maintenance 9-4-2014
Message board moderation
Author | Message |
---|---|
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
We are updating the database currently. The server will be down. As of 1:30 pm EST. |
Send message Joined: 26 Mar 18 Posts: 24 Credit: 102,912,937 RAC: 0 |
It would be nice if we could have the WU limit increased and maybe the deadline decreased a bit so when things like this happen we can keep crunching. I'm using a Volta based card and 80 WU are gone in a couple minutes. |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey vseven, We have to walk a fine line with the number of workunits we allow users to download and their deadlines. We have both CPUs and GPUs that we have to balance with vastly different work times. I think what we have now is a reasonable compromise, but I would be open to hearing your suggestions. Jake |
Send message Joined: 18 Feb 10 Posts: 57 Credit: 222,650,083 RAC: 5,797 |
Just have to say I think it's a bad day having maintenance on a Tuesday since many SETI users have Milkyway as backup project, which also have maintenance on Tuesdays... (Also, maybe some Milkyway users have SETI as backup) |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey JohnDK, I had no idea their maintenance day was Tuesday. We just picked this day because its one of the two days that Jeff is in the office. This won't be too common of an occurrence and will consider switching to Thursdays. I just wanted to avoid being so close to the weekend when starting maintenance. Jake |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
The database is still updating I am watching this through completion. I am expecting late tonight early tomorrow morning to have the feeder dishing out units. |
Send message Joined: 23 Feb 18 Posts: 26 Credit: 4,744,416,145 RAC: 0 |
It would be nice if we could have the WU limit increased and maybe the deadline decreased a bit so when things like this happen we can keep crunching. I'm using a Volta based card and 80 WU are gone in a couple minutes. 80 Workunits per Gpu are really too small.. if server is down in less then half hour (on 7970) you run out of work.. i understand that maybe with CPUs could be reasonable.. but "hardcore business" is made of GPUs |
Send message Joined: 21 Dec 12 Posts: 3 Credit: 207,504,988 RAC: 0 |
Any update on the expected time for the maintenance to end? I have a computer with 72 wu's on it that needs to be reformatted today. |
Send message Joined: 2 Oct 14 Posts: 43 Credit: 55,168,353 RAC: 1,289 |
IMHO, the database server seems to be the weakest link in your system. A DBMS is the most processor and storage intense application in a system like this. RPI really needs a server with enough cores and solid-state memory to handle the throughput required to manage a grid computing environment, especially if the DBMS is enforcing referential integrity. Intel might give RPI some hardware help if they ask. Yes, I know you think "Yes, we already know that." However, do the people with the purse know that? If they are not listening, maybe forwarding this message might get their attention. Overloaded servers breakdown at the most inconvenient times. The cost and time of RPI's most talented people should be considered in the total cost of ownership. |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey Everyone, Our database maintenance is coming to a close. We should be done by the 5pm today. It has been several months since we were down for maintenance last so it is taking us a little while to clean everything up. As far as future plans, it is actually within our budget to upgrade the server and we plan to do that within the next few months. Otherwise, we have a few maintenance periods planned in the upcoming weeks to help optimize the database. In the last few months, it has been running pretty smoothly, but we think we can continue to improve it. Thank you all for your continued support. Jake |
Send message Joined: 6 Jul 18 Posts: 2 Credit: 596,287,750 RAC: 0 |
Thanks for the update, Jake! |
Send message Joined: 13 Nov 10 Posts: 23 Credit: 108,282,839 RAC: 0 |
Thank you Jake. But i expect a lot of problem when all will start again normally. I think there will be million of WU who will be reported at the same time. I hope it will not crash the server. In fact, the limitation of 80WU is not a bad idea. Already one day, all my WU are finished and unable to get any new I understand, sometimes it is needed to make a big clean. For sure for DB. Thanks for update. |
Send message Joined: 18 Jul 10 Posts: 76 Credit: 639,959,631 RAC: 60,011 |
Jake wrote - "We have to walk a fine line with the number of workunits we allow users to download and their deadlines. We have both CPUs and GPUs that we have to balance with vastly different work times. I think what we have now is a reasonable compromise, but I would be open to hearing your suggestions." I understand you have to vary the number of workunits a user can download. But, the number should be based on the capabilities of the users' computer not some arbitrary number (80) that implies that one number fits all, whether it refers to CPU or GPU workunits. Your scheduling (workunit dispersal) algorithm knows everything about a user's computer (average computational time, number of invalid returns, up-time, etc.). It shouldn't be that hard for someone at a prestigious university like RPI to figure out a more equitable way of dispersing workunits. Fast computers get more, slow computers get less, "bad actors" get few. If my computer is returning valid results, and each workunit (GPU) takes 3 minutes, what is the problem with giving me 480 units (1 day), or 960 units (2 days), or more? The algorithm, if properly done, should work for CPU and GPU workunits. |
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0 |
Work units are coming back in and the feeder should be serving them out again. I am monitoring. There will be a few hours as the load balances let me know if you see anything on your side as that processes through. |
Send message Joined: 6 Jul 18 Posts: 2 Credit: 596,287,750 RAC: 0 |
I had 160 process through, but no more picked up. I'm sure it is being resolved. |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 556,872,139 RAC: 43,483 |
I got 34 but nothing more since then, I assume the transitioner being offline must have something to do with it. |
Send message Joined: 27 Apr 18 Posts: 11 Credit: 72,923,580 RAC: 0 |
Milkyway@Home is working again but unfortunately I don't get any GPU workunits (no changes to settings). |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
I have workunits for my pc's but when I look at the workunits status I see that ALOT of them have NOT been sent out to a wingman yet!! They say "unsent", prior to this maintenance phase I had zero "unsent" tasks. |
Send message Joined: 14 Oct 16 Posts: 4 Credit: 25,072,475 RAC: 0 |
Yes, I can confirm this! I currently have 304 "in progress" to my different computers, but none of them seems to have a "wing man". The same goes for the ones that I've completed and reported, and that is now in "Validation inconclusive". My heap of "Validation inconclusive" is constantly growing, and is already 267, while none of my reported tasks seems to be validated. Credits totally stuck! :-( What is happening??? //Gunnar |
Send message Joined: 28 Aug 07 Posts: 133 Credit: 29,423,179 RAC: 0 |
|
©2024 Astroinformatics Group