Welcome to MilkyWay@home

Database Maintenance 9-4-2014

Message boards : News : Database Maintenance 9-4-2014
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Profile Jeffery M. Thompson
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 23 Sep 12
Posts: 159
Credit: 16,977,106
RAC: 0
Message 67765 - Posted: 4 Sep 2018, 17:08:03 UTC

We are updating the database currently. The server will be down.
As of 1:30 pm EST.
ID: 67765 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
vseven

Send message
Joined: 26 Mar 18
Posts: 24
Credit: 102,912,937
RAC: 0
Message 67766 - Posted: 4 Sep 2018, 17:31:58 UTC

It would be nice if we could have the WU limit increased and maybe the deadline decreased a bit so when things like this happen we can keep crunching. I'm using a Volta based card and 80 WU are gone in a couple minutes.
ID: 67766 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 67767 - Posted: 4 Sep 2018, 17:59:56 UTC

Hey vseven,

We have to walk a fine line with the number of workunits we allow users to download and their deadlines. We have both CPUs and GPUs that we have to balance with vastly different work times. I think what we have now is a reasonable compromise, but I would be open to hearing your suggestions.

Jake
ID: 67767 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JohnDK
Avatar

Send message
Joined: 18 Feb 10
Posts: 57
Credit: 222,650,083
RAC: 5,797
Message 67768 - Posted: 4 Sep 2018, 18:04:33 UTC
Last modified: 4 Sep 2018, 18:08:13 UTC

Just have to say I think it's a bad day having maintenance on a Tuesday since many SETI users have Milkyway as backup project, which also have maintenance on Tuesdays... (Also, maybe some Milkyway users have SETI as backup)
ID: 67768 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 67769 - Posted: 4 Sep 2018, 18:07:51 UTC

Hey JohnDK,

I had no idea their maintenance day was Tuesday. We just picked this day because its one of the two days that Jeff is in the office. This won't be too common of an occurrence and will consider switching to Thursdays. I just wanted to avoid being so close to the weekend when starting maintenance.

Jake
ID: 67769 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jeffery M. Thompson
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 23 Sep 12
Posts: 159
Credit: 16,977,106
RAC: 0
Message 67770 - Posted: 4 Sep 2018, 21:10:12 UTC

The database is still updating I am watching this through completion. I am expecting late tonight early tomorrow morning to have the feeder dishing out units.
ID: 67770 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gambatesa
Avatar

Send message
Joined: 23 Feb 18
Posts: 26
Credit: 4,744,416,145
RAC: 0
Message 67771 - Posted: 5 Sep 2018, 9:25:47 UTC - in response to Message 67766.  

It would be nice if we could have the WU limit increased and maybe the deadline decreased a bit so when things like this happen we can keep crunching. I'm using a Volta based card and 80 WU are gone in a couple minutes.


80 Workunits per Gpu are really too small.. if server is down in less then half hour (on 7970) you run out of work.. i understand that maybe with CPUs could be reasonable.. but "hardcore business" is made of GPUs
ID: 67771 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gator 1-3

Send message
Joined: 21 Dec 12
Posts: 3
Credit: 207,504,988
RAC: 0
Message 67772 - Posted: 5 Sep 2018, 12:28:30 UTC

Any update on the expected time for the maintenance to end? I have a computer with 72 wu's on it that needs to be reformatted today.
ID: 67772 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Wisesooth

Send message
Joined: 2 Oct 14
Posts: 43
Credit: 55,168,353
RAC: 1,289
Message 67773 - Posted: 5 Sep 2018, 14:29:26 UTC

IMHO, the database server seems to be the weakest link in your system. A DBMS is the most processor and storage intense application in a system like this. RPI really needs a server with enough cores and solid-state memory to handle the throughput required to manage a grid computing environment, especially if the DBMS is enforcing referential integrity. Intel might give RPI some hardware help if they ask.
Yes, I know you think "Yes, we already know that." However, do the people with the purse know that? If they are not listening, maybe forwarding this message might get their attention. Overloaded servers breakdown at the most inconvenient times. The cost and time of RPI's most talented people should be considered in the total cost of ownership.
ID: 67773 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 67774 - Posted: 5 Sep 2018, 16:32:06 UTC

Hey Everyone,

Our database maintenance is coming to a close. We should be done by the 5pm today. It has been several months since we were down for maintenance last so it is taking us a little while to clean everything up.

As far as future plans, it is actually within our budget to upgrade the server and we plan to do that within the next few months. Otherwise, we have a few maintenance periods planned in the upcoming weeks to help optimize the database. In the last few months, it has been running pretty smoothly, but we think we can continue to improve it.

Thank you all for your continued support.
Jake
ID: 67774 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[H]auntjemima

Send message
Joined: 6 Jul 18
Posts: 2
Credit: 596,287,750
RAC: 0
Message 67775 - Posted: 5 Sep 2018, 17:30:43 UTC

Thanks for the update, Jake!
ID: 67775 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Marsinph

Send message
Joined: 13 Nov 10
Posts: 23
Credit: 108,282,839
RAC: 0
Message 67776 - Posted: 5 Sep 2018, 17:36:49 UTC - in response to Message 67774.  

Thank you Jake.
But i expect a lot of problem when all will start again normally.
I think there will be million of WU who will be reported at the same time.
I hope it will not crash the server.
In fact, the limitation of 80WU is not a bad idea.
Already one day, all my WU are finished and unable to get any new
I understand, sometimes it is needed to make a big clean.
For sure for DB.
Thanks for update.
ID: 67776 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wb8ili

Send message
Joined: 18 Jul 10
Posts: 76
Credit: 639,959,631
RAC: 60,011
Message 67777 - Posted: 5 Sep 2018, 21:56:12 UTC

Jake wrote -

"We have to walk a fine line with the number of workunits we allow users to download and their deadlines. We have both CPUs and GPUs that we have to balance with vastly different work times. I think what we have now is a reasonable compromise, but I would be open to hearing your suggestions."

I understand you have to vary the number of workunits a user can download. But, the number should be based on the capabilities of the users' computer not some arbitrary number (80) that implies that one number fits all, whether it refers to CPU or GPU workunits.

Your scheduling (workunit dispersal) algorithm knows everything about a user's computer (average computational time, number of invalid returns, up-time, etc.).

It shouldn't be that hard for someone at a prestigious university like RPI to figure out a more equitable way of dispersing workunits. Fast computers get more, slow computers get less, "bad actors" get few.

If my computer is returning valid results, and each workunit (GPU) takes 3 minutes, what is the problem with giving me 480 units (1 day), or 960 units (2 days), or more?

The algorithm, if properly done, should work for CPU and GPU workunits.
ID: 67777 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jeffery M. Thompson
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 23 Sep 12
Posts: 159
Credit: 16,977,106
RAC: 0
Message 67778 - Posted: 5 Sep 2018, 22:31:45 UTC

Work units are coming back in and the feeder should be serving them out again. I am monitoring. There will be a few hours as the load balances let me know if you see anything on your side as that processes through.
ID: 67778 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[H]auntjemima

Send message
Joined: 6 Jul 18
Posts: 2
Credit: 596,287,750
RAC: 0
Message 67779 - Posted: 6 Sep 2018, 1:06:11 UTC

I had 160 process through, but no more picked up. I'm sure it is being resolved.
ID: 67779 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 556,872,139
RAC: 43,483
Message 67780 - Posted: 6 Sep 2018, 1:46:24 UTC

I got 34 but nothing more since then, I assume the transitioner being offline must have something to do with it.
ID: 67780 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Manfred Reiff
Avatar

Send message
Joined: 27 Apr 18
Posts: 11
Credit: 72,923,580
RAC: 0
Message 67781 - Posted: 6 Sep 2018, 10:09:11 UTC

Milkyway@Home is working again but unfortunately I don't get any GPU workunits (no changes to settings).
ID: 67781 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 67782 - Posted: 6 Sep 2018, 11:56:29 UTC

I have workunits for my pc's but when I look at the workunits status I see that ALOT of them have NOT been sent out to a wingman yet!! They say "unsent", prior to this maintenance phase I had zero "unsent" tasks.
ID: 67782 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gunnar Hjern

Send message
Joined: 14 Oct 16
Posts: 4
Credit: 25,072,475
RAC: 0
Message 67783 - Posted: 6 Sep 2018, 16:47:32 UTC - in response to Message 67782.  

Yes, I can confirm this!

I currently have 304 "in progress" to my different computers, but none of them seems to have a "wing man".

The same goes for the ones that I've completed and reported, and that is now in "Validation inconclusive".

My heap of "Validation inconclusive" is constantly growing, and is already 267, while none of my reported tasks seems to be validated.
Credits totally stuck! :-(

What is happening???

//Gunnar
ID: 67783 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Saenger
Avatar

Send message
Joined: 28 Aug 07
Posts: 133
Credit: 29,423,179
RAC: 0
Message 67784 - Posted: 6 Sep 2018, 21:06:33 UTC

Yep, all of my WUs are _0 as well, so it looks like _1 are kept behind for now. What went wrong after the restart of the machines?
Grüße vom Sänger
ID: 67784 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : News : Database Maintenance 9-4-2014

©2024 Astroinformatics Group