rpi_logo
Database Maintenance 9-4-2014
Database Maintenance 9-4-2014
log in

Advanced search

Message boards : News : Database Maintenance 9-4-2014

1 · 2 · 3 · Next
Author Message
Jeffery M. Thompson
Volunteer moderator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 23 Sep 12
Posts: 151
Credit: 12,997,107
RAC: 48

Message 67765 - Posted: 4 Sep 2018, 17:08:03 UTC

We are updating the database currently. The server will be down.
As of 1:30 pm EST.

vseven
Send message
Joined: 26 Mar 18
Posts: 15
Credit: 90,591,976
RAC: 88,173

Message 67766 - Posted: 4 Sep 2018, 17:31:58 UTC

It would be nice if we could have the WU limit increased and maybe the deadline decreased a bit so when things like this happen we can keep crunching. I'm using a Volta based card and 80 WU are gone in a couple minutes.

Profile Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 502
Credit: 34,647,251
RAC: 1

Message 67767 - Posted: 4 Sep 2018, 17:59:56 UTC

Hey vseven,

We have to walk a fine line with the number of workunits we allow users to download and their deadlines. We have both CPUs and GPUs that we have to balance with vastly different work times. I think what we have now is a reasonable compromise, but I would be open to hearing your suggestions.

Jake

JohnDK
Send message
Joined: 18 Feb 10
Posts: 4
Credit: 10,800,036
RAC: 246

Message 67768 - Posted: 4 Sep 2018, 18:04:33 UTC
Last modified: 4 Sep 2018, 18:08:13 UTC

Just have to say I think it's a bad day having maintenance on a Tuesday since many SETI users have Milkyway as backup project, which also have maintenance on Tuesdays... (Also, maybe some Milkyway users have SETI as backup)

Profile Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 502
Credit: 34,647,251
RAC: 1

Message 67769 - Posted: 4 Sep 2018, 18:07:51 UTC

Hey JohnDK,

I had no idea their maintenance day was Tuesday. We just picked this day because its one of the two days that Jeff is in the office. This won't be too common of an occurrence and will consider switching to Thursdays. I just wanted to avoid being so close to the weekend when starting maintenance.

Jake

Jeffery M. Thompson
Volunteer moderator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 23 Sep 12
Posts: 151
Credit: 12,997,107
RAC: 48

Message 67770 - Posted: 4 Sep 2018, 21:10:12 UTC

The database is still updating I am watching this through completion. I am expecting late tonight early tomorrow morning to have the feeder dishing out units.

gambatesa
Send message
Joined: 23 Feb 18
Posts: 7
Credit: 718,400,418
RAC: 4,979,277

Message 67771 - Posted: 5 Sep 2018, 9:25:47 UTC - in response to Message 67766.

It would be nice if we could have the WU limit increased and maybe the deadline decreased a bit so when things like this happen we can keep crunching. I'm using a Volta based card and 80 WU are gone in a couple minutes.


80 Workunits per Gpu are really too small.. if server is down in less then half hour (on 7970) you run out of work.. i understand that maybe with CPUs could be reasonable.. but "hardcore business" is made of GPUs

Gator 1-3
Send message
Joined: 21 Dec 12
Posts: 3
Credit: 83,530,344
RAC: 290,316

Message 67772 - Posted: 5 Sep 2018, 12:28:30 UTC

Any update on the expected time for the maintenance to end? I have a computer with 72 wu's on it that needs to be reformatted today.

Profile Wisesooth
Send message
Joined: 2 Oct 14
Posts: 39
Credit: 29,415,074
RAC: 27,514

Message 67773 - Posted: 5 Sep 2018, 14:29:26 UTC

IMHO, the database server seems to be the weakest link in your system. A DBMS is the most processor and storage intense application in a system like this. RPI really needs a server with enough cores and solid-state memory to handle the throughput required to manage a grid computing environment, especially if the DBMS is enforcing referential integrity. Intel might give RPI some hardware help if they ask.
Yes, I know you think "Yes, we already know that." However, do the people with the purse know that? If they are not listening, maybe forwarding this message might get their attention. Overloaded servers breakdown at the most inconvenient times. The cost and time of RPI's most talented people should be considered in the total cost of ownership.
____________

Profile Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 502
Credit: 34,647,251
RAC: 1

Message 67774 - Posted: 5 Sep 2018, 16:32:06 UTC

Hey Everyone,

Our database maintenance is coming to a close. We should be done by the 5pm today. It has been several months since we were down for maintenance last so it is taking us a little while to clean everything up.

As far as future plans, it is actually within our budget to upgrade the server and we plan to do that within the next few months. Otherwise, we have a few maintenance periods planned in the upcoming weeks to help optimize the database. In the last few months, it has been running pretty smoothly, but we think we can continue to improve it.

Thank you all for your continued support.
Jake

[H]auntjemima
Send message
Joined: 6 Jul 18
Posts: 2
Credit: 56,114,380
RAC: 734,351

Message 67775 - Posted: 5 Sep 2018, 17:30:43 UTC

Thanks for the update, Jake!

Profile Marsinph
Send message
Joined: 13 Nov 10
Posts: 7
Credit: 6,314,151
RAC: 433

Message 67776 - Posted: 5 Sep 2018, 17:36:49 UTC - in response to Message 67774.

Thank you Jake.
But i expect a lot of problem when all will start again normally.
I think there will be million of WU who will be reported at the same time.
I hope it will not crash the server.
In fact, the limitation of 80WU is not a bad idea.
Already one day, all my WU are finished and unable to get any new
I understand, sometimes it is needed to make a big clean.
For sure for DB.
Thanks for update.

wb8ili
Send message
Joined: 18 Jul 10
Posts: 61
Credit: 227,276,485
RAC: 220,985

Message 67777 - Posted: 5 Sep 2018, 21:56:12 UTC

Jake wrote -

"We have to walk a fine line with the number of workunits we allow users to download and their deadlines. We have both CPUs and GPUs that we have to balance with vastly different work times. I think what we have now is a reasonable compromise, but I would be open to hearing your suggestions."

I understand you have to vary the number of workunits a user can download. But, the number should be based on the capabilities of the users' computer not some arbitrary number (80) that implies that one number fits all, whether it refers to CPU or GPU workunits.

Your scheduling (workunit dispersal) algorithm knows everything about a user's computer (average computational time, number of invalid returns, up-time, etc.).

It shouldn't be that hard for someone at a prestigious university like RPI to figure out a more equitable way of dispersing workunits. Fast computers get more, slow computers get less, "bad actors" get few.

If my computer is returning valid results, and each workunit (GPU) takes 3 minutes, what is the problem with giving me 480 units (1 day), or 960 units (2 days), or more?

The algorithm, if properly done, should work for CPU and GPU workunits.

Jeffery M. Thompson
Volunteer moderator
Project developer
Project tester
Project scientist
Avatar
Send message
Joined: 23 Sep 12
Posts: 151
Credit: 12,997,107
RAC: 48

Message 67778 - Posted: 5 Sep 2018, 22:31:45 UTC

Work units are coming back in and the feeder should be serving them out again. I am monitoring. There will be a few hours as the load balances let me know if you see anything on your side as that processes through.

[H]auntjemima
Send message
Joined: 6 Jul 18
Posts: 2
Credit: 56,114,380
RAC: 734,351

Message 67779 - Posted: 6 Sep 2018, 1:06:11 UTC

I had 160 process through, but no more picked up. I'm sure it is being resolved.

Profile Keith Myers
Avatar
Send message
Joined: 24 Jan 11
Posts: 157
Credit: 103,575,104
RAC: 23,545

Message 67780 - Posted: 6 Sep 2018, 1:46:24 UTC

I got 34 but nothing more since then, I assume the transitioner being offline must have something to do with it.
____________

Manfred Reiff
Send message
Joined: 27 Apr 18
Posts: 5
Credit: 25,338,698
RAC: 152,916

Message 67781 - Posted: 6 Sep 2018, 10:09:11 UTC

Milkyway@Home is working again but unfortunately I don't get any GPU workunits (no changes to settings).

Profile mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2195
Credit: 246,963,227
RAC: 221,261

Message 67782 - Posted: 6 Sep 2018, 11:56:29 UTC

I have workunits for my pc's but when I look at the workunits status I see that ALOT of them have NOT been sent out to a wingman yet!! They say "unsent", prior to this maintenance phase I had zero "unsent" tasks.

Gunnar Hjern
Send message
Joined: 14 Oct 16
Posts: 1
Credit: 11,705,379
RAC: 75,681

Message 67783 - Posted: 6 Sep 2018, 16:47:32 UTC - in response to Message 67782.

Yes, I can confirm this!

I currently have 304 "in progress" to my different computers, but none of them seems to have a "wing man".

The same goes for the ones that I've completed and reported, and that is now in "Validation inconclusive".

My heap of "Validation inconclusive" is constantly growing, and is already 267, while none of my reported tasks seems to be validated.
Credits totally stuck! :-(

What is happening???

//Gunnar

Profile Saenger
Avatar
Send message
Joined: 28 Aug 07
Posts: 130
Credit: 10,204,169
RAC: 11,942

Message 67784 - Posted: 6 Sep 2018, 21:06:33 UTC

Yep, all of my WUs are _0 as well, so it looks like _1 are kept behind for now. What went wrong after the restart of the machines?
____________
Grüße vom Sänger

1 · 2 · 3 · Next
Post to thread

Message boards : News : Database Maintenance 9-4-2014


Main page · Your account · Message boards


Copyright © 2018 AstroInformatics Group