Welcome to MilkyWay@home

Database Maintenance 9-4-2014


Advanced search

Message boards : News : Database Maintenance 9-4-2014
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Jeffery M. Thompson
Volunteer moderator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 23 Sep 12
Posts: 156
Credit: 12,997,107
RAC: 0
10 million credit badge6 year member badge
Message 67765 - Posted: 4 Sep 2018, 17:08:03 UTC

We are updating the database currently. The server will be down.
As of 1:30 pm EST.
ID: 67765 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
vseven

Send message
Joined: 26 Mar 18
Posts: 20
Credit: 93,326,014
RAC: 105,702
50 million credit badge
Message 67766 - Posted: 4 Sep 2018, 17:31:58 UTC

It would be nice if we could have the WU limit increased and maybe the deadline decreased a bit so when things like this happen we can keep crunching. I'm using a Volta based card and 80 WU are gone in a couple minutes.
ID: 67766 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 542
Credit: 43,237,217
RAC: 106,753
30 million credit badge6 year member badgeextraordinary contributions badge
Message 67767 - Posted: 4 Sep 2018, 17:59:56 UTC

Hey vseven,

We have to walk a fine line with the number of workunits we allow users to download and their deadlines. We have both CPUs and GPUs that we have to balance with vastly different work times. I think what we have now is a reasonable compromise, but I would be open to hearing your suggestions.

Jake
ID: 67767 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JohnDK

Send message
Joined: 18 Feb 10
Posts: 4
Credit: 11,047,976
RAC: 709
10 million credit badge9 year member badge
Message 67768 - Posted: 4 Sep 2018, 18:04:33 UTC
Last modified: 4 Sep 2018, 18:08:13 UTC

Just have to say I think it's a bad day having maintenance on a Tuesday since many SETI users have Milkyway as backup project, which also have maintenance on Tuesdays... (Also, maybe some Milkyway users have SETI as backup)
ID: 67768 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 542
Credit: 43,237,217
RAC: 106,753
30 million credit badge6 year member badgeextraordinary contributions badge
Message 67769 - Posted: 4 Sep 2018, 18:07:51 UTC

Hey JohnDK,

I had no idea their maintenance day was Tuesday. We just picked this day because its one of the two days that Jeff is in the office. This won't be too common of an occurrence and will consider switching to Thursdays. I just wanted to avoid being so close to the weekend when starting maintenance.

Jake
ID: 67769 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jeffery M. Thompson
Volunteer moderator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 23 Sep 12
Posts: 156
Credit: 12,997,107
RAC: 0
10 million credit badge6 year member badge
Message 67770 - Posted: 4 Sep 2018, 21:10:12 UTC

The database is still updating I am watching this through completion. I am expecting late tonight early tomorrow morning to have the feeder dishing out units.
ID: 67770 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
gambatesa

Send message
Joined: 23 Feb 18
Posts: 7
Credit: 1,206,226,303
RAC: 2,198,285
1 billion credit badge1 year member badge
Message 67771 - Posted: 5 Sep 2018, 9:25:47 UTC - in response to Message 67766.  

It would be nice if we could have the WU limit increased and maybe the deadline decreased a bit so when things like this happen we can keep crunching. I'm using a Volta based card and 80 WU are gone in a couple minutes.


80 Workunits per Gpu are really too small.. if server is down in less then half hour (on 7970) you run out of work.. i understand that maybe with CPUs could be reasonable.. but "hardcore business" is made of GPUs
ID: 67771 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gator 1-3

Send message
Joined: 21 Dec 12
Posts: 3
Credit: 106,104,737
RAC: 1,385
100 million credit badge6 year member badge
Message 67772 - Posted: 5 Sep 2018, 12:28:30 UTC

Any update on the expected time for the maintenance to end? I have a computer with 72 wu's on it that needs to be reformatted today.
ID: 67772 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileWisesooth

Send message
Joined: 2 Oct 14
Posts: 39
Credit: 32,617,818
RAC: 17,213
30 million credit badge4 year member badge
Message 67773 - Posted: 5 Sep 2018, 14:29:26 UTC

IMHO, the database server seems to be the weakest link in your system. A DBMS is the most processor and storage intense application in a system like this. RPI really needs a server with enough cores and solid-state memory to handle the throughput required to manage a grid computing environment, especially if the DBMS is enforcing referential integrity. Intel might give RPI some hardware help if they ask.
Yes, I know you think "Yes, we already know that." However, do the people with the purse know that? If they are not listening, maybe forwarding this message might get their attention. Overloaded servers breakdown at the most inconvenient times. The cost and time of RPI's most talented people should be considered in the total cost of ownership.
ID: 67773 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 542
Credit: 43,237,217
RAC: 106,753
30 million credit badge6 year member badgeextraordinary contributions badge
Message 67774 - Posted: 5 Sep 2018, 16:32:06 UTC

Hey Everyone,

Our database maintenance is coming to a close. We should be done by the 5pm today. It has been several months since we were down for maintenance last so it is taking us a little while to clean everything up.

As far as future plans, it is actually within our budget to upgrade the server and we plan to do that within the next few months. Otherwise, we have a few maintenance periods planned in the upcoming weeks to help optimize the database. In the last few months, it has been running pretty smoothly, but we think we can continue to improve it.

Thank you all for your continued support.
Jake
ID: 67774 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[H]auntjemima

Send message
Joined: 6 Jul 18
Posts: 2
Credit: 113,931,934
RAC: 616,001
100 million credit badge
Message 67775 - Posted: 5 Sep 2018, 17:30:43 UTC

Thanks for the update, Jake!
ID: 67775 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileMarsinph

Send message
Joined: 13 Nov 10
Posts: 12
Credit: 14,455,689
RAC: 82,999
10 million credit badge8 year member badge
Message 67776 - Posted: 5 Sep 2018, 17:36:49 UTC - in response to Message 67774.  

Thank you Jake.
But i expect a lot of problem when all will start again normally.
I think there will be million of WU who will be reported at the same time.
I hope it will not crash the server.
In fact, the limitation of 80WU is not a bad idea.
Already one day, all my WU are finished and unable to get any new
I understand, sometimes it is needed to make a big clean.
For sure for DB.
Thanks for update.
ID: 67776 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wb8ili

Send message
Joined: 18 Jul 10
Posts: 63
Credit: 252,681,094
RAC: 137,067
200 million credit badge8 year member badgeextraordinary contributions badge
Message 67777 - Posted: 5 Sep 2018, 21:56:12 UTC

Jake wrote -

"We have to walk a fine line with the number of workunits we allow users to download and their deadlines. We have both CPUs and GPUs that we have to balance with vastly different work times. I think what we have now is a reasonable compromise, but I would be open to hearing your suggestions."

I understand you have to vary the number of workunits a user can download. But, the number should be based on the capabilities of the users' computer not some arbitrary number (80) that implies that one number fits all, whether it refers to CPU or GPU workunits.

Your scheduling (workunit dispersal) algorithm knows everything about a user's computer (average computational time, number of invalid returns, up-time, etc.).

It shouldn't be that hard for someone at a prestigious university like RPI to figure out a more equitable way of dispersing workunits. Fast computers get more, slow computers get less, "bad actors" get few.

If my computer is returning valid results, and each workunit (GPU) takes 3 minutes, what is the problem with giving me 480 units (1 day), or 960 units (2 days), or more?

The algorithm, if properly done, should work for CPU and GPU workunits.
ID: 67777 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jeffery M. Thompson
Volunteer moderator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 23 Sep 12
Posts: 156
Credit: 12,997,107
RAC: 0
10 million credit badge6 year member badge
Message 67778 - Posted: 5 Sep 2018, 22:31:45 UTC

Work units are coming back in and the feeder should be serving them out again. I am monitoring. There will be a few hours as the load balances let me know if you see anything on your side as that processes through.
ID: 67778 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[H]auntjemima

Send message
Joined: 6 Jul 18
Posts: 2
Credit: 113,931,934
RAC: 616,001
100 million credit badge
Message 67779 - Posted: 6 Sep 2018, 1:06:11 UTC

I had 160 process through, but no more picked up. I'm sure it is being resolved.
ID: 67779 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKeith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 172
Credit: 106,319,162
RAC: 16,424
100 million credit badge8 year member badgeextraordinary contributions badge
Message 67780 - Posted: 6 Sep 2018, 1:46:24 UTC

I got 34 but nothing more since then, I assume the transitioner being offline must have something to do with it.
ID: 67780 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Manfred Reiff

Send message
Joined: 27 Apr 18
Posts: 6
Credit: 34,172,885
RAC: 7,705
30 million credit badge
Message 67781 - Posted: 6 Sep 2018, 10:09:11 UTC

Milkyway@Home is working again but unfortunately I don't get any GPU workunits (no changes to settings).
ID: 67781 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2213
Credit: 250,022,407
RAC: 8
200 million credit badge9 year member badgeextraordinary contributions badge
Message 67782 - Posted: 6 Sep 2018, 11:56:29 UTC

I have workunits for my pc's but when I look at the workunits status I see that ALOT of them have NOT been sent out to a wingman yet!! They say "unsent", prior to this maintenance phase I had zero "unsent" tasks.
ID: 67782 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gunnar Hjern

Send message
Joined: 14 Oct 16
Posts: 1
Credit: 12,455,074
RAC: 811
10 million credit badge2 year member badge
Message 67783 - Posted: 6 Sep 2018, 16:47:32 UTC - in response to Message 67782.  

Yes, I can confirm this!

I currently have 304 "in progress" to my different computers, but none of them seems to have a "wing man".

The same goes for the ones that I've completed and reported, and that is now in "Validation inconclusive".

My heap of "Validation inconclusive" is constantly growing, and is already 267, while none of my reported tasks seems to be validated.
Credits totally stuck! :-(

What is happening???

//Gunnar
ID: 67783 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileSaenger
Avatar

Send message
Joined: 28 Aug 07
Posts: 130
Credit: 11,627,169
RAC: 8,836
10 million credit badge10 year member badge
Message 67784 - Posted: 6 Sep 2018, 21:06:33 UTC

Yep, all of my WUs are _0 as well, so it looks like _1 are kept behind for now. What went wrong after the restart of the machines?
Grüße vom Sänger
ID: 67784 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : News : Database Maintenance 9-4-2014

©2019 Astroinformatics Group