rpi_logo
Server can't open database?
Server can't open database?
log in

Advanced search

Message boards : Number crunching : Server can't open database?

Author Message
Profile Lord Tedric
Avatar
Send message
Joined: 9 Nov 07
Posts: 151
Credit: 8,391,608
RAC: 0

Message 30649 - Posted: 13 Sep 2009, 10:03:44 UTC

What's this all about?

13/09/2009 11:01:18 Milkyway@home update requested by user
13/09/2009 11:01:19 Milkyway@home Sending scheduler request: Requested by user.
13/09/2009 11:01:19 Milkyway@home Reporting 20 completed tasks, requesting new tasks
13/09/2009 11:01:24 Milkyway@home Scheduler request completed: got 0 new tasks
13/09/2009 11:01:24 Milkyway@home Message from server: Server can't open database

Then update backs off for 60mins
____________

Ross*
Send message
Joined: 17 May 09
Posts: 22
Credit: 160,965,535
RAC: 0

Message 30651 - Posted: 13 Sep 2009, 10:07:45 UTC - in response to Message 30649.

Hi guys
Why is the server not able to open the database?
I have 100 WUs waiting
server seems fine but someone is not looking at whats going on.
Ross

____________

mrchips
Avatar
Send message
Joined: 31 Oct 10
Posts: 2
Credit: 2,212,156
RAC: 41,095

Message 68076 - Posted: 28 Jan 2019, 18:34:48 UTC

I see this has been happening for over 9 years. Can't anyone figure out and fix the root cause?
It happened again this morning.

Dylan
Send message
Joined: 27 Dec 12
Posts: 2
Credit: 203,943
RAC: 1,243

Message 68079 - Posted: 30 Jan 2019, 16:24:16 UTC - in response to Message 68076.

I see this has been happening for over 9 years. Can't anyone figure out and fix the root cause?
It happened again this morning.


I see it, too. Idk what's wrong, it seems to always fix itself. I would just ensure you have enough work to get through it, and it shouldn't be an issue.

Profile mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2212
Credit: 250,022,407
RAC: 96

Message 68080 - Posted: 31 Jan 2019, 0:14:30 UTC - in response to Message 68076.

I see this has been happening for over 9 years. Can't anyone figure out and fix the root cause?
It happened again this morning.


I always assumed it was some process running, like a backup, cleanup, whatever that needed to close the database. But you are right every morning about the same time it goes off line.

Profile Tackleway
Send message
Joined: 17 Mar 10
Posts: 17
Credit: 4,989,899
RAC: 796

Message 68091 - Posted: 1 Feb 2019, 21:39:00 UTC - in response to Message 68080.

I see this has been happening for over 9 years. Can't anyone figure out and fix the root cause?
It happened again this morning.


I always assumed it was some process running, like a backup, cleanup, whatever that needed to close the database. But you are right every morning about the same time it goes off line.


But three hours a day...every day! It's long overdue for some explanation if not a resolution.
____________

Jim1348
Send message
Joined: 9 Jul 17
Posts: 45
Credit: 2,315,360
RAC: 8,940

Message 68126 - Posted: 9 Feb 2019, 16:08:51 UTC - in response to Message 68091.

But three hours a day...every day! It's long overdue for some explanation if not a resolution.

I also think an explanation is in order. But I am now putting my RX 570 on Folding. They are willing to give you work, and it is useful science.

Profile Ged
Send message
Joined: 22 Apr 09
Posts: 6
Credit: 10,757,781
RAC: 333

Message 68136 - Posted: 11 Feb 2019, 8:30:18 UTC

These DB outages are bad enough on the Milkyway project but they also impact other projects I run on my machines. When BOINC Manager issues an update request for Milkyway@Home, either to report completed work or because I asked for an update against the project, BOINC puts that request on a sequential stack. If other running projects cause an update of their status while the Milkyway request is being processed and Milkyway's DB is down then due to the extraordinarily long time it seems to take just to say "it's broken" those other project requests can't be serviced until Milkyway issues a reply.

So can Milkyway project admins please stop ignoring this problem and put some technical effort into operational efficiency improvements at the brief expense of the cascade of application development? I'm sure it will be worth the effort.

Ged

Profile Keith Myers
Avatar
Send message
Joined: 24 Jan 11
Posts: 169
Credit: 105,790,734
RAC: 23,841

Message 68139 - Posted: 11 Feb 2019, 17:55:40 UTC

You can change the connection timeout to something shorter than stock in cc_config.xml so that it gives up faster on MW and can then service your other projects.

<http_transfer_timeout>90</http_transfer_timeout>

But as you state, it would be best if the project fixed its server database unavailability problem as soon as possible.
____________

Profile Ged
Send message
Joined: 22 Apr 09
Posts: 6
Credit: 10,757,781
RAC: 333

Message 68146 - Posted: 12 Feb 2019, 9:00:05 UTC - in response to Message 68139.

Thanks Keith.

I get what you're suggesting but it would bother me that cc_config.xml is a global configuration item for cpu-bound projects; nvc-config.xml would have to be set to cover gpu-bound apps, too.

Also, I thought that <http_transfer_timeout> is to do with project file transfers to/from client/server - If it's taking longer than the value you specify (default is 300 seconds...) then 'Abort' the transfer. This, along with the <http_transfer_timeout_bps> configuration item, which allows you to specify a minimum/acceptable data transfer rate, allows for those using dial-up modem connections (nostalgia!!!) to not waste call time on an unacceptably slow connection, for example:

<http_transfer_timeout>90</http_transfer_timeout>
<http_transfer_timeout_bps>9600</http_transfer_timeout_bps>

Would tell the client to abort the file transfer if the effective connection speed was less than 9600 bits per second for 90 seconds.

I'm still of the mind that the Milkyway project team need to fix the fundamental issue at their end ;-)

Ged

Profile Tackleway
Send message
Joined: 17 Mar 10
Posts: 17
Credit: 4,989,899
RAC: 796

Message 68150 - Posted: 12 Feb 2019, 18:32:21 UTC

Sorry! but I'm disappointed with the project admins ignoring this issue so I'll run down the last few work units I've got left and will leave MW for now.
____________

Profile Keith Myers
Avatar
Send message
Joined: 24 Jan 11
Posts: 169
Credit: 105,790,734
RAC: 23,841

Message 68152 - Posted: 13 Feb 2019, 0:12:12 UTC

I tried to reply 5 hours earlier but the website went down while composing.

As far as I can tell the transfer timeout is watching the handshake between the client and the scheduler. If the client doesn't get an acknowledgment from the scheduler in the allotted time, then abort the connection attempt. It doesn't put a clock on any actual transfer of tasks. This is what I observe.

Set http_debug in your Event Log options so see all the handshake communications on a scheduler connection.
____________

Profile Ged
Send message
Joined: 22 Apr 09
Posts: 6
Credit: 10,757,781
RAC: 333

Message 68156 - Posted: 13 Feb 2019, 8:04:14 UTC - in response to Message 68152.

Thanks again Keith.

Ged


Post to thread

Message boards : Number crunching : Server can't open database?


Main page · Your account · Message boards


Copyright © 2019 AstroInformatics Group