Welcome to MilkyWay@home

Server can't open database?

Message boards : Number crunching : Server can't open database?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Lord Tedric
Avatar

Send message
Joined: 9 Nov 07
Posts: 151
Credit: 8,391,608
RAC: 0
Message 30649 - Posted: 13 Sep 2009, 10:03:44 UTC

What's this all about?

13/09/2009 11:01:18 Milkyway@home update requested by user
13/09/2009 11:01:19 Milkyway@home Sending scheduler request: Requested by user.
13/09/2009 11:01:19 Milkyway@home Reporting 20 completed tasks, requesting new tasks
13/09/2009 11:01:24 Milkyway@home Scheduler request completed: got 0 new tasks
13/09/2009 11:01:24 Milkyway@home Message from server: Server can't open database

Then update backs off for 60mins
ID: 30649 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ross*

Send message
Joined: 17 May 09
Posts: 22
Credit: 161,135,083
RAC: 0
Message 30651 - Posted: 13 Sep 2009, 10:07:45 UTC - in response to Message 30649.  

Hi guys
Why is the server not able to open the database?
I have 100 WUs waiting
server seems fine but someone is not looking at whats going on.
Ross

ID: 30651 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mrchips
Avatar

Send message
Joined: 31 Oct 10
Posts: 15
Credit: 281,009,768
RAC: 1
Message 68076 - Posted: 28 Jan 2019, 18:34:48 UTC

I see this has been happening for over 9 years. Can't anyone figure out and fix the root cause?
It happened again this morning.
ID: 68076 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dylan

Send message
Joined: 27 Dec 12
Posts: 2
Credit: 3,150,504
RAC: 0
Message 68079 - Posted: 30 Jan 2019, 16:24:16 UTC - in response to Message 68076.  

I see this has been happening for over 9 years. Can't anyone figure out and fix the root cause?
It happened again this morning.


I see it, too. Idk what's wrong, it seems to always fix itself. I would just ensure you have enough work to get through it, and it shouldn't be an issue.
ID: 68079 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 68080 - Posted: 31 Jan 2019, 0:14:30 UTC - in response to Message 68076.  

I see this has been happening for over 9 years. Can't anyone figure out and fix the root cause?
It happened again this morning.


I always assumed it was some process running, like a backup, cleanup, whatever that needed to close the database. But you are right every morning about the same time it goes off line.
ID: 68080 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tackleway

Send message
Joined: 17 Mar 10
Posts: 20
Credit: 5,641,904
RAC: 0
Message 68091 - Posted: 1 Feb 2019, 21:39:00 UTC - in response to Message 68080.  

I see this has been happening for over 9 years. Can't anyone figure out and fix the root cause?
It happened again this morning.


I always assumed it was some process running, like a backup, cleanup, whatever that needed to close the database. But you are right every morning about the same time it goes off line.


But three hours a day...every day! It's long overdue for some explanation if not a resolution.
ID: 68091 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jim1348

Send message
Joined: 9 Jul 17
Posts: 100
Credit: 16,967,906
RAC: 0
Message 68126 - Posted: 9 Feb 2019, 16:08:51 UTC - in response to Message 68091.  

But three hours a day...every day! It's long overdue for some explanation if not a resolution.

I also think an explanation is in order. But I am now putting my RX 570 on Folding. They are willing to give you work, and it is useful science.
ID: 68126 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ged

Send message
Joined: 22 Apr 09
Posts: 6
Credit: 10,757,781
RAC: 0
Message 68136 - Posted: 11 Feb 2019, 8:30:18 UTC

These DB outages are bad enough on the Milkyway project but they also impact other projects I run on my machines. When BOINC Manager issues an update request for Milkyway@Home, either to report completed work or because I asked for an update against the project, BOINC puts that request on a sequential stack. If other running projects cause an update of their status while the Milkyway request is being processed and Milkyway's DB is down then due to the extraordinarily long time it seems to take just to say "it's broken" those other project requests can't be serviced until Milkyway issues a reply.

So can Milkyway project admins please stop ignoring this problem and put some technical effort into operational efficiency improvements at the brief expense of the cascade of application development? I'm sure it will be worth the effort.

Ged
ID: 68136 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 556,162,116
RAC: 54,690
Message 68139 - Posted: 11 Feb 2019, 17:55:40 UTC

You can change the connection timeout to something shorter than stock in cc_config.xml so that it gives up faster on MW and can then service your other projects.

<http_transfer_timeout>90</http_transfer_timeout>

But as you state, it would be best if the project fixed its server database unavailability problem as soon as possible.
ID: 68139 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ged

Send message
Joined: 22 Apr 09
Posts: 6
Credit: 10,757,781
RAC: 0
Message 68146 - Posted: 12 Feb 2019, 9:00:05 UTC - in response to Message 68139.  

Thanks Keith.

I get what you're suggesting but it would bother me that cc_config.xml is a global configuration item for cpu-bound projects; nvc-config.xml would have to be set to cover gpu-bound apps, too.

Also, I thought that <http_transfer_timeout> is to do with project file transfers to/from client/server - If it's taking longer than the value you specify (default is 300 seconds...) then 'Abort' the transfer. This, along with the <http_transfer_timeout_bps> configuration item, which allows you to specify a minimum/acceptable data transfer rate, allows for those using dial-up modem connections (nostalgia!!!) to not waste call time on an unacceptably slow connection, for example:

<http_transfer_timeout>90</http_transfer_timeout>
<http_transfer_timeout_bps>9600</http_transfer_timeout_bps>

Would tell the client to abort the file transfer if the effective connection speed was less than 9600 bits per second for 90 seconds.

I'm still of the mind that the Milkyway project team need to fix the fundamental issue at their end ;-)

Ged
ID: 68146 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tackleway

Send message
Joined: 17 Mar 10
Posts: 20
Credit: 5,641,904
RAC: 0
Message 68150 - Posted: 12 Feb 2019, 18:32:21 UTC

Sorry! but I'm disappointed with the project admins ignoring this issue so I'll run down the last few work units I've got left and will leave MW for now.
ID: 68150 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 556,162,116
RAC: 54,690
Message 68152 - Posted: 13 Feb 2019, 0:12:12 UTC

I tried to reply 5 hours earlier but the website went down while composing.

As far as I can tell the transfer timeout is watching the handshake between the client and the scheduler. If the client doesn't get an acknowledgment from the scheduler in the allotted time, then abort the connection attempt. It doesn't put a clock on any actual transfer of tasks. This is what I observe.

Set http_debug in your Event Log options so see all the handshake communications on a scheduler connection.
ID: 68152 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ged

Send message
Joined: 22 Apr 09
Posts: 6
Credit: 10,757,781
RAC: 0
Message 68156 - Posted: 13 Feb 2019, 8:04:14 UTC - in response to Message 68152.  

Thanks again Keith.

Ged
ID: 68156 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 68198 - Posted: 4 Mar 2019, 17:12:40 UTC - in response to Message 68150.  

Sorry! but I'm disappointed with the project admins ignoring this issue so I'll run down the last few work units I've got left and will leave MW for now.


I noticed that both seti and milkyway are no longer on the gridcoin whitelist. Both went on the "Excluded Projects" list in March
https://gridcoinstats.eu/project
With collatz excluded since February that leaves only amicable numbers as the remaining ATI GPU capable project that still pay rewards.
ID: 68198 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 68200 - Posted: 4 Mar 2019, 22:05:52 UTC - in response to Message 68198.  

I noticed that both seti and milkyway are no longer on the gridcoin whitelist. Both went on the "Excluded Projects" list in March

Those are just occasional glitches. Milkyway and SETI are back already. Collatz will be back soon too, I guess.
ID: 68200 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rcthardcore

Send message
Joined: 30 Dec 08
Posts: 30
Credit: 6,999,702
RAC: 0
Message 68213 - Posted: 6 Mar 2019, 21:16:51 UTC - in response to Message 68152.  

Doing what you suggested doesn't actually fix the problem. It is clearly a problem on the server side, not ours. No matter what I set the timeout to, the problem still happens.
ID: 68213 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Server can't open database?

©2024 Astroinformatics Group