server issues
log in

Advanced search

Message boards : News : server issues

1 · 2 · Next
Author Message
Sidd
Project developer
Project tester
Project scientist
Send message
Joined: 19 May 14
Posts: 60
Credit: 325,261
RAC: 1,011

Message 63664 - Posted: 3 Jun 2015, 18:47:39 UTC

Hey all,

We are currently having a few issues with the server. Everything seems to be up and running yet no work units are being sent out. If any of you see any erroneous behavior on your end please let us know.

Thanks,
Sidd

Richard Haselgrove
Send message
Joined: 4 Sep 12
Posts: 218
Credit: 448,778
RAC: 0

Message 63665 - Posted: 3 Jun 2015, 19:26:08 UTC

If we are - finally - to pay some attention to the server, could I remind you of three messages where I've posted about the BOINC server code being outdated?

Message 63188 - unfinished web update, corrupts < and > in [ pre ] and [ code ] blocks.
Message 63274 - php warning when 'don't move stickies to top' is selected.
BOINC message 62439 - recent ATI cards aren't recognised as being OpenCL capable.

And you'll know about the connection errors and timeouts since I started drafting the above.

trekkie0
Send message
Joined: 27 Mar 15
Posts: 1
Credit: 3,505,784
RAC: 0

Message 63666 - Posted: 3 Jun 2015, 19:31:03 UTC - in response to Message 63664.

I know I am not getting any work units. Other than that and the n-body issue everything seems fine.

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,896,245
RAC: 175,386

Message 63668 - Posted: 4 Jun 2015, 12:48:28 UTC

Hey guys,

Looks like we have the server getting some work units out again. As for the other issues, now that the spring semester is over maybe I can get some help from Travis on fixing some of the persistent server issues.

The timeouts and connection errors were the result of us working on the server trying to unstick the runs. You can expect those to continue a bit as we try to fix the nbody runs, but after today they shouldn't happen as frequently.

Sorry these issues are taking so long to resolve.

Jake W.

PtrHurricane
Send message
Joined: 5 Aug 11
Posts: 1
Credit: 257,694
RAC: 2

Message 63671 - Posted: 4 Jun 2015, 15:53:33 UTC

It has been the same for days, I received the same work unit, completed it twice, watched it at 100% for the longest time running, and without ready to report, after about 16 or so hours each time I reset the project, and after the twice, it has only given me message of communication deferred no matter how many times I attempt to update project.

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,896,245
RAC: 175,386

Message 63672 - Posted: 4 Jun 2015, 17:56:10 UTC

The work units getting stuck is a different issue than what we are having with the server. No worries though, Sidd thinks he found the issue with the nbody client and will be working on a solution to that.

The server issue resulted in no work units being sent out for separation and nbody, but that issue has been resolved. There are still a few other unresolved issues on the server so expect a few restarts over the next few days.

Jake W.

Cohan
Send message
Joined: 2 May 15
Posts: 1
Credit: 87,313,200
RAC: 0

Message 63678 - Posted: 6 Jun 2015, 5:49:44 UTC

The Problem ist not solved. I' didn't recieve WU's for ATI just before. Had to update the project. Its very instable since the Power outage.

Profile Wisesooth
Send message
Joined: 2 Oct 14
Posts: 33
Credit: 19,809,053
RAC: 29,363

Message 63683 - Posted: 7 Jun 2015, 23:43:23 UTC - in response to Message 63668.
Last modified: 7 Jun 2015, 23:57:43 UTC

Something else seems strange. I am getting responses that some of my results (14) are inconclusive. I see no errors for my other BOINC project and all results are verified correct. Also, the cobblestones on my profile do not match what BOINC charts on my machine. According to BOINC, I have 2.02 billion cobblestones. yet my profile shows slightly less than 2.0 billion. That is a 2 percent difference. Now the count for this post in the signature area shows slightly over 2 billion. Thought you might like to know.
____________

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,896,245
RAC: 175,386

Message 63689 - Posted: 9 Jun 2015, 17:06:43 UTC

Hey Wisesooth,

I'm not sure what is causing your credit discrepancy. I think your profile on our website will only show the credits you earned through MilkyWay@Home while BOINC will add in credits from other projects. That is just a guess though and maybe someone else can give you more insight on that.

As for the inconclusive problem, that is standard operating procedure for our project. Our server does not award credits until after your results have been validated against the results of other users. Inconclusive just means the server is waiting to hear back from other users who are still crunching the workunit so it can be validated. As far as I understand, this is different than many other projects where they use different validation strategies.


Jake W.

Frank J Curtis
Send message
Joined: 12 Feb 11
Posts: 1
Credit: 924,143
RAC: 0

Message 63699 - Posted: 10 Jun 2015, 21:47:11 UTC

For the last several weeks I seen projects listed as "100% complete". Why do insist on listing completed projects?

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,896,245
RAC: 175,386

Message 63704 - Posted: 12 Jun 2015, 14:36:12 UTC

Hey Richard,

I am currently looking into updating our BOINC libraries for both the client and server. I think this will fix many of the issues people have been running into, both the errors you mentioned here and errors mentioned in other places on the forum. I don't know how long it will take us to get these libraries updated, but it is being looked into.

For the sake of some transparency, the major hurdle we have is that the current libraries we use were customized for our project. This means simply building the newest version of the BOINC libraries and deploying that on the server and client will likely cause other unforeseen problems. Hopefully we can get a stable version working soon though.

Sorry for the delayed response.

Jake W.

metamorphoses
Send message
Joined: 8 Oct 10
Posts: 1
Credit: 4,986,575
RAC: 1,302

Message 63705 - Posted: 12 Jun 2015, 16:01:56 UTC - in response to Message 63699.

same issue here.
"job" states 100% complete:
'stuck',
no change in last 3-days;
i suspended it so that i could free-up resources for other "idle-work-projects" on this machine.
no other MW tasks received.

mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2032
Credit: 180,428,585
RAC: 281,183

Message 63706 - Posted: 13 Jun 2015, 11:08:47 UTC - in response to Message 63705.

same issue here.
"job" states 100% complete:
'stuck',
no change in last 3-days;
i suspended it so that i could free-up resources for other "idle-work-projects" on this machine.
no other MW tasks received.


As long as it is suspended you won't get other work as MW thinks you already have plenty. Personally after 1 day of no progress I would have aborted the unit and moved on to another one. Crunching is one thing, wasting time is another.

mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2032
Credit: 180,428,585
RAC: 281,183

Message 63707 - Posted: 13 Jun 2015, 11:17:20 UTC - in response to Message 63683.

Something else seems strange. I am getting responses that some of my results (14) are inconclusive. I see no errors for my other BOINC project and all results are verified correct.


The only 2 errors I can see now, I am just a cruncher not an admin here have a validation error. You can't compare one Boinc project to another as each project writes their own application files and each has it's own set of priorities and things it is looking for. This project could be much more picky in the details as opposed to your other project.

Also, the cobblestones on my profile do not match what BOINC charts on my machine. According to BOINC, I have 2.02 billion cobblestones. yet my profile shows slightly less than 2.0 billion. That is a 2 percent difference. Now the count for this post in the signature area shows slightly over 2 billion. Thought you might like to know.


Try this page for you stats:
http://stats.free-dc.org/stats.php?page=userbycpid&cpid=da1cb0a1901cbb23c6241de969db356e

It shows you at 2.8 million combined cobblestones, with MW at just over 2 million.

SLRE
Send message
Joined: 26 Jan 09
Posts: 12
Credit: 21,914,416
RAC: 22,165

Message 63730 - Posted: 16 Jun 2015, 20:57:37 UTC - in response to Message 63664.

Seem to have a lot of 'validation inconclusive' nvidia opencl units - about 50 over the last couple of days since the wu's started coming in again. Not sure whether that's an anomaly, but it seems unusual enough to let you guys know ...
____________

swiftmallard
Avatar
Send message
Joined: 18 Jul 09
Posts: 289
Credit: 302,980,648
RAC: 0

Message 63731 - Posted: 16 Jun 2015, 21:21:22 UTC - in response to Message 63730.

Seem to have a lot of 'validation inconclusive' nvidia opencl units - about 50 over the last couple of days since the wu's started coming in again. Not sure whether that's an anomaly, but it seems unusual enough to let you guys know ...

This is normal. The results are simply waiting to be confirmed by other crunchers.

SLRE
Send message
Joined: 26 Jan 09
Posts: 12
Credit: 21,914,416
RAC: 22,165

Message 63734 - Posted: 17 Jun 2015, 11:03:17 UTC - in response to Message 63731.

Ermm...pretty much the whole batch shows as 'validate errors'. In fact everything back to the server dropout shows as a validation error. All but two out of 188 tasks show as validation errors. That seems much less normal ...

To be fair, I put a new card in a week or so back - a new Asus GTX 750. But that model ran fine on another machine and is handling SETI with no problems.

Comments?

swiftmallard
Avatar
Send message
Joined: 18 Jul 09
Posts: 289
Credit: 302,980,648
RAC: 0

Message 63737 - Posted: 17 Jun 2015, 19:49:28 UTC - in response to Message 63734.

Seem to have a lot of 'validation inconclusive' nvidia opencl units - about 50 over the last couple of days since the wu's started coming in again. Not sure whether that's an anomaly, but it seems unusual enough to let you guys know ...


This is normal. The results are simply waiting to be confirmed by other crunchers.


Ermm...pretty much the whole batch shows as 'validate errors'. In fact everything back to the server dropout shows as a validation error. All but two out of 188 tasks show as validation errors. That seems much less normal ...

To be fair, I put a new card in a week or so back - a new Asus GTX 750. But that model ran fine on another machine and is handling SETI with no problems.

Comments?


You asked about "validation inconclusive". Validate errors are a different thing and can be caused by any number of issues. Sit tight and one of the crunchers who know far more about nvidia than I will be able to assist you.

Profile Wisesooth
Send message
Joined: 2 Oct 14
Posts: 33
Credit: 19,809,053
RAC: 29,363

Message 63739 - Posted: 17 Jun 2015, 23:47:06 UTC - in response to Message 63689.

Thanks for the info, Jake. You asked about server issues at our end following the scheduled power outage. Earlier today, I saw that user update requests were not handling tasks ready to report. I checked the "server status" button and found that about 7 or server tasks were down with errors. All of them seemed to be related to a database corruption with MySQL. I suspect that you may have corrupted indexes. I know little about the "secret sauce" that lubricates the grid, but I had some past experience with MySQL that were not very pretty.

The servers may think they are running, but not be aware that they are running in circles. Seti@home had a similar problem months ago. They had to rebuild their database from scratch.

Hope this is helpful.
____________

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,896,245
RAC: 175,386

Message 63740 - Posted: 18 Jun 2015, 12:22:59 UTC

Hey Wisesooth,

That was me doing some server maintenance. I put up some new runs on Tuesday using some obscure settings to see if they helped get the runs to finish faster. Turns out they broke the work unit generator for modfit. So that was just me rebooting everything to get it running again. Sorry I maybe should have made a news post.

Jake W.

1 · 2 · Next
Post to thread

Message boards : News : server issues


Main page · Your account · Message boards


Copyright © 2017 AstroInformatics Group