Welcome to MilkyWay@home

server issues


Advanced search

Message boards : News : server issues
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Sidd
Project developer
Project tester
Project scientist

Send message
Joined: 19 May 14
Posts: 73
Credit: 356,131
RAC: 0
100 thousand credit badge4 year member badge
Message 63664 - Posted: 3 Jun 2015, 18:47:39 UTC

Hey all,

We are currently having a few issues with the server. Everything seems to be up and running yet no work units are being sent out. If any of you see any erroneous behavior on your end please let us know.

Thanks,
Sidd
ID: 63664 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 448,778
RAC: 0
100 thousand credit badge6 year member badge
Message 63665 - Posted: 3 Jun 2015, 19:26:08 UTC

If we are - finally - to pay some attention to the server, could I remind you of three messages where I've posted about the BOINC server code being outdated?

Message 63188 - unfinished web update, corrupts < and > in [ pre ] and [ code ] blocks.
Message 63274 - php warning when 'don't move stickies to top' is selected.
BOINC message 62439 - recent ATI cards aren't recognised as being OpenCL capable.

And you'll know about the connection errors and timeouts since I started drafting the above.
ID: 63665 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
trekkie0

Send message
Joined: 27 Mar 15
Posts: 1
Credit: 3,639,053
RAC: 9,148
3 million credit badge4 year member badge
Message 63666 - Posted: 3 Jun 2015, 19:31:03 UTC - in response to Message 63664.  

I know I am not getting any work units. Other than that and the n-body issue everything seems fine.
ID: 63666 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 579
Credit: 58,996,321
RAC: 427,313
50 million credit badge6 year member badgeextraordinary contributions badge
Message 63668 - Posted: 4 Jun 2015, 12:48:28 UTC

Hey guys,

Looks like we have the server getting some work units out again. As for the other issues, now that the spring semester is over maybe I can get some help from Travis on fixing some of the persistent server issues.

The timeouts and connection errors were the result of us working on the server trying to unstick the runs. You can expect those to continue a bit as we try to fix the nbody runs, but after today they shouldn't happen as frequently.

Sorry these issues are taking so long to resolve.

Jake W.
ID: 63668 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
PtrHurricane

Send message
Joined: 5 Aug 11
Posts: 1
Credit: 1,250,812
RAC: 223
1 million credit badge7 year member badge
Message 63671 - Posted: 4 Jun 2015, 15:53:33 UTC

It has been the same for days, I received the same work unit, completed it twice, watched it at 100% for the longest time running, and without ready to report, after about 16 or so hours each time I reset the project, and after the twice, it has only given me message of communication deferred no matter how many times I attempt to update project.
ID: 63671 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 579
Credit: 58,996,321
RAC: 427,313
50 million credit badge6 year member badgeextraordinary contributions badge
Message 63672 - Posted: 4 Jun 2015, 17:56:10 UTC

The work units getting stuck is a different issue than what we are having with the server. No worries though, Sidd thinks he found the issue with the nbody client and will be working on a solution to that.

The server issue resulted in no work units being sent out for separation and nbody, but that issue has been resolved. There are still a few other unresolved issues on the server so expect a few restarts over the next few days.

Jake W.
ID: 63672 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cohan

Send message
Joined: 2 May 15
Posts: 1
Credit: 87,313,200
RAC: 0
50 million credit badge3 year member badge
Message 63678 - Posted: 6 Jun 2015, 5:49:44 UTC

The Problem ist not solved. I' didn't recieve WU's for ATI just before. Had to update the project. Its very instable since the Power outage.
ID: 63678 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileWisesooth

Send message
Joined: 2 Oct 14
Posts: 39
Credit: 33,698,157
RAC: 29,374
30 million credit badge4 year member badge
Message 63683 - Posted: 7 Jun 2015, 23:43:23 UTC - in response to Message 63668.  
Last modified: 7 Jun 2015, 23:57:43 UTC

Something else seems strange. I am getting responses that some of my results (14) are inconclusive. I see no errors for my other BOINC project and all results are verified correct. Also, the cobblestones on my profile do not match what BOINC charts on my machine. According to BOINC, I have 2.02 billion cobblestones. yet my profile shows slightly less than 2.0 billion. That is a 2 percent difference. Now the count for this post in the signature area shows slightly over 2 billion. Thought you might like to know.
ID: 63683 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 579
Credit: 58,996,321
RAC: 427,313
50 million credit badge6 year member badgeextraordinary contributions badge
Message 63689 - Posted: 9 Jun 2015, 17:06:43 UTC

Hey Wisesooth,

I'm not sure what is causing your credit discrepancy. I think your profile on our website will only show the credits you earned through MilkyWay@Home while BOINC will add in credits from other projects. That is just a guess though and maybe someone else can give you more insight on that.

As for the inconclusive problem, that is standard operating procedure for our project. Our server does not award credits until after your results have been validated against the results of other users. Inconclusive just means the server is waiting to hear back from other users who are still crunching the workunit so it can be validated. As far as I understand, this is different than many other projects where they use different validation strategies.


Jake W.
ID: 63689 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Frank J Curtis

Send message
Joined: 12 Feb 11
Posts: 1
Credit: 972,109
RAC: 0
500 thousand credit badge8 year member badge
Message 63699 - Posted: 10 Jun 2015, 21:47:11 UTC

For the last several weeks I seen projects listed as "100% complete". Why do insist on listing completed projects?
ID: 63699 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 579
Credit: 58,996,321
RAC: 427,313
50 million credit badge6 year member badgeextraordinary contributions badge
Message 63704 - Posted: 12 Jun 2015, 14:36:12 UTC

Hey Richard,

I am currently looking into updating our BOINC libraries for both the client and server. I think this will fix many of the issues people have been running into, both the errors you mentioned here and errors mentioned in other places on the forum. I don't know how long it will take us to get these libraries updated, but it is being looked into.

For the sake of some transparency, the major hurdle we have is that the current libraries we use were customized for our project. This means simply building the newest version of the BOINC libraries and deploying that on the server and client will likely cause other unforeseen problems. Hopefully we can get a stable version working soon though.

Sorry for the delayed response.

Jake W.
ID: 63704 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
metamorphoses

Send message
Joined: 8 Oct 10
Posts: 1
Credit: 7,668,304
RAC: 5,322
5 million credit badge8 year member badge
Message 63705 - Posted: 12 Jun 2015, 16:01:56 UTC - in response to Message 63699.  

same issue here.
"job" states 100% complete:
'stuck',
no change in last 3-days;
i suspended it so that i could free-up resources for other "idle-work-projects" on this machine.
no other MW tasks received.
ID: 63705 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2220
Credit: 252,807,544
RAC: 99,558
200 million credit badge9 year member badgeextraordinary contributions badge
Message 63706 - Posted: 13 Jun 2015, 11:08:47 UTC - in response to Message 63705.  

same issue here.
"job" states 100% complete:
'stuck',
no change in last 3-days;
i suspended it so that i could free-up resources for other "idle-work-projects" on this machine.
no other MW tasks received.


As long as it is suspended you won't get other work as MW thinks you already have plenty. Personally after 1 day of no progress I would have aborted the unit and moved on to another one. Crunching is one thing, wasting time is another.
ID: 63706 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2220
Credit: 252,807,544
RAC: 99,558
200 million credit badge9 year member badgeextraordinary contributions badge
Message 63707 - Posted: 13 Jun 2015, 11:17:20 UTC - in response to Message 63683.  

Something else seems strange. I am getting responses that some of my results (14) are inconclusive. I see no errors for my other BOINC project and all results are verified correct.


The only 2 errors I can see now, I am just a cruncher not an admin here have a validation error. You can't compare one Boinc project to another as each project writes their own application files and each has it's own set of priorities and things it is looking for. This project could be much more picky in the details as opposed to your other project.

Also, the cobblestones on my profile do not match what BOINC charts on my machine. According to BOINC, I have 2.02 billion cobblestones. yet my profile shows slightly less than 2.0 billion. That is a 2 percent difference. Now the count for this post in the signature area shows slightly over 2 billion. Thought you might like to know.


Try this page for you stats:
http://stats.free-dc.org/stats.php?page=userbycpid&cpid=da1cb0a1901cbb23c6241de969db356e

It shows you at 2.8 million combined cobblestones, with MW at just over 2 million.
ID: 63707 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SLRE

Send message
Joined: 26 Jan 09
Posts: 12
Credit: 31,798,538
RAC: 24,547
30 million credit badge10 year member badge
Message 63730 - Posted: 16 Jun 2015, 20:57:37 UTC - in response to Message 63664.  

Seem to have a lot of 'validation inconclusive' nvidia opencl units - about 50 over the last couple of days since the wu's started coming in again. Not sure whether that's an anomaly, but it seems unusual enough to let you guys know ...
ID: 63730 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 18 Jul 09
Posts: 289
Credit: 302,980,648
RAC: 0
300 million credit badge9 year member badgeextraordinary contributions badge
Message 63731 - Posted: 16 Jun 2015, 21:21:22 UTC - in response to Message 63730.  

Seem to have a lot of 'validation inconclusive' nvidia opencl units - about 50 over the last couple of days since the wu's started coming in again. Not sure whether that's an anomaly, but it seems unusual enough to let you guys know ...

This is normal. The results are simply waiting to be confirmed by other crunchers.
ID: 63731 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SLRE

Send message
Joined: 26 Jan 09
Posts: 12
Credit: 31,798,538
RAC: 24,547
30 million credit badge10 year member badge
Message 63734 - Posted: 17 Jun 2015, 11:03:17 UTC - in response to Message 63731.  

Ermm...pretty much the whole batch shows as 'validate errors'. In fact everything back to the server dropout shows as a validation error. All but two out of 188 tasks show as validation errors. That seems much less normal ...

To be fair, I put a new card in a week or so back - a new Asus GTX 750. But that model ran fine on another machine and is handling SETI with no problems.

Comments?
ID: 63734 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 18 Jul 09
Posts: 289
Credit: 302,980,648
RAC: 0
300 million credit badge9 year member badgeextraordinary contributions badge
Message 63737 - Posted: 17 Jun 2015, 19:49:28 UTC - in response to Message 63734.  

Seem to have a lot of 'validation inconclusive' nvidia opencl units - about 50 over the last couple of days since the wu's started coming in again. Not sure whether that's an anomaly, but it seems unusual enough to let you guys know ...


This is normal. The results are simply waiting to be confirmed by other crunchers.


Ermm...pretty much the whole batch shows as 'validate errors'. In fact everything back to the server dropout shows as a validation error. All but two out of 188 tasks show as validation errors. That seems much less normal ...

To be fair, I put a new card in a week or so back - a new Asus GTX 750. But that model ran fine on another machine and is handling SETI with no problems.

Comments?


You asked about "validation inconclusive". Validate errors are a different thing and can be caused by any number of issues. Sit tight and one of the crunchers who know far more about nvidia than I will be able to assist you.
ID: 63737 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileWisesooth

Send message
Joined: 2 Oct 14
Posts: 39
Credit: 33,698,157
RAC: 29,374
30 million credit badge4 year member badge
Message 63739 - Posted: 17 Jun 2015, 23:47:06 UTC - in response to Message 63689.  

Thanks for the info, Jake. You asked about server issues at our end following the scheduled power outage. Earlier today, I saw that user update requests were not handling tasks ready to report. I checked the "server status" button and found that about 7 or server tasks were down with errors. All of them seemed to be related to a database corruption with MySQL. I suspect that you may have corrupted indexes. I know little about the "secret sauce" that lubricates the grid, but I had some past experience with MySQL that were not very pretty.

The servers may think they are running, but not be aware that they are running in circles. Seti@home had a similar problem months ago. They had to rebuild their database from scratch.

Hope this is helpful.
ID: 63739 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 579
Credit: 58,996,321
RAC: 427,313
50 million credit badge6 year member badgeextraordinary contributions badge
Message 63740 - Posted: 18 Jun 2015, 12:22:59 UTC

Hey Wisesooth,

That was me doing some server maintenance. I put up some new runs on Tuesday using some obscure settings to see if they helped get the runs to finish faster. Turns out they broke the work unit generator for modfit. So that was just me rebooting everything to get it running again. Sorry I maybe should have made a news post.

Jake W.
ID: 63740 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : server issues

©2019 Astroinformatics Group