Welcome to MilkyWay@home

The Last Couple Days


Advanced search

Message boards : News : The Last Couple Days
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
ProfileTom Donlon
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 80
Credit: 48,880,301
RAC: 45,271
30 million credit badge1 year member badge
Message 70519 - Posted: 3 Feb 2021, 16:49:59 UTC

Hi Everyone,

As I'm sure you're all aware by now, the server went through a bit of maintenance the last couple days and now seems to be working normally. We had to clear out a lot of backlog workunits due to technical problems on our side. The bad news is that if you crunched workunits in the last two or three days, those workunits may have been deleted before they were validated, meaning that you might not be getting credit for them. The good news is that the server is working again and that we have taken precautions so that this won't happen again in the future.

Apologies for any inconvenience this may have caused. This caught us by surprise just as it caught all of you by surprise, and know that we will always try our best to communicate with you all when these sorts of things happen.

Best,
Tom
ID: 70519 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileArtstein

Send message
Joined: 2 Apr 11
Posts: 2
Credit: 4,048,677
RAC: 40,183
3 million credit badge9 year member badge
Message 70521 - Posted: 3 Feb 2021, 16:58:52 UTC - in response to Message 70519.  

Thank you Tom!

I have a quick question. You mentioned that you have taken precautions. Are they related to the INT -> BIGINT transition you mentioned before? Or are they something else?

Back to crunching as soon as my client downloads the new units.
Proud Gridcoin Team member!
ID: 70521 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Holdolin

Send message
Joined: 9 Dec 11
Posts: 30
Credit: 651,255,696
RAC: 10,983,817
500 million credit badge9 year member badge
Message 70522 - Posted: 3 Feb 2021, 17:00:28 UTC - in response to Message 70519.  

Communication in my most humble of opinions is vitally important. Not only does it keep we donors up to date, but shows that those running the project have not only an interest in whatever the goal is of the project, but those that donate cycles to the cause. Thank you greatly for continuing to keep us as up to date as possible :)
ID: 70522 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTom Donlon
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 80
Credit: 48,880,301
RAC: 45,271
30 million credit badge1 year member badge
Message 70524 - Posted: 3 Feb 2021, 17:24:23 UTC - in response to Message 70521.  
Last modified: 3 Feb 2021, 17:24:47 UTC

Hi Artstein,

Thanks for the question. We are going through the code that generated the faulty workunit that jammed things up and fixing the logic in that code. Also, we are no longer going to leave converged runs up after they have converged, and will instead refresh the runs before they cause these issues.

I will probably make the INT --> BIGINT change at some convenient point in the future. We did not make that change yet because we were more concerned about getting the server back to functional and less concerned about future proofing.

Tom
ID: 70524 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Holdolin

Send message
Joined: 9 Dec 11
Posts: 30
Credit: 651,255,696
RAC: 10,983,817
500 million credit badge9 year member badge
Message 70525 - Posted: 3 Feb 2021, 17:35:12 UTC

So in watching how things are operating I came to a question. Was it intentional to greatly reduce the amount of WUs a user gets per request? Before the outage i was getting 900(at most) WUs per request, now I'm getting 22.
ID: 70525 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTom Donlon
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 80
Credit: 48,880,301
RAC: 45,271
30 million credit badge1 year member badge
Message 70526 - Posted: 3 Feb 2021, 17:54:42 UTC - in response to Message 70525.  
Last modified: 3 Feb 2021, 17:56:37 UTC

That may be a temporary side effect of flushing out the server. Hopefully as things run, workunits will fill up (the "tasks ready to send" are steadily climbing) and you will be able to pull more per request. Regardless, I will look into this problem, please keep me monitored on whether it continues to be an issue.
ID: 70526 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JohnDK

Send message
Joined: 18 Feb 10
Posts: 11
Credit: 114,389,014
RAC: 445,622
100 million credit badge11 year member badge
Message 70527 - Posted: 3 Feb 2021, 17:58:00 UTC

Got over 300 WUs on my 2 PCs on the first request after the outage.
ID: 70527 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Dirk Riesener

Send message
Joined: 19 Aug 19
Posts: 2
Credit: 6,052,245
RAC: 6,688
5 million credit badge1 year member badge
Message 70528 - Posted: 3 Feb 2021, 18:45:38 UTC

Hello Tom,

you wrote: "We had to clear out a lot of backlog workunits due to technical problems on our side. The bad news is that if you crunched workunits in the last two or three days, those workunits may have been deleted before they were validated, meaning that you might not be getting credit for them."
I found three unconfirmed wu from Jan. the 25. an 30. After clicking on the work package links, I got no information on these wu. The server couldn't find them. Is it that what you meant?
ID: 70528 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Holdolin

Send message
Joined: 9 Dec 11
Posts: 30
Credit: 651,255,696
RAC: 10,983,817
500 million credit badge9 year member badge
Message 70531 - Posted: 3 Feb 2021, 19:14:22 UTC

Ok, time to turn in my "guy card" and give y'all a laugh at my expense. Like many of you, I participate in more than one project. For me it's this one, Prime Grid, and Einstein. My Nvidia GPUs are running PG and my AMD cards are here. I hit up Einstein when this project went through that rough spot the last couple days. Now this all relates to the thread I promise. I noticed that Einstein seemed to be slow to give me work as well, but there is chatter over there about WUs that allowed me to believe it could be me. Now this project is back to normal and as I said I was getting few WUs while others have said they getting work no prob. So i got to thinking "gee Hold, what would cause all your projects to get limited work?" I strolled over to PG to see if I found any primes being Tour de Prime and all and it hit me. You idiot, you set your WU que to 0 days. So ever hopeful, I set up an alternate venue that allowed several days que and ploped my primary MW cruncher in that venue. Told MW to get work and POW...900 WUs. Sorry if I cause you any worry Tom, looks like all is actually well with the servers, I'm just a scrub. Ok, y'all can stop laughing now lol.
ID: 70531 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTom Donlon
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 80
Credit: 48,880,301
RAC: 45,271
30 million credit badge1 year member badge
Message 70532 - Posted: 3 Feb 2021, 19:17:50 UTC - in response to Message 70531.  

Don't worry Holdolin, after 3 days sweating over a database I'm not in a position to make fun of anyone for dumb tech mistakes. Glad to hear that everything is working as it should be!
ID: 70532 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileArtstein

Send message
Joined: 2 Apr 11
Posts: 2
Credit: 4,048,677
RAC: 40,183
3 million credit badge9 year member badge
Message 70533 - Posted: 3 Feb 2021, 19:50:07 UTC

Confirmed it's working, BOINC downloaded 200 tasks and GPUs are currently crunching!
ID: 70533 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilejpb
Avatar

Send message
Joined: 6 Apr 20
Posts: 31
Credit: 451,893,681
RAC: 1,094,991
300 million credit badge
Message 70534 - Posted: 3 Feb 2021, 21:47:50 UTC

Thank you for keeping us informed! .
ID: 70534 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Max_Pirx

Send message
Joined: 13 Dec 17
Posts: 15
Credit: 917,919,272
RAC: 1,307,080
500 million credit badge3 year member badge
Message 70535 - Posted: 3 Feb 2021, 22:03:06 UTC

I somehow end up with substantial amount oh inconclusive tasks (more than 50% actually). Is anyone else seeing something like that?
ID: 70535 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Holdolin

Send message
Joined: 9 Dec 11
Posts: 30
Credit: 651,255,696
RAC: 10,983,817
500 million credit badge9 year member badge
Message 70536 - Posted: 3 Feb 2021, 23:01:45 UTC - in response to Message 70535.  

Ya, that happens. What's going on is older stats don't delete immediately so many of those inconclusives are from older WUs. Same thing with all the invalids from stripes 84 and 85. Was kinda shocked at seeing so many till i stopped to look and see it's older data. Out of 380 invalids, only 2 are recent and were server cancellations. All the rest were older.
ID: 70536 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2482
Credit: 462,415,785
RAC: 19,030
300 million credit badge11 year member badgeextraordinary contributions badge
Message 70537 - Posted: 4 Feb 2021, 2:09:34 UTC - in response to Message 70535.  

I somehow end up with substantial amount oh inconclusive tasks (more than 50% actually). Is anyone else seeing something like that?


You have your pc's hidden but Windows is rolling out some updates and you could have gotten caught by them, they LOOK like the offical drivers but don't have all the crunching stuff in them. If you are using Windows try reloading your drivers right over the top of the old ones and see if it helps.
ID: 70537 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Fardringle

Send message
Joined: 4 Nov 07
Posts: 3
Credit: 60,485,225
RAC: 312,843
50 million credit badge13 year member badge
Message 70540 - Posted: 4 Feb 2021, 3:47:33 UTC - in response to Message 70535.  

Yes. A very large percentage of new tasks processed in the last 24 hours are getting "Validation Inconclusive" for me.
ID: 70540 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Max_Pirx

Send message
Joined: 13 Dec 17
Posts: 15
Credit: 917,919,272
RAC: 1,307,080
500 million credit badge3 year member badge
Message 70541 - Posted: 4 Feb 2021, 8:19:20 UTC - in response to Message 70537.  

Yeah, I'm aware of that and hav disabled driver updates. Otherwise windows all the time shoves in crap drivers.
ID: 70541 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2482
Credit: 462,415,785
RAC: 19,030
300 million credit badge11 year member badgeextraordinary contributions badge
Message 70542 - Posted: 4 Feb 2021, 11:05:00 UTC - in response to Message 70540.  

Yes. A very large percentage of new tasks processed in the last 24 hours are getting "Validation Inconclusive" for me.


Validation inconclusive is Milkyways way of saying 'waiting for a wingman', other projects do it other ways but this is Milkyway and they like it this way.
ID: 70542 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilejohnnymc
Avatar

Send message
Joined: 10 Mar 11
Posts: 8
Credit: 13,244,977
RAC: 16,704
10 million credit badge9 year member badge
Message 70543 - Posted: 4 Feb 2021, 11:05:34 UTC

Running any project embraces its own level of stress.

I'm certain everyone who participates in this project thanks you for what you've created and supported and empathetically cheers you on as you deal with the issues related to this endeavor.

Cheers Tom!

~johnnymc
Life's short; make fun of it!
ID: 70543 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
DCL

Send message
Joined: 26 Feb 15
Posts: 3
Credit: 1,847,105
RAC: 7,081
1 million credit badge6 year member badge
Message 70552 - Posted: 5 Feb 2021, 13:58:35 UTC - in response to Message 70519.  

I thank you for the good communication. Just a very small request. There are people in many, if not most of the time zones. If, when you give a time, could you give the time zone. Thanks for your work!
ID: 70552 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : The Last Couple Days

©2021 Astroinformatics Group