Message boards :
News :
Server Downtime March 28, 2022 (12 hours starting 00:00 UTC)
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 15 · Next
Author | Message |
---|---|
Kiska Send message Joined: 31 Mar 12 Posts: 90 Credit: 150,890,343 RAC: 526 ![]() ![]() |
I get the feeling that the server thinks there are 17M jobs ready to send out, so it doesn't make more jobs. However, I cancelled all of those jobs in order to try to clear the validation backlog. I'm not sure where the jobs are stuck, but I will turn things off and reset their transition times, and see if that clears them. I would say don't touch the database at all, I am going to spin a few machines to clear this issue EDIT: There are really, 17M tasks ready to send. There is no validation backlog EDIT2: If there wasn't 17M tasks then the following graph won't be decreasing or showing any trend: ![]() Time is in UTC EDIT3: Here is last 7 days of the same graph, above graph is last 3 days ![]() |
![]() ![]() Send message Joined: 4 Jul 09 Posts: 53 Credit: 15,597,528 RAC: 12,785 ![]() ![]() |
All of the credits stagger out of the system when they are ready... I have been credited with over 140K in the last 3 days. Bill F |
Kiska Send message Joined: 31 Mar 12 Posts: 90 Credit: 150,890,343 RAC: 526 ![]() ![]() |
Congratulations on having the transitioner remake at least 120K tasks: ![]() |
Mr P Hucker![]() Send message Joined: 5 Jul 11 Posts: 759 Credit: 361,871,315 RAC: 4 ![]() ![]() |
Congratulations on having the transitioner remake at least 120K tasks:I get the feeling you understand Boinc servers. Perhaps you could remotely control the MW server? |
BillK Send message Joined: 14 Mar 21 Posts: 3 Credit: 797,393 RAC: 0 ![]() ![]() |
I have 450 "validation inconclusive" tasks on 4/3. No valid, no invalid. Is there any hope? Bill K |
San-Fernando-Valley Send message Joined: 13 Apr 17 Posts: 235 Credit: 575,641,850 RAC: 2,375,322 ![]() ![]() ![]() |
I have 450 "validation inconclusive" tasks on 4/3. No valid, no invalid. Is there any hope? What do you mean by 4/3 ? Check out the workunit, there you can see your "wingmen" and deduct the reason for the status and quorum to be fulfilled. Also if another send is on its way or to be scheduled. |
San-Fernando-Valley Send message Joined: 13 Apr 17 Posts: 235 Credit: 575,641,850 RAC: 2,375,322 ![]() ![]() ![]() |
It says 600 in progress: OK, now these 600 "lost" tasks, which i couldn't find anywhere, are erroring out with "Timed out - no response". They never started. They are all from 22. March 2022. Nothing lost for me - except my error rate is going up. Just glad "problem" is solved in a "harmless" manner. |
Septimus Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,790,949 RAC: 3,038 ![]() ![]() |
I had 2 WU's validated from 8Th March . |
Mr P Hucker![]() Send message Joined: 5 Jul 11 Posts: 759 Credit: 361,871,315 RAC: 4 ![]() ![]() |
OK, now these 600 "lost" tasks, which i couldn't find anywhere, are erroring out with "Timed out - no response".Hopefully Tom can scrounge as much data as he can from things that went upside down so our processing was meaningful. If not, we'll just have to do it again. Shit happens. |
poppinfresh99 Send message Joined: 28 Feb 22 Posts: 16 Credit: 2,400,538 RAC: 30 ![]() ![]() |
I have 450 "validation inconclusive" tasks on 4/3. No valid, no invalid. Is there any hope? I assume 4/3 is April 3... https://en.as.com/en/2022/01/01/latest_news/1641063320_406325.html I also am not getting valid tasks (I only run N-Body Simulation). Here are my tasks... State: All (2591) · In progress (72) · Validation pending (0) · Validation inconclusive (2518) · Valid (1) · Invalid (0) · Error (0) Application: All (2591) · Milkyway@home N-Body Simulation (2591) · Milkyway@home Separation (0) The workunits all look like the following one (Workunit 403393471)... minimum quorum 1 initial replication 2 175677525 921221 3 Apr 2022, 4:06:43 UTC 3 Apr 2022, 13:44:29 UTC Completed, validation inconclusive 491.58 1,527.09 pending Milkyway@home N-Body Simulation v1.82 (mt) windows_x86_64 200923667 --- --- --- Unsent --- --- --- --- The single valid task I have is when *I* was the wingman (on April 1). |
Max_Pirx Send message Joined: 13 Dec 17 Posts: 46 Credit: 2,367,251,294 RAC: 2,515,395 ![]() ![]() |
Most of my current and past work went to 'validation inconclusive' pile. Quite disappointing. The WUs are duplicated but for some reason both results are unsatisfactory. The third copy of the WUs are just 'unsent'. Such a waste of time and resources. |
Mr P Hucker![]() Send message Joined: 5 Jul 11 Posts: 759 Credit: 361,871,315 RAC: 4 ![]() ![]() |
Or more correctly March 4th. Date, month, year, increasing order. Month, date, year, pure ludicrousy.What do you mean by 4/3 ?I assume 4/3 is April 3... |
Mr P Hucker![]() Send message Joined: 5 Jul 11 Posts: 759 Credit: 361,871,315 RAC: 4 ![]() ![]() |
Most of my current and past work went to 'validation inconclusive' pile. Quite disappointing. The WUs are duplicated but for some reason both results are unsatisfactory. The third copy of the WUs are just 'unsent'. Such a waste of time and resources.I have suggested we all chip in and buy some up to date hardware. Quite how Tom came up with 10 grand just for SSDs I don't know. |
Kiska Send message Joined: 31 Mar 12 Posts: 90 Credit: 150,890,343 RAC: 526 ![]() ![]() |
Most of my current and past work went to 'validation inconclusive' pile. Quite disappointing. The WUs are duplicated but for some reason both results are unsatisfactory. The third copy of the WUs are just 'unsent'. Such a waste of time and resources.I have suggested we all chip in and buy some up to date hardware. Quite how Tom came up with 10 grand just for SSDs I don't know. Or you could rent some cloud resources and attach them to the project to help speed up clearing the ready to send queue? I am renting out some azure and aws offerings to help speed this up a bit |
unixchick![]() Send message Joined: 21 Feb 22 Posts: 66 Credit: 817,008 RAC: 13 ![]() ![]() |
Having an inconclusive WU isn't a waste. It just means that the result is hard to confirm and another copy of the WU will be sent out, and then usually with the 3rd result, it will become valid (at least that is the pattern for me). It isn't a waste, the valid result will be found, and credit will be issued. The separation queue of WUs being sent is just under 3 million, I'm currently getting resends from March 30, so hopefully the resends generated from your WUs yesterday will be sent out in a day or two. |
alanb1951 Send message Joined: 16 Mar 10 Posts: 168 Credit: 97,052,237 RAC: 64,605 ![]() ![]() ![]() |
I have suggested we all chip in and buy some up to date hardware. Quite how Tom came up with 10 grand just for SSDs I don't know. Peter, Enterprise SSDs are designed to meet much more stressful usage situations than the sort of SSD that we might have in a PC or laptop... There are typically less bits stored per cell, a far higher level of under-provisioning to allow for the eventual failure of memory cells, and lots more error-detection and correction logic; also there needs to be some sort of mechanism for protection against unexpected power loss. All of those push up the price! Assuming one doesn't just buy the cheapest items labelled "Enterprise SSD" typical UK prices seem to be about £250 per Terabyte; if those prices are truly representative of what might be available and usable by the MW server's RAID system (without needing to replace that as well) £10,000 would get about 40 Terabytes -- how much user storage that would provide would depend on the RAID version, number of redundant drives, and so on... Obviously, I can't know what prices might be available in the USA, or what the Computing technical people at RPI will be willing to acquire, so the above sizing is merely indicative... :-) Cheers - Al. P.S. If I/O bandwidth doesn't slow things down, there's a fair chance that in a single server BOINC environment memory bandwidth will become a problem instead; dividing work between multiple servers can help with that. More expense :-) |
Mr P Hucker![]() Send message Joined: 5 Jul 11 Posts: 759 Credit: 361,871,315 RAC: 4 ![]() ![]() |
Enterprise SSDs are designed to meet much more stressful usage situations than the sort of SSD that we might have in a PC or laptop... There are typically less bits stored per cell, a far higher level of under-provisioning to allow for the eventual failure of memory cells, and lots more error-detection and correction logic; also there needs to be some sort of mechanism for protection against unexpected power loss. All of those push up the price!£250 a TB seems about right to me. £100 a TB for desktop, Enterprise starts at £150, so a decent one £250 sounds ok. The missing variable here is how much storage they need, I don't know what that is. It's $10,000 Tom quoted, which is £7,600, which would be 30TB. At the moment I think they use 3 disks, 1 redundant, so 30TB of SSD would provide 20TB of storage. The work units they send out are pretty small, but there are millions of them, and we don't know how big the source data is or how much needs to be stored afterwards. But perhaps only the user-facing bit of storage needs to be SSD? Long term storage over a few months of collected data can go on slow disks. At any rate, many of us chipping in can create a lot of money, he did say he was going to put the donations page on the homepage, I didn't even know they took donations. |
San-Fernando-Valley Send message Joined: 13 Apr 17 Posts: 235 Credit: 575,641,850 RAC: 2,375,322 ![]() ![]() ![]() |
Donations can be happily made here: https://securelb.imodules.com/s/1225/giving/index.aspx?sid=1225&gid=1&pgid=3676 Don't forget to put a checkmark in the box near the bottom, so that the contributions only go directly to Milkyway! |
BillK Send message Joined: 14 Mar 21 Posts: 3 Credit: 797,393 RAC: 0 ![]() ![]() |
3 of my 505 Inconclusive went to Valid. It's getting caught up! Bill K |
Septimus Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,790,949 RAC: 3,038 ![]() ![]() |
Some of mine too. Have a lot today that got validated straight away as well. Slightly worried the waiting for validation number on the server has shot up to over 55000, maybe it’s just timing. |
©2023 Astroinformatics Group