Message boards :
News :
Server Downtime March 28, 2022 (12 hours starting 00:00 UTC)
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 15 · Next
Author | Message |
---|---|
Send message Joined: 31 Mar 12 Posts: 96 Credit: 152,502,177 RAC: 0 |
I get the feeling that the server thinks there are 17M jobs ready to send out, so it doesn't make more jobs. However, I cancelled all of those jobs in order to try to clear the validation backlog. I'm not sure where the jobs are stuck, but I will turn things off and reset their transition times, and see if that clears them. I would say don't touch the database at all, I am going to spin a few machines to clear this issue EDIT: There are really, 17M tasks ready to send. There is no validation backlog EDIT2: If there wasn't 17M tasks then the following graph won't be decreasing or showing any trend: Time is in UTC EDIT3: Here is last 7 days of the same graph, above graph is last 3 days |
Send message Joined: 4 Jul 09 Posts: 97 Credit: 17,381,330 RAC: 1,658 |
All of the credits stagger out of the system when they are ready... I have been credited with over 140K in the last 3 days. Bill F |
Send message Joined: 31 Mar 12 Posts: 96 Credit: 152,502,177 RAC: 0 |
Congratulations on having the transitioner remake at least 120K tasks: |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Congratulations on having the transitioner remake at least 120K tasks:I get the feeling you understand Boinc servers. Perhaps you could remotely control the MW server? |
Send message Joined: 14 Mar 21 Posts: 3 Credit: 797,393 RAC: 0 |
I have 450 "validation inconclusive" tasks on 4/3. No valid, no invalid. Is there any hope? Bill K |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
I have 450 "validation inconclusive" tasks on 4/3. No valid, no invalid. Is there any hope? What do you mean by 4/3 ? Check out the workunit, there you can see your "wingmen" and deduct the reason for the status and quorum to be fulfilled. Also if another send is on its way or to be scheduled. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
It says 600 in progress: OK, now these 600 "lost" tasks, which i couldn't find anywhere, are erroring out with "Timed out - no response". They never started. They are all from 22. March 2022. Nothing lost for me - except my error rate is going up. Just glad "problem" is solved in a "harmless" manner. |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
I had 2 WU's validated from 8Th March . |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
OK, now these 600 "lost" tasks, which i couldn't find anywhere, are erroring out with "Timed out - no response".Hopefully Tom can scrounge as much data as he can from things that went upside down so our processing was meaningful. If not, we'll just have to do it again. Shit happens. |
Send message Joined: 28 Feb 22 Posts: 16 Credit: 2,400,538 RAC: 0 |
I have 450 "validation inconclusive" tasks on 4/3. No valid, no invalid. Is there any hope? I assume 4/3 is April 3... https://en.as.com/en/2022/01/01/latest_news/1641063320_406325.html I also am not getting valid tasks (I only run N-Body Simulation). Here are my tasks... State: All (2591) · In progress (72) · Validation pending (0) · Validation inconclusive (2518) · Valid (1) · Invalid (0) · Error (0) Application: All (2591) · Milkyway@home N-Body Simulation (2591) · Milkyway@home Separation (0) The workunits all look like the following one (Workunit 403393471)... minimum quorum 1 initial replication 2 175677525 921221 3 Apr 2022, 4:06:43 UTC 3 Apr 2022, 13:44:29 UTC Completed, validation inconclusive 491.58 1,527.09 pending Milkyway@home N-Body Simulation v1.82 (mt) windows_x86_64 200923667 --- --- --- Unsent --- --- --- --- The single valid task I have is when *I* was the wingman (on April 1). |
Send message Joined: 13 Dec 17 Posts: 46 Credit: 2,421,362,376 RAC: 0 |
Most of my current and past work went to 'validation inconclusive' pile. Quite disappointing. The WUs are duplicated but for some reason both results are unsatisfactory. The third copy of the WUs are just 'unsent'. Such a waste of time and resources. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Or more correctly March 4th. Date, month, year, increasing order. Month, date, year, pure ludicrousy.What do you mean by 4/3 ?I assume 4/3 is April 3... |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Most of my current and past work went to 'validation inconclusive' pile. Quite disappointing. The WUs are duplicated but for some reason both results are unsatisfactory. The third copy of the WUs are just 'unsent'. Such a waste of time and resources.I have suggested we all chip in and buy some up to date hardware. Quite how Tom came up with 10 grand just for SSDs I don't know. |
Send message Joined: 31 Mar 12 Posts: 96 Credit: 152,502,177 RAC: 0 |
Most of my current and past work went to 'validation inconclusive' pile. Quite disappointing. The WUs are duplicated but for some reason both results are unsatisfactory. The third copy of the WUs are just 'unsent'. Such a waste of time and resources.I have suggested we all chip in and buy some up to date hardware. Quite how Tom came up with 10 grand just for SSDs I don't know. Or you could rent some cloud resources and attach them to the project to help speed up clearing the ready to send queue? I am renting out some azure and aws offerings to help speed this up a bit |
Send message Joined: 21 Feb 22 Posts: 66 Credit: 817,008 RAC: 0 |
Having an inconclusive WU isn't a waste. It just means that the result is hard to confirm and another copy of the WU will be sent out, and then usually with the 3rd result, it will become valid (at least that is the pattern for me). It isn't a waste, the valid result will be found, and credit will be issued. The separation queue of WUs being sent is just under 3 million, I'm currently getting resends from March 30, so hopefully the resends generated from your WUs yesterday will be sent out in a day or two. |
Send message Joined: 16 Mar 10 Posts: 213 Credit: 108,360,140 RAC: 4,629 |
I have suggested we all chip in and buy some up to date hardware. Quite how Tom came up with 10 grand just for SSDs I don't know. Peter, Enterprise SSDs are designed to meet much more stressful usage situations than the sort of SSD that we might have in a PC or laptop... There are typically less bits stored per cell, a far higher level of under-provisioning to allow for the eventual failure of memory cells, and lots more error-detection and correction logic; also there needs to be some sort of mechanism for protection against unexpected power loss. All of those push up the price! Assuming one doesn't just buy the cheapest items labelled "Enterprise SSD" typical UK prices seem to be about £250 per Terabyte; if those prices are truly representative of what might be available and usable by the MW server's RAID system (without needing to replace that as well) £10,000 would get about 40 Terabytes -- how much user storage that would provide would depend on the RAID version, number of redundant drives, and so on... Obviously, I can't know what prices might be available in the USA, or what the Computing technical people at RPI will be willing to acquire, so the above sizing is merely indicative... :-) Cheers - Al. P.S. If I/O bandwidth doesn't slow things down, there's a fair chance that in a single server BOINC environment memory bandwidth will become a problem instead; dividing work between multiple servers can help with that. More expense :-) |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Enterprise SSDs are designed to meet much more stressful usage situations than the sort of SSD that we might have in a PC or laptop... There are typically less bits stored per cell, a far higher level of under-provisioning to allow for the eventual failure of memory cells, and lots more error-detection and correction logic; also there needs to be some sort of mechanism for protection against unexpected power loss. All of those push up the price!£250 a TB seems about right to me. £100 a TB for desktop, Enterprise starts at £150, so a decent one £250 sounds ok. The missing variable here is how much storage they need, I don't know what that is. It's $10,000 Tom quoted, which is £7,600, which would be 30TB. At the moment I think they use 3 disks, 1 redundant, so 30TB of SSD would provide 20TB of storage. The work units they send out are pretty small, but there are millions of them, and we don't know how big the source data is or how much needs to be stored afterwards. But perhaps only the user-facing bit of storage needs to be SSD? Long term storage over a few months of collected data can go on slow disks. At any rate, many of us chipping in can create a lot of money, he did say he was going to put the donations page on the homepage, I didn't even know they took donations. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
Donations can be happily made here: https://securelb.imodules.com/s/1225/giving/index.aspx?sid=1225&gid=1&pgid=3676 Don't forget to put a checkmark in the box near the bottom, so that the contributions only go directly to Milkyway! |
Send message Joined: 14 Mar 21 Posts: 3 Credit: 797,393 RAC: 0 |
3 of my 505 Inconclusive went to Valid. It's getting caught up! Bill K |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
Some of mine too. Have a lot today that got validated straight away as well. Slightly worried the waiting for validation number on the server has shot up to over 55000, maybe it’s just timing. |
©2024 Astroinformatics Group