Message boards :
Number crunching :
Huge number of 'Validation inconclusive' WUs
Message board moderation
Author | Message |
---|---|
Max_Pirx Send message Joined: 13 Dec 17 Posts: 46 Credit: 2,373,982,081 RAC: 2,566,850 ![]() ![]() |
Hello, at present I have more than 2000 'validation inconclusive' WU (MilkyWay@Home v1.46 (opencl_ati_101)), these are of the 'Unsent' variety, on my three machines: https://milkyway.cs.rpi.edu/milkyway//results.php?hostid=764666&offset=0&show_names=0&state=3&appid= https://milkyway.cs.rpi.edu/milkyway//results.php?hostid=765378&offset=0&show_names=0&state=3&appid= https://milkyway.cs.rpi.edu/milkyway//results.php?hostid=763440&offset=0&show_names=0&state=3&appid= Any idea what's going on? Many thanks, max |
![]() ![]() Send message Joined: 8 May 09 Posts: 3105 Credit: 518,036,387 RAC: 23,941 ![]() ![]() ![]() |
Hello, I'm over 600 myself, going to put a stop to my crunching here until it comes back down again!! |
iwajabitw Send message Joined: 16 Nov 14 Posts: 16 Credit: 335,683,507 RAC: 0 ![]() ![]() |
Over 6300+ here. |
pututu Send message Joined: 24 Aug 17 Posts: 5 Credit: 221,483,901 RAC: 561 ![]() ![]() |
I've got almost 2000 here. There are 1.4M unsent tasks according to the server status here: https://milkyway.cs.rpi.edu/milkyway/server_status.php Maybe wait for the next few days to see what happen next? |
mmonnin Send message Joined: 2 Oct 16 Posts: 162 Credit: 1,004,124,246 RAC: 1,903 ![]() ![]() |
My count has gone down since last night. From 2.2k to 1.7k. |
![]() ![]() Send message Joined: 8 May 09 Posts: 3105 Credit: 518,036,387 RAC: 23,941 ![]() ![]() ![]() |
My count has gone down since last night. From 2.2k to 1.7k. Mine came down almost 40%, but on my the list the first dozen are still "unsent" and I stopped crunching yesterday after I posted. Either a ton of us stopped crunching or there are some significant problems with the way things work. I'm not creating new "inconclusives" but one would think the existing workunits should have been sent out long before now. |
Max_Pirx Send message Joined: 13 Dec 17 Posts: 46 Credit: 2,373,982,081 RAC: 2,566,850 ![]() ![]() |
Slight decrease of my inconclusive WUs, but still have around 2000 of them. I just started more detailed monitoring - picked up few of those and will watch them to seen when (if ever) they will be sent out again. |
Jake Weiss Volunteer moderator Project developer Project tester Project scientist Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 135 ![]() ![]() ![]() |
Hey Everyone, Looks like the workunit generator went a little overboard over the weekend and clogged things up. The validation inconclusives should clear our of the queue over the next two or three days as the server works through the backlog. On another note, the new RAM for the server should be here tomorrow and that should help prevent this from happening again. Jake |
pututu Send message Joined: 24 Aug 17 Posts: 5 Credit: 221,483,901 RAC: 561 ![]() ![]() |
I'm speculating that it may take some time since for the 2nd quorum, the task number is not sequential or at least close to the first task number but is spread very far apart. See one example here for one of my WUs: https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1579238967 For this WU, my task number is 2,264,897,549 and my soon-to-be wingman task number for this same WU is 2,266,161,449. That's about 1,263,790 difference. This is about the current unsent tasks as reported by the server status. I'm also suspecting that no one has started crunching task number 2,266,xxx,xxx yet?[/url] |
![]() ![]() Send message Joined: 8 May 09 Posts: 3105 Credit: 518,036,387 RAC: 23,941 ![]() ![]() ![]() |
Hey Everyone, I sure hope so!! Thanks for working on this AND letting us know what's going on!!! |
mmonnin Send message Joined: 2 Oct 16 Posts: 162 Credit: 1,004,124,246 RAC: 1,903 ![]() ![]() |
I saw the site was down earlier today, a couple of hours ago. Count is now 2.5k so even higher than last night. Just a single 280x so pretty much most of them. |
![]() ![]() Send message Joined: 8 May 09 Posts: 3105 Credit: 518,036,387 RAC: 23,941 ![]() ![]() ![]() |
I saw the site was down earlier today, a couple of hours ago. Count is now 2.5k so even higher than last night. Just a single 280x so pretty much most of them. I now have 211 inconclusive but 69 of those haven't even been sent to another computer yet!! Until that catches up my pc's will stay elsewhere and the units I have in my cache won't be crunched! Sorry guys but I don't like the idea of MW making NEW workunits and sending them out BEFORE the existing workunits are sent out!! I have 211 workunits I have already completed and are waiting for validation, and 69 of those haven't even been SENT to another computer yet!!! I haven't even gotten a new workunit in almost 2 days now!! |
Max_Pirx Send message Joined: 13 Dec 17 Posts: 46 Credit: 2,373,982,081 RAC: 2,566,850 ![]() ![]() |
I saw the site was down earlier today, a couple of hours ago. Count is now 2.5k so even higher than last night. Just a single 280x so pretty much most of them. The same with me.... mine are steadily going up, 3500+ at the moment. Pretty much all of my completed WUs end up as inconclusive (Unsent). So, I am moving my machines away from MW, don't see any point in keeping them crunching here. |
![]() ![]() Send message Joined: 8 May 09 Posts: 3105 Credit: 518,036,387 RAC: 23,941 ![]() ![]() ![]() |
I saw the site was down earlier today, a couple of hours ago. Count is now 2.5k so even higher than last night. Just a single 280x so pretty much most of them. I will bring mine back, I always seem too anyway, but they need to fix the problems so I'm not just spinning my wheels!! |
vseven Send message Joined: 26 Mar 18 Posts: 24 Credit: 102,912,937 RAC: 0 ![]() ![]() |
I'm confused. I mean they will eventually be validated so what does it matter, correct? |
![]() ![]() Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,419,729,574 RAC: 3,430,404 ![]() ![]() |
Hello, Problem is bug in program. Looking at the set of errors, the first user I looked at had 3 titans but had almost 20,000 errors. If all tasks all error out the number of "pending" will rise to the total number of work units. I thought only 80 were allowed per day. Even with 3 titans my math suggests it should have taken a week at 80 per 24 hours. All 19,736 was from May7, 5am to may8, 1300 |
![]() ![]() Send message Joined: 8 May 09 Posts: 3105 Credit: 518,036,387 RAC: 23,941 ![]() ![]() ![]() |
Hello, No it's 80 per gpu, but if you buzz right thru them at 2 seconds each you can zip thru thousands of them per day, all errors of course!!! Jake wants us to send him the link to computers like the one you found so he can put it on the 'suspicious' list. Unfortunately when he did it automatically a while back LOTS of people couldn't bring new gpu's on here to crunch so it's now a manual process. |
vseven Send message Joined: 26 Mar 18 Posts: 24 Credit: 102,912,937 RAC: 0 ![]() ![]() |
I was crunching on 3 Tesla v100's at the same time. A WU took around 35 seconds while running 6 at a time. So averaging a little under 6 seconds per WU x 3 cards which averages out to under 2 seconds per WU. Thats over 43,000 a day. Now I only ran like this for a couple hours (testing) but my inconclusive total was over 2,000 in those couple hours. Out of those I threw 6 errors and 2 invalids, I have 134 still at inconclusive, and all the rest validated. I still don't know what they issue is you guys are posting about..... |
![]() ![]() Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,419,729,574 RAC: 3,430,404 ![]() ![]() |
I was crunching on 3 Tesla v100's at the same time. A WU took around 35 seconds while running 6 at a time. So averaging a little under 6 seconds per WU x 3 cards which averages out to under 2 seconds per WU. Thats over 43,000 a day. Now I only ran like this for a couple hours (testing) but my inconclusive total was over 2,000 in those couple hours. Out of those I threw 6 errors and 2 invalids, I have 134 still at inconclusive, and all the rest validated. The original post was about inconclusive validations and you are correct in that if you wait them out all will eventually be validated. OTOH, the post I made was to point out that the first system listed had 3 titans and 19,000 of the work units error'ed which is not the same as inconclusive validations. Looking at task details one observes that OpenCL was unable to find any nVidia devices although the system had 3 titans. MY S9100 is nowhere as fast as your tesla. It did cost me under $300 and used only a single 8pin power connector which is a plus. Your computers are hidden, but you might want to run my program at http://new.stateson.net/HostProjectStats to get an accurate measurement of completion time. |
vseven Send message Joined: 26 Mar 18 Posts: 24 Credit: 102,912,937 RAC: 0 ![]() ![]() |
Oh...they are not "my" Tesla's. I wish. I got permission to play with the machines they were in. Thats also why they are hidden...didn't want the host names to get out. And I no longer have access to them but I might get another chance in the future. Here is the output from your program for a v100 16Gb SXM2 interface: Run Time CPU Time Credit (sec) (sec) 32.6 28.1 231.2 33.6 29.5 227.3 36.6 32.0 229.3 24.5 21.1 228.5 35.6 30.8 230.8 29.6 26.3 227.6 29.6 24.4 227.6 32.7 28.5 227.7 30.5 26.5 227.6 30.5 26.6 227.7 33.7 29.3 227.7 30.6 27.3 231.2 31.6 28.4 227.6 40.6 35.8 229.3 34.6 30.4 229.7 32.5 26.3 227.6 25.5 22.1 227.6 33.7 30.2 227.7 34.5 31.2 229.4 33.7 28.9 229.4 ---------------------------------- AVG: 32.3 28.2 228.6 STD: 3.5 3.3 1.3 Keep in mind I'm running 6 WU at a time in the above. Here is from a p100 16Gb PCIe interface: Run Time CPU Time Credit (sec) (sec) 53.4 51.4 227.3 61.5 59.4 229.1 53.4 51.5 227.3 59.5 57.6 227.3 48.4 46.9 227.2 54.4 52.7 227.3 65.5 63.8 229.1 57.4 55.8 227.3 49.3 46.3 227.2 57.4 54.6 227.3 60.4 58.3 227.3 50.4 48.8 227.3 54.4 52.4 228.1 55.4 53.6 227.3 53.4 51.3 227.2 53.4 51.9 228.1 56.4 54.7 229.3 57.4 55.3 227.3 57.6 49.3 227.3 55.6 48.1 227.2 ---------------------------------- AVG: 55.7 53.2 227.6 STD: 4.0 4.3 0.7 Also running 6 WU at a time. I do not have 20 consecutive valid results using 1 per WU but I might be able to borrow a v100 for 20 minutes to get it if you think it would make a big difference. I know overall 1 WU at a time is slower since the card is barely loaded with 1. |
©2023 Astroinformatics Group