Message boards :
News :
New(er) Searches
Message board moderation
Author | Message |
---|---|
Send message Joined: 6 May 09 Posts: 217 Credit: 6,856,375 RAC: 0 |
I've started some new searches: ps_separation_15_3s_stndrd_1 ps_separation_22_3s_free0_1 ps_separation_22_3s_edge0_2 de_separation_15_3s_stndrd_1 de_separation_22_3s_free0_1 de_separation_22_3s_edge0_2 I found a few minor errors in the parameter files - I wouldn't have expected them to be significant, but with the weird errors that we are seeing, I figured it would be best to clean the files up and restart the runs. I also started a set of Stripe 15 runs, since these have been successful previously. When the Nan/Inf errors occurred, we were seeing a strange exit status; I'm trying to track that down. More news soon. Cheers, Matthew N. |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,290,230 RAC: 2,073 |
Those seem to run OK, but the max. of 1 (or actually 2) errors is not enough: wuid=261886822. I got something, that "looks OK", but needed a confirmation from a wingman. Unfortunately both of them are doing not much more than generating computing errors -> completed, can't validate. |
Send message Joined: 29 Aug 12 Posts: 31 Credit: 40,781,945 RAC: 0 |
I also have two that completed but can't validate. http://milkyway.cs.rpi.edu/milkyway/results.php?userid=461415&offset=0&show_names=0&state=4&appid= |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,290,230 RAC: 2,073 |
I also have two that completed but can't validate. http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=464745&offset=0&show_names=0&state=4&appid= (the other url only you can open) Yes, that's exactly the same thing, there are too many not properly working computers for the current max errors setting. |
Send message Joined: 3 Jul 12 Posts: 13 Credit: 7,601,982 RAC: 0 |
Hallo! Still all error of type 0x1 - incorrect function - , also new types of tasks. Kind regards Martin |
Send message Joined: 16 Jun 08 Posts: 93 Credit: 366,882,323 RAC: 0 |
These new searches are looking much, much better, at least for me. My error rate is down to less than 20/day now whereas it had been consistently between 60-80/day. Only 4 errors in the last 12 hours with the edge0/free0 runs. |
Send message Joined: 29 Aug 12 Posts: 31 Credit: 40,781,945 RAC: 0 |
I also have two that completed but can't validate. Looks like they are probably bad in any case, see the following wu. http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=261364915 |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,290,230 RAC: 2,073 |
Hallo! ??? All errors (except for one) are from the old separation_22_3s_edge/free_3 batches. |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,290,230 RAC: 2,073 |
Looks like they are probably bad in any case, see the following wu. No, don't think so. The first two results look OK, they needed however a third one as confirmation (many WUs need 3 results ATM). Unfortunately hosts 144013 and 143124, to which _2 and _3 were assigned, are doing pretty much nothing else than trashing WUs, look at their task lists. |
Send message Joined: 3 Jul 12 Posts: 13 Credit: 7,601,982 RAC: 0 |
Hallo! At me there are still about 3% ending up with failure, 16% can not become validated immediately. May be, it became a bit better than before, but not sure. On the other hand, the total crunching power of 349TFLPOs now begins to increase since jesterday, which is a good sign. It has droped by about 20% from 545TFLOPs before this bad task series. This drop is not only due to the higher rate of failure but also due to the very high rate of tasks to become verified very often by more than one wingman. This extra crunching binds al lot of crunching effort. Kind regards and happy crunching Martin |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,290,230 RAC: 2,073 |
Next killed WU by hosts with some issues: 262295807. |
Send message Joined: 6 May 09 Posts: 217 Credit: 6,856,375 RAC: 0 |
Thanks for the feedback! I'm still looking into this, I hope to have it figured out tomorrow. -Matthew N. |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,290,230 RAC: 2,073 |
One more. Apparently aborting a WU by the user is also counted as an error which might indicate a bug in the WU. |
Send message Joined: 26 Feb 11 Posts: 170 Credit: 205,557,553 RAC: 0 |
have summary yesterday and today only 4 computing errors, so it seems to go back to normal business soon for me ^^ Oh yes, and good desition to go to double up the cumputing time per WU :) DSKAG Austria Research Team: http://www.research.dskag.at |
Send message Joined: 3 Jul 12 Posts: 13 Credit: 7,601,982 RAC: 0 |
Hallo! Since more than 3 days I haven´t had any crunching error, but still lots of not validated tasks. But I see under "pending tasks" / WU ID , that there are still a lot of failed tasks. The total numder of pending tasks listed is shrinking also, and the total crunching power is still increasing. So, you´re at the right path! Hopefully !!! Kind regards and happy crunching. Martin |
Send message Joined: 5 Nov 10 Posts: 69 Credit: 15,064,831 RAC: 0 |
Is it me or do all WU's suddenly required longer to crunch? Check my stats e.g., ATI GPU 30% longer, most CPU tasks require about the same extra but on an Android machine (ODROID-X) can take more than 60% longer 46,362.69 seconds/12.87 hours where it used to be 27,830.55/7.73 hours??? Credits-per-wu appear to have increased so maybe it all balances out in the end? |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,290,230 RAC: 2,073 |
But I see under "pending tasks" / WU ID , that there are still a lot of failed tasks. Pending tasks are not failed, they are just waiting for a wingman to confirm your result. |
Send message Joined: 3 Jul 12 Posts: 13 Credit: 7,601,982 RAC: 0 |
Hallo! Pending tasks are not failed, they are just waiting for a wingman to confirm your result. That´s right, but they also require extra crunching power and so reduces the possible progress of the project. So a high rate of not validated tasks is still a drawback. See also this message lower here. Kind regards and happy crunching. Martin |
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,290,230 RAC: 2,073 |
Pending tasks are not failed, they are just waiting for a wingman to confirm your result. The extra crunching is used to ensure that only proper results get validated, see this post. Quality of the results is more important the pure throughput, many projects send out the same WU to at least two computers by default, Milkyway is still doing that only if the validator is not sure about the first (and eventually also the second) result, now it only happens more often than it did in the past. |
Send message Joined: 3 Jul 12 Posts: 13 Credit: 7,601,982 RAC: 0 |
Hallo Link! Pending tasks are not failed, they are just waiting for a wingman to confirm your result. Well, I know about this. But now we have about a factor of 4 to 5 higher rate of extra validation runnings than before. And that binds crunching power. Only the software developpers can decide, whether this has to be accepted as unavoidable, or it can be made better by some programm code modification. Up to a month or so before it was better. And it would be better to come to the same situation as before. That is it. And now I don´t like to talk about this any more. It´s not worth. Kind regards and happy crunching Martin |
©2024 Astroinformatics Group