Message boards :
Number crunching :
Disecting the new validator.
Message board moderation
Author | Message |
---|---|
Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 |
Quoting Travis from the front page.. "Validation will now work as follows: Every result that could improve one of our searches will be validated (with a min quorum of 3 -- and the accuracy of the fitness reported must be within 10e-11 of the quorum results, this means that single precision GPU results will be flagged invalid). Results that won't improve a search will be validated 50% of the time until the error rates of hosts stabilizes in the database (this will probably take a couple weeks). Afterwards, for the results that don't improve our searches, we'll be using BOINC's adaptive validation based on hosts error rates (which will be between 10% and 100% depending on how many errors the host typically has)." Point 1. .. ”Every result that could improve one of our searches will be validated (with a min quorum of 3 -- and the accuracy of the fitness reported must be within 10e-11 of the quorum results, this means that single precision GPU results will be flagged invalid).” Excellent. Get rid of the cheating scum! Point 2. ...”Results that won't improve a search will be validated 50% of the time until the error rates of hosts stabilizes in the database (this will probably take a couple weeks).” Does this mean that work that is completed with a valid app on valid hardware, will not be granted credit 50% of the time due to no fault of its own, but solely because the parameters assigned to it didn’t ‘improve’ the genetic mutation towards a better outcome? (hopefully I got the gist of how this all works basically right). Surely this is totally unfair! Point 3. ....”Afterwards, for the results that don't improve our searches, we'll be using BOINC's adaptive validation based on hosts error rates (which will be between 10% and 100% depending on how many errors the host typically has)." OK, this is slightly better than the above, but it's still unfair. Do you really want people to drop MW like a hot potato once their credit starts dropping like a stone? I can’t agree more with Point 1, but Points 2 and 3 are over the top! |
Send message Joined: 12 Aug 09 Posts: 172 Credit: 645,240,165 RAC: 0 |
Great post Gas Giant. I am running with an error rate of about 20%. If it is only for a couple of weeks I can live with it. If it runs on longer, I will also live with it, but many others won't. For me we are talking about a loss of a ½ million credits per day, enough to be noticed! It seems only fair to grant, all valid crunching, credits. After all, we are not setting a precedent here, this what CPDN does, whether the WU is used or not. Regards |
Send message Joined: 20 Sep 08 Posts: 1391 Credit: 203,563,566 RAC: 0 |
All of my rigs have suddenly got these error messages ..... Anyone know why? <core_client_version>6.10.18</core_client_version> Don't drink water, that's the stuff that rusts pipes |
Send message Joined: 8 Mar 08 Posts: 17 Credit: 4,411,459 RAC: 0 |
I'm having that problem also.. :S |
Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0 |
Point 2. ...”Results that won't improve a search will be validated 50% of the time until the error rates of hosts stabilizes in the database (this will probably take a couple weeks).” I think any successfully completed WU will get credit as before. It may take a bit longer before getting credit because it has to be checked against other results, but it will get its credits eventually. Think of it this way - currently, WUs that don't improve their search aren't used at all - but they still get credits. So why would the extra validation requirement change this? (if they could predict in advance which parameters would yield better results they wouldn't need WUs in the first place, so it's not like the work is wasted) Point 3. ....”Afterwards, for the results that don't improve our searches, we'll be using BOINC's adaptive validation based on hosts error rates (which will be between 10% and 100% depending on how many errors the host typically has)." Again, I don't think this will affect which WUs get credits - it merely means that if your host has a large error rate, its results will be checked against others' more often. Now, if you were getting credits for invalid results before due to the lax validation, obviously you won't be getting credits for those anymore - but that's the whole point of the new validator. Of course, I don't speak for Travis; but I don't think your interpretation of his post makes sense. |
Send message Joined: 19 Feb 08 Posts: 350 Credit: 141,284,369 RAC: 0 |
Many of my WU's have that problem too. I'm runnung mw 0.20b and 0.22 from the optimized apps site. Is something wrong with these apps? Which apps work perfect? |
Send message Joined: 4 Jan 10 Posts: 86 Credit: 51,753,924 RAC: 0 |
Is something wrong with these apps? I think smth wrong with validator. |
Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0 |
The validator is still very much in flux. Things should get better in the next few days as more of the issues are tracked down and as new application versions are readied for release. Remember that Travis has to sleep too - but at least some of the issues reported last night have now been fixed. Additionally, there appears to be a problem with the ATI GPU apps which is holding things up - at this point it's not clear whether the issue can be worked around on the application level or if it is due to driver/SDK bugs that ATI will have to fix; let's hope for the former. If you're worried about losing crunching time, you may want to set Milkyway to No New Tasks until things stabilize. If you want to help out with testing, just keep crunching as normal and report any oddities you see (if they haven't been reported already). |
Send message Joined: 6 Nov 09 Posts: 2 Credit: 1,500,164 RAC: 0 |
All of my rigs have suddenly got these error messages ..... Anyone know why? In the thread news:testing new validator Travis answered to this question: That's just getting ready for the new version of the application. The new application will take the parameters it's using from the command line (that way we don't have to generate a new parameter file for each workunit). |
Send message Joined: 19 Feb 08 Posts: 350 Credit: 141,284,369 RAC: 0 |
Additionally, there appears to be a problem with the ATI GPU apps which is holding things up - at this point it's not clear whether the issue can be worked around on the application level or if it is due to driver/SDK bugs that ATI will have to fix; let's hope for the former. Travis wrote somewhere, that it looks like 48xx and 58xx cards produce different results. I use both of them and wu's from both cards produce results not granted with credits. Both cards are in the same machine, so cal is the same. If it is a driver problem, that can be solved by an update. If it is within cal, we might have a problem for the next weeks or so. I've seen some posts about apps running in single precision. So my question is: are the apps 0.20b and 0.22 tested correct apps? It is nice to have credit, but it is much nicer to have a deeper understanding of our milky way. The main goal is to have a working computer grid. And sometimes this means: try and error. No, i will not stop running wu's. Regards, Alexander |
Send message Joined: 6 Mar 09 Posts: 41 Credit: 38,856,291 RAC: 0 |
In order to be as 'compliant' as possible, I migrated my 2 systems with a 4870 to stock application (v21) (from optimized v22). Resulted into 160 invalid, 300 valid and 100 pendings. Besides some practical issues (the quadXPC system is highly unstable since I returned airflow to normal), Credits are about a third what they used to be... I hope we may expect the situation to return to 'normal'; somewhere this week ... |
Send message Joined: 1 Sep 08 Posts: 520 Credit: 302,524,931 RAC: 15 |
As I see these 'failed validation results' popping up on all of my workstations (4850 GPU and CPU MW clients as well), at this point I figure to mark as 'no new work' for ALL of my MW workstations, letting the current queues flush out. When the project has a need for processed work, (the new validation schema suggests they have more data than they need or want and by imposing the new schema seek to migrate folks out of the project), hopefully they will post a news update. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
I'm thinking it's validating whatever result comes first and then rejecting the latter. This example I finished and then the others were sent out, 4800 came in second, 5800, 4800. The 5800 was rejected. http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90067556 Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 13 Mar 08 Posts: 804 Credit: 26,380,161 RAC: 0 |
Quorum Down to 2 |
Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0 |
It also looks like the problem with HD5800 series cards has been tracked down. It's not a problem with the cards or the SDK, but a poorly publicized change in recommended programming practices for how to properly load floating point values (hope I'm saying that right) that only applies to the newer cards. It'll hopefully be fixed soon (I'm also hoping the CUDA applications will get the same accuracy from project-side, but they're at least well within the required range). |
©2024 Astroinformatics Group