Disecting the new validator.

Author	Message
The Gas Giant Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0	Message 38081 - Posted: 5 Apr 2010, 11:22:46 UTC Quoting Travis from the front page.. "Validation will now work as follows: Every result that could improve one of our searches will be validated (with a min quorum of 3 -- and the accuracy of the fitness reported must be within 10e-11 of the quorum results, this means that single precision GPU results will be flagged invalid). Results that won't improve a search will be validated 50% of the time until the error rates of hosts stabilizes in the database (this will probably take a couple weeks). Afterwards, for the results that don't improve our searches, we'll be using BOINC's adaptive validation based on hosts error rates (which will be between 10% and 100% depending on how many errors the host typically has)." Point 1. .. ”Every result that could improve one of our searches will be validated (with a min quorum of 3 -- and the accuracy of the fitness reported must be within 10e-11 of the quorum results, this means that single precision GPU results will be flagged invalid).” Excellent. Get rid of the cheating scum! Point 2. ...”Results that won't improve a search will be validated 50% of the time until the error rates of hosts stabilizes in the database (this will probably take a couple weeks).” Does this mean that work that is completed with a valid app on valid hardware, will not be granted credit 50% of the time due to no fault of its own, but solely because the parameters assigned to it didn’t ‘improve’ the genetic mutation towards a better outcome? (hopefully I got the gist of how this all works basically right). Surely this is totally unfair! Point 3. ....”Afterwards, for the results that don't improve our searches, we'll be using BOINC's adaptive validation based on hosts error rates (which will be between 10% and 100% depending on how many errors the host typically has)." OK, this is slightly better than the above, but it's still unfair. Do you really want people to drop MW like a hot potato once their credit starts dropping like a stone? I can’t agree more with Point 1, but Points 2 and 3 are over the top! ID: 38081 · Rating: 0 · rate: / Reply Quote

David Glogau* Send message Joined: 12 Aug 09 Posts: 172 Credit: 645,240,165 RAC: 0	Message 38084 - Posted: 5 Apr 2010, 11:38:05 UTC Great post Gas Giant. I am running with an error rate of about 20%. If it is only for a couple of weeks I can live with it. If it runs on longer, I will also live with it, but many others won't. For me we are talking about a loss of a ½ million credits per day, enough to be noticed! It seems only fair to grant, all valid crunching, credits. After all, we are not setting a precedent here, this what CPDN does, whether the WU is used or not. Regards ID: 38084 · Rating: 0 · rate: / Reply Quote

Chris S Send message Joined: 20 Sep 08 Posts: 1391 Credit: 203,563,566 RAC: 0	Message 38085 - Posted: 5 Apr 2010, 12:41:41 UTC All of my rigs have suddenly got these error messages ..... Anyone know why? <core_client_version>6.10.18</core_client_version> <![CDATA[ <stderr_txt> Running Milkyway@home ATI GPU application version 0.20b (Win32, SSE2, CAL 1.3) by Gipsel ignoring unknown input argument in app_info.xml: -np ignoring unknown input argument in app_info.xml: 20 ignoring unknown input argument in app_info.xml: -p ignoring unknown input argument in app_info.xml: 0.8204714877234080000000000 ignoring unknown input argument in app_info.xml: 6.2644417249787670000000000 ignoring unknown input argument in app_info.xml: -1.1275059800827940000000000 ignoring unknown input argument in app_info.xml: 171.3489450424705200000000000 ignoring unknown input argument in app_info.xml: 25.5968204295114100000000000 ignoring unknown input argument in app_info.xml: 0.4638059718261400000000000 ignoring unknown input argument in app_info.xml: 6.2831853071795860000000000 ignoring unknown input argument in app_info.xml: 6.7755620233591070000000000 ignoring unknown input argument in app_info.xml: -7.1501757118781240000000000 ignoring unknown input argument in app_info.xml: 179.3048252880004400000000000 ignoring unknown input argument in app_info.xml: 38.0862970666705200000000000 ignoring unknown input argument in app_info.xml: 2.5868948022107680000000000 ignoring unknown input argument in app_info.xml: 4.7261433922311420000000000 ignoring unknown input argument in app_info.xml: 7.0656503094651130000000000 ignoring unknown input argument in app_info.xml: -13.5765285909904500000000000 ignoring unknown input argument in app_info.xml: 211.2459262577329200000000000 ignoring unknown input argument in app_info.xml: 15.0673173424509500000000000 ignoring unknown input argument in app_info.xml: 0.0000000000000000000000000 ignoring unknown input argument in app_info.xml: 6.2592519338112480000000000 ignoring unknown input argument in app_info.xml: 12.94646831325726500000 Don't drink water, that's the stuff that rusts pipes ID: 38085 · Rating: 0 · rate: / Reply Quote

UBT - Ben Send message Joined: 8 Mar 08 Posts: 17 Credit: 4,411,459 RAC: 0	Message 38086 - Posted: 5 Apr 2010, 12:43:45 UTC I'm having that problem also.. :S ID: 38086 · Rating: 0 · rate: / Reply Quote

Emanuel Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0	Message 38088 - Posted: 5 Apr 2010, 13:53:01 UTC - in response to Message 38081. Last modified: 5 Apr 2010, 13:57:19 UTC Point 2. ...”Results that won't improve a search will be validated 50% of the time until the error rates of hosts stabilizes in the database (this will probably take a couple weeks).” Does this mean that work that is completed with a valid app on valid hardware, will not be granted credit 50% of the time due to no fault of its own, but solely because the parameters assigned to it didn’t ‘improve’ the genetic mutation towards a better outcome? (hopefully I got the gist of how this all works basically right). Surely this is totally unfair! I think any successfully completed WU will get credit as before. It may take a bit longer before getting credit because it has to be checked against other results, but it will get its credits eventually. Think of it this way - currently, WUs that don't improve their search aren't used at all - but they still get credits. So why would the extra validation requirement change this? (if they could predict in advance which parameters would yield better results they wouldn't need WUs in the first place, so it's not like the work is wasted) Point 3. ....”Afterwards, for the results that don't improve our searches, we'll be using BOINC's adaptive validation based on hosts error rates (which will be between 10% and 100% depending on how many errors the host typically has)." OK, this is slightly better than the above, but it's still unfair. Do you really want people to drop MW like a hot potato once their credit starts dropping like a stone? Again, I don't think this will affect which WUs get credits - it merely means that if your host has a large error rate, its results will be checked against others' more often. Now, if you were getting credits for invalid results before due to the lax validation, obviously you won't be getting credits for those anymore - but that's the whole point of the new validator. Of course, I don't speak for Travis; but I don't think your interpretation of his post makes sense. ID: 38088 · Rating: 0 · rate: / Reply Quote

Werkstatt Send message Joined: 19 Feb 08 Posts: 350 Credit: 141,284,369 RAC: 0	Message 38094 - Posted: 5 Apr 2010, 14:48:11 UTC - in response to Message 38088. Many of my WU's have that problem too. I'm runnung mw 0.20b and 0.22 from the optimized apps site. Is something wrong with these apps? Which apps work perfect? ID: 38094 · Rating: 0 · rate: / Reply Quote

CTAPbIi Send message Joined: 4 Jan 10 Posts: 86 Credit: 51,753,924 RAC: 0	Message 38097 - Posted: 5 Apr 2010, 15:21:41 UTC - in response to Message 38094. Is something wrong with these apps? I think smth wrong with validator. ID: 38097 · Rating: 0 · rate: / Reply Quote

Emanuel Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0	Message 38102 - Posted: 5 Apr 2010, 16:24:36 UTC Last modified: 5 Apr 2010, 16:25:15 UTC The validator is still very much in flux. Things should get better in the next few days as more of the issues are tracked down and as new application versions are readied for release. Remember that Travis has to sleep too - but at least some of the issues reported last night have now been fixed. Additionally, there appears to be a problem with the ATI GPU apps which is holding things up - at this point it's not clear whether the issue can be worked around on the application level or if it is due to driver/SDK bugs that ATI will have to fix; let's hope for the former. If you're worried about losing crunching time, you may want to set Milkyway to No New Tasks until things stabilize. If you want to help out with testing, just keep crunching as normal and report any oddities you see (if they haven't been reported already). ID: 38102 · Rating: 0 · rate: / Reply Quote

uwe Send message Joined: 6 Nov 09 Posts: 2 Credit: 1,500,164 RAC: 0	Message 38110 - Posted: 5 Apr 2010, 17:31:10 UTC - in response to Message 38085. All of my rigs have suddenly got these error messages ..... Anyone know why? <core_client_version>6.10.18</core_client_version> <![CDATA[ <stderr_txt> Running Milkyway@home ATI GPU application version 0.20b (Win32, SSE2, CAL 1.3) by Gipsel ignoring unknown input argument in app_info.xml: -np ... ignoring unknown input argument in app_info.xml: 12.94646831325726500000 In the thread news:testing new validator Travis answered to this question: That's just getting ready for the new version of the application. The new application will take the parameters it's using from the command line (that way we don't have to generate a new parameter file for each workunit). ID: 38110 · Rating: 0 · rate: / Reply Quote

Werkstatt Send message Joined: 19 Feb 08 Posts: 350 Credit: 141,284,369 RAC: 0	Message 38112 - Posted: 5 Apr 2010, 18:09:10 UTC - in response to Message 38102. Additionally, there appears to be a problem with the ATI GPU apps which is holding things up - at this point it's not clear whether the issue can be worked around on the application level or if it is due to driver/SDK bugs that ATI will have to fix; let's hope for the former. If you're worried about losing crunching time, you may want to set Milkyway to No New Tasks until things stabilize. Travis wrote somewhere, that it looks like 48xx and 58xx cards produce different results. I use both of them and wu's from both cards produce results not granted with credits. Both cards are in the same machine, so cal is the same. If it is a driver problem, that can be solved by an update. If it is within cal, we might have a problem for the next weeks or so. I've seen some posts about apps running in single precision. So my question is: are the apps 0.20b and 0.22 tested correct apps? It is nice to have credit, but it is much nicer to have a deeper understanding of our milky way. The main goal is to have a working computer grid. And sometimes this means: try and error. No, i will not stop running wu's. Regards, Alexander ID: 38112 · Rating: 0 · rate: / Reply Quote

SkyeHunter Send message Joined: 6 Mar 09 Posts: 41 Credit: 38,856,291 RAC: 0	Message 38115 - Posted: 5 Apr 2010, 18:56:15 UTC In order to be as 'compliant' as possible, I migrated my 2 systems with a 4870 to stock application (v21) (from optimized v22). Resulted into 160 invalid, 300 valid and 100 pendings. Besides some practical issues (the quadXPC system is highly unstable since I returned airflow to normal), Credits are about a third what they used to be... I hope we may expect the situation to return to 'normal'; somewhere this week ... ID: 38115 · Rating: 0 · rate: / Reply Quote

BarryAZ Send message Joined: 1 Sep 08 Posts: 520 Credit: 302,538,504 RAC: 0	Message 38118 - Posted: 5 Apr 2010, 19:03:55 UTC As I see these 'failed validation results' popping up on all of my workstations (4850 GPU and CPU MW clients as well), at this point I figure to mark as 'no new work' for ALL of my MW workstations, letting the current queues flush out. When the project has a need for processed work, (the new validation schema suggests they have more data than they need or want and by imposing the new schema seek to migrate folks out of the project), hopefully they will post a news update. ID: 38118 · Rating: 0 · rate: / Reply Quote

banditwolf Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0	Message 38122 - Posted: 5 Apr 2010, 19:23:53 UTC I'm thinking it's validating whatever result comes first and then rejecting the latter. This example I finished and then the others were sent out, 4800 came in second, 5800, 4800. The 5800 was rejected. http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90067556 Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. ID: 38122 · Rating: 0 · rate: / Reply Quote

Blurf Volunteer moderator Project administrator Send message Joined: 13 Mar 08 Posts: 804 Credit: 26,380,161 RAC: 0	Message 38167 - Posted: 6 Apr 2010, 3:05:25 UTC Quorum Down to 2 The database is having a bit of trouble keeping up with all the new results due to a quorum of 3, so for the time being I'm dropping it to a quorum of 2. On another note, we should have source code for the new application available tomorrow. 6 Apr 2010 2:11:10 UTC ID: 38167 · Rating: 0 · rate: / Reply Quote

Emanuel Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0	Message 38180 - Posted: 6 Apr 2010, 10:58:41 UTC It also looks like the problem with HD5800 series cards has been tracked down. It's not a problem with the cards or the SDK, but a poorly publicized change in recommended programming practices for how to properly load floating point values (hope I'm saying that right) that only applies to the newer cards. It'll hopefully be fixed soon (I'm also hoping the CUDA applications will get the same accuracy from project-side, but they're at least well within the required range). ID: 38180 · Rating: 0 · rate: / Reply Quote