Message boards :
Number crunching :
Marked as Invalid? (Part 2)
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 6 Mar 09 Posts: 41 Credit: 38,856,291 RAC: 0 ![]() ![]() |
My "invalids" usually are paired with a combination of 5800 series running v0.23 and a 4700/4800 series running optimized applications (anonymous platform)... I am running on 4870's : Catalyst 10.2 drivers (10.3 is rumoured to be instable ?) and the stock application v0.23. One system under XP 32-bit and the second one under XP 64-bit. The XP 32-bit machine is currently experiencing most (if not all) invalids, the 64-bit almost none... |
![]() Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 ![]() ![]() |
My "invalids" usually are paired with a combination of 5800 series running v0.23 and a 4700/4800 series running optimized applications (anonymous platform)... It sounds like there's some issue with your 32 bit machine, or the 32 bit ATI application. Is anyone else having a similar problem? The server is down to < 3% invalids, so I think that means the ones flagged as invalid are actually invalid now. Have you tried doing a detach and reattach to make sure you have the right brook32.dll file? Maybe manually delete the old one to force the new one to download? ![]() |
Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0 ![]() ![]() |
Speaking of Detaching/Reattaching, I think a project Reset should do just as well if you're using BOINC client version 6.10.45 due to the following bug fix: David 4 Apr 2010 - client: clean out project dir on reset. fixes #978 Could be wrong though. |
Send message Joined: 27 Nov 09 Posts: 108 Credit: 430,760,953 RAC: 0 ![]() ![]() ![]() |
The server is down to < 3% invalids, so I think that means the ones flagged as invalid are actually invalid now.Actually the vast majority of invalids still being reported against my HD5870 are due to the earlier buggy versions of anonymous platforms out-voting it. I've notified 12 owners now by PM but only about 4 of them that I know of have upgraded their app to 0.23. |
Send message Joined: 29 Aug 07 Posts: 486 Credit: 576,548,171 RAC: 0 ![]() ![]() |
The server is down to < 3% invalids, so I think that means the ones flagged as invalid are actually invalid now.Actually the vast majority of invalids still being reported against my HD5870 are due to the earlier buggy versions of anonymous platforms out-voting it. The Project just needs to stop sending work to any Host with less than v0.23 Application and I bet you would see them Update in a Jiffy .... STE\/E |
![]() ![]() Send message Joined: 11 Nov 07 Posts: 232 Credit: 178,229,009 RAC: 0 ![]() ![]() |
The server is down to < 3% invalids, so I think that means the ones flagged as invalid are actually invalid now.Actually the vast majority of invalids still being reported against my HD5870 are due to the earlier buggy versions of anonymous platforms out-voting it. Does that not mean that the results in this case that are geting the status 'Valid' actually is 'Invalid' or in other words the validation sytem does not working? |
Send message Joined: 6 Mar 09 Posts: 41 Credit: 38,856,291 RAC: 0 ![]() ![]() |
Have you tried doing a detach and reattach to make sure you have the right brook32.dll file? Maybe manually delete the old one to force the new one to download? I think you nailed the issue; I noticed an old (0.20b) executable in the directory. Erased all files in the directory and reset the project. Is it correct that the dll is brook32a_ati.dll ? I'll let it run for a while and keep an eye on it. |
![]() Send message Joined: 1 Mar 09 Posts: 56 Credit: 1,984,937,499 RAC: 0 ![]() ![]() |
Does that not mean that the results in this case that are geting the status 'Valid' actually is 'Invalid' ... I don't know why you would think that. If <3% of results are marked invalid then >97% are marked valid. All that means is that >97% of results agree with each other. This makes no statement about whether the results are actually right or wrong. All non-5800 series crunching methods gave correct answers anyway. All 5800 series cards that are running V0.23 are giving correct answers now. The chance of two remaining 'bad versions' on 5800 series cards getting to form a quorum is decreasing and is probably quite low now. So I would think that the majority of 'valid' results are also correct results. ... or in other words the validation sytem does not working? The validation system is working as designed. The validator can't rectify the situation if two hosts each send back the same incorrect answer. The situation will improve further if those still using the 'bad' versions get 'encouraged' into upgrading to V0.23. Send some PMs to offenders rather than suggesting that the validator is broken. Cheers, Gary. |
Send message Joined: 6 Mar 09 Posts: 41 Credit: 38,856,291 RAC: 0 ![]() ![]() |
Have you tried doing a detach and reattach to make sure you have the right brook32.dll file? Maybe manually delete the old one to force the new one to download? Nope, No change. I still have about 40% invalids on that machine, after a full cleanout of the directory and reset of the project... |
![]() ![]() Send message Joined: 11 Nov 07 Posts: 232 Credit: 178,229,009 RAC: 0 ![]() ![]() |
Does that not mean that the results in this case that are geting the status 'Valid' actually is 'Invalid' ... The case was this... "Actually the vast majority of invalids still being reported against my HD5870 are due to the earlier buggy versions of anonymous platforms out-voting it." If <3% of results are marked invalid then >97% are marked valid. All that means is that >97% of results agree with each other. This makes no statement about whether the results are actually right or wrong. I believed that the main purpose for a validation system was to guarantee that the results aceppted by the system had errors thad did not exceed the accepted limit. If the system still will marke a correct outpot file as 'Invalid' just because the majority is equal, but not correct, I can not understand why there is no controll that the delivered results \ outpot file was created using the correct application. |
![]() Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 ![]() ![]() ![]() |
.... Nope, No change. I still have about 40% invalids on that machine, after a full cleanout of the directory and reset of the project... Was it a reset or a detatch? If it was a reset from inside BAM, do a detatch using the host list at BOINCstats Another thing to try .... you are running 822/995 which seems high for that card without overvolting, try running at defaults to see if it gets going, then you'll know. If that does get it going, reduce memory down to 200-300, memory speed is irrelevant MW, that will either allow cooler running PC & save a bunch on power, or allow a slightly higher o/c without overvolting. Regards Zy |
Send message Joined: 6 Mar 09 Posts: 41 Credit: 38,856,291 RAC: 0 ![]() ![]() |
.... Nope, No change. I still have about 40% invalids on that machine, after a full cleanout of the directory and reset of the project... Will do both changes... Meanwhile I upgraded the CCC/driver combo, but alas, only slight improvement... |
![]() Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 ![]() ![]() ![]() |
The 30 mins since you went to defaults is looking good, you were getting around 8 invalids an hour - one since the change. Let it run for a couple of hours like that to see if it stays stable. If as is likely it does, its then a case of looking at the overclocking as the likely cause. The chances are it will happen to the other card eventually as its similarly overclocked. When overclocking, gently, step by step (say 15-20Mhz steps), increase the gpu clocks (memory at defaults) while running a WU, until it locks up - thats the limit. Reboot, back off 25Mz (50Mz to sleep at nights!) and you should be ok, but keep your eye on temperatures - heat is your enemy when o/c. Then lower the memory to around 200-300Mhz for MW (that value will differ wildly from Project to Project depending on whether or not they are memory dependent. The lower memory will give you a bit of room re power, and run cooler. If its pushed to the last possible Mhz, there is no room for "peak" useage, and eventually you will get caught. The top end value will differ from card to card, even of the same card type, due to the manufacturing process and the way the dies are binned. Regards Zy |
Send message Joined: 6 Mar 09 Posts: 41 Credit: 38,856,291 RAC: 0 ![]() ![]() |
Zydor,Thanks for the info. Apperently the new (stock) app stresses the GPU more than the previous one did (has been running with these settings for almost 8 months). I'll hav to find a new sweet spot ... |
![]() Send message Joined: 9 Apr 09 Posts: 10 Credit: 117,669,581 RAC: 0 ![]() ![]() |
So how does this work? on this WU I got marked as invalid http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=91766141 The first person reporting was stock .23 application on a 5800. I reported second using the opti .23 application on a 5800. The third result was someone running the .20b application on a 5800 and they and the first person got marked as valid and I got marked as invalid. |
Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0 ![]() ![]() |
Huh, I thought they were supposed to show the reported fitness in the stderr.txt log ... but I don't see it in there. Otherwise you would be able to compare the fitness you got to the fitness the others got. The only way the situation -should- occur is if their fitnesses were very close together, and yours was somehow off. That would still be strange, but would be correct as far as the validator is concerned. |
![]() Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 ![]() ![]() ![]() |
Zydor,Thanks for the info. Apperently the new (stock) app stresses the GPU more than the previous one did (has been running with these settings for almost 8 months). I'll hav to find a new sweet spot ... When you settle to new prefered values, if you still get regular invalids - however infrequently - and you still get them when reverting to stock, I would suspect mainboard memory. You should only get invalids very rarely. If it happens, worth shutting down, removing all memory sticks (remembouring to keep the pairs intact), and reseat them in different slots. It can happen, more frequently than many imagine, that the sticks work very slightly loose creating a less than optimum contact. Reseating like this can often sort that. If invalids are still there - in any quantity at stock values - then its likely there is a bad memory stick lurking and a memory test should be done (memtest is a good util for it). Happy Crunching :) Regards Zy |
![]() ![]() Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,299,809 RAC: 547 ![]() ![]() |
It seems the validator is doing something wanky: http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1658&nowrap=true#38383 http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1658&nowrap=true#38390 |
![]() Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 ![]() ![]() |
If you read the news, theres an issue with some of the 0.23 applications not correctly updating their brookX.dll file, which means they can return the same result as a 0.20 application, and thus quorum against valid results. The validator is doing exactly what it should -- matching fitnesses and making quorums. The problem is that if enough computers out their are returning bad results they can quorum against good results. For the most part we're down to < 4% invalid results which means most of the people running the bad 58x0 application have upgraded, but there are a few that even though they're using the 0.23 application downloaded from the server, the boinc client didnt upgrade the brookX.dll file, so they're still returning bad results. When we swap over to the new application this should be completely fixed because people will have to upgrade their application to use the new application (and the corresponding dll files). ![]() |
![]() ![]() Send message Joined: 11 Nov 07 Posts: 232 Credit: 178,229,009 RAC: 0 ![]() ![]() |
Does not that mean that if the results that was validated with each other all was using the same bad application they could deliver a result that was equal to each other but had errors that was to large and they would still be marked as 'Valid'? Or in other words you will end up with all results marked 'Valid' but they are all incorrect. How do you know that the results you are collecting have errors within the expected limits? |
©2025 Astroinformatics Group