Welcome to MilkyWay@home

Marked as Invalid? (Part 2)

Message boards : Number crunching : Marked as Invalid? (Part 2)
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
SkyeHunter

Send message
Joined: 6 Mar 09
Posts: 41
Credit: 38,856,291
RAC: 0
Message 38471 - Posted: 9 Apr 2010, 14:31:32 UTC - in response to Message 38465.  

My "invalids" usually are paired with a combination of 5800 series running v0.23 and a 4700/4800 series running optimized applications (anonymous platform)...

I am running on 4870's :

Catalyst 10.2 drivers (10.3 is rumoured to be instable ?) and the stock application v0.23. One system under XP 32-bit and the second one under XP 64-bit. The XP 32-bit machine is currently experiencing most (if not all) invalids, the 64-bit almost none...
ID: 38471 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 38482 - Posted: 9 Apr 2010, 16:12:33 UTC - in response to Message 38471.  

My "invalids" usually are paired with a combination of 5800 series running v0.23 and a 4700/4800 series running optimized applications (anonymous platform)...

I am running on 4870's :

Catalyst 10.2 drivers (10.3 is rumoured to be instable ?) and the stock application v0.23. One system under XP 32-bit and the second one under XP 64-bit. The XP 32-bit machine is currently experiencing most (if not all) invalids, the 64-bit almost none...


It sounds like there's some issue with your 32 bit machine, or the 32 bit ATI application. Is anyone else having a similar problem? The server is down to < 3% invalids, so I think that means the ones flagged as invalid are actually invalid now.

Have you tried doing a detach and reattach to make sure you have the right brook32.dll file? Maybe manually delete the old one to force the new one to download?
ID: 38482 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Emanuel

Send message
Joined: 18 Nov 07
Posts: 280
Credit: 2,442,757
RAC: 0
Message 38486 - Posted: 9 Apr 2010, 18:20:08 UTC

Speaking of Detaching/Reattaching, I think a project Reset should do just as well if you're using BOINC client version 6.10.45 due to the following bug fix:
David  4 Apr 2010 - client: clean out project dir on reset.  fixes #978

Could be wrong though.
ID: 38486 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 38491 - Posted: 9 Apr 2010, 19:50:40 UTC - in response to Message 38482.  
Last modified: 9 Apr 2010, 19:52:33 UTC

The server is down to < 3% invalids, so I think that means the ones flagged as invalid are actually invalid now.
Actually the vast majority of invalids still being reported against my HD5870 are due to the earlier buggy versions of anonymous platforms out-voting it.

I've notified 12 owners now by PM but only about 4 of them that I know of have upgraded their app to 0.23.
ID: 38491 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
STE\/E

Send message
Joined: 29 Aug 07
Posts: 486
Credit: 576,517,594
RAC: 36,582
Message 38492 - Posted: 9 Apr 2010, 19:58:06 UTC - in response to Message 38491.  

The server is down to < 3% invalids, so I think that means the ones flagged as invalid are actually invalid now.
Actually the vast majority of invalids still being reported against my HD5870 are due to the earlier buggy versions of anonymous platforms out-voting it.

I've notified 12 owners now by PM but only about 4 of them that I know of have upgraded their app to 0.23.


The Project just needs to stop sending work to any Host with less than v0.23 Application and I bet you would see them Update in a Jiffy ....

STE\/E
ID: 38492 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Simplex0
Avatar

Send message
Joined: 11 Nov 07
Posts: 232
Credit: 178,229,009
RAC: 0
Message 38493 - Posted: 9 Apr 2010, 20:04:08 UTC - in response to Message 38491.  

The server is down to < 3% invalids, so I think that means the ones flagged as invalid are actually invalid now.
Actually the vast majority of invalids still being reported against my HD5870 are due to the earlier buggy versions of anonymous platforms out-voting it.

I've notified 12 owners now by PM but only about 4 of them that I know of have upgraded their app to 0.23.


Does that not mean that the results in this case that are geting the status 'Valid' actually is 'Invalid' or in other words the validation sytem does not working?
ID: 38493 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SkyeHunter

Send message
Joined: 6 Mar 09
Posts: 41
Credit: 38,856,291
RAC: 0
Message 38504 - Posted: 9 Apr 2010, 23:12:49 UTC - in response to Message 38482.  
Last modified: 9 Apr 2010, 23:14:03 UTC

Have you tried doing a detach and reattach to make sure you have the right brook32.dll file? Maybe manually delete the old one to force the new one to download?


I think you nailed the issue; I noticed an old (0.20b) executable in the directory. Erased all files in the directory and reset the project. Is it correct that the dll is brook32a_ati.dll ? I'll let it run for a while and keep an eye on it.
ID: 38504 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Gary Roberts

Send message
Joined: 1 Mar 09
Posts: 56
Credit: 1,984,937,499
RAC: 0
Message 38509 - Posted: 9 Apr 2010, 23:49:05 UTC - in response to Message 38493.  

Does that not mean that the results in this case that are geting the status 'Valid' actually is 'Invalid' ...

I don't know why you would think that.

If <3% of results are marked invalid then >97% are marked valid. All that means is that >97% of results agree with each other. This makes no statement about whether the results are actually right or wrong. All non-5800 series crunching methods gave correct answers anyway. All 5800 series cards that are running V0.23 are giving correct answers now. The chance of two remaining 'bad versions' on 5800 series cards getting to form a quorum is decreasing and is probably quite low now. So I would think that the majority of 'valid' results are also correct results.

... or in other words the validation sytem does not working?

The validation system is working as designed. The validator can't rectify the situation if two hosts each send back the same incorrect answer. The situation will improve further if those still using the 'bad' versions get 'encouraged' into upgrading to V0.23. Send some PMs to offenders rather than suggesting that the validator is broken.

Cheers,
Gary.
ID: 38509 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SkyeHunter

Send message
Joined: 6 Mar 09
Posts: 41
Credit: 38,856,291
RAC: 0
Message 38518 - Posted: 10 Apr 2010, 3:28:52 UTC - in response to Message 38504.  

Have you tried doing a detach and reattach to make sure you have the right brook32.dll file? Maybe manually delete the old one to force the new one to download?


I think you nailed the issue; I noticed an old (0.20b) executable in the directory. Erased all files in the directory and reset the project. Is it correct that the dll is brook32a_ati.dll ? I'll let it run for a while and keep an eye on it.


Nope, No change. I still have about 40% invalids on that machine, after a full cleanout of the directory and reset of the project...

ID: 38518 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Simplex0
Avatar

Send message
Joined: 11 Nov 07
Posts: 232
Credit: 178,229,009
RAC: 0
Message 38521 - Posted: 10 Apr 2010, 6:37:12 UTC - in response to Message 38509.  
Last modified: 10 Apr 2010, 6:57:47 UTC

Does that not mean that the results in this case that are geting the status 'Valid' actually is 'Invalid' ...

I don't know why you would think that.


The case was this...
"Actually the vast majority of invalids still being reported against my HD5870 are due to the earlier buggy versions of anonymous platforms out-voting it."


If <3% of results are marked invalid then >97% are marked valid. All that means is that >97% of results agree with each other. This makes no statement about whether the results are actually right or wrong.



I believed that the main purpose for a validation system was to guarantee
that the results aceppted by the system had errors thad did not exceed the accepted limit. If the system still will marke a correct outpot file as 'Invalid' just because the majority is equal, but not correct, I can not understand why there is no controll that the delivered results \ outpot file was created using the correct application.
ID: 38521 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 38523 - Posted: 10 Apr 2010, 6:44:49 UTC - in response to Message 38518.  
Last modified: 10 Apr 2010, 6:50:19 UTC

.... Nope, No change. I still have about 40% invalids on that machine, after a full cleanout of the directory and reset of the project...


Was it a reset or a detatch? If it was a reset from inside BAM, do a detatch using the host list at BOINCstats

Another thing to try .... you are running 822/995 which seems high for that card without overvolting, try running at defaults to see if it gets going, then you'll know. If that does get it going, reduce memory down to 200-300, memory speed is irrelevant MW, that will either allow cooler running PC & save a bunch on power, or allow a slightly higher o/c without overvolting.

Regards
Zy
ID: 38523 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SkyeHunter

Send message
Joined: 6 Mar 09
Posts: 41
Credit: 38,856,291
RAC: 0
Message 38525 - Posted: 10 Apr 2010, 10:49:26 UTC - in response to Message 38523.  

.... Nope, No change. I still have about 40% invalids on that machine, after a full cleanout of the directory and reset of the project...


Was it a reset or a detatch? If it was a reset from inside BAM, do a detatch using the host list at BOINCstats

Another thing to try .... you are running 822/995 which seems high for that card without overvolting, try running at defaults to see if it gets going, then you'll know. If that does get it going, reduce memory down to 200-300, memory speed is irrelevant MW, that will either allow cooler running PC & save a bunch on power, or allow a slightly higher o/c without overvolting.

Regards
Zy

Will do both changes... Meanwhile I upgraded the CCC/driver combo, but alas, only slight improvement...

ID: 38525 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 38526 - Posted: 10 Apr 2010, 12:19:59 UTC - in response to Message 38525.  
Last modified: 10 Apr 2010, 12:21:37 UTC

The 30 mins since you went to defaults is looking good, you were getting around 8 invalids an hour - one since the change. Let it run for a couple of hours like that to see if it stays stable. If as is likely it does, its then a case of looking at the overclocking as the likely cause. The chances are it will happen to the other card eventually as its similarly overclocked.

When overclocking, gently, step by step (say 15-20Mhz steps), increase the gpu clocks (memory at defaults) while running a WU, until it locks up - thats the limit. Reboot, back off 25Mz (50Mz to sleep at nights!) and you should be ok, but keep your eye on temperatures - heat is your enemy when o/c. Then lower the memory to around 200-300Mhz for MW (that value will differ wildly from Project to Project depending on whether or not they are memory dependent. The lower memory will give you a bit of room re power, and run cooler.

If its pushed to the last possible Mhz, there is no room for "peak" useage, and eventually you will get caught. The top end value will differ from card to card, even of the same card type, due to the manufacturing process and the way the dies are binned.

Regards
Zy
ID: 38526 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
SkyeHunter

Send message
Joined: 6 Mar 09
Posts: 41
Credit: 38,856,291
RAC: 0
Message 38531 - Posted: 10 Apr 2010, 16:15:18 UTC

Zydor,Thanks for the info. Apperently the new (stock) app stresses the GPU more than the previous one did (has been running with these settings for almost 8 months). I'll hav to find a new sweet spot ...

ID: 38531 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Mr. Hankey

Send message
Joined: 9 Apr 09
Posts: 10
Credit: 117,669,581
RAC: 0
Message 38532 - Posted: 10 Apr 2010, 17:34:46 UTC


So how does this work?

on this WU I got marked as invalid

http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=91766141

The first person reporting was stock .23 application on a 5800.

I reported second using the opti .23 application on a 5800.

The third result was someone running the .20b application on a 5800 and they and the first person got marked as valid and I got marked as invalid.
ID: 38532 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Emanuel

Send message
Joined: 18 Nov 07
Posts: 280
Credit: 2,442,757
RAC: 0
Message 38534 - Posted: 10 Apr 2010, 17:46:33 UTC

Huh, I thought they were supposed to show the reported fitness in the stderr.txt log ... but I don't see it in there. Otherwise you would be able to compare the fitness you got to the fitness the others got. The only way the situation -should- occur is if their fitnesses were very close together, and yours was somehow off. That would still be strange, but would be correct as far as the validator is concerned.
ID: 38534 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
Message 38561 - Posted: 11 Apr 2010, 1:36:50 UTC - in response to Message 38531.  
Last modified: 11 Apr 2010, 1:40:06 UTC

Zydor,Thanks for the info. Apperently the new (stock) app stresses the GPU more than the previous one did (has been running with these settings for almost 8 months). I'll hav to find a new sweet spot ...


When you settle to new prefered values, if you still get regular invalids - however infrequently - and you still get them when reverting to stock, I would suspect mainboard memory. You should only get invalids very rarely.

If it happens, worth shutting down, removing all memory sticks (remembouring to keep the pairs intact), and reseat them in different slots. It can happen, more frequently than many imagine, that the sticks work very slightly loose creating a less than optimum contact. Reseating like this can often sort that.

If invalids are still there - in any quantity at stock values - then its likely there is a bad memory stick lurking and a memory test should be done (memtest is a good util for it).

Happy Crunching :)

Regards
Zy
ID: 38561 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 38565 - Posted: 11 Apr 2010, 2:23:19 UTC - in response to Message 38532.  
Last modified: 11 Apr 2010, 2:24:39 UTC


So how does this work?

on this WU I got marked as invalid

http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=91766141

The first person reporting was stock .23 application on a 5800.

I reported second using the opti .23 application on a 5800.

The third result was someone running the .20b application on a 5800 and they and the first person got marked as valid and I got marked as invalid.

It seems the validator is doing something wanky:

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1658&nowrap=true#38383

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1658&nowrap=true#38390
ID: 38565 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 38569 - Posted: 11 Apr 2010, 5:10:53 UTC - in response to Message 38565.  


So how does this work?

on this WU I got marked as invalid

http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=91766141

The first person reporting was stock .23 application on a 5800.

I reported second using the opti .23 application on a 5800.

The third result was someone running the .20b application on a 5800 and they and the first person got marked as valid and I got marked as invalid.

It seems the validator is doing something wanky:

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1658&nowrap=true#38383

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1658&nowrap=true#38390



If you read the news, theres an issue with some of the 0.23 applications not correctly updating their brookX.dll file, which means they can return the same result as a 0.20 application, and thus quorum against valid results.

The validator is doing exactly what it should -- matching fitnesses and making quorums. The problem is that if enough computers out their are returning bad results they can quorum against good results.

For the most part we're down to < 4% invalid results which means most of the people running the bad 58x0 application have upgraded, but there are a few that even though they're using the 0.23 application downloaded from the server, the boinc client didnt upgrade the brookX.dll file, so they're still returning bad results.

When we swap over to the new application this should be completely fixed because people will have to upgrade their application to use the new application (and the corresponding dll files).
ID: 38569 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Simplex0
Avatar

Send message
Joined: 11 Nov 07
Posts: 232
Credit: 178,229,009
RAC: 0
Message 38570 - Posted: 11 Apr 2010, 6:47:09 UTC - in response to Message 38569.  


So how does this work?

on this WU I got marked as invalid

http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=91766141

The first person reporting was stock .23 application on a 5800.

I reported second using the opti .23 application on a 5800.

The third result was someone running the .20b application on a 5800 and they and the first person got marked as valid and I got marked as invalid.

It seems the validator is doing something wanky:

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1658&nowrap=true#38383

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1658&nowrap=true#38390



If you read the news, theres an issue with some of the 0.23 applications not correctly updating their brookX.dll file, which means they can return the same result as a 0.20 application, and thus quorum against valid results.

The validator is doing exactly what it should -- matching fitnesses and making quorums. The problem is that if enough computers out their are returning bad results they can quorum against good results.

For the most part we're down to < 4% invalid results which means most of the people running the bad 58x0 application have upgraded


Does not that mean that if the results that was validated with each other all was using the same bad application they could deliver a result that was equal to each other but had errors that was to large and they would still be marked as 'Valid'? Or in other words you will end up with all results marked 'Valid' but they are all incorrect.

How do you know that the results you are collecting have errors within the expected limits?
ID: 38570 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Marked as Invalid? (Part 2)

©2024 Astroinformatics Group