Message boards :
Number crunching :
Getting lots of Invalid Wus - Help?
Message board moderation
Author | Message |
---|---|
Send message Joined: 3 May 10 Posts: 6 Credit: 104,596,950 RAC: 0 |
Hi, this is Jed Smith, I crunch for Team Seti.Usa under the username DrPop. My ATI 4870 used to do about 95 - 100K per day, but for the last 4 days only around 40 - 60K. I checked my account, and it seems like I am getting TONS of invalid or inconclusive WUs now. I never had this happen before - is there something I need to setup differently? Thank you for any help, DrPop |
Send message Joined: 17 Feb 08 Posts: 363 Credit: 258,227,990 RAC: 0 |
Hi, this is Jed Smith, I crunch for Team Seti.Usa under the username DrPop. GPU core clock: 800 MHz, memory clock: 800 MHz Reduce the core clock to stock settings or slightly above that, say 775 MHz. Along with that, drop the memory clock to 500 Mhz. That'll save energy and your card will run quite cooler. MW doesn't depend on memory bandwidth. Join Support science! Joinc Team BOINC United now! |
Send message Joined: 3 May 10 Posts: 6 Credit: 104,596,950 RAC: 0 |
Thanks, I will try that! I wonder why it never bothered anything before? Strange. I'll let you know how it goes. Jed |
Send message Joined: 3 May 10 Posts: 6 Credit: 104,596,950 RAC: 0 |
Hmmm...not sure what's going on. I dropped the rates to 750 / 500 like you said, and now it runs REALLY cool - 56 degrees under load. Amazing. But, still erroring WUs, I guess...says many of them are inconclusive. This never happened before. Do you think I should reinstall the driver, or remove the opti app for a while or something? Thanks for any tips. |
Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 |
Your fine - inconclusive just means "waiting for a Wingman" to validate against, hence its chucked into pendings. Most get sorted on a 2-10 day view, although inevitably you'll get some stuck in pendings for 2-4 weeks waiting for timeouts on wingmen. Pendings will climb for circa 2weeks+, then even off as WUs going in equal WUs coming out of pendings. RAC takes a temporary dip for that period, then all's well. Alarm bells only ring in the mind if the invalid or error tab starts filling, pendings (inconclusive) are fine. EDIT: Forgot this bit: Hmmm...not sure what's going on. I dropped the rates to 750 / 500 like you said, and now it runs REALLY cool - 56 degrees under load. Amazing. Think of memory "speed" (and its bandwidth not speed), like a freeway. If you only have light traffic normaly, a - say - 3 lane freeway will do. If its mega busy with high City traffic, probably need a 6 lane freeway. Pointless building a 6 lane freeway for light traffic, waste of resource. Same with memory, the rating refers to bandwidth (lanes on the freeway). MW does not need high bandwidth for the data it passes through, only relatively light loads of data pass at any one instance, so a lower bandwidth is fine, uses less resource, so less heat generated. Pointless having higher bandwidth, as like the car on an empty freeway, it will not travel any faster (as such). Regards Zy |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
Hmmm...not sure what's going on. I dropped the rates to 750 / 500 like you said, and now it runs REALLY cool - 56 degrees under load. Amazing. You only have one listed as invalid, you should be ok. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 3 May 10 Posts: 6 Credit: 104,596,950 RAC: 0 |
Thanks for the replies, it's making more sense, then. Just have to wait for the wingmen to catch up. I appreciate explaining the memory too - I'm glad Crunch3r told me to drop it that low...I though 800 was low already. :-) It rus a good 10 to 15 degrees cooler now, depending on time of day and other factors. |
Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 |
Just keep your eye on invalids/error WUs in the tasks tab until it all settles. If you still get regular invalids - albeit infrequently, circa one or two a day - reduce your setting from 775 down to - say - 765 or 760. Probably not needed, as its looking good at present, just keep your eye on it. Dont be tempted to push the card to the absolute limit in terms of the highest possible settings within a few Mhz before errors appear - it will - eventually - burn out. Its like a car, travel at full speed forever and high revs, it wears out much faster than if it were driven with a conservative driving style. Dont be fooled by months of "its ok" running at full stretch to the last possible Mhz, all thats happening is the wear and tear is drastically increased, and it will die on you. General rule of thumb is get to the settings where there are no errors over time, and reduce by 15Mhz (or just leave it at default settings ....) and it will purr away happily forever. In a very generalist sense, if you overclock more than 5-10% without taking other precautions and increased TLC you are pushing your luck in the long run. The card will run at those faster 5-10%+ speeds, but leave it alone, dont go there unless you get to the stage of really understanding overclocking - its an expensive way to learn lessons (!). edit: Just thought ..... for a bit of fun ..... if you are looking at learning more re overclocking, take a look at the link below. Its an extreme overclock session with liquid nitrogen by one of the leading gurus on LN overclocking. During the session he broke three world records. Guru3d also do overclock sessions with each card as it comes out, always worthwhile looking up your own card on past Guru3d reviews to get a feel of what it can do. Its basic everyday overclocking done by them but useful to readup on, these guys know what they are doing. Do be careful ..... dont get carried away on a wave of enthusiasm for it (!) ..... its expensive going that learning route. There is a forum attached to the Site where there are some genuine Uber Geek guys who are the types to take subliminal learning whilst asleep about their latest nano computer build, and think going to a seminar on the latest IBM registers is a holiday treat. They are a good bunch of guys, and will help anyone as long as you remember the please and thank yous. Bowing and scraping not expected, just common decency and curtesy. (just watch out for the usual loadmouth wannabie surfacing there, the guys slap them down hard when they surface, just pause to make sure you are dealing with a reputable guy - the rest will dive in to protect you if they spot a numpty sounding off). Liquid Nitrogen Session at Guru3d Regards Zy |
Send message Joined: 3 May 10 Posts: 6 Credit: 104,596,950 RAC: 0 |
Thank you, I enjoyed the detailed explanation of all this. I will check into that link when I have some time to learn more. Seems like things are going better on the card - have not had an invalid WU since the 9/14 and my numbers are slowly going back up (did nearly 70K yesterday). Thanks again, Jed |
Send message Joined: 29 Nov 07 Posts: 39 Credit: 74,300,629 RAC: 0 |
What I find odd is your 4870 is running almost 400 seconds per work unit. My 4850 at 727 Mhz is running right around 200 seconds per unit so not sure what is going on there. |
Send message Joined: 24 Dec 07 Posts: 1947 Credit: 240,884,648 RAC: 0 |
What I find odd is your 4870 is running almost 400 seconds per work unit. My 4850 at 727 Mhz is running right around 200 seconds per unit so not sure what is going on there. Running 2 at once most likely. DrPop may want to try going back to 1 wu at a time and see if the number of errors reduces. |
Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 |
A GPU can take more than one WU at anyone time, but they are not processed any faster. Each gets its "turn" individually for processing as time goes on. The time taken therefore appears to be doubled (400 secs), but on completion two WUs are ready not one. Therefore the effective throughput is still the same as such, although there is a minimal marginal advantage in terms of loading time overall into the GPU. Two loaded at the same time (via an app_info file), therefore (in this case) come out every 400secs. The effective throughput is 200 seconds, precisely what you are getting. The advantage of this method is it cuts down on "thermal cycling". The latter means that when a WU first starts, and finally finishes, there is a load/unload time. During that time, the card cools, then rapidly reheats as the loaded WU(s) start crunching). The worst enemies of computer silicon is the various forms of heat they put up with. Overwhelmingly, primarily, its the issues of interior temperature in the case, and pushing the card too hard with together with poor airflow. However thermal cycling plays its part as enemy number two, albeit in a far far less abrasive way. The latter is why its better from a longevity point of view to minimise turning PCs on & off during use. So thermal cycling is not something to loose sleep over as such, but its there, so if easily minimised, as in this case, why not ... Regards Zy |
Send message Joined: 2 Jan 08 Posts: 123 Credit: 69,524,507 RAC: 958 |
I too am having problems with work units that error out. I have three 5870 cards, all the same type with one in one computer and two in the other. The single card is running just fine without a problem or an error. The two card set up is the one with the issues. I was wondering why my RAC was not moving up very fast on the dual card machine so I checked the work units. A very large number were getting 0.00 and classed as invalid. Nearly all were paired with a 64 bit computer and when another 64 bit computer was added to check results I was always kicked out of the Quorum. I have a large pending amount as well, this is probably related to the inconclusive results but all are now getting 0.00 as well when quorum is met. Settings are the same on all cards, app_info file is at default settings. I have now stopped these two cards from doing Milkyway and gone back to DNETC which is having no such problems at all, leaving my single card doing Milkyway. I seem to recall I may of had this problem before with my 4870 cards, so I stopped them as well. I now have new cards and updated the application as well but still have the same issue. Not sure what is going on. I did down clock the single card but as I was doing Collatz as a backup project I put it back to normal a it all slowed down. |
Send message Joined: 22 Jan 10 Posts: 3 Credit: 71,094,774 RAC: 0 |
Lots of recent errors on my 5870 as well. Typically they are of two cases: either 1) "too many errors" where nobody can validate, which results in "Completed, marked as invalid" for me, or 2) I get ganged up on by a couple of v0.19 users compared to my 0.23 (ati13ati) version. I haven't changed video drivers or anything BOINC-related as things were running pretty smoothly since I put in the ATI 58x0 fix. Well, until perhaps about a week ago, when I noticed I was getting lots of "Completed, marked as invalid" and "Completed, can't validate." I have always been running the memory downclocked and have been dropping the core back towards stock speed, but errors are still occurring. Have there been any recent requirements to upgrade drivers for GPU crunching I may have missed posted on the message boards? Thanks for any insight/advice, Brent |
©2024 Astroinformatics Group