Message boards :
Number crunching :
ps_separation and ps_test WUs all erroring out
Message board moderation
Author | Message |
---|---|
Send message Joined: 8 Aug 08 Posts: 30 Credit: 74,566,409 RAC: 0 |
All of the above mentioned WUs are erroring out on my ATI HD 5850. Same for my wingmen. As some of them are running nVidia cards, the issue seems to be WU related. |
Send message Joined: 25 Jan 11 Posts: 8 Credit: 3,824,187 RAC: 0 |
I got a batch of ps_separation_13_3s_fix20 today and all end with computing errors. I'm using an ATI Radeon HD 4770. What should I do, throw them all out and wait until I get some good ones? |
Send message Joined: 19 Feb 08 Posts: 350 Credit: 141,284,369 RAC: 0 |
Sorry to tell you, Lukfi, my _13_3s_fix20 finish and validate. I found a difference: I use newer drivers. Maybe a update can help. |
Send message Joined: 28 Feb 10 Posts: 120 Credit: 109,840,492 RAC: 0 |
Well your Boinc Client is 6.10.58 mayby thats a Problem. I work with older Catalyst drivers (1.4.1016, 1.4.900), and both Machines (1x HD4850, 2 x HD4850)work fine. |
Send message Joined: 19 Jul 10 Posts: 589 Credit: 18,926,725 RAC: 4,683 |
Well your Boinc Client is 6.10.58 Don't think so, I'm still on 6.10.18, works fine. Also the only task with error done on his GPU had -177 error with "Maximum elapsed time exceeded", seems like many people have this problem ATM. |
Send message Joined: 8 Feb 08 Posts: 261 Credit: 104,050,322 RAC: 0 |
Boinc Client 6.10.58 works w/o problems here. What I found is that ps_separation_13_3s_free_2 WUs have set their run times far too short. See Message 49774 This together with a DCF that got messed up by a list of faulty WUs might lead to "Maximum elapsed time exceeded" too. So my first try would be to reset the DCF. |
Send message Joined: 19 Jul 10 Posts: 589 Credit: 18,926,725 RAC: 4,683 |
What I found is that ps_separation_13_3s_free_2 WUs have set their run times far too short. See Message 49774 See my reply to that. But in general yes, increasing the DCF value to about double of what it is now should help. |
Send message Joined: 25 Jan 11 Posts: 8 Credit: 3,824,187 RAC: 0 |
What's a DCF? |
Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 |
The elapsed time bug is being dealt with at: http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2468 DCF has nothing to do with the elapsed time bug, and has a completely separate non connected purpose. What's a DCF? Thats a Pandora's box ..... I'll try to give a short version, bare with me - please - its a complex topic at detailed level, the latter is not for this discusion. DCF DCF stands for "Duration Correction Factor". (click on the Projects tab - select the Project you are looking for the DCF - click Properties top left in BAM - look at the bottom of the resulting screen you'll see the DCF for that Project) A Major problem in BOINC is tracking and predicting Work Unit anticipated run times - its far far from easy to get right, and causes grief in many situations - and I doubt it will ever be "right" 100% of the time, its the nature of the beast. The reason for its importance (in the way BOINC is setup at present) is the predicted run time in BAM has a profound effect on the BOINC schedular, and its the latter that goes and fetches the correct WUs at the right time, and in the right quantity. In understanding DCF, its important to understand the "division of labour" between BOINC and BOINC Projects, contrary to some beliefs they are two separate Entities, BOINC Server (there are others, that will do as a generalisation), and Project Work Units. BOINC exists to provide a framework into which any Project can slot their Project software and run it, almost invariably inside a beast called the BOINC Wrapper. The latter enables work units written in many different languages - from ancient COBOL to C++ - whatever, zillions of them - to run in BOINC without too many drastic changes to their Project Application (not strictly true, but pretty close). Project Admins do have access to the BOINC Server code and BOINC Wrapper code, but they are not maintained by Project Staff (that includes the BOINC Client most use to manage Projects and WUs), and nor should it be, Projects Staff have enough to do. Projects are responsible for presenting the Project Work Unit to the BOINC schedular (part of BOINC server), and from there until the results are returned to the Project, BOINC, not the Project, is responsible for processing the WU, collecting the results and sending them back to the Project. {bare with me .... here comes DCF ...} In order to do that for Projects, BOINC needs to workout how long a WU will take on User hardware - even Projects cant predict that - and they own the WU - BOINC has no chance of a simple prediction. Here is where it gets messy. For BOINC to take on board an "anonymous" WU - which it must to do its job - and get the BOINC scedular to workout how many WUs of a Project in your list to send the correct WUs to meet your set % preferences - it has to have a mechanism where the run time is worked out. So far so good ... not too bad, it just "learns" from the WUs crunched and over time gets more accurate through experience. Now comes the rub ..... having worked out the time a Project WU takes for the hardware the Cruncher has - a literally infinte number of combinations - it then has to apply that to the percentages set in your preferences so that the correct number of WUs from your crunching projects arrive at your Client, in good time, and to the correct proportion. To do the latter it runs a mechanism that, over time, ensures Project "X" gets the 30% you set, and Project "Y" the 25%, and Project "Z" the 45% (.... whatever, depends what you set as your preference). The core of that mechanism is DCF. The DCF value indicates to BOINC the differences in WU run times and gives a "weighted" factor for use in grabbing the right Project WUs, for the right Projects you run, at the right time, and in the right quantity. Sometimes it gets out of wack and you'll hear about "resetting DCF", that means setting a value of zero to get it going again. The detail of all that is for another day ,,,,, a few chapters of an encylopedia...... and a coder along the lines of BOINC Devs / Claggy / Crunch3r / Richard (way too many to list)many many wonderful volunter coders (that furvently excludes me) who beaver away in their own time on behalf of BOINC keeping the Titanic off the Iceburg :) {You did ask .... :) } Regards Zy |
Send message Joined: 25 Jan 11 Posts: 8 Credit: 3,824,187 RAC: 0 |
Well that is all incredibly wonderful, but it doesn't solve the problem I'm having. Worse still, neither does the thread you're pointing at. |
Send message Joined: 1 Jul 11 Posts: 10 Credit: 422,543 RAC: 0 |
On my host http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=299828 all the gpu WUs ps_separation_13_3s ended with computational error. eg. http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=64434802 Outcome Computation error Client state Compute error Exit status -177 (0xffffffffffffff4f) In other cases I also got the -179 error Previous I had a FX 580, removed it, using driver sweeper, and insttalled a 6950 My specs are Win 7 64, 6.12.26 x64, Cat 11.6, 6950 downclocked to 500/1250 Seti with ATI OpenCL build seams to be working ok. Any suggestions? |
©2024 Astroinformatics Group