Welcome to MilkyWay@home

ps_separation and ps_test WUs all erroring out


Advanced search

Message boards : Number crunching : ps_separation and ps_test WUs all erroring out
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileSenilix

Send message
Joined: 8 Aug 08
Posts: 30
Credit: 74,566,409
RAC: 0
50 million credit badge13 year member badge
Message 49644 - Posted: 27 Jun 2011, 22:43:56 UTC

All of the above mentioned WUs are erroring out on my ATI HD 5850. Same for my wingmen. As some of them are running nVidia cards, the issue seems to be WU related.
ID: 49644 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lukfi

Send message
Joined: 25 Jan 11
Posts: 8
Credit: 3,824,187
RAC: 0
3 million credit badge11 year member badge
Message 49818 - Posted: 3 Jul 2011, 8:20:11 UTC
Last modified: 3 Jul 2011, 8:20:25 UTC

I got a batch of ps_separation_13_3s_fix20 today and all end with computing errors. I'm using an ATI Radeon HD 4770. What should I do, throw them all out and wait until I get some good ones?
ID: 49818 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileWerkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 140,436,377
RAC: 13
100 million credit badge14 year member badge
Message 49819 - Posted: 3 Jul 2011, 8:59:17 UTC

Sorry to tell you, Lukfi, my _13_3s_fix20 finish and validate.
I found a difference: I use newer drivers. Maybe a update can help.
ID: 49819 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
100 million credit badge12 year member badge
Message 49820 - Posted: 3 Jul 2011, 9:11:04 UTC

Well your Boinc Client is 6.10.58

mayby thats a Problem.

I work with older Catalyst drivers (1.4.1016, 1.4.900), and both Machines (1x HD4850, 2 x HD4850)work fine.

ID: 49820 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 357
Credit: 16,332,748
RAC: 4
10 million credit badge11 year member badge
Message 49824 - Posted: 3 Jul 2011, 10:41:06 UTC - in response to Message 49820.  

Well your Boinc Client is 6.10.58

mayby thats a Problem.

Don't think so, I'm still on 6.10.18, works fine. Also the only task with error done on his GPU had -177 error with "Maximum elapsed time exceeded", seems like many people have this problem ATM.
.
ID: 49824 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
100 million credit badge14 year member badge
Message 49825 - Posted: 3 Jul 2011, 11:39:09 UTC

Boinc Client 6.10.58 works w/o problems here.

What I found is that ps_separation_13_3s_free_2 WUs have set their run times far too short. See Message 49774

This together with a DCF that got messed up by a list of faulty WUs might lead to "Maximum elapsed time exceeded" too. So my first try would be to reset the DCF.
ID: 49825 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 357
Credit: 16,332,748
RAC: 4
10 million credit badge11 year member badge
Message 49830 - Posted: 3 Jul 2011, 12:31:24 UTC - in response to Message 49825.  

What I found is that ps_separation_13_3s_free_2 WUs have set their run times far too short. See Message 49774

See my reply to that.

But in general yes, increasing the DCF value to about double of what it is now should help.
.
ID: 49830 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lukfi

Send message
Joined: 25 Jan 11
Posts: 8
Credit: 3,824,187
RAC: 0
3 million credit badge11 year member badge
Message 49837 - Posted: 3 Jul 2011, 14:06:39 UTC

What's a DCF?
ID: 49837 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zydor
Avatar

Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0
100 million credit badge13 year member badgeextraordinary contributions badge
Message 49841 - Posted: 3 Jul 2011, 15:16:56 UTC
Last modified: 3 Jul 2011, 15:42:56 UTC

The elapsed time bug is being dealt with at:

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2468

DCF has nothing to do with the elapsed time bug, and has a completely separate non connected purpose.

What's a DCF?

Thats a Pandora's box ..... I'll try to give a short version, bare with me - please - its a complex topic at detailed level, the latter is not for this discusion.

DCF
DCF stands for "Duration Correction Factor". (click on the Projects tab - select the Project you are looking for the DCF - click Properties top left in BAM - look at the bottom of the resulting screen you'll see the DCF for that Project)

A Major problem in BOINC is tracking and predicting Work Unit anticipated run times - its far far from easy to get right, and causes grief in many situations - and I doubt it will ever be "right" 100% of the time, its the nature of the beast. The reason for its importance (in the way BOINC is setup at present) is the predicted run time in BAM has a profound effect on the BOINC schedular, and its the latter that goes and fetches the correct WUs at the right time, and in the right quantity.

In understanding DCF, its important to understand the "division of labour" between BOINC and BOINC Projects, contrary to some beliefs they are two separate Entities, BOINC Server (there are others, that will do as a generalisation), and Project Work Units.

BOINC exists to provide a framework into which any Project can slot their Project software and run it, almost invariably inside a beast called the BOINC Wrapper. The latter enables work units written in many different languages - from ancient COBOL to C++ - whatever, zillions of them - to run in BOINC without too many drastic changes to their Project Application (not strictly true, but pretty close). Project Admins do have access to the BOINC Server code and BOINC Wrapper code, but they are not maintained by Project Staff (that includes the BOINC Client most use to manage Projects and WUs), and nor should it be, Projects Staff have enough to do.

Projects are responsible for presenting the Project Work Unit to the BOINC schedular (part of BOINC server), and from there until the results are returned to the Project, BOINC, not the Project, is responsible for processing the WU, collecting the results and sending them back to the Project.

{bare with me .... here comes DCF ...}

In order to do that for Projects, BOINC needs to workout how long a WU will take on User hardware - even Projects cant predict that - and they own the WU - BOINC has no chance of a simple prediction. Here is where it gets messy. For BOINC to take on board an "anonymous" WU - which it must to do its job - and get the BOINC scedular to workout how many WUs of a Project in your list to send the correct WUs to meet your set % preferences - it has to have a mechanism where the run time is worked out. So far so good ... not too bad, it just "learns" from the WUs crunched and over time gets more accurate through experience.

Now comes the rub ..... having worked out the time a Project WU takes for the hardware the Cruncher has - a literally infinte number of combinations - it then has to apply that to the percentages set in your preferences so that the correct number of WUs from your crunching projects arrive at your Client, in good time, and to the correct proportion. To do the latter it runs a mechanism that, over time, ensures Project "X" gets the 30% you set, and Project "Y" the 25%, and Project "Z" the 45% (.... whatever, depends what you set as your preference).

The core of that mechanism is DCF. The DCF value indicates to BOINC the differences in WU run times and gives a "weighted" factor for use in grabbing the right Project WUs, for the right Projects you run, at the right time, and in the right quantity. Sometimes it gets out of wack and you'll hear about "resetting DCF", that means setting a value of zero to get it going again. The detail of all that is for another day ,,,,, a few chapters of an encylopedia...... and a coder along the lines of BOINC Devs / Claggy / Crunch3r / Richard (way too many to list)many many wonderful volunter coders (that furvently excludes me) who beaver away in their own time on behalf of BOINC keeping the Titanic off the Iceburg :)

{You did ask .... :) }

Regards
Zy
ID: 49841 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lukfi

Send message
Joined: 25 Jan 11
Posts: 8
Credit: 3,824,187
RAC: 0
3 million credit badge11 year member badge
Message 49846 - Posted: 3 Jul 2011, 17:30:00 UTC

Well that is all incredibly wonderful, but it doesn't solve the problem I'm having. Worse still, neither does the thread you're pointing at.
ID: 49846 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
cristipurdel

Send message
Joined: 1 Jul 11
Posts: 10
Credit: 422,543
RAC: 0
100 thousand credit badge11 year member badge
Message 49870 - Posted: 4 Jul 2011, 13:32:26 UTC
Last modified: 4 Jul 2011, 13:34:53 UTC

On my host http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=299828
all the gpu WUs ps_separation_13_3s ended with computational error.
eg. http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=64434802
Outcome Computation error
Client state Compute error
Exit status -177 (0xffffffffffffff4f)
In other cases I also got the -179 error
Previous I had a FX 580, removed it, using driver sweeper, and insttalled a 6950
My specs are Win 7 64, 6.12.26 x64, Cat 11.6, 6950 downclocked to 500/1250
Seti with ATI OpenCL build seams to be working ok.
Any suggestions?
ID: 49870 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : ps_separation and ps_test WUs all erroring out

©2022 Astroinformatics Group