| log in |
Message boards : News : testing new validator
| Author | Message |
|---|---|
|
I've started up the new validator, so please be patient as I get all the kinks worked out over the next few days. Validation will now work as follows: Every result that could improve one of our searches will be validated (with a min quorum of 3 -- and the accuracy of the fitness reported must be within 10e-11 of the quorum results, this means that single precision GPU results will be flagged invalid). Results that won't improve a search will be validated 50% of the time until the error rates of hosts stabilizes in the database (this will probably take a couple weeks). Afterwards, for the results that don't improve our searches, we'll be using BOINC's adaptive validation based on hosts error rates (which will be between 10% and 100% depending on how many errors the host typically has). | |
| ID: 38016 | Rating: 0 | rate:
| |
|
Are these only test units just to verify the validator? I've had 2 issues with them - Completed, validation inconclusive and when the consensus does come in, I get - Completed, can't validate due to the settings of - max # of error/total/success tasks 1, 6, 1 errors Too many success results | |
| ID: 38019 | Rating: 0 | rate:
| |
Are these only test units just to verify the validator? I've had 2 issues with them - Completed, validation inconclusive and when the consensus does come in, I get - Completed, can't validate due to the settings of - max # of error/total/success tasks 1, 6, 1 errors Too many success results Ahh, thats the issue. I was wondering why WUs weren't coming back for additional validation. This should be fixed with new WUs. ____________ | |
| ID: 38020 | Rating: 0 | rate:
| |
Are these only test units just to verify the validator? I've had 2 issues with them - Completed, validation inconclusive and when the consensus does come in, I get - Completed, can't validate due to the settings of - max # of error/total/success tasks 1, 6, 1 errors Too many success results Thanks, I'll give 'em a try after I quit banging on Slicker's server. | |
| ID: 38021 | Rating: 0 | rate:
| |
Validation will now work as follows: Every result that could improve one of our searches will be validated (with a min quorum of 3[...] Why a quorum of 3? Why not two, and if they don't match closely enough, a third task is sent? It seems like a waste of resources. ____________ | |
| ID: 38022 | Rating: 0 | rate:
| |
|
So what about the wus that say "Checked, but no consensus yet" also come up as pending. Will they be granted credit eventually? | |
| ID: 38023 | Rating: 0 | rate:
| |
Validation will now work as follows: Every result that could improve one of our searches will be validated (with a min quorum of 3[...] Depending on how the validation goes I'll probably bring it down to 2. But right now I want to flush out all the clients returning bad results, which means a higher quorum -- so there's less chance of two bad clients returning results for the same WU and getting credit. ____________ | |
| ID: 38024 | Rating: 0 | rate:
| |
So what about the wus that say "Checked, but no consensus yet" also come up as pending. Will they be granted credit eventually? They'll be granted credit when there's a quorum of 3. Checked but no consensus yet means that the result was looked at but there weren't 2 other similar results to validate it. ____________ | |
| ID: 38025 | Rating: 0 | rate:
| |
|
Most of mine (edit: many, not most) are ending up with "Completed, validation inconclusive". 4 results, with a max of 4, and the no one gets any credits. Something doesn't seem right with this scheme. | |
| ID: 38027 | Rating: 0 | rate:
| |
Most of mine (edit: many, not most) are ending up with "Completed, validation inconclusive". 4 results, with a max of 4, and the no one gets any credits. Something doesn't seem right with this scheme. The max results should be 6. These must be from some of the older workunits (from the old validator), I'll update the database so hopefully they'll be fixed. ____________ | |
| ID: 38028 | Rating: 0 | rate:
| |
|
There also seems to be a set of applications out there which are giving close, but not close enough results -- accurate to about ~10e-8, when we really need ~10e-11 or more. I'm not sure if this is due to overclocking or single precision GPUs or maybe older optimized versions of the application which need to be updated. | |
| ID: 38029 | Rating: 0 | rate:
| |
|
Okay, I've run across a few with: | |
| ID: 38030 | Rating: 0 | rate:
| |
|
Looks to be more hardware related than application related to me. Many of the results marked invalid are using the stock application that is automatically downloaded. | |
| ID: 38031 | Rating: 0 | rate:
| |
|
Ouch. If correct, that's going to make this messy. Does HR (homogeneous redundancy) exist for GPUs? | |
| ID: 38033 | Rating: 0 | rate:
| |
Looks to be more hardware related than application related to me. Many of the results marked invalid are using the stock application that is automatically downloaded. Now that you mention it... i think i read somewhere that the 58xx cards give incorrect results with the latest SDK.
Source -> http://setiathome.berkeley.edu/forum_thread.php?id=59506&nowrap=true#986347 Since OpenCL is just some sort of wrapper for CAL/brook... Seems to me that someone should do a standalone test and compare results between 48xx and 58xx again, just to make sure everything works properly. ____________ Join BOINC United now! | |
| ID: 38034 | Rating: 0 | rate:
| |
Seems to me that someone should do a standalone test and compare results between 48xx and 58xx again, just to make sure everything works properly. And if they do produce the same results, then perhaps the validator needs to be tested as well. ____________ | |
| ID: 38035 | Rating: 0 | rate:
| |
Seems to me that someone should do a standalone test and compare results between 48xx and 58xx again, just to make sure everything works properly. I'd guess comparing a few numbers against each other shouldn't be that hard to do... But you never know ;) ____________ Join BOINC United now! | |
| ID: 38036 | Rating: 0 | rate:
| |
As stated in corresponding thread, 5xxx ability to work is very questionable right now. Bugs in ATI's OpenCL SDK implementation. They promised to fix those in new SDK release, will see...I recall GPUGRID was saying that ATI OpenCL was completely unusable. Kept locking up the machine at random. Also major problems with 4xxx performance that rendered them useless for any purpose. | |
| ID: 38037 | Rating: 0 | rate:
| |
Looks to be more hardware related than application related to me. Many of the results marked invalid are using the stock application that is automatically downloaded. My own observations seem to tie in pretty much with the above. I've checked a number of my results (48xx series - stock app) and everytime so far that I'm teamed up with non-58xx GPUs or even CPUs, the results are valid. If there are three 58xx GPUs, my result is always invalid. I've not yet seen a quorum where both 48xx and 58xx GPUs validate against each other. It does take time to check so I haven't looked at enough quorums yet to be absolutely sure. Here's a quorum that is a bit strange. There are two 48xx results that validate against each other and there are three 58xx results that have been declared invalid. These three did come in last but how did the two manage to trump them when there are supposed to be three for a quorum? Also, the use of 1,6,6 for the error/total/success numbers is a bit strange. If the min quorum is 3 then the max errors should really be 3 also since you could still get 3 successful results and form a quorum. By leaving the errors at 1, a second error will immediately junk an otherwise potentially successful quorum. EDIT: Does anyone know if this is the bit of the returned data that is used for validation purposes? probability calculation (stars) Calculated about 3.34818e+009 floatingpoint ops on FPU. If not, what exactly is used? ____________ Cheers, Gary. | |
| ID: 38039 | Rating: 0 | rate:
| |
|
HD5870 running ati13ati app. factory oc (875, 1250). | |
| ID: 38040 | Rating: 0 | rate:
| |
As stated in corresponding thread, 5xxx ability to work is very questionable right now. Bugs in ATI's OpenCL SDK implementation. They promised to fix those in new SDK release, will see...I recall GPUGRID was saying that ATI OpenCL was completely unusable. Kept locking up the machine at random. Also major problems with 4xxx performance that rendered them useless for any purpose. We have an OpenCL version of the MW@Home GPU application... and its about 10x slower on both NVIDIA and ATI cards. OpenCL still needs a lot of work it seems... If someone with both cards could do some comparison the numbers would be very helpful. When I release the code for the new application I'll have some real-sized workunit examples and the output that will be required (it will have to be within at least 10e-11). Hopefully this will help us either figure out the problem. ____________ | |
| ID: 38041 | Rating: 0 | rate:
| |
The 1 max error is because our application really shouldn't error out. Chances are if there's an error it was our fault (ie, a badly generated or specified workunit), and we don't want to send out more bad WUs. I don't mind upping it to 3 if people would prefer that, however. ____________ | |
| ID: 38042 | Rating: 0 | rate:
| |
It knows the time taken but doesn't use this for validation. I'm not quite sure how that would be helpful. ____________ | |
| ID: 38043 | Rating: 0 | rate:
| |
Good catch. There was a small bug in the check_set code for the validator. This shouldn't happen anymore.
The only thing used is the fitness value reported by the application. If the fitness returned is within 10e-11 of 2 other fitnesses for the quorum, it's valid. ____________ | |
| ID: 38044 | Rating: 0 | rate:
| |
|
Shouldn't it read: | |
| ID: 38045 | Rating: 0 | rate:
| |
|
Just counting pages at 20 tasks each, I am currently at 9 valid and 2 invalid. 82% valid tasks, is not going to work in the long run, obviously. But I'll hang around for the shakedown. | |
| ID: 38046 | Rating: 0 | rate:
| |
If not, what exactly is used? Thanks very much for the reply. All we can see in the data returned is what's shown below. This is one of the invalids from the quorum I linked previously. Can't see any 'fitness' value in there so can you advise if it's possible to get that value from somewhere? I imagine you could trawl the slot directory and find it there for your own host before the result is uploaded but that doesn't help with finding the fitness for each of your wingmen. Device 0: ATI Radeon HD5800 series (Cypress) 1024 MB local RAM (remote 2047 MB cached + 2047 MB uncached) GPU core clock: 850 MHz, memory clock: 1200 MHz 1600 shader units organized in 20 SIMDs with 16 VLIW units (5-issue), wavefront size 64 threads supporting double precision Starting WU on GPU 0 main integral, 640 iterations predicted runtime per iteration is 123 ms (33.3333 ms are allowed), dividing each iteration in 4 parts borders of the domains at 0 400 800 1200 1600 Calculated about 3.28897e+013 floatingpoint ops on GPU, 2.47165e+008 on FPU. Approximate GPU time 84.7168 seconds. probability calculation (stars) Calculated about 3.34818e+009 floatingpoint ops on FPU. WU completed. CPU time: 3.04202 seconds, GPU time: 84.7168 seconds, wall clock time: 86.535 seconds, CPU frequency: 2.87056 GHz </stderr_txt> ____________ Cheers, Gary. | |
| ID: 38048 | Rating: 0 | rate:
| |
|
Here is another one: | |
| ID: 38049 | Rating: 0 | rate:
| |
Here is another one: Bug already fixed. Check the third post in this thread. ____________ Cheers, Gary. | |
| ID: 38050 | Rating: 0 | rate:
| |
Here is another one: Thank you. I think I need more coffee... | |
| ID: 38051 | Rating: 0 | rate:
| |
82% valid tasks, is not going to work in the long run, obviously. But I'll hang around for the shakedown. Well I'm out of here for the time being as 82% is not satisfactory for me! I realize that MilkyWay is still classed (as far as I know) as an Alpha project, but IMHO it is mature enough that they shouldn't be running tests in a production environment - at least some of these bugs (if not the majority of them) SHOULD have been caught in testing before releasing this new version validator into the wild. See y'all later. ____________ | |
| ID: 38052 | Rating: 0 | rate:
| |
82% valid tasks, is not going to work in the long run, obviously. But I'll hang around for the shakedown. Right now it looks like the problem isn't the validator but the (optimized?) GPU applications. I don't think it will take us too long to sort this out. And honestly, I put the new validator out tonight only screwing up a few workunits. I don't think that's too bad :P There's a lot of things you just can't catch until you put that kind of thing out in the wild anyways. Like I mentioned in the previous post, I rewrote the assimilator/validator code from the ground up in Java. This is going to make debugging and testing a LOT easier (yay garbage collection, exceptions and no more segmentation faults), and the validator much more stable (no memory leaks, writing to bad areas of memory). Oddly enough, it seems to be using significantly less CPU than the older version (which was c/c++). ____________ | |
| ID: 38053 | Rating: 0 | rate:
| |
The 1 max error is because our application really shouldn't error out. Chances are if there's an error it was our fault (ie, a badly generated or specified workunit), and we don't want to send out more bad WUs. With an IR of 3, if the whole WU is bad all 3 will be bad and and you'll quickly hit the 3 error results limit. You shouldn't underestimate the ability of the average cruncher to trash the tasks even if the app itself really shouldn't error out :-). Also, it's very frustrating to the CPU crunchers to see many hours of work down the drain just because of a second error result in a quorum before the third success result has had a chance to come in. What problem is there in sending out an extra copy or two of the task to see if you can get a quorum? I don't mind upping it to 3 if people would prefer that, however. Well, at least make it 2 so as to give a bit more protection to those who have invested their resources (and put a memo on your monitor bezel to "Not send out any bad WUs" :-). ____________ Cheers, Gary. | |
| ID: 38054 | Rating: 0 | rate:
| |
The 1 max error is because our application really shouldn't error out. Chances are if there's an error it was our fault (ie, a badly generated or specified workunit), and we don't want to send out more bad WUs. Good points. I upped the max error results to 3. This should be reflected in all the current (and new) workunits. ____________ | |
| ID: 38055 | Rating: 0 | rate:
| |
|
Here's another example of the 'Too many success results' bug. Note that one of the victims actually invested over a day of CPU time for no reward. I guess he wont be particularly impressed. | |
| ID: 38057 | Rating: 0 | rate:
| |
|
So it is possible that most of these results would be accurate to 10e-11 if compared only against an unoptimised CPU application but the results from ATI 48xx, NVIDIA and optimised CPU applications are on one side of the required fitness value and the results from ATI 58xx and 5970 are on the other side. Therefore the difference between these two sets of hardware is less accurate than 10e-11, even though individual results compared against an unoptimised CPU application may still have the required accuracy. | |
| ID: 38058 | Rating: 0 | rate:
| |
|
Is the "Canonical" result used in anyway in determining the validity of results? I haven't checked too many, but have noted that the first result in sometimes determines validity or invalidity. | |
| ID: 38059 | Rating: 0 | rate:
| |
Is the "Canonical" result used in anyway in determining the validity of results? I haven't checked too many, but have noted that the first result in sometimes determines validity or invalidity. I think it's the other way around. The validator selects those results that agree (within specification) and one of them (perhaps the first one) is nominated as 'canonical'. Maybe it's the one whose answer is the closest to the average of all valid results for that quorum. I guess it depends on how the validator has been written. .... Take a look more closely. The numbers you are quoting are 'flops' not 'fitness' and they are e+013 and e+009 rather than 'minus'. Travis has already said that only 'fitness' is used for validation but he hasn't answered (yet) about where we might be able to observe the actual 'fitness' values for results in a quorum. I suspect we can't access those values which will make it rather unsatisfactory for anyone trying to understand why results are being deemed invalid. Seeing as the program is being modified at the moment, it might be a good opportunity to add some code to display on the website the fitness value returned by each successful task. ____________ Cheers, Gary. | |
| ID: 38061 | Rating: 0 | rate:
| |
|
Oh well...NNT until the problems with the validator have been overcome. | |
| ID: 38062 | Rating: 0 | rate:
| |
|
I'm still gathering up a lot of "can't validate" messages. What does "check skipped" mean anyway? | |
| ID: 38065 | Rating: 0 | rate:
| |
Oh well...NNT until the problems with the validator have been overcome. I think that's probably a very wise move. I'm now seeing quite a lot of examples of quorums that are giving the "Too many total results" error message when there are 6 successful results that apparently don't agree closely enough. There has got to be some sort of a problem with the validator, I would guess. I've seen a few examples where the 6 results have been split 3/3 (or 4/2 or 2/4) between 48xx and 58xx GPUs and yet the validator can't seem to find 3 that agree closely enough. There's something wrong with the validation process somewhere. Here's an example of a 3/3 split that can't validate and gives the 'Too many total results'. ____________ Cheers, Gary. | |
| ID: 38066 | Rating: 0 | rate:
| |
|
I think all these examples are definitely helping, at least. Other than that, it may be best to wait for the new application versions if you hate losing crunching time (if you don't, I'm sure Travis would appreciate your continued help testing). | |
| ID: 38067 | Rating: 0 | rate:
| |
I'm still gathering up a lot of "can't validate" messages. What does "check skipped" mean anyway? The validator will skip the validation check if any of the limits are exceeded. In your case I think you will find that the IR has gone to 7 and the WU as a whole has errored out with the 'Too many total results' error message. Click on the WU ID and look at what the error message for the WU as a whole actually says. ____________ Cheers, Gary. | |
| ID: 38068 | Rating: 0 | rate:
| |
|
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90246907 | |
| ID: 38069 | Rating: 0 | rate:
| |
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=96603214 That's the task ID and NOT the WU ID. If you have the task ID open you can actually see the WU ID on the second line of that particular page of output. However, when you are looking at all your results on the website you can see both task ID and WU ID side by side. Click on the WU ID to see what I'm talking about. ____________ Cheers, Gary. | |
| ID: 38070 | Rating: 0 | rate:
| |
|
Ok. I edited the previous post as you asked me to do. It says too many total results. Status: Completed, can't validate | |
| ID: 38071 | Rating: 0 | rate:
| |
|
What they have in common is ignoring unknown input argument in app_info.xml. Normal behaviour? | |
| ID: 38073 | Rating: 0 | rate:
| |
Ok. I edited the previous post ... That's OK. I obviously captured for posterity what you wrote before editing :-). as you asked me to do. I certainly didn't ask you to edit your post. I was trying to explain to you the difference between a 'task' and a 'WU'. There is no error in any of the six tasks that make up the complete WU but the WU has errored out simply because the IR went to 7 - ie there were nore than six tasks in total making up the workunit. If you look at any of the six tasks in the WU, none of them actually have a problem that's visible. However we can deduce that the validator couldn't find three that agreed closely enough, before the IR was bumped from 6 to 7. As soon as it was bumped, the limit of 6 total tasks was exceeded and the whole WU was marked as an error and any further validation checks were skipped. The problem is more likely to be with validation rather than how your machine crunched your particular task. It says too many total results. Status: Completed, can't validate Which is exactly what it should say seeing as Travis has set the total tasks (results) limit for a WU to be 6. The real question is why the hell the validator can't find three results that agree when it has six to choose from? ____________ Cheers, Gary. | |
| ID: 38077 | Rating: 0 | rate:
| |
What they have in common is ignoring unknown input argument in app_info.xml. Normal behaviour? I'm seeing that on all recent tasks that do validate and are being crunched by stock apps anyway (no app_info.xml file) so I'm assuming it is something in new tasks that is not right but otherwise is actually harmless. Only Travis can sort that out and he's obviously not listening at the moment - probably in bed. ____________ Cheers, Gary. | |
| ID: 38079 | Rating: 0 | rate:
| |
Here's another example of the 'Too many success results' bug. Note that one of the victims actually invested over a day of CPU time for no reward. I guess he wont be particularly impressed. Yeah, I bumped it up to 3,9,6 for the time being. ____________ | |
| ID: 38089 | Rating: 0 | rate:
| |
|
Dumb Question time: | |
| ID: 38090 | Rating: 0 | rate:
| |
|
I've updated the validator so it will add the fitness that the results reported to the server at the end of the standard output. | |
| ID: 38091 | Rating: 0 | rate:
| |
What they have in common is ignoring unknown input argument in app_info.xml. Normal behaviour? That's just getting ready for the new version of the application. The new application will take the parameters it's using from the command line (that way we don't have to generate a new parameter file for each workunit). ____________ | |
| ID: 38092 | Rating: 0 | rate:
| |
Dumb Question time: One workunit is being used, and multiple results are generated for each workunit until it reaches a quorum. Right now the way it works is one copy of the workunit is initially issued, when it comes back, we check to see if it needs validation. If it does we send out 2 more copies (to try and get a quorum of 3). If those 2 come back and there is no quorum, we send out a 4th copy, then a 5th, etc until we have a quorum of 3.
This doesn't solve the problem because we need the accuracy specified. We need to find out which cards are sending back incorrect results and update the applications accordingly.
That's what we're trying to figure out. I just updated the validator to append information about the fitness returned from your results to the std_err field -- so you can see it when you look at a task. Hopefully this will help.
Not quite sure what this issue is... seems maybe BOINC client related?
This is because we're still not validating EVERY workunit. We validate every workunit that will improve the searches we're running (if there's a better fitness found that what we currently know about). If the fitness isn't going to improve the search, we're still validating those workunits 50% of the time. This is so we can get these accuracy issues worked out and so people can't scam the server for credit using single precision GPU applications or other things. ____________ | |
| ID: 38093 | Rating: 0 | rate:
| |
Yeah I was passed out. But I found part of the problem. The validator was actually trying to get a quorum of 4. I was looking for matches == getMinQuorum(), and since I wasn't comparing a workunit to itself, what I actually needed was matches == getMinQuorum()-1. Debugging at 4am is bad news, lol. Was pretty obvious with a good nights sleep. ____________ | |
| ID: 38095 | Rating: 0 | rate:
| |
|
Well, here's the first "magic number" failure I have in my records: | |
| ID: 38096 | Rating: 0 | rate:
| |
82% valid tasks, is not going to work in the long run, obviously. But I'll hang around for the shakedown. I'm not using an optimized app, I'm using what is given to me by the project. I have NEVER before had invalid results, after the "upgrade" I was getting many. ____________ | |
| ID: 38098 | Rating: 0 | rate:
| |
|
Since I don't know if purges are running real quick, and I don't know what the major amount of noise is in this thread since I last read it, I'm going ahead and posting this. It will likely be formatted badly, and may already be covered by the numerous postings, but I just wanted to state that it's quite unfair to me to have an app that is known to be working fine and spend 4.5 hours on a task for zip, zap, zero... Oh, and in case the 4.5 hours didn't tell you which system is mine, it's the non-GPU system, the first one in the quorum... | |
| ID: 38100 | Rating: 0 | rate:
| |
Since I don't know if purges are running real quick, and I don't know what the major amount of noise is in this thread since I last read it, I'm going ahead and posting this. It will likely be formatted badly, and may already be covered by the numerous postings, but I just wanted to state that it's quite unfair to me to have an app that is known to be working fine and spend 4.5 hours on a task for zip, zap, zero... Oh, and in case the 4.5 hours didn't tell you which system is mine, it's the non-GPU system, the first one in the quorum... This was one of the older WUs sent out with bad values for max error/total/success:
That issue shouldn't happen anymore. I've also loosened up the validation a little bit which may help some workunits not being flagged invalid. If we can't figure out a good solution to the 48xx vs 58xx ATI GPUs issue, I'll probably lower the validation to having fitness within 10e-10 (or 10e-9) to see if that helps. The new application will be 10e-11 however. ____________ | |
| ID: 38103 | Rating: 0 | rate:
| |
|
On another note, does anyone know if the 48xx or the 58xx ATI GPU is the one validating correctly vs the stock application? | |
| ID: 38104 | Rating: 0 | rate:
| |
Ditto, I stopped using the opti apps (except for the CPU) several weeks ago.. I am getting hundreds of invalids.. none of my cards are overclocked everything stock. So, how do you intend to correct this? If 1 or 2 hosts in the quorum are using stock apps but are returning invalid results, could there be something wrong with the stock app and not the validation? ____________ . | |
| ID: 38105 | Rating: 0 | rate:
| |
On another note, does anyone know if the 48xx or the 58xx ATI GPU is the one validating correctly vs the stock application? Here is as host that I have switched from Stock to opti back to stock.. both apps see to be having problems. 5870 - single card, no overclock. http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=47682 ____________ . | |
| ID: 38107 | Rating: 0 | rate:
| |
On another note, does anyone know if the 48xx or the 58xx ATI GPU is the one validating correctly vs the stock application? What I'm trying to figure out is if say a stock application result and 2 48xx GPU results come back, do they validate to a quorum? (that would mean the issue is with the 58xx GPU application). Otherwise, if a stock application result and 2 58xx GPU results make a quorum, that would mean the 48xx GPU application is the problem. ____________ | |
| ID: 38108 | Rating: 0 | rate:
| |
|
Seams to be a lot of wasted computer power. | |
| ID: 38113 | Rating: 0 | rate:
| |
Seams to be a lot of wasted computer power. It won't be as bad once we get the GPU issue sorted out and the hosts error rates updated in the database. We'll be moving to a quorum of 2 and ~10% validation for results that don't improve our searches (unless that host is known to be a repeat offender when it comes to errors); which will really be minimal duplicate work. Right now I'm just trying to flush out bad clients that are running single precision GPU applications or scripts. ____________ | |
| ID: 38114 | Rating: 0 | rate:
| |
What I'm trying to figure out is if say a stock application result and 2 48xx GPU results come back, do they validate to a quorum? (that would mean the issue is with the 58xx GPU application). Not all GPUs are the same brand or architecture class but neither are all CPUs. So what is considered a stock application result, an unoptimised CPU application running on a certain model Intel CPU or an unoptimised CPU application running on a certain model AMD CPU? Or does the CPU architecture make no difference to the result? I am just trying to make sure that the term "stock application result" is understood in the same way by all reading or contributing to the thread. Many believe that the ATI GPU application automatically sent by the server is the "stock" application and the one that is manually downloaded and installed by contributors is the "optimised" application, but they are often both the same application. In this sense all GPU applications are "optimised" and only the original CPU application automatically downloaded could be considered the stock application. If unoptimised application CPU results are in between the results of 48xx and 58xx, then is it possible that they would not validate with either GPU class or with both? Hopefully we will find out soon, if there was a test application available I still couldn't work it out myself unless I also had a test validator. | |
| ID: 38117 | Rating: 0 | rate:
| |
What I'm trying to figure out is if say a stock application result and 2 48xx GPU results come back, do they validate to a quorum? (that would mean the issue is with the 58xx GPU application). As far as I know, all the applications we provide (the stock applications) provide results with 10e-13 of each other, regardless of them being on a CPU or GPU. ____________ | |
| ID: 38119 | Rating: 0 | rate:
| |
As far as I know, all the applications we provide (the stock applications) provide results with 10e-13 of each other, regardless of them being on a CPU or GPU. Awhile back you said you only needed to the 8th place and would like to get to the 10th. That is what I thought all of these applications were based off of. ____________ Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. | |
| ID: 38120 | Rating: 0 | rate:
| |
As far as I know, all the applications we provide (the stock applications) provide results with 10e-13 of each other, regardless of them being on a CPU or GPU. As far as I remember, we've always wanted 10+ degrees of precision. If we were content with 8 we could be using single precision applications. ____________ | |
| ID: 38124 | Rating: 0 | rate:
| |
|
Double precision is accurate to the 12th decimal. What about these "completed, validation inconclusive" ones? Are they waiting for the other quorums to be validated? | |
| ID: 38126 | Rating: 0 | rate:
| |
As far as I know, all the applications we provide (the stock applications) provide results with 10e-13 of each other, regardless of them being on a CPU or GPU. Thanks for the clarification of what stock means. I thought the GPU applications were compared to a CPU for accuracy when they were developed not compared to other GPU results. | |
| ID: 38127 | Rating: 0 | rate:
| |
As far as I know, all the applications we provide (the stock applications) provide results with 10e-13 of each other, regardless of them being on a CPU or GPU. Could have been talking about single precision apps. ____________ Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. | |
| ID: 38128 | Rating: 0 | rate:
| |
|
Two comparisons which include unoptimised CPU application in the quorum. | |
| ID: 38140 | Rating: 0 | rate:
| |
|
I´m wondering about the "new" observation that different architectures lead to different results even with the same application. I´m crunching CPDN for about five years and hardly have seen two identical or nearly identical results even with the same application (because the applications are closed). Maybe the goal of accuracy is to challenging. | |
| ID: 38142 | Rating: 0 | rate:
| |
|
I've posted some results including the optimized CPU and CUDA application in the other thread. Only the HD5800 series GPUs deviate substantially. All others are well within the bounds set by the project. | |
| ID: 38144 | Rating: 0 | rate:
| |
I've posted some results including the optimized CPU and CUDA application in the other thread. Only the HD5800 series GPUs deviate substantially. All others are well within the bounds set by the project. Yeah I've modified the validator on my end to get more information about what's going on, and the 5800s are the ones giving off results (by about ~9e-8). 4800s validate vs. stock and CUDA just fine. ____________ | |
| ID: 38145 | Rating: 0 | rate:
| |
I´m wondering about the "new" observation that different architectures lead to different results even with the same application. I´m crunching CPDN for about five years and hardly have seen two identical or nearly identical results even with the same application (because the applications are closed). Maybe the goal of accuracy is to challenging. As shown in the other thread HD3800/4700/4800 GPUs return the exact same fitness value as CPUs with my versions. So at least here it is possible. I've told Anthony and Travis already some time ago, that I think the CUDA version could probably return the same values too. The behaviour of the HD58x0 GPUs is definitely peculiar and probably some kind of mess-up on some thing that simply needs to be found and fixed. | |
| ID: 38146 | Rating: 0 | rate:
| |
|
Ok, well I just set all my 5850s to NNW. I have been using the optimized .20b application.... Let us know when it is safe to return :) | |
| ID: 38149 | Rating: 0 | rate:
| |
HD3800/4700/4800 GPUs return the exact same fitness value as CPUs with my versions. So at least here it is possible. I've told Anthony and Travis already some time ago, that I think the CUDA version could probably return the same values too. I´m only CPU-cruncher on MW, but my accurate results were errored out because that four 58x0 overruled my result as could be seen here: http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90277453 Will I get my credits back? I have a valid result with 4800, my CPU and a cuda-app and the 5800 was marked as invalid. (http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=90148812) This supports the statement of Cluster Physik. | |
| ID: 38151 | Rating: 0 | rate:
| |
|
Is it possible for you to compile a test case for those of us who have both 48xx and 58xx cards so we can run them and see what is really doing the good work. | |
| ID: 38155 | Rating: 0 | rate:
| |
|
This WU: | |
| ID: 38157 | Rating: 0 | rate:
| |
Is it possible for you to compile a test case for those of us who have both 48xx and 58xx cards so we can run them and see what is really doing the good work. I'll run some sample WUs standalone on my laptop tonight so I can be sure of the fitness. I'll put out the input files and the expected output when they're done. ____________ | |
| ID: 38158 | Rating: 0 | rate:
| |
|
Am running down MW wus until problem solved. | |
| ID: 38169 | Rating: 0 | rate:
| |
Am running down MW wus until problem solved. Well, unless the 58x0 series is more accurate than CPUs (which I doubt), they're the culprit. From an email Anthony just sent me: this is a 2 stream workunit with sgr coordinates) (uses hardcoded values in atSurveyGeometry.c) -2.558875331749281 v0.19 CPU application (SSE3) -2.558875331749119 v0.20 apps ((ati 48xx) -2.558875331749284 v0.18 optimized -2.558875331749081 nvidia on boinc -2.558875355118770 (not v0.20) ati on boinc 58xx (use computed values in atSurveyGeometry.c) -2.558875329826787 cpu (from repository) -2.558875329826697 nvidia (old version, circa oct 2009) -2.558875329826689 nvidia (new unreleased version) the 58x0 series just isn't matching up to anything we have. ____________ | |
| ID: 38173 | Rating: 0 | rate:
| |
|
<removed> | |
| ID: 38176 | Rating: 0 | rate:
| |
|
A few questions. | |
| ID: 38291 | Rating: 0 | rate:
| |
A few questions. I think this was maybe a more recent change? At any rate the results we've been getting for the searches have always been validated -- it's just that the issue didn't show up as much because we were not validating the vast majority of the workunits; we were just validating the ones which improved the searches we were doing. So while they had the error it didn't effect our results very much at all. The reason it's been a big deal lately was because in order to fix scripting and single precision app issues we started validating most workunits (even those that didnt improve our searches). So before we were only validating 2-5% of WUs, now we're validating 50-75%.
Right now my focus is on trying to get the server running stably again and upgrading to the new application. I'm not sure if I'm going to have time to go through the database manually and fix everyones lost credit. Most of these workunits have also been purged from the database right now, so there's really no good way to update and grant lost credit. I think it's just something everyone is going to have to live with and I apologize for that.
Well the real issue here was that we went from doing nearly no validation (we were only validating a minority of results which actually improved our search populations), to doing a lot more validation which made the problem really apparent -- so I guess the swap was a good thing :) On our end, we don't really need this extra validation because results which don't improve our search populations aren't particularly important, other than to weed out bad applications (which in this case we were unlucky enough to have one). But at any rate, I think with the more strict validation we have in place now, this kind of thing shouldn't happen again.
Glad after all of this we aren't totally hated here :) ____________ | |
| ID: 38325 | Rating: 0 | rate:
| |
But at any rate, I think with the more strict validation we have in place now, this kind of thing shouldn't happen again.For a possible counter-example, check Workunit 90623954. Two anonymous platforms sporting versions 0.20b and 0.22 out-quorumed an HD5870 running version 0.23. All of the results were from ATI Cypress boards (HD5870 and HD5850 apparently). Shouldn't 5xxx-series GPU results from applications prior to 0.23 be automatically discarded? | |
| ID: 38336 | Rating: 0 | rate:
| |
They should... but that's takes a couple extra database queries per workunit, and the server is crashing enough as it is. I had the check in there for awhile and the server couldn't keep up with it. ____________ | |
| ID: 38341 | Rating: 0 | rate:
| |
Two anonymous platforms sporting versions 0.20b and 0.22 out-quorumed an HD5870 running version 0.23. There are probably a significant number of people running AP and not paying close attention to the boards. Anybody noticing cases of 5800 series cards still running the wrong app should send a PM to the owner (if possible) since that will give them an email as well. Hopefully they are monitoring their email a bit more closely. ____________ Cheers, Gary. | |
| ID: 38344 | Rating: 0 | rate:
| |
Anybody noticing cases of 5800 series cards still running the wrong app should send a PM to the owner (if possible) since that will give them an email as well.I personally think it would be more appropriate if RPI were sending out these e-mails but... ...I've notified 4 other owners as requested. | |
| ID: 38353 | Rating: 0 | rate:
| |
Anybody noticing cases of 5800 series cards still running the wrong app should send a PM to the owner (if possible) since that will give them an email as well.I personally think it would be more appropriate if RPI were sending out these e-mails but... Doing a mass email to everyone should be easy. Just the ones who don't want emails from the project would be left out, then individual emails. ____________ Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. | |
| ID: 38356 | Rating: 0 | rate:
| |
Until that problem is sorted out I vill run Folding@home instead. Hope you will have this fixed soon. | |
| ID: 38362 | Rating: 0 | rate:
| |
|
I have unfortunately had to swap 7 machines running various 3850, 4850, and 4870 cards on to another project as each one was producing 90% computation errors or work not validated. I'll check back in a few days and see if this is still continuing. | |
| ID: 38627 | Rating: 0 | rate:
| |
I have unfortunately had to swap 7 machines running various 3850, 4850, and 4870 cards on to another project as each one was producing 90% computation errors or work not validated. I'll check back in a few days and see if this is still continuing. Did your machines upgrade to the correct application (0.23) and are they running the right brook32/64.dll? If they're giving that many errors it's probably because they're using the wrong application. ____________ | |
| ID: 38643 | Rating: 0 | rate:
| |
|
hi Travis | |
| ID: 38646 | Rating: 0 | rate:
| |
hi Travis If you're running windows, for CPU the highest app version is still 0.19. So if it's running 0.19 on the CPU that's not a problem. ____________ | |
| ID: 38648 | Rating: 0 | rate:
| |
|
yes i run Win7 and XP | |
| ID: 38649 | Rating: 0 | rate:
| |
Did your machines upgrade to the correct application (0.23) and are they running the right brook32/64.dll? Thanks for the response. I've had a bit of a change round and they seem OK now, so they are back crunching for MW after many days of lost work! See my post here. http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1679&nowrap=true#38670 ____________ Don't drink water, that's the stuff that rusts pipes | |
| ID: 38671 | Rating: 0 | rate:
| |
hi Travis I think i know what he's talking about. Yesterday, for some reason i had the same happening here on two machines. Although running on anonymous platform which only has the gpu app specified, the server send some tasks assigned for the CPU app (0.19) which was not selected in the prefs (Don't use CPU) nor specified in the app_info.xml. That shouldn't have happened at all. The boinc client started all of those tasks at once using the GPU app(i checked that in the tasks manager) and labeled them as CPU app in the boinc manager. So my 8 core machine had 2 GPU apps running (as specified in the app_info.xml) and another 8 active tasks showing up as CPU tasks in the manager, although it actually used the gpu app for them. Result of that was that the V8 slowed down quite a bit and the other one locked up completely. EDIT Same thing has happened on Collatz -> http://boinc.thesonntags.com/collatz/forum_thread.php?id=370 So i think that it's a bug server side, where it does't honor prefs nor apps specified in an app_info.xml. ____________ Join BOINC United now! | |
| ID: 38677 | Rating: 0 | rate:
| |
|
My second WU to be marked invalid: Workunit 91877832 | |
| ID: 38716 | Rating: 0 | rate:
| |
|
wann funkrioniert die scheisse hier wieder es kann nicht sein das 50% für den arsch sind | |
| ID: 38764 | Rating: 0 | rate:
| |
wann funkrioniert die scheisse hier wieder es kann nicht sein das 50% für den arsch sind Die "Scheisse" funktionert so wie es soll. Es liegt mit hoher Warscheinlichkeit an deinem Scheiss übertakten...macht wohl deine Scheisskarte nicht mit und produziert nur falsche Resultate. Darum bekommst du auch keine Scheiss Credits mehr... So ne Scheisse aber auch ... ____________ Join BOINC United now! | |
| ID: 38767 | Rating: 0 | rate:
| |
|
ob übertaktet oder nicht das Ergebnis ist nicht fürs Gesicht | |
| ID: 38774 | Rating: 0 | rate:
| |
|
Bitte, Bitte, | |
| ID: 38775 | Rating: 0 | rate:
| |
|
wenn meine karte bei allen anderen Projekten funktioniert, nur hier nicht liegt das nicht an meiner karte | |
| ID: 38789 | Rating: 0 | rate:
| |
|
Kann man so allgemein auch nicht sagen. Vielleicht vertragen andere Projekte einfach grössere Abweichungen bei den Ergebnissen. Eine geringere Fehlertoleranz ist aber kein "Fehler" des Projektes. :-P | |
| ID: 38791 | Rating: 0 | rate:
| |
|
ist ja nicht so als das meine karte hier noch nie funktioniert hat. sie hat ja schon mal zu 100% funktioniert | |
| ID: 38792 | Rating: 0 | rate:
| |
|
Ein paar Leute haben geschrieben, dass sie mit der neuen ATI app ihre Übertaktung etwas reduzieren mussten. | |
| ID: 38799 | Rating: 0 | rate:
| |
|
Maybe it could be a good idea to send WUs for validation to different types of systems if possible (for example no more than 1 to 58X0, etc.) or no more than one to systems that have many validation problems recently (58X0 whould currently qualify for that, but in future that could of course change). | |
| ID: 38810 | Rating: 0 | rate:
| |
|
aber wenn die fehlertoleranz so gering ist das man die mouse nicht mehr bewegen kann ist es vieleicht etwas übertrieben. man hat auch mal was zu tun am pc | |
| ID: 38811 | Rating: 0 | rate:
| |
aber wenn die fehlertoleranz so gering ist das man die mouse nicht mehr bewegen kann ist es vieleicht etwas übertrieben. man hat auch mal was zu tun am pc Wenn Deine Karten schon fehlerhafte Resultate ausspucken, wenn Du nur die Maus bewegst, würde ich mir ernsthaft Gedanken machen. Bei mir ist das vollkommen egal. Ich lasse MW auch auf meinem normalen Arbeits-PC laufen (habe da eine 3870X2 reingesteckt), da kann ich machen was ich will und es läuft ohne ungültige Ergebnisse. Hilfreich wäre vielleicht, wenn Du mal ein paar Angaben über Deinen Rechner machst (GPU, Takte, OS, Treiber, MW-Version). PS: Wenn es geht, sind übrigens englischsprachige Posts zu empfehlen. | |
| ID: 38829 | Rating: 0 | rate:
| |
|
So, am I the only one running nVidia that's still having troubles? | |
| ID: 38835 | Rating: 0 | rate:
| |
I uploaded 24 WUs a little bit ago, only to have 9 of them go Validation inconclusive. For some obscure reason WUs are listed as "Completed, validation inconclusive" instead of "Completed, waiting for validation" even if there is no other deviating result that would make the validation inconclusive. Currently I see 8 of 10 of your WUs in the "validation inconclusive" state although there are no deviating results. One WU is labeled as "Completed, waiting for validation" with one task "Completed, waiting for validation" and the other task "Completed, validation inconclusive". The validator is still buggy in this regard. | |
| ID: 38836 | Rating: 0 | rate:
| |
|
I posted the following message earlier, but I may have stuck it in the wrong thread: | |
| ID: 38841 | Rating: 0 | rate:
| |
I uploaded 24 WUs a little bit ago, only to have 9 of them go Validation inconclusive. "Completed, waiting for validation" = waiting on another task "Completed, validation inconclusive" = wu wasn't quite on target, and another was sent out and waiting for it to come back. ____________ Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. | |
| ID: 38842 | Rating: 0 | rate:
| |
I uploaded 24 WUs a little bit ago, only to have 9 of them go Validation inconclusive. The entry "Completed, validation inconclusive" should only be set if there are deviating results and not if the result is the only one sent back. If the other WU is unsent or no result is reported back the entry should be set to "Completed, waiting for validation". Just my two cents... | |
| ID: 38844 | Rating: 0 | rate:
| |
I uploaded 24 WUs a little bit ago, only to have 9 of them go Validation inconclusive.Offhand, except for Workunit 90594183, all of the pending WU's you submitted appear to be in the expected state based on what my own units do. All results require a confirmation from another client (quorum=2) and your work units are showing 'Validation inconclusive' because none of the other results have come back yet. In fact, some of those confirmation tasks haven't even been sent out yet. You should post a complaint about 90594183 in the "Number Crunching: Waiting for Validation..." thread (http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1678) to let Travis know there is a problem. A lot of us had problems with validation around the same time. | |
| ID: 38848 | Rating: 0 | rate:
| |
I uploaded 24 WUs a little bit ago, only to have 9 of them go Validation inconclusive. Sadly this is just how the BOINC server relays messages. If your WU gets set to Completed, validation inconclusive, that just means the server has sent out more results and is waiting to complete validation. Completed, waiting for validation means that the result hasn't gone through the validator at all yet. ____________ | |
| ID: 38855 | Rating: 0 | rate:
| |
|
OOPS it would seem that the server is out of work.. | |
| ID: 38857 | Rating: 0 | rate:
| |
OOPS it would seem that the server is out of work.. Generating work right now (see the other news post). Things should be up and running... ____________ | |
| ID: 38858 | Rating: 0 | rate:
| |
Message boards :
News :
testing new validator