Message boards :
Number crunching :
Broken WUs
Message board moderation
Author | Message |
---|---|
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
I just had a look to the some gs_constrainted_82_2s_4 WUs. The search parameters appear to be messed up, causing invalid WUs. gs_constrainted_82_2s_4 There still may be issues with the new apps (even if the new apps were tested on different WU types) with new WU types. But with all those bad WUs flying around it is hard to identify if there are issues. |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 |
I just had a look to the some gs_constrainted_82_2s_4 WUs. The search parameters appear to be messed up, causing invalid WUs. Cluster I think there is more than that to the invalid WU results. I started a thread here on the same subjets (I think). But I have swapped back to 0.19 and this is also giving more than 70% invalid results (all CPU work). My 3850 GPU seems to be OK on 0.20. Go away, I was asleep |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
I think there is more than that to the invalid WU results. I started a thread here on the same subjets (I think). But I have swapped back to 0.19 and this is also giving more than 70% invalid results (all CPU work). My 3850 GPU seems to be OK on 0.20. So apparently it is nothing introduced with 0.20. The GPU app was designed in some parts to be more fault tolerant than the CPU apps. If some calculations don't make sense because the input parameters are all zero (like in the example above), it is less likely to come up with an infinity result during some intermediate steps. But the results are crap either way, they only might pass the validator easier. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I'll be keeping a close eye on the server today to see if anything weird like this happens. The astronomers were trying some new things with the new search parameters so there might be some kind of error there. |
Send message Joined: 6 Mar 09 Posts: 1 Credit: 37,785,089 RAC: 0 |
I have many invalidated results from both 0.19 and 0.20 applicaitons. http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=91486&offset=0&show_names=0&state=4 |
Send message Joined: 16 Oct 08 Posts: 18 Credit: 164,409,593 RAC: 0 |
Still have invalid result from 2s wu |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I am seeing some invalid results coming in with the 2s WUs. Unfortunately these aren't reporting what application they're coming from so that's making the problem a bit hard to track down. I think there might be some kind of math issue in the application (I don't know if this is happening with the stock application -- which is part of the problem), which is making the application return NaN for the fitness, which is then screwing up the reported parameters, fitness and preventing the application from reporting it's version correctly. Either that or the server isn't reading the version correctly. I'm pretty sure the parameter sets the server is generating are valid. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Can anyone give me the input search parameters file for a bad workunit, not what the result/stderr file are reporting? I think there's some kind of weird numerical error with the new bounds of the workunits. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
Can anyone give me the input search parameters file for a bad workunit, not what the result/stderr file are reporting? Just look to the very first post in this thread. That is the content of the search parameter file for a failing WU straight from my computer. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
I am seeing some invalid results coming in with the 2s WUs. Unfortunately these aren't reporting what application they're coming from so that's making the problem a bit hard to track down. They are possibly coming from all versions. At least the 0.19 as well as the new 0.20 apps can't do anything meaningful with the parameters of those _2s WUs. And I really doubt the stock app would fare much better with the search parameters from the first post in the thread. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I am seeing some invalid results coming in with the 2s WUs. Unfortunately these aren't reporting what application they're coming from so that's making the problem a bit hard to track down. I think I'm seeing the problem now. For some reason it looks like some of the search data is getting corrupted on the server... maybe a memory leak, i'm not sure. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
I am seeing some invalid results coming in with the 2s WUs. Unfortunately these aren't reporting what application they're coming from so that's making the problem a bit hard to track down. But it could also point to a problem with the validator reading the results. The peculiar thing I found about the failing WUs, i.e. those which are marked as invalid even as the search_parameters look to be okay (different to the wrong parameters as posted in the first message), is a very long metadata string. While the buffer in the app holding this data is large enough to handle it, that may not be the case for the validator reading the result files. Maybe there is some kind of buffer overflow which messes up the reading of the application version string which follows (you mentioned there is a problem with this afaik). Just an example of a 82_2s_6 WU I've seen this: ps_constrainted_82_2s_6 "Normal" WUs have very short metadata (like "i:26, redundancy" or something as short as this and not a complete parameter set), so this may be a problem. |
Send message Joined: 23 Aug 09 Posts: 1 Credit: 5,489,712 RAC: 0 |
Limit in 1500s( 25 min). Overclock and downclock CPU show probleme |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I am seeing some invalid results coming in with the 2s WUs. Unfortunately these aren't reporting what application they're coming from so that's making the problem a bit hard to track down. That's what a particle swarm WU should look like. The metadata buffer size is 2048 so it should be more than enough to cover that. The smaller metadata is from genetic search and differential evolution. I'm putting in some code to check if the server is somehow sending out bad WUs. I already have a lot of checks in there to make sure everything is within bounds... so I'm not quite sure what the issue is yet. |
Send message Joined: 9 Nov 08 Posts: 41 Credit: 92,786,635 RAC: 0 |
Hi! Still errors.... http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=94222&offset=0&show_names=0&state=4 A proud member of the Polish National Team COME VISIT US at Polish National Team FORUM |
Send message Joined: 4 Jul 08 Posts: 165 Credit: 364,966 RAC: 0 |
I've been using the ver 0.20 app for about 24hrs now and I haven't seen any wu's that have been invalid as yet, but that could change.. Will keep an eye on it and let you folks know if any change happens |
Send message Joined: 26 Jan 09 Posts: 589 Credit: 497,834,261 RAC: 0 |
|
Send message Joined: 6 May 08 Posts: 1 Credit: 16,952,704 RAC: 0 |
Anyway, still having troubles, also with _6 WU's. See: http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=103258426 |
Send message Joined: 4 Oct 08 Posts: 1734 Credit: 64,228,409 RAC: 0 |
Still getting more than 50% of the work on this host (32120) posted as invalid. This is only using CPUs (a quad), while the old quad (64209) uses a 4850 without producing any invalid results. Travis Your request under your Home Page post Work Unit errors Part II still seems to be giving problems as you can see from my first link at this post. Go away, I was asleep |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
@Travis: I think the really strange thing is that the GPU apps appear get the 82_2s_6 validated without any problems now, while the validation of CPU results still have quite some issues (0.19 as well as 0.20 are affected). So the changes to the validator helped indeed but only for the GPU apps as I see it. I was trying to find some reason for the this, and I have to admit I don't find one. I've pulled some random 82_2s_6 WUs from my computer and crunched them both with the GPU (which validates and gives credits) as well as on the CPU (where people can't get it validated). At least on my system both CPU as well as GPU gives the exact same reasonable looking result file (besides the application signature) as promised for 0.20. So I have really no idea what is the reason for the validation issues. Can someone save the search_parameters file of a 82_2s_6 WU failing to validate on his CPU and post it here? I can give instruction how to run the app offline with those WU so one can analyze the result file as well as the checkpoint file when run on his specific CPU an compare it with the output of a GPU or CPUs on other systems. As an example, I've taken the file "de_constrainted_82_2s_6_search_parameters_794609_1253021251": de_constrainted_82_2s_6 I calculated it with the 0.20 Win64_SSE3 CPU application (failing to validate on those WU types for a lot of people as well as with the 0.19 version) on a PhenomX4. The result was: de_constrainted_82_2s_6 I calculated the exact same WU with the 0.20 ATI_Win64 app (which actually sent this result to the server and it validated without problems) and got this: de_constrainted_82_2s_6 As I said, the results are identical besides the application string and I don't see the slightst reason why the CPU result should not be validated. |
©2024 Astroinformatics Group