Message boards :
Number crunching :
Is Anyone Addressing This Constant Computation Error Problem?
Message board moderation
Author | Message |
---|---|
Send message Joined: 24 Jul 10 Posts: 21 Credit: 465,205 RAC: 0 |
Is anyone addressing the WU's that constantly fail due to computation error?....see the Bad WU thread also - no one seems to be at home in that thread lately. At least so we know when it's safe to resume work fetching. The reason I ask is that I see no mention of it in the News. Peter Toronto, Canada |
Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 |
Mercifully the malformed WUs seem to have mostly burned their way through the system. Haven't seen any of the really bad de_separation_17_3s_fix_5 WUs for a couple of days. Had one de_separation_10_3s_free_2 WU yesterday and 1 so far today. The latter aren't as troublesome because they didn't have a tendency to get stuck and run for hours like the de_separation_17_3s_fix_5 WUs. Hopefully in the future the admins will cancel malformed WUs and not let them simply run through the system creating havoc. Please? |
Send message Joined: 24 Jul 10 Posts: 21 Credit: 465,205 RAC: 0 |
Mercifully the malformed WUs seem to have mostly burned their way through the system. Haven't seen any of the really bad de_separation_17_3s_fix_5 WUs for a couple of days. Had one de_separation_10_3s_free_2 WU yesterday and 1 so far today. The latter aren't as troublesome because they didn't have a tendency to get stuck and run for hours like the de_separation_17_3s_fix_5 WUs. Hopefully in the future the admins will cancel malformed WUs and not let them simply run through the system creating havoc. Please? So you reckon it's safe to turn on work fetching? Peter Toronto, Canada |
Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 |
So you reckon it's safe to turn on work fetching? For me the answer is that I never turned work fetch off. I did try to catch the de_separation_17_3s_fix_5 WUs and abort them when they appeared so that they wouldn't put a GPU out of commission for hours. I think the project should institute a policy that no new WU runs be created until after at least the 2nd cup of coffee :) |
Send message Joined: 24 Jul 10 Posts: 21 Credit: 465,205 RAC: 0 |
So you reckon it's safe to turn on work fetching? Well for now it seems the supply has dried up anyway. When they failed what got me were the dozens of errors Boinc/Windows threw up. Fine if one was sitting in front of the machine when they occurred but a bit off-putting when one had been away from the machine for a few hours. Peter Toronto, Canada |
Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,461,693,501 RAC: 0 |
Mercifully the malformed WUs seem to have mostly burned their way through the system. Haven't seen any of the really bad de_separation_17_3s_fix_5 WUs for a couple of days. Had one de_separation_10_3s_free_2 WU yesterday and 1 so far today. The latter aren't as troublesome because they didn't have a tendency to get stuck and run for hours like the de_separation_17_3s_fix_5 WUs. Hopefully in the future the admins will cancel malformed WUs and not let them simply run through the system creating havoc. Please? The next time this happens can you terminate boinc and restart? I have had several tasks here (milkway) and at setathome completed successfully immedately after restarting boinc. It is as if they were done but were unable to signal thay had completed. |
Send message Joined: 15 Jul 08 Posts: 383 Credit: 729,293,740 RAC: 0 |
Mercifully the malformed WUs seem to have mostly burned their way through the system. Haven't seen any of the really bad de_separation_17_3s_fix_5 WUs for a couple of days. Had one de_separation_10_3s_free_2 WU yesterday and 1 so far today. The latter aren't as troublesome because they didn't have a tendency to get stuck and run for hours like the de_separation_17_3s_fix_5 WUs. Hopefully in the future the admins will cancel malformed WUs and not let them simply run through the system creating havoc. Please? No, because both of these WU runs were improperly formatted. They all fail. The sooner they're dead the better. The de_separation_17_3s_fix_5 were the worst because sometimes they refused to terminate at the normal time. They all fail nevertheless, no matter what is done. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I'm pretty sure Matt N. has taken them down. We're looking into what's causing the problem. |
Send message Joined: 24 Jul 10 Posts: 21 Credit: 465,205 RAC: 0 |
The computation error idiocy continues. I'm about to withdraw my meagre support for this project. Peter Toronto, Canada |
Send message Joined: 1 Sep 08 Posts: 204 Credit: 219,354,537 RAC: 0 |
Are you talking about faulty WUs (not seeing any on my hosts) or are you talking about hosts which are still using the old optimized apps, which generate nothing but errors with the new setup? MrS Scanning for our furry friends since Jan 2002 |
Send message Joined: 24 Jul 10 Posts: 21 Credit: 465,205 RAC: 0 |
I have no idea what the difference is, sorry. All I know is I keep getting work units that fail "computation error" and I'm not getting that with any other projects. Peter Toronto, Canada |
Send message Joined: 28 Jan 10 Posts: 7 Credit: 23,771,725 RAC: 85 |
Computation error since app update in April. I've try all what I could find on this forum: Update drivers Restart project Detached and reattached Config: BOINC 6.10.58 Radeon HD3850, Catalyst 11.3 Windows XP SP3 32Bit Athlon XP 2600+ MilkyWay was running well with opptim app before app update. I look for a solution since one month, and find nothing! What is the problem? Is some one have the same problem or THE solution, please? Error code: Unhandled Exception Detected... - Unhandled Exception Record - Reason: Illegal Instruction (0xc000001d) at address 0x004051F9 |
Send message Joined: 1 Sep 08 Posts: 204 Credit: 219,354,537 RAC: 0 |
@Skwi: it seems the current GPU app uses SSE2 instructions somewhere, probably in some 3rd party libraries. Your CPU does not support these. The old app used a different method. Matt is aware of the issue and a fix is planned for the next release, but that's probably not going to happen tomorrow. @Ex_Brit: I'm asking because I am not and have not seen any project related task failures on my machines. Difficult to tell, though, as the results disappear so quickly. Currently you don't have any results shown for your host (I wouldn't want to run "all error" WUs either), so I can't take a look at the error. The nVidia drivers 270.6x are reported to behave quite bad, so you may want to go straight to 275.xx. Can't promise it'll help, though. And you might consider running other projects on your GTX295. They're still powerful, but can only use 1/8th of their power here (double precision), which is a lot less than ATI cards. Your cards still rock at GPU-Grid, though :) MrS Scanning for our furry friends since Jan 2002 |
Send message Joined: 24 Jul 10 Posts: 21 Credit: 465,205 RAC: 0 |
I understand, however since I had overheating issues I don't allow any projects to use the GPU any more. I also am not interested in using beta graphics drivers just to suit a project. This project needs to fix the work view problem too, there is no reason why recent work units should vanish from the records so quickly. This problem started with THIS thread so isn't anything new. Peter Toronto, Canada |
Send message Joined: 28 Jan 10 Posts: 7 Credit: 23,771,725 RAC: 85 |
Thank you ETA to confirm the origin of the problem. I will wait the new release. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Computation error since app update in April. The optimized applications are deprecated and won't validate anymore -- there were quite a few news threads about this. We've updated the server code and what we were sending to clients to help reduce server load, so the old optimized applications aren't receiving certain files they need to run. I don't know if anyone has released any newer optimized applications that run with the new versions of the workunits, but you can always use the ones provided by the server. |
Send message Joined: 24 Jul 10 Posts: 21 Credit: 465,205 RAC: 0 |
but you can always use the ones provided by the server. Are you saying it's fixed or what? I wasn't aware we can pick and choose where our work comes from. Peter Toronto, Canada |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
but you can always use the ones provided by the server. I know Matt A. has been working on the GPU applications pretty constantly. You might want to shoot him a message with the details of your problem, if the applications provided by the server aren't running correctly. |
Send message Joined: 24 Jul 10 Posts: 21 Credit: 465,205 RAC: 0 |
My problem is with the regular WU's not the GPU ones. I don't use my GPU as previously stated, overheating issues and it caused problems with other applications. What get's me is the issue hasn't even been mentioned in the News and it's been going on for ages. They also need to alter this website interface so that more WU's are kept on view. At present they disappear within hours so no one can check what past WU's were that gave the issue without a lot of delving. Peter Toronto, Canada |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
My problem is with the regular WU's not the GPU ones. I don't use my GPU as previously stated, overheating issues and it caused problems with other applications. Try crunching one or two so I can see the error results. Do they immediately error out, or does it take awhile? |
©2024 Astroinformatics Group