Welcome to MilkyWay@home

Is Anyone Addressing This Constant Computation Error Problem?

Message boards : Number crunching : Is Anyone Addressing This Constant Computation Error Problem?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Ex_Brit
Avatar

Send message
Joined: 24 Jul 10
Posts: 21
Credit: 465,205
RAC: 0
Message 48830 - Posted: 17 May 2011, 10:15:51 UTC
Last modified: 17 May 2011, 10:18:09 UTC

Is anyone addressing the WU's that constantly fail due to computation error?....see the Bad WU thread also - no one seems to be at home in that thread lately.

At least so we know when it's safe to resume work fetching.

The reason I ask is that I see no mention of it in the News.
Peter
Toronto, Canada
ID: 48830 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 48831 - Posted: 17 May 2011, 12:55:03 UTC

Mercifully the malformed WUs seem to have mostly burned their way through the system. Haven't seen any of the really bad de_separation_17_3s_fix_5 WUs for a couple of days. Had one de_separation_10_3s_free_2 WU yesterday and 1 so far today. The latter aren't as troublesome because they didn't have a tendency to get stuck and run for hours like the de_separation_17_3s_fix_5 WUs. Hopefully in the future the admins will cancel malformed WUs and not let them simply run through the system creating havoc. Please?
ID: 48831 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ex_Brit
Avatar

Send message
Joined: 24 Jul 10
Posts: 21
Credit: 465,205
RAC: 0
Message 48832 - Posted: 17 May 2011, 13:04:39 UTC - in response to Message 48831.  

Mercifully the malformed WUs seem to have mostly burned their way through the system. Haven't seen any of the really bad de_separation_17_3s_fix_5 WUs for a couple of days. Had one de_separation_10_3s_free_2 WU yesterday and 1 so far today. The latter aren't as troublesome because they didn't have a tendency to get stuck and run for hours like the de_separation_17_3s_fix_5 WUs. Hopefully in the future the admins will cancel malformed WUs and not let them simply run through the system creating havoc. Please?


So you reckon it's safe to turn on work fetching?

Peter
Toronto, Canada
ID: 48832 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 48833 - Posted: 17 May 2011, 13:23:45 UTC - in response to Message 48832.  

So you reckon it's safe to turn on work fetching?

For me the answer is that I never turned work fetch off. I did try to catch the de_separation_17_3s_fix_5 WUs and abort them when they appeared so that they wouldn't put a GPU out of commission for hours. I think the project should institute a policy that no new WU runs be created until after at least the 2nd cup of coffee :)
ID: 48833 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ex_Brit
Avatar

Send message
Joined: 24 Jul 10
Posts: 21
Credit: 465,205
RAC: 0
Message 48834 - Posted: 17 May 2011, 13:30:41 UTC - in response to Message 48833.  

So you reckon it's safe to turn on work fetching?

For me the answer is that I never turned work fetch off. I did try to catch the de_separation_17_3s_fix_5 WUs and abort them when they appeared so that they wouldn't put a GPU out of commission for hours. I think the project should institute a policy that no new WU runs be created until after at least the 2nd cup of coffee :)


Well for now it seems the supply has dried up anyway. When they failed what got me were the dozens of errors Boinc/Windows threw up. Fine if one was sitting in front of the machine when they occurred but a bit off-putting when one had been away from the machine for a few hours.
Peter
Toronto, Canada
ID: 48834 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Joseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 291
Credit: 2,461,693,501
RAC: 0
Message 48840 - Posted: 17 May 2011, 20:29:01 UTC - in response to Message 48831.  

Mercifully the malformed WUs seem to have mostly burned their way through the system. Haven't seen any of the really bad de_separation_17_3s_fix_5 WUs for a couple of days. Had one de_separation_10_3s_free_2 WU yesterday and 1 so far today. The latter aren't as troublesome because they didn't have a tendency to get stuck and run for hours like the de_separation_17_3s_fix_5 WUs. Hopefully in the future the admins will cancel malformed WUs and not let them simply run through the system creating havoc. Please?



The next time this happens can you terminate boinc and restart? I have had several tasks here (milkway) and at setathome completed successfully immedately after restarting boinc. It is as if they were done but were unable to signal thay had completed.
ID: 48840 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 48850 - Posted: 18 May 2011, 15:02:30 UTC - in response to Message 48840.  

Mercifully the malformed WUs seem to have mostly burned their way through the system. Haven't seen any of the really bad de_separation_17_3s_fix_5 WUs for a couple of days. Had one de_separation_10_3s_free_2 WU yesterday and 1 so far today. The latter aren't as troublesome because they didn't have a tendency to get stuck and run for hours like the de_separation_17_3s_fix_5 WUs. Hopefully in the future the admins will cancel malformed WUs and not let them simply run through the system creating havoc. Please?

The next time this happens can you terminate boinc and restart? I have had several tasks here (milkway) and at setathome completed successfully immedately after restarting boinc. It is as if they were done but were unable to signal thay had completed.

No, because both of these WU runs were improperly formatted. They all fail. The sooner they're dead the better. The de_separation_17_3s_fix_5 were the worst because sometimes they refused to terminate at the normal time. They all fail nevertheless, no matter what is done.
ID: 48850 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 48854 - Posted: 18 May 2011, 22:13:24 UTC - in response to Message 48850.  

I'm pretty sure Matt N. has taken them down. We're looking into what's causing the problem.
ID: 48854 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ex_Brit
Avatar

Send message
Joined: 24 Jul 10
Posts: 21
Credit: 465,205
RAC: 0
Message 48913 - Posted: 21 May 2011, 12:45:11 UTC

The computation error idiocy continues. I'm about to withdraw my meagre support for this project.
Peter
Toronto, Canada
ID: 48913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
Message 48951 - Posted: 24 May 2011, 20:35:17 UTC - in response to Message 48913.  

Are you talking about faulty WUs (not seeing any on my hosts) or are you talking about hosts which are still using the old optimized apps, which generate nothing but errors with the new setup?

MrS
Scanning for our furry friends since Jan 2002
ID: 48951 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ex_Brit
Avatar

Send message
Joined: 24 Jul 10
Posts: 21
Credit: 465,205
RAC: 0
Message 48953 - Posted: 24 May 2011, 20:44:25 UTC

I have no idea what the difference is, sorry. All I know is I keep getting work units that fail "computation error" and I'm not getting that with any other projects.
Peter
Toronto, Canada
ID: 48953 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>France>Est>Bourgogne]Skwi

Send message
Joined: 28 Jan 10
Posts: 7
Credit: 23,771,725
RAC: 85
Message 49003 - Posted: 25 May 2011, 19:59:19 UTC

Computation error since app update in April.

I've try all what I could find on this forum:
Update drivers
Restart project
Detached and reattached

Config:

BOINC 6.10.58
Radeon HD3850, Catalyst 11.3
Windows XP SP3 32Bit
Athlon XP 2600+


MilkyWay was running well with opptim app before app update.
I look for a solution since one month, and find nothing!
What is the problem? Is some one have the same problem or THE solution, please?


Error code:

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Illegal Instruction (0xc000001d) at address 0x004051F9
ID: 49003 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ExtraTerrestrial Apes
Avatar

Send message
Joined: 1 Sep 08
Posts: 204
Credit: 219,354,537
RAC: 0
Message 49005 - Posted: 25 May 2011, 21:06:45 UTC - in response to Message 49003.  

@Skwi: it seems the current GPU app uses SSE2 instructions somewhere, probably in some 3rd party libraries. Your CPU does not support these. The old app used a different method. Matt is aware of the issue and a fix is planned for the next release, but that's probably not going to happen tomorrow.

@Ex_Brit: I'm asking because I am not and have not seen any project related task failures on my machines. Difficult to tell, though, as the results disappear so quickly. Currently you don't have any results shown for your host (I wouldn't want to run "all error" WUs either), so I can't take a look at the error. The nVidia drivers 270.6x are reported to behave quite bad, so you may want to go straight to 275.xx. Can't promise it'll help, though.
And you might consider running other projects on your GTX295. They're still powerful, but can only use 1/8th of their power here (double precision), which is a lot less than ATI cards. Your cards still rock at GPU-Grid, though :)

MrS
Scanning for our furry friends since Jan 2002
ID: 49005 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ex_Brit
Avatar

Send message
Joined: 24 Jul 10
Posts: 21
Credit: 465,205
RAC: 0
Message 49007 - Posted: 25 May 2011, 21:37:40 UTC - in response to Message 49005.  

I understand, however since I had overheating issues I don't allow any projects to use the GPU any more. I also am not interested in using beta graphics drivers just to suit a project.

This project needs to fix the work view problem too, there is no reason why recent work units should vanish from the records so quickly.

This problem started with THIS thread so isn't anything new.
Peter
Toronto, Canada
ID: 49007 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>France>Est>Bourgogne]Skwi

Send message
Joined: 28 Jan 10
Posts: 7
Credit: 23,771,725
RAC: 85
Message 49031 - Posted: 26 May 2011, 11:48:59 UTC

Thank you ETA to confirm the origin of the problem.

I will wait the new release.
ID: 49031 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 49032 - Posted: 26 May 2011, 11:55:05 UTC - in response to Message 49003.  

Computation error since app update in April.

I've try all what I could find on this forum:
Update drivers
Restart project
Detached and reattached

Config:

BOINC 6.10.58
Radeon HD3850, Catalyst 11.3
Windows XP SP3 32Bit
Athlon XP 2600+


MilkyWay was running well with opptim app before app update.
I look for a solution since one month, and find nothing!
What is the problem? Is some one have the same problem or THE solution, please?


Error code:

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Illegal Instruction (0xc000001d) at address 0x004051F9



The optimized applications are deprecated and won't validate anymore -- there were quite a few news threads about this. We've updated the server code and what we were sending to clients to help reduce server load, so the old optimized applications aren't receiving certain files they need to run.

I don't know if anyone has released any newer optimized applications that run with the new versions of the workunits, but you can always use the ones provided by the server.
ID: 49032 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ex_Brit
Avatar

Send message
Joined: 24 Jul 10
Posts: 21
Credit: 465,205
RAC: 0
Message 49033 - Posted: 26 May 2011, 12:04:57 UTC - in response to Message 49032.  

but you can always use the ones provided by the server.


Are you saying it's fixed or what? I wasn't aware we can pick and choose where our work comes from.
Peter
Toronto, Canada
ID: 49033 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 49036 - Posted: 26 May 2011, 12:28:03 UTC - in response to Message 49033.  

but you can always use the ones provided by the server.


Are you saying it's fixed or what? I wasn't aware we can pick and choose where our work comes from.


I know Matt A. has been working on the GPU applications pretty constantly. You might want to shoot him a message with the details of your problem, if the applications provided by the server aren't running correctly.
ID: 49036 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ex_Brit
Avatar

Send message
Joined: 24 Jul 10
Posts: 21
Credit: 465,205
RAC: 0
Message 49038 - Posted: 26 May 2011, 12:41:49 UTC - in response to Message 49036.  

My problem is with the regular WU's not the GPU ones. I don't use my GPU as previously stated, overheating issues and it caused problems with other applications.

What get's me is the issue hasn't even been mentioned in the News and it's been going on for ages.

They also need to alter this website interface so that more WU's are kept on view. At present they disappear within hours so no one can check what past WU's were that gave the issue without a lot of delving.
Peter
Toronto, Canada
ID: 49038 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 49040 - Posted: 26 May 2011, 12:47:48 UTC - in response to Message 49038.  

My problem is with the regular WU's not the GPU ones. I don't use my GPU as previously stated, overheating issues and it caused problems with other applications.

What get's me is the issue hasn't even been mentioned in the News and it's been going on for ages.

They also need to alter this website interface so that more WU's are kept on view. At present they disappear within hours so no one can check what past WU's were that gave the issue without a lot of delving.


Try crunching one or two so I can see the error results. Do they immediately error out, or does it take awhile?
ID: 49040 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Is Anyone Addressing This Constant Computation Error Problem?

©2024 Astroinformatics Group