Message boards :
Number crunching :
Sudden mass of WU's finishing with Computation Error
Message board moderation
Author | Message |
---|---|
Send message Joined: 21 Jun 09 Posts: 4 Credit: 4,119,496 RAC: 0 |
Ok, so I checked my system this morning, and found that overnight, a serious amount of WU's for MilkyWay had resulted in Computation Error, all with 0s elapsed. Anyone else experiencing this? I'm running BOINC 6.10.18 MilkyWay app is 0.21 (cuda23) Running NVidia driver 191.07 - havent had time to update to 195.62 as yet OS: Win7 64bit |
Send message Joined: 12 Aug 09 Posts: 172 Credit: 645,240,165 RAC: 0 |
Ok, so I checked my system this morning, and found that overnight, a serious amount of WU's for MilkyWay had resulted in Computation Error, all with 0s elapsed. Happened to me the other day. I reverted to Vista and the problem went away. PS: I tried the 195.62 driver and that just caused VDU crashes all the time. So much so that out of 48 WU's only 10 did NOT error. |
Send message Joined: 18 Oct 07 Posts: 35 Credit: 4,684,314 RAC: 0 |
Not a mass but I had four of them on my GTX260 with 191.07 friday afternoon (UTC), too. OS is WinXP32 SP2 with BOINC 6.10.17. All errors were 0x1, invalid function. |
Send message Joined: 28 Jan 09 Posts: 31 Credit: 85,934,108 RAC: 0 |
Yes I am. Only on 1 machine thoug but every unit dies. Reinstalled every driver from ATI from 8.6x driver to 9.4 through 9.11 catalyst to no avail. Redownloaded the apps. Check the right version i.e. ATI or AMD. For me its only the machine with ATI and nVidia in it and crappy cards but its driving me mad. 02:36 am and I am still here fighting it. Oh and Collatz optimised app is fine grrr |
Send message Joined: 21 Jun 09 Posts: 4 Credit: 4,119,496 RAC: 0 |
Still happening - every single WU - must be almost 1000 in total by now. Gonna hafta suspend this project 'til I figure out whats going down. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Still happening - every single WU - must be almost 1000 in total by now. Gonna hafta suspend this project 'til I figure out whats going down. When did this start? With the recent batch of new workunits? What GPU are you using? |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
@ Aaron: Your result gives this: Can't get shared memory segment name: shmget() failed I remeber this problem, can't remember the solution. @ Travis: If you look his first message it is a day ago, so it wouldn't be the new tasks. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Yeah I just saw that. Not getting shared memory sounds like a problem with the BOINC client. |
Send message Joined: 18 Sep 09 Posts: 1 Credit: 694,971 RAC: 0 |
Good evening guys (and girls). Starting with the increased size, all new workunints end up as invalid. For the meantime I have stopped the project. Any clues? Thanks a lot. |
Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 |
Well, I cannot see any difference on my tasks that complete successfuly and those that fail. I put the one system into NNT ... GTX295 cards are what I see the failures on. Just checked, my dual GTX260 system is also showing a high failure rate. Most interesting is that the system with a GTX280 is not showing a single failure. My ATI system is running off Collatz tasks so can't check there till tomorrow sometime. The tasks complete successfully but fail validation. Most interestingly I have had some tasks validate... Not enough information for me to tell you more ... |
Send message Joined: 26 Jan 09 Posts: 589 Credit: 497,834,261 RAC: 0 |
|
Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 |
6 x 4870s here - no invalid tasks. Doubt it ... Tried running some of the ATI tasks I have on hand but they all seem to be the short ones. So, no comparison. My GTX280 card seems to be running the long tasks just fine. Not sure why the 295 and 260 cards are having problems unless it has to do with multiple GPUs ... not sure why that would crop up now ... worked fine before ... |
Send message Joined: 18 Oct 07 Posts: 35 Credit: 4,684,314 RAC: 0 |
All new WUs seem to finish invalid on both of my cuda machines. Have no ATI here. Machine 1: intel Q9650 WinXP SP2 GTX260 191.07 BOINC 6.10.17 Machine 2: intel Pentium D Win2003 SP2 GTX260 191.07 BOINC 6.10.17 Project degraded to NNT until this is fixed. Oh boy, this reminds me of the song Flakes from Frank Zappa ;-) |
Send message Joined: 9 Sep 08 Posts: 96 Credit: 336,443,946 RAC: 0 |
I have had no errors/invalid tasks yet with my W7 64 bit, Q6600, NVIDIA GeForce GTX 260 (896MB) driver: 19107... |
Send message Joined: 18 Oct 07 Posts: 35 Credit: 4,684,314 RAC: 0 |
I have had no errors/invalid tasks yet with my W7 64 bit, Q6600, NVIDIA GeForce GTX 260 (896MB) driver: 19107... Hidden computers are always very helpful for diagnostics. How big is your cache? Maybe you're still crunching older WUs. When did you get them? |
Send message Joined: 9 Sep 08 Posts: 96 Credit: 336,443,946 RAC: 0 |
I'll get right back to you on that... |
Send message Joined: 13 Nov 07 Posts: 1 Credit: 27,656,801 RAC: 1,296 |
Another problem machine AMD Phenom II X4 940 Processor Linux 64 bit Nvidia GTX 285 Boinc 6.10.18 |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
Another problem machine Gives this message: core_client_version>6.10.18</core_client_version> <![CDATA[ <stderr_txt> Device index specified on the command line was 0 Looking for a Double Precision capable NVIDIA GPU The device GeForce GTX 285 specified on the command line can be used called boinc_finish </stderr_txt> Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 18 Oct 07 Posts: 35 Credit: 4,684,314 RAC: 0 |
OK, now after I tried some more things like project reset, reboot, detach and reattach, I'm completely stumped what to do next. Seems that I can no longer deliver valid results. I browsed through the user stats and there can be seen that it's not a global problem. All of the top users seem to have no problems crunching the new WUs, and it's independent of BOINC client version or GPU type. My GPUs are almost new. One is crunching since two weeks, the other one is running since the beginning of november and it's very unlikely that both cards will die in the same second. The fact that I'm not the only one having this problem makes it less likely that all those GPUs are malfunctioning. Other projects like seti and collatz are running fine on both machines so it is assumed that the GPUs are all fine. So what's going on here? Due to excessive DB purging here at MW there is not much of a history but I'm almost sure that it all began when the WU size has been increased. And the only thing that has been changed on both machines was anti virus pattern files. Here are the machines again: Machine 1: intel Q9650, WinXP SP2, GTX260 191.07, BOINC 6.10.17 Machine 2: intel Pentium D, Win2003 SP2, GTX260 191.07, BOINC 6.10.17 Please help, any suggestion will be appreciated. |
Send message Joined: 17 Feb 08 Posts: 363 Credit: 258,227,990 RAC: 0 |
The only thing i can think of is that now that the WU size increased 'damatically', is that the cuda app is reaching the maximum time that a cuda kernel is allowed to run and therefore crashes. One could try dviding the 'domain size' to prevent that. Anyway, that's just a guess without having a look at the cuda code at all. Might be something totaly different. Only Anthony will know for sure. Join Support science! Joinc Team BOINC United now! |
©2024 Astroinformatics Group