Message boards :
Number crunching :
Client errors
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 2 Jan 08 Posts: 123 Credit: 69,796,808 RAC: 487 |
Have had a number of recent failures, all on my Windows machine, none on my Linux machine. Both are Opteron 285 machines and I leave the application in memory, I am running Win XP and B/M 5.10.38. The error I am getting is "One or more arguments are invalid" and a whole heap of stuff in the error report:-- WU 2692733 WU 2692835 WU 2692843 WU 2692881 WU 2692897 WU 2707581 WU 2712199 WU 2712209 Thanks, hope it is a simple problem. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
I still get ones that freeze, out of the last two groups of 140's, each had 2 that froze so far. |
Send message Joined: 8 Oct 07 Posts: 289 Credit: 3,690,838 RAC: 0 |
Have had a number of recent failures, all on my Windows machine, none on my Linux machine. Both are Opteron 285 machines and I leave the application in memory, I am running Win XP and B/M 5.10.38. Conan - I have to wonder if it has anything to do with the Boinc client you are using...is 5.10.38 a stable version? |
Send message Joined: 2 Jan 08 Posts: 123 Credit: 69,796,808 RAC: 487 |
Have had a number of recent failures, all on my Windows machine, none on my Linux machine. Both are Opteron 285 machines and I leave the application in memory, I am running Win XP and B/M 5.10.38. Maybe, maybe not, but I had to upgrade to it to fix a problem that developed on Ralph with compiled libs. So I am stuck with it for now. I did notice in another thread that another Tester has got the same problem and they were using 5.10.30 I think. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
This current bunch I have had 6 that froze. @conan - Do you need an older version of Boinc? I have a few past versions. |
Send message Joined: 2 Jan 08 Posts: 123 Credit: 69,796,808 RAC: 487 |
This current bunch I have had 6 that froze. Thanks banditwolf, I can go back to 5.8.15, 5.10.30 and 5.10.35, if I need to but all running ok over the last day. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
I have found that when I click to open up Boinc manager that the current mw wu will freeze, it then goes to the next. It's happened twice today, and it doesn't seem to be everytime, so far only when the wu's changed back to 0%. |
Send message Joined: 27 Dec 07 Posts: 35 Credit: 1,432,926 RAC: 0 |
I had two freeze this week. Both of these the manager said was running but they did not count the time or progress. About 5 hours on 2761512 and about 10 hour on 2650511. I don't always have access to this computer so it can take time before I get to abort them, and I can't go to an older BOINC as the computer has Vista on it and older ones kill other programs work when shut down, which the owner dues often. Not mine so I can't do anything about that. Even with this I will take the credits for work done on a computer whare the owner will let it run. We all have to find others who will let us run it, way to many CPU cycles are waisted. |
Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0 |
It would be nice if those hangs were fixed, but as a maybe simpler solution could a second thread be added to each WU that polls it every few seconds and gives it a jolt if it hangs? |
Send message Joined: 8 Oct 07 Posts: 289 Credit: 3,690,838 RAC: 0 |
I don't think it is individual workunits causing the freeze. I think it is the way the application communicates with the Boinc client. I say this because yesterday while the Cosmology server was busy aborting the work it just gave me,every instance of milkyway running across all hosts ended up with computation errors. Every freeze of a result here I have seen has to do with an application switch to another project.While individual workunits freeze it seems to me to be the symptom and not the cause.....me thinks the application needs some bugs to be fixed to get rid of this. |
Send message Joined: 9 Nov 07 Posts: 20 Credit: 39,712 RAC: 0 |
Hi, I don't think it is individual workunits causing the freeze. I think it is the way the application communicates with the Boinc client.... There may well be something to this. I should've mentioned this before, but about a week or so ago I had to restart my host, at the time BOINC was crunching a non Milkyway WU and had three Milkyway tasks queued up ready to run. When BOINC restarted all three of the Milkyway WUs immediately "errored out" showing compute errors in the BOINC manager, even though, AFAIK, BOINC hadn't even attempted to run them yet.... [EDIT] These would be WUs #2486445, #2486167 and #2485667. If you look at my results page here you can see the three units in question. TTFN - Pete. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
yesterday I had a full 20 that went through with no problems. Today my previous bunch I had 1 fail for an unknown reason to me (wu id # 3328146). Had a pop-up. My current bunch I am having a lot (so far 6 of 13) that are freezing and going on to the next wu.(at various completion times, >10 secs &21 to 23 min) A couple times seem to be when I come back to my computer to do something. (leave app in mem is checked) |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
yesterday I had a full 20 that went through with no problems. are these having a pop up? what exactly is happening with the work units? they just stop and move on? |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
The only one that had a pop up was the one that completely failed. The others just stop and the next starts and I end up with a bunch of applications running in memory. When in gets to the bottom of the list it picks back up and the top and runs through those that froze. Today for the first time I had 2 rosetta units do the same(never had that before), but could possibly be due to a new app version. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
The only one that had a pop up was the one that completely failed. The others just stop and the next starts and I end up with a bunch of applications running in memory. When in gets to the bottom of the list it picks back up and the top and runs through those that froze. when it goes back through, do they still freeze, or do they work? |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
Generally they work, I have seen a couple that freeze again and then cycle through again and run. Do you want me to keep track of which ones? I have had more Rosetta do that today, but I do think that has to do with their new release, it still has bugs. I think I'll start a topic there about this as well. |
Send message Joined: 8 Feb 08 Posts: 261 Credit: 104,050,322 RAC: 0 |
Today I had this client error too. The MW program crashed when I stopped it manually (in boinc manager) to switch the cpu over to an other project. Popup about crash WU: client error, compute error, atleast one argument is invalid Could it be, this error is related to the client being forced to stop? |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
errored out as I got on computer, wu id #3436046. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
another errored out as I got on computer, wu id #3435988. Got a pop-up on this one, not sure on the last if I got one. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
errored out, wu id #3452929. The errors I have been getting are failing around 1300 sec. |
©2024 Astroinformatics Group