Welcome to MilkyWay@home

Client errors

Message boards : Number crunching : Client errors
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Conan
Avatar

Send message
Joined: 2 Jan 08
Posts: 122
Credit: 69,480,026
RAC: 1,421
Message 1573 - Posted: 22 Jan 2008, 9:42:43 UTC

Have had a number of recent failures, all on my Windows machine, none on my Linux machine. Both are Opteron 285 machines and I leave the application in memory, I am running Win XP and B/M 5.10.38.

The error I am getting is "One or more arguments are invalid" and a whole heap of stuff in the error report:--

WU 2692733
WU 2692835
WU 2692843
WU 2692881
WU 2692897
WU 2707581
WU 2712199
WU 2712209

Thanks, hope it is a simple problem.
ID: 1573 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 1574 - Posted: 22 Jan 2008, 15:55:55 UTC

I still get ones that freeze, out of the last two groups of 140's, each had 2 that froze so far.
ID: 1574 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jayargh
Avatar

Send message
Joined: 8 Oct 07
Posts: 289
Credit: 3,690,838
RAC: 0
Message 1575 - Posted: 22 Jan 2008, 16:46:32 UTC - in response to Message 1573.  

Have had a number of recent failures, all on my Windows machine, none on my Linux machine. Both are Opteron 285 machines and I leave the application in memory, I am running Win XP and B/M 5.10.38.

The error I am getting is "One or more arguments are invalid" and a whole heap of stuff in the error report:--

WU 2692733
WU 2692835
WU 2692843
WU 2692881
WU 2692897
WU 2707581
WU 2712199
WU 2712209

Thanks, hope it is a simple problem.



Conan - I have to wonder if it has anything to do with the Boinc client you are using...is 5.10.38 a stable version?
ID: 1575 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 2 Jan 08
Posts: 122
Credit: 69,480,026
RAC: 1,421
Message 1576 - Posted: 22 Jan 2008, 21:35:29 UTC - in response to Message 1575.  

Have had a number of recent failures, all on my Windows machine, none on my Linux machine. Both are Opteron 285 machines and I leave the application in memory, I am running Win XP and B/M 5.10.38.

The error I am getting is "One or more arguments are invalid" and a whole heap of stuff in the error report:--

WU 2692733
WU 2692835
WU 2692843
WU 2692881
WU 2692897
WU 2707581
WU 2712199
WU 2712209

Thanks, hope it is a simple problem.



Conan - I have to wonder if it has anything to do with the Boinc client you are using...is 5.10.38 a stable version?


Maybe, maybe not, but I had to upgrade to it to fix a problem that developed on Ralph with compiled libs. So I am stuck with it for now.
I did notice in another thread that another Tester has got the same problem and they were using 5.10.30 I think.
ID: 1576 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 1577 - Posted: 22 Jan 2008, 22:03:16 UTC

This current bunch I have had 6 that froze.

@conan - Do you need an older version of Boinc? I have a few past versions.
ID: 1577 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 2 Jan 08
Posts: 122
Credit: 69,480,026
RAC: 1,421
Message 1578 - Posted: 23 Jan 2008, 14:53:49 UTC - in response to Message 1577.  

This current bunch I have had 6 that froze.

@conan - Do you need an older version of Boinc? I have a few past versions.


Thanks banditwolf, I can go back to 5.8.15, 5.10.30 and 5.10.35, if I need to but all running ok over the last day.
ID: 1578 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 1591 - Posted: 25 Jan 2008, 15:25:29 UTC

I have found that when I click to open up Boinc manager that the current mw wu will freeze, it then goes to the next. It's happened twice today, and it doesn't seem to be everytime, so far only when the wu's changed back to 0%.
ID: 1591 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [B@H] Ray

Send message
Joined: 27 Dec 07
Posts: 35
Credit: 1,432,926
RAC: 0
Message 1594 - Posted: 25 Jan 2008, 22:29:00 UTC

I had two freeze this week.

Both of these the manager said was running but they did not count the time or progress.

About 5 hours on 2761512 and about 10 hour on 2650511.

I don't always have access to this computer so it can take time before I get to abort them, and I can't go to an older BOINC as the computer has Vista on it and older ones kill other programs work when shut down, which the owner dues often. Not mine so I can't do anything about that. Even with this I will take the credits for work done on a computer whare the owner will let it run. We all have to find others who will let us run it, way to many CPU cycles are waisted.
ID: 1594 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Emanuel

Send message
Joined: 18 Nov 07
Posts: 280
Credit: 2,442,757
RAC: 0
Message 1595 - Posted: 25 Jan 2008, 23:16:27 UTC - in response to Message 1594.  

It would be nice if those hangs were fixed, but as a maybe simpler solution could a second thread be added to each WU that polls it every few seconds and gives it a jolt if it hangs?
ID: 1595 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jayargh
Avatar

Send message
Joined: 8 Oct 07
Posts: 289
Credit: 3,690,838
RAC: 0
Message 1596 - Posted: 25 Jan 2008, 23:53:18 UTC
Last modified: 26 Jan 2008, 0:09:27 UTC

I don't think it is individual workunits causing the freeze. I think it is the way the application communicates with the Boinc client.

I say this because yesterday while the Cosmology server was busy aborting the work it just gave me,every instance of milkyway running across all hosts ended up with computation errors.

Every freeze of a result here I have seen has to do with an application switch to another project.While individual workunits freeze it seems to me to be the symptom and not the cause.....me thinks the application needs some bugs to be fixed to get rid of this.
ID: 1596 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ensor
Avatar

Send message
Joined: 9 Nov 07
Posts: 20
Credit: 39,712
RAC: 0
Message 1598 - Posted: 26 Jan 2008, 1:40:26 UTC - in response to Message 1596.  
Last modified: 26 Jan 2008, 1:47:17 UTC

Hi,

I don't think it is individual workunits causing the freeze. I think it is the way the application communicates with the Boinc client....

There may well be something to this.

I should've mentioned this before, but about a week or so ago I had to restart my host, at the time BOINC was crunching a non Milkyway WU and had three Milkyway tasks queued up ready to run.

When BOINC restarted all three of the Milkyway WUs immediately "errored out" showing compute errors in the BOINC manager, even though, AFAIK, BOINC hadn't even attempted to run them yet....

[EDIT] These would be WUs #2486445, #2486167 and #2485667. If you look at my results page here you can see the three units in question.


TTFN - Pete.


ID: 1598 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 1671 - Posted: 8 Feb 2008, 22:40:38 UTC
Last modified: 8 Feb 2008, 22:40:56 UTC

yesterday I had a full 20 that went through with no problems.

Today my previous bunch I had 1 fail for an unknown reason to me (wu id # 3328146). Had a pop-up.

My current bunch I am having a lot (so far 6 of 13) that are freezing and going on to the next wu.(at various completion times, >10 secs &21 to 23 min) A couple times seem to be when I come back to my computer to do something. (leave app in mem is checked)
ID: 1671 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 1673 - Posted: 9 Feb 2008, 11:21:44 UTC - in response to Message 1671.  

yesterday I had a full 20 that went through with no problems.

Today my previous bunch I had 1 fail for an unknown reason to me (wu id # 3328146). Had a pop-up.

My current bunch I am having a lot (so far 6 of 13) that are freezing and going on to the next wu.(at various completion times, >10 secs &21 to 23 min) A couple times seem to be when I come back to my computer to do something. (leave app in mem is checked)


are these having a pop up? what exactly is happening with the work units? they just stop and move on?
ID: 1673 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 1674 - Posted: 9 Feb 2008, 14:22:42 UTC

The only one that had a pop up was the one that completely failed. The others just stop and the next starts and I end up with a bunch of applications running in memory. When in gets to the bottom of the list it picks back up and the top and runs through those that froze.

Today for the first time I had 2 rosetta units do the same(never had that before), but could possibly be due to a new app version.
ID: 1674 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 1676 - Posted: 9 Feb 2008, 23:13:12 UTC - in response to Message 1674.  

The only one that had a pop up was the one that completely failed. The others just stop and the next starts and I end up with a bunch of applications running in memory. When in gets to the bottom of the list it picks back up and the top and runs through those that froze.

Today for the first time I had 2 rosetta units do the same(never had that before), but could possibly be due to a new app version.


when it goes back through, do they still freeze, or do they work?
ID: 1676 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 1677 - Posted: 10 Feb 2008, 0:02:57 UTC

Generally they work, I have seen a couple that freeze again and then cycle through again and run. Do you want me to keep track of which ones?

I have had more Rosetta do that today, but I do think that has to do with their new release, it still has bugs. I think I'll start a topic there about this as well.
ID: 1677 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 1678 - Posted: 10 Feb 2008, 9:32:49 UTC

Today I had this client error too.
The MW program crashed when I stopped it manually (in boinc manager) to switch the cpu over to an other project.

Popup about crash
WU: client error, compute error, atleast one argument is invalid

Could it be, this error is related to the client being forced to stop?
ID: 1678 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 1680 - Posted: 10 Feb 2008, 19:28:42 UTC

errored out as I got on computer, wu id #3436046.
ID: 1680 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 1681 - Posted: 10 Feb 2008, 22:46:32 UTC

another errored out as I got on computer, wu id #3435988. Got a pop-up on this one, not sure on the last if I got one.
ID: 1681 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 1683 - Posted: 11 Feb 2008, 15:23:59 UTC

errored out, wu id #3452929.
The errors I have been getting are failing around 1300 sec.
ID: 1683 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Client errors

©2024 Astroinformatics Group