Welcome to MilkyWay@home

nm_s82_r7/r8 computation errors

Message boards : Number crunching : nm_s82_r7/r8 computation errors
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 10542 - Posted: 13 Feb 2009, 22:17:59 UTC - in response to Message 10539.  

If the problem is with the stars files, then how come I can take the stars files from a machine not reporting any errors and copy them to another machine that has about 50% errors and still have the same problem?


Well, then that probably isn't the issue :P Are you using an optimized app? I'm wondering if something went wrong with them for whatever reason when the database got corrupted.

Again, it's pretty hard for me to debug an optimized application because they aren't something we've released, so it could be hardware problems or any other thing.

One thing I think might be happening is that when you use the stock app the boinc client tries to verify the downloaded workunit files with a checksum, and I'm pretty sure the client wont do this for an optimized application. So what might be happening is some of the workunit files might not be downloading completely or correctly, and the optimized app runs on them anyways and errors out.

ID: 10542 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Cori
Avatar

Send message
Joined: 27 Aug 07
Posts: 647
Credit: 27,592,547
RAC: 0
Message 10543 - Posted: 13 Feb 2009, 22:22:20 UTC - in response to Message 10542.  
Last modified: 13 Feb 2009, 22:23:30 UTC

... One thing I think might be happening is that when you use the stock app the boinc client tries to verify the downloaded workunit files with a checksum, and I'm pretty sure the client wont do this for an optimized application. So what might be happening is some of the workunit files might not be downloading completely or correctly, and the optimized app runs on them anyways and errors out.

But I have tested with both the standard and the opti app and got the same download/checksum errors no matter which app! And the client always tried to verify the checksums.

I think the errors might happen because some "star" files are downloaded multiple times for different WU types. Maybe then the checksums don't fit?

However these errors have stopped for me now.
Lovely greetings, Cori
ID: 10543 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Logan
Avatar

Send message
Joined: 15 Aug 08
Posts: 163
Credit: 3,876,869
RAC: 0
Message 10544 - Posted: 13 Feb 2009, 22:23:53 UTC - in response to Message 10542.  
Last modified: 13 Feb 2009, 22:25:48 UTC

If the problem is with the stars files, then how come I can take the stars files from a machine not reporting any errors and copy them to another machine that has about 50% errors and still have the same problem?


Well, then that probably isn't the issue :P Are you using an optimized app? I'm wondering if something went wrong with them for whatever reason when the database got corrupted.

Again, it's pretty hard for me to debug an optimized application because they aren't something we've released, so it could be hardware problems or any other thing.

One thing I think might be happening is that when you use the stock app the boinc client tries to verify the downloaded workunit files with a checksum, and I'm pretty sure the client wont do this for an optimized application. So what might be happening is some of the workunit files might not be downloading completely or correctly, and the optimized app runs on them anyways and errors out.



Travis... A lot of users were reported the same with stock app... Don't try to use the excuse of third party apps... The problem is in your side, and all of us know it...
Logan.

BOINC FAQ Service (Ahora, también disponible en Español/Now available in Spanish)
ID: 10544 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 10546 - Posted: 13 Feb 2009, 22:29:27 UTC - in response to Message 10544.  

Travis... A lot of users were reported the same with stock app... Don't try to use the excuse of third party apps... The problem is in your side, and all of us know it...


I'm not blaming 3rd party apps, I'm just trying to narrow down where the error is. You're always so negative man :P

Our database settings got lost when everything went down, what I'm thinking is that we might have some kind of error where it might be dropping connections early or something along those lines, which is all i can think of that would be making the application crash, considering the applications were working fine before, and that we're not doing anything new when we're generating the workunits...
ID: 10546 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 10547 - Posted: 13 Feb 2009, 22:30:21 UTC - in response to Message 10546.  

One more question to everyone. Is it still just happening on stripe 82, or happening with 79 and 86 as well?
ID: 10547 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Cori
Avatar

Send message
Joined: 27 Aug 07
Posts: 647
Credit: 27,592,547
RAC: 0
Message 10548 - Posted: 13 Feb 2009, 22:32:35 UTC - in response to Message 10547.  

One more question to everyone. Is it still just happening on stripe 82, or happening with 79 and 86 as well?

I had problems with 82 and 79, see also my older post below. ;-)
Lovely greetings, Cori
ID: 10548 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Logan
Avatar

Send message
Joined: 15 Aug 08
Posts: 163
Credit: 3,876,869
RAC: 0
Message 10550 - Posted: 13 Feb 2009, 22:35:15 UTC

ID: 10550 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Logan
Avatar

Send message
Joined: 15 Aug 08
Posts: 163
Credit: 3,876,869
RAC: 0
Message 10552 - Posted: 13 Feb 2009, 22:50:38 UTC - in response to Message 10546.  
Last modified: 13 Feb 2009, 23:11:12 UTC

Travis... A lot of users were reported the same with stock app... Don't try to use the excuse of third party apps... The problem is in your side, and all of us know it...


I'm not blaming 3rd party apps, I'm just trying to narrow down where the error is. You're always so negative man :P

Our database settings got lost when everything went down, what I'm thinking is that we might have some kind of error where it might be dropping connections early or something along those lines, which is all i can think of that would be making the application crash, considering the applications were working fine before, and that we're not doing anything new when we're generating the workunits...


And I'm not a negative man... I only explain to you in what side is the problem... Say to the volunteers 'Restart the project' or 'detach an attach the project' (with all the disturbs and inconveniences to the users) without check if the origin of the trouble is in your side, and give the gilty to the third party apps without any reason... Well... What I must to think of you...? :)

Best regards
Logan.

BOINC FAQ Service (Ahora, también disponible en Español/Now available in Spanish)
ID: 10552 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 10556 - Posted: 13 Feb 2009, 23:06:17 UTC - in response to Message 10552.  

Travis... A lot of users were reported the same with stock app... Don't try to use the excuse of third party apps... The problem is in your side, and all of us know it...


I'm not blaming 3rd party apps, I'm just trying to narrow down where the error is. You're always so negative man :P

Our database settings got lost when everything went down, what I'm thinking is that we might have some kind of error where it might be dropping connections early or something along those lines, which is all i can think of that would be making the application crash, considering the applications were working fine before, and that we're not doing anything new when we're generating the workunits...


And I'm not a negative man... I only explain to you in what side is the problem... Say to the volunteers 'Restart the project' or 'detach an attach the project' (with all the disturbs and inconveniences to the users) without check if the origin of the trouble is in your side, and give the gilty to the third party apps without any reason... Well... What I must to thing of you...? :)

Best regards


We're checking on our side as well, it's just not an easy thing to pinpoint. Hence asking users to try a detach and see if the problem is still happening.
ID: 10556 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 10557 - Posted: 13 Feb 2009, 23:17:06 UTC

Myself I have only had the one group of units that had and error I know of so far. I think they were all the same type.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 10557 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Cori
Avatar

Send message
Joined: 27 Aug 07
Posts: 647
Credit: 27,592,547
RAC: 0
Message 10561 - Posted: 13 Feb 2009, 23:42:51 UTC
Last modified: 14 Feb 2009, 0:08:23 UTC

*EEK* Just found this one (after several hours without problems):

stderr out

<core_client_version>6.4.5</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>/stars-79.txt</file_name>
<error_code>-119</error_code>
<error_message>MD5 check failed</error_message>
</file_xfer_error>

</message>
]]>


---> http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=2328011


EDIT I think (read: hope) this was the only WU with download problems at this time... phew! :-)))
Lovely greetings, Cori
ID: 10561 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Beau

Send message
Joined: 3 Jan 09
Posts: 270
Credit: 124,346
RAC: 0
Message 10563 - Posted: 13 Feb 2009, 23:57:17 UTC - in response to Message 10561.  

I am using an opti win32 app (zslip) and am processing s70, s82, & s86 without any errors whatsoever. They are all giving valid results and granting credit. I think if it was a system wide server issue, there would be a lot more people complaining. It seems to just be a handful of people having the problem.
ID: 10563 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
msattler

Send message
Joined: 15 Jul 08
Posts: 288
Credit: 5,474,012
RAC: 0
Message 10567 - Posted: 14 Feb 2009, 0:07:05 UTC - in response to Message 10563.  

I am using an opti win32 app (zslip) and am processing s70, s82, & s86 without any errors whatsoever. They are all giving valid results and granting credit. I think if it was a system wide server issue, there would be a lot more people complaining. It seems to just be a handful of people having the problem.

I think since the server crash, there has been some change in the server settings on MW's side, which is causing some to have problems with downloads, specifically the star files....and if they are not downloaded correctly, all WUs associated with them will error out....
That is the problem in a nutshell, I think.
I am the Kittyman.

Please visit and give a Click for Seti City.




ID: 10567 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile speedimic
Avatar

Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 10572 - Posted: 14 Feb 2009, 0:11:18 UTC - in response to Message 10563.  

I am using an opti win32 app (zslip) and am processing s70, s82, & s86 without any errors whatsoever. They are all giving valid results and granting credit. I think if it was a system wide server issue, there would be a lot more people complaining. It seems to just be a handful of people having the problem.


with the insta-purge on most people won't see their results error out...
mic.


ID: 10572 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 10577 - Posted: 14 Feb 2009, 0:21:43 UTC - in response to Message 10572.  

I am using an opti win32 app (zslip) and am processing s70, s82, & s86 without any errors whatsoever. They are all giving valid results and granting credit. I think if it was a system wide server issue, there would be a lot more people complaining. It seems to just be a handful of people having the problem.


with the insta-purge on most people won't see their results error out...


Yeah this is true. I've been looking in the database and it doesn't seem like there are TOO many errors happening however.

On another note, I started up a new set of searches with fresh parameter and star files. So please let me know if these are causing any problems.
ID: 10577 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Misfit
Avatar

Send message
Joined: 27 Aug 07
Posts: 915
Credit: 1,503,319
RAC: 0
Message 10587 - Posted: 14 Feb 2009, 0:46:53 UTC - in response to Message 10572.  

I am using an opti win32 app (zslip) and am processing s70, s82, & s86 without any errors whatsoever. They are all giving valid results and granting credit. I think if it was a system wide server issue, there would be a lot more people complaining. It seems to just be a handful of people having the problem.

with the insta-purge on most people won't see their results error out...

No errors showing here in BOINC messages.
me@rescam.org
ID: 10587 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Glenn Rogers
Avatar

Send message
Joined: 4 Jul 08
Posts: 165
Credit: 364,966
RAC: 0
Message 10632 - Posted: 14 Feb 2009, 4:52:56 UTC
Last modified: 14 Feb 2009, 4:57:09 UTC

Gday all so far Im crunching a mix of all the wu's that are talked about woke up checked my results had only one error at http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=2294022
downloading and reporting ok at the moment
Glenn

Now crunching the newer searches that Travis was talking about will let every one know if I get error after they report
Glenn
ID: 10632 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Glenn Rogers
Avatar

Send message
Joined: 4 Jul 08
Posts: 165
Credit: 364,966
RAC: 0
Message 10633 - Posted: 14 Feb 2009, 5:05:21 UTC

Just Reported wu's for the new searches 6 reported 6 more dowloaded with no errors and all reported got credit.
Glenn
ID: 10633 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : Number crunching : nm_s82_r7/r8 computation errors

©2024 Astroinformatics Group