| log in |
|
1)
Message boards :
Number crunching :
Computer errors
(Message 56007)
Posted 202 days ago by John Black
Hi, I have also been getting "error while computing" on about 15% of my wu's. Here is the result for one of them. I will keep on trying but it looks like a bad batch of wu's to me. Does anybody see the same thing with CPU calculated wu's? I see that the normal supply of tasks has been suspended so maybe someone has figured this out???? Thanks John p.s. this looks similar to a problesm that we had a few months ago de_separation_22_3s_free_3_1350173304_7373798_0 Workunit 261080625 Created 29 Oct 2012 | 13:37:46 UTC Sent 29 Oct 2012 | 13:38:37 UTC Received 29 Oct 2012 | 22:02:02 UTC Server state Over Outcome Computation error Client state Compute error Exit status 1 (0x1) Unknown error number Computer ID 174739 Report deadline 10 Nov 2012 | 13:38:37 UTC Run time 7,488.69 CPU time 5,522.79 Validate state Invalid Credit 0.00 Application version MilkyWay@Home v1.00 Stderr output <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> <search_application> milkyway_separation 1.00 Windows x86 double </search_application> Unrecognized XML in project preferences: nvidia_block_amount Skipping: 128 Skipping: /nvidia_block_amount Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Error reading astronomy parameters from file 'astronomy_parameters.txt' Trying old parameters file Using SSE3 path Integral 0 time = 3877.755773 s Integral 1 time = 1918.157672 s <search_application> milkyway_separation 1.00 Windows x86 double </search_application> Unrecognized XML in project preferences: nvidia_block_amount Skipping: 128 Skipping: /nvidia_block_amount Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Error reading astronomy parameters from file 'astronomy_parameters.txt' Trying old parameters file Using SSE3 path Integral 2 time = 454.460341 s Running likelihood with 66200 stars Likelihood time = 0.515756 s Non-finite result Failed to calculate likelihood <background_integral> 0.000354347638725 </background_integral> <stream_integral> 0.000000000000000 284.252325256453730 750.030118419354270 </stream_integral> <background_likelihood> -3.084160458746776 </background_likelihood> <stream_only_likelihood> -1.#IND00000000000 -11.375050105105538 -5.712595199774682 </stream_only_likelihood> <search_likelihood> -241.000000000000000 </search_likelihood> 21:29:52 (11232): called boinc_finish </stderr_txt> ]]> Main page · Your account · Message boards [/i] |
|
2)
Message boards :
Number crunching :
Computer errors
(Message 56006)
Posted 202 days ago by John Black
Hi, I have also been getting "error while computing" on about 15% of my wu's. Here is the result for one of them. I will keep on trying but it looks like a bad batch of wu's to me. Does anybody see the same thing with CPU calculated wu's? I see that the normal supply of tasks has been suspended so maybe someone has figured this out???? Thanks John p.s. this looks similar to a problem that we had a few months ago de_separation_22_3s_free_3_1350173304_7373798_0 Workunit 261080625 Created 29 Oct 2012 | 13:37:46 UTC Sent 29 Oct 2012 | 13:38:37 UTC Received 29 Oct 2012 | 22:02:02 UTC Server state Over Outcome Computation error Client state Compute error Exit status 1 (0x1) Unknown error number Computer ID 174739 Report deadline 10 Nov 2012 | 13:38:37 UTC Run time 7,488.69 CPU time 5,522.79 Validate state Invalid Credit 0.00 Application version MilkyWay@Home v1.00 Stderr output <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> <search_application> milkyway_separation 1.00 Windows x86 double </search_application> Unrecognized XML in project preferences: nvidia_block_amount Skipping: 128 Skipping: /nvidia_block_amount Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Error reading astronomy parameters from file 'astronomy_parameters.txt' Trying old parameters file Using SSE3 path Integral 0 time = 3877.755773 s Integral 1 time = 1918.157672 s <search_application> milkyway_separation 1.00 Windows x86 double </search_application> Unrecognized XML in project preferences: nvidia_block_amount Skipping: 128 Skipping: /nvidia_block_amount Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Error reading astronomy parameters from file 'astronomy_parameters.txt' Trying old parameters file Using SSE3 path Integral 2 time = 454.460341 s Running likelihood with 66200 stars Likelihood time = 0.515756 s Non-finite result Failed to calculate likelihood <background_integral> 0.000354347638725 </background_integral> <stream_integral> 0.000000000000000 284.252325256453730 750.030118419354270 </stream_integral> <background_likelihood> -3.084160458746776 </background_likelihood> <stream_only_likelihood> -1.#IND00000000000 -11.375050105105538 -5.712595199774682 </stream_only_likelihood> <search_likelihood> -241.000000000000000 </search_likelihood> 21:29:52 (11232): called boinc_finish </stderr_txt> ]]> Main page · Your account · Message boards [/i] |
|
3)
Message boards :
Number crunching :
MW@H Computing Failures
(Message 55510)
Posted 255 days ago by John Black
Hi Guys, OK Mikey fair enough I just run a straight BOINC on a dual core cpu. Len LE/GE:the fsutil routine with the file transaction log seems to have worked so far as MW@H is still running perfectly 12 hours later. Thanks guys this is what I like about BOINC, everyone is so helpful. John |
|
4)
Message boards :
Number crunching :
MW@H Computing Failures
(Message 55507)
Posted 256 days ago by John Black
Hi Len LE/GE, thanks for your informative post. I ran the Microsoft Fixit routine. "fsutil resource setautoreset true c:\" to clean out the file system transaction log. Now everything seems to be working ok but as it is 01:35 in Scotland I am going back to my bed. If everything is ok tomorrow then I will post here to let you know. Thanks again John |
|
5)
Message boards :
Number crunching :
MW@H Computing Failures
(Message 55502)
Posted 256 days ago by John Black
Hi Mikey, I dont know the answer to your question, is there a way in which I can find out and then post it here? Thanks John |
|
6)
Message boards :
Number crunching :
MW@H Computing Failures
(Message 55498)
Posted 256 days ago by John Black
Hi again, as before a reboot has restored normal calculations of MW@H tasks. I am at a total loss to figure this out so can anyone help with interpretation of the SDERR output. Thanks John |
|
7)
Message boards :
Number crunching :
MW@H Computing Failures
(Message 55495)
Posted 256 days ago by John Black
Hi again people, since I last reported MW@H tasks ran successfully but today the tasks started to show "error while computing" again. As per my previous posts I have rebooted to see if that restores things and I will report here later. My enquiry here is for someone to translate the SDERR output as I am unsure what it is indicating. I notice that it says "Unrecognized XML in project preferences: nvidia_block_amount" and as I have said I do not use my wee NVIDIA gpu to do calculations so I cannot figure it out. Stderr output <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> <search_application> milkyway_separation 1.00 Windows x86 double </search_application> Unrecognized XML in project preferences: nvidia_block_amount Skipping: 128 Skipping: /nvidia_block_amount Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Error reading astronomy parameters from file 'astronomy_parameters.txt' Trying old parameters file Using SSE3 path Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error. Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error. Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error. Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error. Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error. Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error. Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error. Failed to update checkpoint file ('separation_checkpoint_tmp' to 'separation_checkpoint') (2): No such file or directory Write checkpoint failed 16:54:28 (5448): called boinc_finish </stderr_txt> ]]> Thanks in advance for any help offered. John |
|
8)
Message boards :
Number crunching :
MW@H Computing Failures
(Message 55454)
Posted 259 days ago by John Black
Hi, firstly Len LE/GE: yep standard 7.0.28. I have checked both filesystem and hard drive no problems (now after reboot). secondly mikey: as I am set to do no GPU tasks and SETI tasks were running ok on the same CPU whilst the MW@H tasks were erroring out I cannot figure out why a GPU fault would give this result. When I rebooted the MW@H tasks started to run ok so whatever it was it was not a permanent hardware fault. Unfortunately the error occurred overnight and I had managed to run about 20 errored tasks before I found out what was happening. Anyway all is now ok (for the moment?). I cannot interpret the SDERR results or figure out why it refers to the NVIDIA GPU. Thanks to both of you for your input. John[/b] |
|
9)
Message boards :
Number crunching :
MW@H Computing Failures
(Message 55445)
Posted 260 days ago by John Black
Hi again people, I rebooted after a power down and the first MW@H task has been running for about 30 minutes whereas they had been erroring out just after a minute. I still don't know what went wrong but perhaps there was a wee bit of corruption somewhere on my system and the power down and reboot fixed it. I would appreciate any suggestions as to what caused this. I will repost the news as it develops. Thanks for any suggestions. John |
|
10)
Message boards :
Number crunching :
MW@H Computing Failures
(Message 55442)
Posted 260 days ago by John Black
Hi people, I have had a number of computation failures over the last 24 hours. Before that MW@H calculations went well and I am currently running SETI with no problems. I should say that all these calculations are done on a dual core E4700 CPU as my graphics card is too wimpy. SDERR looks like this:- <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> <search_application> milkyway_separation 1.00 Windows x86 double </search_application> Unrecognized XML in project preferences: nvidia_block_amount Skipping: 128 Skipping: /nvidia_block_amount Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Error reading astronomy parameters from file 'astronomy_parameters.txt' Trying old parameters file Using SSE3 path Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error. Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error. Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error. Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error. Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error. Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error. Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error. Failed to update checkpoint file ('separation_checkpoint_tmp' to 'separation_checkpoint') (2): No such file or directory Write checkpoint failed 12:50:15 (4548): called boinc_finish As I am not computer literate I don't know whether this is a fault at my end i.e. my pc is goosed or an error with the workunit. Does anybody have any advice? Thanks John |