Posts by John Black
log in
1) Message boards : Number crunching : Computer errors (Message 56007)
Posted 202 days ago by Profile John Black
Hi,

I have also been getting "error while computing" on about 15% of my wu's. Here is the result for one of them. I will keep on trying but it looks like a bad batch of wu's to me. Does anybody see the same thing with CPU calculated wu's?

I see that the normal supply of tasks has been suspended so maybe someone has figured this out????

Thanks
John
p.s. this looks similar to a problesm that we had a few months ago

de_separation_22_3s_free_3_1350173304_7373798_0
Workunit 261080625
Created 29 Oct 2012 | 13:37:46 UTC
Sent 29 Oct 2012 | 13:38:37 UTC
Received 29 Oct 2012 | 22:02:02 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 1 (0x1) Unknown error number
Computer ID 174739
Report deadline 10 Nov 2012 | 13:38:37 UTC
Run time 7,488.69
CPU time 5,522.79
Validate state Invalid
Credit 0.00
Application version MilkyWay@Home v1.00
Stderr output

<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
<search_application> milkyway_separation 1.00 Windows x86 double </search_application>
Unrecognized XML in project preferences: nvidia_block_amount
Skipping: 128
Skipping: /nvidia_block_amount
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Integral 0 time = 3877.755773 s
Integral 1 time = 1918.157672 s
<search_application> milkyway_separation 1.00 Windows x86 double </search_application>
Unrecognized XML in project preferences: nvidia_block_amount
Skipping: 128
Skipping: /nvidia_block_amount
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Integral 2 time = 454.460341 s
Running likelihood with 66200 stars
Likelihood time = 0.515756 s
Non-finite result
Failed to calculate likelihood
<background_integral> 0.000354347638725 </background_integral>
<stream_integral> 0.000000000000000 284.252325256453730 750.030118419354270 </stream_integral>
<background_likelihood> -3.084160458746776 </background_likelihood>
<stream_only_likelihood> -1.#IND00000000000 -11.375050105105538 -5.712595199774682 </stream_only_likelihood>
<search_likelihood> -241.000000000000000 </search_likelihood>
21:29:52 (11232): called boinc_finish

</stderr_txt>
]]>

Main page · Your account · Message boards
[/i]
2) Message boards : Number crunching : Computer errors (Message 56006)
Posted 202 days ago by Profile John Black
Hi,

I have also been getting "error while computing" on about 15% of my wu's. Here is the result for one of them. I will keep on trying but it looks like a bad batch of wu's to me. Does anybody see the same thing with CPU calculated wu's?

I see that the normal supply of tasks has been suspended so maybe someone has figured this out????

Thanks
John
p.s. this looks similar to a problem that we had a few months ago

de_separation_22_3s_free_3_1350173304_7373798_0
Workunit 261080625
Created 29 Oct 2012 | 13:37:46 UTC
Sent 29 Oct 2012 | 13:38:37 UTC
Received 29 Oct 2012 | 22:02:02 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 1 (0x1) Unknown error number
Computer ID 174739
Report deadline 10 Nov 2012 | 13:38:37 UTC
Run time 7,488.69
CPU time 5,522.79
Validate state Invalid
Credit 0.00
Application version MilkyWay@Home v1.00
Stderr output

<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
<search_application> milkyway_separation 1.00 Windows x86 double </search_application>
Unrecognized XML in project preferences: nvidia_block_amount
Skipping: 128
Skipping: /nvidia_block_amount
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Integral 0 time = 3877.755773 s
Integral 1 time = 1918.157672 s
<search_application> milkyway_separation 1.00 Windows x86 double </search_application>
Unrecognized XML in project preferences: nvidia_block_amount
Skipping: 128
Skipping: /nvidia_block_amount
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Integral 2 time = 454.460341 s
Running likelihood with 66200 stars
Likelihood time = 0.515756 s
Non-finite result
Failed to calculate likelihood
<background_integral> 0.000354347638725 </background_integral>
<stream_integral> 0.000000000000000 284.252325256453730 750.030118419354270 </stream_integral>
<background_likelihood> -3.084160458746776 </background_likelihood>
<stream_only_likelihood> -1.#IND00000000000 -11.375050105105538 -5.712595199774682 </stream_only_likelihood>
<search_likelihood> -241.000000000000000 </search_likelihood>
21:29:52 (11232): called boinc_finish

</stderr_txt>
]]>

Main page · Your account · Message boards
[/i]
3) Message boards : Number crunching : MW@H Computing Failures (Message 55510)
Posted 255 days ago by Profile John Black
Hi Guys,

OK Mikey fair enough I just run a straight BOINC on a dual core cpu.

Len LE/GE:the fsutil routine with the file transaction log seems to have worked so far as MW@H is still running perfectly 12 hours later.

Thanks guys this is what I like about BOINC, everyone is so helpful.

John
4) Message boards : Number crunching : MW@H Computing Failures (Message 55507)
Posted 256 days ago by Profile John Black
Hi Len LE/GE,

thanks for your informative post. I ran the Microsoft Fixit routine. "fsutil resource setautoreset true c:\" to clean out the file system transaction log.

Now everything seems to be working ok but as it is 01:35 in Scotland I am going back to my bed. If everything is ok tomorrow then I will post here to let you know.

Thanks again

John
5) Message boards : Number crunching : MW@H Computing Failures (Message 55502)
Posted 256 days ago by Profile John Black
Hi Mikey,

I dont know the answer to your question, is there a way in which I can find out and then post it here?

Thanks

John
6) Message boards : Number crunching : MW@H Computing Failures (Message 55498)
Posted 256 days ago by Profile John Black
Hi again,

as before a reboot has restored normal calculations of MW@H tasks.

I am at a total loss to figure this out so can anyone help with interpretation of the SDERR output.

Thanks
John
7) Message boards : Number crunching : MW@H Computing Failures (Message 55495)
Posted 256 days ago by Profile John Black
Hi again people,

since I last reported MW@H tasks ran successfully but today the tasks started to show "error while computing" again. As per my previous posts I have rebooted to see if that restores things and I will report here later.

My enquiry here is for someone to translate the SDERR output as I am unsure what it is indicating. I notice that it says "Unrecognized XML in project preferences: nvidia_block_amount" and as I have said I do not use my wee NVIDIA gpu to do calculations so I cannot figure it out.

Stderr output

<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
<search_application> milkyway_separation 1.00 Windows x86 double </search_application>
Unrecognized XML in project preferences: nvidia_block_amount
Skipping: 128
Skipping: /nvidia_block_amount
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to update checkpoint file ('separation_checkpoint_tmp' to 'separation_checkpoint') (2): No such file or directory
Write checkpoint failed
16:54:28 (5448): called boinc_finish

</stderr_txt>
]]>


Thanks in advance for any help offered.

John
8) Message boards : Number crunching : MW@H Computing Failures (Message 55454)
Posted 259 days ago by Profile John Black
Hi,

firstly Len LE/GE: yep standard 7.0.28. I have checked both filesystem and hard drive no problems (now after reboot).

secondly mikey: as I am set to do no GPU tasks and SETI tasks were running ok on the same CPU whilst the MW@H tasks were erroring out I cannot figure out why a GPU fault would give this result.

When I rebooted the MW@H tasks started to run ok so whatever it was it was not a permanent hardware fault. Unfortunately the error occurred overnight and I had managed to run about 20 errored tasks before I found out what was happening. Anyway all is now ok (for the moment?).

I cannot interpret the SDERR results or figure out why it refers to the NVIDIA GPU.

Thanks to both of you for your input.

John[/b]
9) Message boards : Number crunching : MW@H Computing Failures (Message 55445)
Posted 260 days ago by Profile John Black
Hi again people,

I rebooted after a power down and the first MW@H task has been running for about 30 minutes whereas they had been erroring out just after a minute.

I still don't know what went wrong but perhaps there was a wee bit of corruption somewhere on my system and the power down and reboot fixed it. I would appreciate any suggestions as to what caused this.

I will repost the news as it develops.

Thanks for any suggestions.

John
10) Message boards : Number crunching : MW@H Computing Failures (Message 55442)
Posted 260 days ago by Profile John Black
Hi people,

I have had a number of computation failures over the last 24 hours. Before that MW@H calculations went well and I am currently running SETI with no problems. I should say that all these calculations are done on a dual core E4700 CPU as my graphics card is too wimpy.

SDERR looks like this:-


<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
<search_application> milkyway_separation 1.00 Windows x86 double </search_application>
Unrecognized XML in project preferences: nvidia_block_amount
Skipping: 128
Skipping: /nvidia_block_amount
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (6801): Transaction support within the specified file system resource manager is not started or was shutdown due to an error.

Failed to update checkpoint file ('separation_checkpoint_tmp' to 'separation_checkpoint') (2): No such file or directory
Write checkpoint failed
12:50:15 (4548): called boinc_finish


As I am not computer literate I don't know whether this is a fault at my end i.e. my pc is goosed or an error with the workunit.

Does anybody have any advice?

Thanks
John


Next 10

Main page · Your account · Message boards


Copyright © 2013 AstroInformatics Group