Computation errors

Author	Message
Phil Send message Joined: 20 Nov 12 Posts: 3 Credit: 4,248,153 RAC: 0	Message 56752 - Posted: 5 Jan 2013, 19:17:51 UTC I have been running the MW@home for about six weeks with no errors or problems until now. Every task that the cpu completes now states "computation error" but the gpu tasks are ok. It also no longer downloads new work unless I hit the update button. My computer hardware hasn't changed and I have not added or deleted any programs lately. Any help would be appreciated as it's frustrating to have the computer do all the work and get wasted results and no credit. Phil ID: 56752 · Rating: 0 · rate: / Reply Quote

TeeVeeEss Send message Joined: 5 Nov 08 Posts: 2 Credit: 13,898,752 RAC: 0	Message 56753 - Posted: 5 Jan 2013, 22:54:55 UTC First set the project to No New Work (NNW). When all Milkyway-work is finished: reset the Milkyway-project. Stop BOINC. Check if the project-directory (..\projects\milkyway.cs.rpi.edu_milkyway) is empty. If not: manually delete all files in that directory. Close down your computer: shut down and power off. Start your computer. Start BOINC. Allow new work for Milkyway. If your N-Body CPU-tasks still end in error after 0 seconds: ask someone else for help in this thread :) ID: 56753 · Rating: 0 · rate: / Reply Quote

Phil Send message Joined: 20 Nov 12 Posts: 3 Credit: 4,248,153 RAC: 0	Message 56754 - Posted: 5 Jan 2013, 23:02:54 UTC - in response to Message 56753. Thanks for the info. I tried it but I'm still having the same problem. Thank you for your help though. Phil ID: 56754 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 56763 - Posted: 6 Jan 2013, 13:18:00 UTC - in response to Message 56754. Thanks for the info. I tried it but I'm still having the same problem. Thank you for your help though. Phil Read some of the other threads, lots of people are having problems with the new 1.04 cpu units. This thread in particular: http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3102&nowrap=true#56727 ID: 56763 · Rating: 0 · rate: / Reply Quote

Ronald R Codney Send message Joined: 29 Nov 11 Posts: 18 Credit: 815,433 RAC: 0	Message 56769 - Posted: 6 Jan 2013, 17:37:52 UTC On a Nbody error, this is the info sent back to me: <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code -1073740940 (0xc0000374) </message> <stderr_txt> <search_application> milkyway_nbody 1.04 Windows x86_64 double OpenMP, Crlibm </search_application> Using OpenMP 1 max threads on a system with 4 processors Warning: not applying timestep correction for workunit with min version 0.80 Using OpenMP 1 max threads on a system with 4 processors Using OpenMP 1 max threads on a system with 4 processors <search_likelihood>-62200.827114903703000</search_likelihood> </stderr_txt> ]]> ID: 56769 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 56779 - Posted: 7 Jan 2013, 12:18:36 UTC - in response to Message 56769. On a Nbody error, this is the info sent back to me: <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code -1073740940 (0xc0000374) </message> <stderr_txt> <search_application> milkyway_nbody 1.04 Windows x86_64 double OpenMP, Crlibm </search_application> Using OpenMP 1 max threads on a system with 4 processors Warning: not applying timestep correction for workunit with min version 0.80 Using OpenMP 1 max threads on a system with 4 processors Using OpenMP 1 max threads on a system with 4 processors <search_likelihood>-62200.827114903703000</search_likelihood> </stderr_txt> ]]> Look in the Nbody thread under News, some Admins are responding in that thread. ID: 56779 · Rating: 0 · rate: / Reply Quote

Kanyo Anastasov Send message Joined: 26 Jun 09 Posts: 1 Credit: 1,331,983 RAC: 0	Message 56808 - Posted: 10 Jan 2013, 8:24:02 UTC I have computation errors on 6 WU from the N-Body Simulation v1.04. Is this an issue with the application or with my computer. <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code -1073740940 (0xc0000374) </message> <stderr_txt> <search_application> milkyway_nbody 1.04 Windows x86_64 double OpenMP, Crlibm </search_application> Using OpenMP 1 max threads on a system with 4 processors Warning: not applying timestep correction for workunit with min version 0.80 Using OpenMP 1 max threads on a system with 4 processors Number of particles in bins is very small compared to total. (334 << 100000). Skipping distance calculation <search_likelihood>-194.117647058823540</search_likelihood> </stderr_txt> ]]> ID: 56808 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0	Message 56809 - Posted: 10 Jan 2013, 9:07:35 UTC - in response to Message 56808. I have a suspicion that the N-Body application fails with exit code -1073740940 if it was stopped and restarted (from the checkpoint file) during the course of a run. Somebody help me test this, please? ID: 56809 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 56819 - Posted: 11 Jan 2013, 11:07:05 UTC - in response to Message 56809. Last modified: 11 Jan 2013, 11:25:33 UTC I have a suspicion that the N-Body application fails with exit code -1073740940 if it was stopped and restarted (from the checkpoint file) during the course of a run. Somebody help me test this, please? Hmmm... OK, I have an nBody running on this host I'm posting from, so to test that I just forced the BOINC service to do a restart (to ensure that all tasks would have to be unloaded from memory) and the nBody task restarted from checkpoint without a problem. So I don't think that restarting from checkpoint is the problem, at least not in and of itself that is. Al ID: 56819 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0	Message 56821 - Posted: 11 Jan 2013, 11:32:14 UTC - in response to Message 56819. I have a suspicion that the N-Body application fails with exit code -1073740940 if it was stopped and restarted (from the checkpoint file) during the course of a run. Somebody help me test this, please? Hmmm... OK, I have an nBody running on this host I'm posting from, so to test that I just forced the BOINC service to do a restart (to ensure that all tasks would have to be unloaded from memory) and the nBody task restarted from checkpoint without a problem. So I don't think that restarting from checkpoint is the problem, at least not in and of itself that is. Al Any error will reveal itself as heap corruption discovered during the memory de-allocation and cleanup phase at the end of the run. ID: 56821 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 56823 - Posted: 11 Jan 2013, 12:05:20 UTC - in response to Message 56821. Last modified: 11 Jan 2013, 12:15:11 UTC So IOW, I'm going to have to wait until the run completes, correct? If so, here's a link to the task in case I'm not around to catch the results. http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=293818334 Estimated ToC is 25.5 hours at this point (but increasing). Al ID: 56823 · Rating: 0 · rate: / Reply Quote

Richard Haselgrove Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0	Message 56824 - Posted: 11 Jan 2013, 13:16:39 UTC OK, I sacrificed a short one to the cause: task 379042021. I stopped BOINC at about 75% - 80%, so it was flushed from memory, and restarted. It ran for another 20 minutes after the restart, but failed with the expected 0xc0000374 error some 20 minutes later, which is when I was expecting it to reach 100%. ID: 56824 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 56827 - Posted: 11 Jan 2013, 13:39:57 UTC - in response to Message 56824. OK, I sacrificed a short one to the cause: task 379042021. I stopped BOINC at about 75% - 80%, so it was flushed from memory, and restarted. It ran for another 20 minutes after the restart, but failed with the expected 0xc0000374 error some 20 minutes later, which is when I was expecting it to reach 100%. LOL... Oh great! :-) The one I was messing with looks to be a long one. However, this was the XPP-64 host and I'm pretty sure I have restarted BOINC, rebooted the machine, etc. while an nBody has been running during this latest test run. So hopefully XPP is immune to this restart from checkpoint heap corruption problem. ID: 56827 · Rating: 0 · rate: / Reply Quote

Alinator Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0	Message 56839 - Posted: 11 Jan 2013, 21:55:17 UTC Last modified: 11 Jan 2013, 21:56:30 UTC OK, the task I was testing on and linked to in the earlier post completed and validated with no issues. ID: 56839 · Rating: 0 · rate: / Reply Quote

Overtonesinger Send message Joined: 15 Feb 10 Posts: 63 Credit: 1,836,010 RAC: 0	Message 57439 - Posted: 7 Mar 2013, 8:42:26 UTC Suddenly, all separation tasks I get are erroring out immediatelly on this computer! 6 tesks, than another 6 tasks, etc.... :O http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=344187 Even after resetting the project! ... Is there a "bad" serie of WUs, please? Whats happening ? :( ID: 57439 · Rating: 0 · rate: / Reply Quote

Len LE/GE Send message Joined: 8 Feb 08 Posts: 261 Credit: 104,050,322 RAC: 0	Message 57440 - Posted: 7 Mar 2013, 10:15:04 UTC - in response to Message 57439. Suddenly, all separation tasks I get are erroring out immediatelly on this computer! 6 tesks, than another 6 tasks, etc.... :O http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=344187 Even after resetting the project! ... Is there a "bad" serie of WUs, please? Whats happening ? :( Client state Compute error Exit status -185 (0xffffffffffffff47) ERR_RESULT_START <core_client_version>7.0.44</core_client_version> <![CDATA[ <message> couldn't start app: CreateProcess() failed - Klient nenï¿½ drï¿½itelem poï¿½adovanï¿½ho oprï¿½vnï¿½nï¿½. (0x522) </message> ]]> Your computer has a problem to start the mw app. Do you have problems with other projects on this computer too? ID: 57440 · Rating: 0 · rate: / Reply Quote

Carsten Milkau Send message Joined: 10 Feb 13 Posts: 6 Credit: 1,994,863 RAC: 0	Message 57465 - Posted: 9 Mar 2013, 20:59:12 UTC Last modified: 9 Mar 2013, 21:06:38 UTC Got a different kind of error, which fails all WU immediately: <core_client_version>7.0.29</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> <search_application> milkyway_separation 1.02 Linux x86_64 double OpenCL </search_application> Unrecognized XML in project preferences: max_gfx_cpu_pct Skipping: 20 Skipping: /max_gfx_cpu_pct Unrecognized XML in project preferences: allow_non_preferred_apps Skipping: 1 Skipping: /allow_non_preferred_apps Unrecognized XML in project preferences: nbody_graphics_poll_period Skipping: 30 Skipping: /nbody_graphics_poll_period Unrecognized XML in project preferences: nbody_graphics_float_speed Skipping: 5 Skipping: /nbody_graphics_float_speed Unrecognized XML in project preferences: nbody_graphics_textured_point_size Skipping: 250 Skipping: /nbody_graphics_textured_point_size Unrecognized XML in project preferences: nbody_graphics_point_point_size Skipping: 40 Skipping: /nbody_graphics_point_point_size BOINC GPU type suggests using OpenCL vendor 'NVIDIA Corporation' Setting process priority to 0 (13): Permission denied Opening Lua script 'astronomy_parameters.txt' (2): No such file or directory Error reading astronomy parameters from file 'astronomy_parameters.txt' Trying old parameters file Opening astronomy parameters file 'astronomy_parameters.txt' (2): No such file or directory Failed to read parameters file 20:40:09 (12990): called boinc_finish </stderr_txt> ]]> I completely removed and re-initialized the project several times without any luck. Apparantly GPU tasks need this file but don't download it. http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=415923043 http://milkyway.cs.rpi.edu/milkyway/results.php?userid=765878&state=5 ID: 57465 · Rating: 0 · rate: / Reply Quote

Overtonesinger Send message Joined: 15 Feb 10 Posts: 63 Credit: 1,836,010 RAC: 0	Message 57466 - Posted: 9 Mar 2013, 22:30:40 UTC - in response to Message 57440. Yep ! With all projects the same error! - reinstalled back the STABLE version: Issue resolved! :) (so, the special users for running boinc apps and for admin.boinc ... were re-created with the proper rights, probably. It is still unknown how or why they were damaged, either some windows update .... or maybe mass-user-rights change invoked by company-domain administrator ) ID: 57466 · Rating: 0 · rate: / Reply Quote

TJ Send message Joined: 12 Aug 09 Posts: 262 Credit: 92,631,041 RAC: 0	Message 57467 - Posted: 9 Mar 2013, 23:24:53 UTC Last modified: 9 Mar 2013, 23:26:54 UTC All taks error out immediately with BOINC 7.0.28. See these :416025759 (I can not get it clickable). Is it something on my end? Einstein@home is now running on the GPU's and Rosetta@home on 5 CPU's, thus one CPU left. This is the eroor message (for all last 33 tasks): Stderr output <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> BOINC: parse gpu_opencl_dev_index 0 <search_application> milkyway_separation 1.02 Windows x86_64 double OpenCL </search_application> Unrecognized XML in project preferences: nvidia_block_amount Skipping: 128 Skipping: /nvidia_block_amount BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.' Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Error reading astronomy parameters from file 'astronomy_parameters.txt' Trying old parameters file Using SSE4.1 path Found 1 platform Platform 0 information: Name: AMD Accelerated Parallel Processing Version: OpenCL 1.2 AMD-APP (1124.2) Vendor: Advanced Micro Devices, Inc. Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_khr_d3d10_sharing cl_khr_d3d11_sharing Profile: FULL_PROFILE Using device 0 on platform 0 Found 2 CL devices Device 'Cypress' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU) Driver version: 1124.2 (VM) Version: OpenCL 1.2 AMD-APP (1124.2) Compute capability: 0.0 Max compute units: 20 Clock frequency: 850 Mhz Global mem size: 1073741824 Local mem size: 32768 Max const buf size: 65536 Double extension: cl_khr_fp64 Build log: -------------------------------------------------------------------------------- "C:\Users\TJ\AppData\Local\Temp\OCL4D8.tmp.cl", line 30: warning: OpenCL extension is now part of core #pragma OPENCL EXTENSION cl_khr_fp64 : enable ^ -------------------------------------------------------------------------------- clBuildProgram: Build failure (-11): CL_BUILD_PROGRAM_FAILURE Error building program from source (-11): CL_BUILD_PROGRAM_FAILURE Error creating integral program from source Failed to calculate likelihood <background_integral> 1.#QNAN0000000000 </background_integral> <stream_integral> 1.#QNAN0000000000 1.#QNAN0000000000 1.#QNAN0000000000 </stream_integral> <background_likelihood> 1.#QNAN0000000000 </background_likelihood> <stream_only_likelihood> 1.#QNAN0000000000 1.#QNAN0000000000 1.#QNAN0000000000 </stream_only_likelihood> <search_likelihood> 1.#QNAN0000000000 </search_likelihood> 00:10:58 (3332): called boinc_finish </stderr_txt> ]]> Greetings from, TJ ID: 57467 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0	Message 57471 - Posted: 10 Mar 2013, 11:22:56 UTC - in response to Message 57467. Last modified: 10 Mar 2013, 11:23:30 UTC All taks error out immediately with BOINC 7.0.28. See these :416025759 (I can not get it clickable). Is it something on my end? Einstein@home is now running on the GPU's and Rosetta@home on 5 CPU's, thus one CPU left. This is the eroor message (for all last 33 tasks): Stderr output <core_client_version>7.0.28</core_client_version> ]]> Are you averse to upgrading to the Boinc version 7.0.52? It fixed my problems at another project and for me seems to be stable. You can get it here: http://boinc.berkeley.edu/dl/?C=M;O=D I see 7.0.54 is now available, here are the changes: http://boinc.berkeley.edu/dev/forum_thread.php?id=6698 ID: 57471 · Rating: 0 · rate: / Reply Quote