Message boards :
Number crunching :
Computation errors
Message board moderation
Author | Message |
---|---|
Send message Joined: 20 Nov 12 Posts: 3 Credit: 4,248,153 RAC: 0 |
I have been running the MW@home for about six weeks with no errors or problems until now. Every task that the cpu completes now states "computation error" but the gpu tasks are ok. It also no longer downloads new work unless I hit the update button. My computer hardware hasn't changed and I have not added or deleted any programs lately. Any help would be appreciated as it's frustrating to have the computer do all the work and get wasted results and no credit. Phil |
Send message Joined: 5 Nov 08 Posts: 2 Credit: 13,898,752 RAC: 0 |
First set the project to No New Work (NNW). When all Milkyway-work is finished: reset the Milkyway-project. Stop BOINC. Check if the project-directory (..\projects\milkyway.cs.rpi.edu_milkyway) is empty. If not: manually delete all files in that directory. Close down your computer: shut down and power off. Start your computer. Start BOINC. Allow new work for Milkyway. If your N-Body CPU-tasks still end in error after 0 seconds: ask someone else for help in this thread :) |
Send message Joined: 20 Nov 12 Posts: 3 Credit: 4,248,153 RAC: 0 |
Thanks for the info. I tried it but I'm still having the same problem. Thank you for your help though. Phil |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Thanks for the info. I tried it but I'm still having the same problem. Thank you for your help though. Read some of the other threads, lots of people are having problems with the new 1.04 cpu units. This thread in particular: http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3102&nowrap=true#56727 |
Send message Joined: 29 Nov 11 Posts: 18 Credit: 815,433 RAC: 0 |
On a Nbody error, this is the info sent back to me: <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code -1073740940 (0xc0000374) </message> <stderr_txt> <search_application> milkyway_nbody 1.04 Windows x86_64 double OpenMP, Crlibm </search_application> Using OpenMP 1 max threads on a system with 4 processors Warning: not applying timestep correction for workunit with min version 0.80 Using OpenMP 1 max threads on a system with 4 processors Using OpenMP 1 max threads on a system with 4 processors <search_likelihood>-62200.827114903703000</search_likelihood> </stderr_txt> ]]> |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
On a Nbody error, this is the info sent back to me: Look in the Nbody thread under News, some Admins are responding in that thread. |
Send message Joined: 26 Jun 09 Posts: 1 Credit: 1,331,983 RAC: 0 |
I have computation errors on 6 WU from the N-Body Simulation v1.04. Is this an issue with the application or with my computer. <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code -1073740940 (0xc0000374) </message> <stderr_txt> <search_application> milkyway_nbody 1.04 Windows x86_64 double OpenMP, Crlibm </search_application> Using OpenMP 1 max threads on a system with 4 processors Warning: not applying timestep correction for workunit with min version 0.80 Using OpenMP 1 max threads on a system with 4 processors Number of particles in bins is very small compared to total. (334 << 100000). Skipping distance calculation <search_likelihood>-194.117647058823540</search_likelihood> </stderr_txt> ]]> |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
I have a suspicion that the N-Body application fails with exit code -1073740940 if it was stopped and restarted (from the checkpoint file) during the course of a run. Somebody help me test this, please? |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
I have a suspicion that the N-Body application fails with exit code -1073740940 if it was stopped and restarted (from the checkpoint file) during the course of a run. Hmmm... OK, I have an nBody running on this host I'm posting from, so to test that I just forced the BOINC service to do a restart (to ensure that all tasks would have to be unloaded from memory) and the nBody task restarted from checkpoint without a problem. So I don't think that restarting from checkpoint is the problem, at least not in and of itself that is. Al |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
I have a suspicion that the N-Body application fails with exit code -1073740940 if it was stopped and restarted (from the checkpoint file) during the course of a run. Any error will reveal itself as heap corruption discovered during the memory de-allocation and cleanup phase at the end of the run. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
So IOW, I'm going to have to wait until the run completes, correct? If so, here's a link to the task in case I'm not around to catch the results. http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=293818334 Estimated ToC is 25.5 hours at this point (but increasing). Al |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
OK, I sacrificed a short one to the cause: task 379042021. I stopped BOINC at about 75% - 80%, so it was flushed from memory, and restarted. It ran for another 20 minutes after the restart, but failed with the expected 0xc0000374 error some 20 minutes later, which is when I was expecting it to reach 100%. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
OK, I sacrificed a short one to the cause: task 379042021. LOL... Oh great! :-) The one I was messing with looks to be a long one. However, this was the XPP-64 host and I'm pretty sure I have restarted BOINC, rebooted the machine, etc. while an nBody has been running during this latest test run. So hopefully XPP is immune to this restart from checkpoint heap corruption problem. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
OK, the task I was testing on and linked to in the earlier post completed and validated with no issues. |
Send message Joined: 15 Feb 10 Posts: 63 Credit: 1,836,010 RAC: 0 |
Suddenly, all separation tasks I get are erroring out immediatelly on this computer! 6 tesks, than another 6 tasks, etc.... :O http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=344187 Even after resetting the project! ... Is there a "bad" serie of WUs, please? Whats happening ? :( |
Send message Joined: 8 Feb 08 Posts: 261 Credit: 104,050,322 RAC: 0 |
Suddenly, all separation tasks I get are erroring out immediatelly on this computer! 6 tesks, than another 6 tasks, etc.... :O Client state Compute error Exit status -185 (0xffffffffffffff47) ERR_RESULT_START <core_client_version>7.0.44</core_client_version> <![CDATA[ <message> couldn't start app: CreateProcess() failed - Klient nen� dr�itelem po�adovan�ho opr�vn�n�. (0x522) </message> ]]> Your computer has a problem to start the mw app. Do you have problems with other projects on this computer too? |
Send message Joined: 10 Feb 13 Posts: 6 Credit: 1,994,863 RAC: 0 |
Got a different kind of error, which fails all WU immediately: <core_client_version>7.0.29</core_client_version> I completely removed and re-initialized the project several times without any luck. Apparantly GPU tasks need this file but don't download it. http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=415923043 http://milkyway.cs.rpi.edu/milkyway/results.php?userid=765878&state=5 |
Send message Joined: 15 Feb 10 Posts: 63 Credit: 1,836,010 RAC: 0 |
Yep ! With all projects the same error! - reinstalled back the STABLE version: Issue resolved! :) (so, the special users for running boinc apps and for admin.boinc ... were re-created with the proper rights, probably. It is still unknown how or why they were damaged, either some windows update .... or *maybe* mass-user-rights change invoked by company-domain administrator ) |
Send message Joined: 12 Aug 09 Posts: 262 Credit: 92,631,041 RAC: 0 |
All taks error out immediately with BOINC 7.0.28. See these :416025759 (I can not get it clickable). Is it something on my end? Einstein@home is now running on the GPU's and Rosetta@home on 5 CPU's, thus one CPU left. This is the eroor message (for all last 33 tasks): Stderr output <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> BOINC: parse gpu_opencl_dev_index 0 <search_application> milkyway_separation 1.02 Windows x86_64 double OpenCL </search_application> Unrecognized XML in project preferences: nvidia_block_amount Skipping: 128 Skipping: /nvidia_block_amount BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.' Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Error reading astronomy parameters from file 'astronomy_parameters.txt' Trying old parameters file Using SSE4.1 path Found 1 platform Platform 0 information: Name: AMD Accelerated Parallel Processing Version: OpenCL 1.2 AMD-APP (1124.2) Vendor: Advanced Micro Devices, Inc. Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_khr_d3d10_sharing cl_khr_d3d11_sharing Profile: FULL_PROFILE Using device 0 on platform 0 Found 2 CL devices Device 'Cypress' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU) Driver version: 1124.2 (VM) Version: OpenCL 1.2 AMD-APP (1124.2) Compute capability: 0.0 Max compute units: 20 Clock frequency: 850 Mhz Global mem size: 1073741824 Local mem size: 32768 Max const buf size: 65536 Double extension: cl_khr_fp64 Build log: -------------------------------------------------------------------------------- "C:\Users\TJ\AppData\Local\Temp\OCL4D8.tmp.cl", line 30: warning: OpenCL extension is now part of core #pragma OPENCL EXTENSION cl_khr_fp64 : enable ^ -------------------------------------------------------------------------------- clBuildProgram: Build failure (-11): CL_BUILD_PROGRAM_FAILURE Error building program from source (-11): CL_BUILD_PROGRAM_FAILURE Error creating integral program from source Failed to calculate likelihood <background_integral> 1.#QNAN0000000000 </background_integral> <stream_integral> 1.#QNAN0000000000 1.#QNAN0000000000 1.#QNAN0000000000 </stream_integral> <background_likelihood> 1.#QNAN0000000000 </background_likelihood> <stream_only_likelihood> 1.#QNAN0000000000 1.#QNAN0000000000 1.#QNAN0000000000 </stream_only_likelihood> <search_likelihood> 1.#QNAN0000000000 </search_likelihood> 00:10:58 (3332): called boinc_finish </stderr_txt> ]]> Greetings from, TJ |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
All taks error out immediately with BOINC 7.0.28. See these :416025759 (I can not get it clickable). Are you averse to upgrading to the Boinc version 7.0.52? It fixed my problems at another project and for me seems to be stable. You can get it here: http://boinc.berkeley.edu/dl/?C=M;O=D I see 7.0.54 is now available, here are the changes: http://boinc.berkeley.edu/dev/forum_thread.php?id=6698 |
©2025 Astroinformatics Group