Message boards :
Number crunching :
Computation errors?
Message board moderation
Author | Message |
---|---|
Send message Joined: 25 Feb 09 Posts: 82 Credit: 15,824,247 RAC: 0 |
Hi, Seems there's something wrong here. No idea if it's on my side or on MW's side, but I extremely often get computation errors. If I don't get those, then I get invalid results. What's going on? Out of 7 WUs, only one has been successful thus far. This is on a 64-bit Linux system. All my other projects (einstein, qmc, rosetta and cosmology@home) compute correctly, so I suspect it's not something on my side. I'm really thinking of just suspending MW as I can't seem to get decent outcomes Any ideas? http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=112757 |
Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0 |
Hi,This is a limitation of the N-body treecode that can be hit in rare circumstances, which I believe is usually related to precision issues. I've been debating treating it as non-fatal. You're supposed to get most of the credit when this happens, but something looks wrong with that. As for other Linux specific errors, there is currently a random crash in the BOINC libraries that sometimes happens. The dev list says it may have already been fixed for the next release. |
Send message Joined: 25 Feb 09 Posts: 82 Credit: 15,824,247 RAC: 0 |
Hi Thanks for answer. Well, it's not that rare over here. As said, I've been crunching for 4 days now and only one out of all WUs has succeeded. And yeah, I don't get credit for those errors so in the end I'm just burning CPU time for nothing which I can spend instead on something else less broken :/ Any ideas if I can do something to fix it? You mention a random crash in the libs. Does this crash brings down the GUI? Because if it does, I haven't experienced it thus far. |
Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0 |
Thanks for answer. Well, it's not that rare over here. As said, I've been crunching for 4 days now and only one out of all WUs has succeeded.That's rather odd. I'm still not exactly clear on why this can happen. It's been my main question about it since I started working on it. Any ideas if I can do something to fix it?Nope. It's on this end. You mention a random crash in the libs. Does this crash brings down the GUI? Because if it does, I haven't experienced it thus far.No, it's just that the majority of crashes I see now in the database are random segfaults on Linux. I asked about it on the boinc dev list, and apparently Einstein@home has been having this problem for a while. |
Send message Joined: 25 Feb 09 Posts: 82 Credit: 15,824,247 RAC: 0 |
hmm, interesting. Because I've got only 2 invalid WUs for einstein (it seems only the S5 searches are affected, though some succeed. ABP succeeds each time). Most of the rest are still pending, while the others have succeeded http://einstein.phys.uwm.edu/results.php?userid=198995 About MW, this is how it goes here. Either it crunches and at some point I see computation error in BOINC (usually appears nearly the end, say between 80-90%) or it fully succeeds but when I go to my tasks on here, I see that after a while it gets marked as invalid. I've suspended MW for the time being until this can be resolved. No point in wasting cycles for nothing :/ If it matters, my system is a Phenom II X4 955 (no OC or anything), currently without CUDA GPU but soon I'll add one. |
Send message Joined: 31 Oct 10 Posts: 137 Credit: 3,755,067 RAC: 0 |
I´ve got a comutation error only because i shut down the boinc manager. 6.10.58 is running. I´m very surprised. Also the units was still running after the shut down. I had to cancle them in task manager. Vista 64 ultimate. |
Send message Joined: 2 Jun 09 Posts: 1 Credit: 20,555,072 RAC: 3 |
I am also getting many computation errors on my machines. One machine in particular errors immediately while the others make it part way through computations before failing. In checking results, my wingmen often fail computing the same workunit. --Art |
Send message Joined: 14 Dec 08 Posts: 2 Credit: 9,981,120 RAC: 0 |
|
Send message Joined: 30 Dec 07 Posts: 311 Credit: 149,490,184 RAC: 0 |
Fermi class NVIDIA cards are not supported by the default MilkyWay CUDA application. You need to download and install version 0.24: Fermi CUDA (MW_0.24_CUDA.zip) modified by Crunch3r from http://www.arkayn.us/milkyway 1) Stop BOINC by going to the advanced menu and selecting "Shut down connected client", click okay and then click cancel. Then exit out of the BOINC Manager. 2) Unzip the MW_0.24_CUDA.zip file you downloaded. 3) Copy the 4 files, app_info.xml, milkyway_windows_intelx86__cuda23.exe, cutil.dll, cudart.dll to your BOINC\projects\milkyway.cs.rpi.edu_milkyway folder. Windows 98/SE/ME: C:\Windows\All Users\BOINC\ or C:\Windows\Profiles\All Users\BOINC\ Windows 2000/XP: C:\Documents and Settings\All Users\Application Data\BOINC\ Windows Vista and Windows 7: C:\ProgramData\BOINC\ There is a file called "app_info.xml" included that will make BOINC use this app automatically. 4) Restart BOINC by reopening BOINC Manager. BOINC will tell you in Messages tab that it found app_info.xml and is using an "anonymous platform". |
Send message Joined: 25 Feb 09 Posts: 82 Credit: 15,824,247 RAC: 0 |
Fermi class NVIDIA cards are not supported by the default MilkyWay CUDA application. You need to download and install version 0.24: Fermi CUDA (MW_0.24_CUDA.zip) modified by Crunch3r from http://www.arkayn.us/milkyway I'm on Linux so if I add a CUDA GPU soon, I'll face the same problem? If so, is there a mod for Linux too? |
Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0 |
If I remember correctly, it was only the Windows version that had the problem. Linux should be fine. |
Send message Joined: 19 Oct 10 Posts: 4 Credit: 67,375,789 RAC: 224 |
Fermi class NVIDIA cards are not supported by the default MilkyWay CUDA application. You need to download and install version 0.24: Fermi CUDA (MW_0.24_CUDA.zip) modified by Crunch3r from http://www.arkayn.us/milkyway Hi, I did as you described and found I row as "23/11/2010 23:02:55 Milkyway@home Found app_info.xml; using anonymous platform" BUT!!! The next messages from the project were: 23/11/2010 23:02:55 Milkyway@home [error] State file error: missing application milkyway_nbody 23/11/2010 23:02:55 Milkyway@home [error] Can't handle workunit in state file 23/11/2010 23:02:55 Milkyway@home [error] State file error: missing application milkyway_nbody 23/11/2010 23:02:55 Milkyway@home [error] Can't handle workunit in state file 23/11/2010 23:02:55 Milkyway@home [error] State file error: missing task de_nbody_model6_3_21065_1290515179 23/11/2010 23:02:55 Milkyway@home [error] Can't link task de_nbody_model6_3_21065_1290515179_0 in state file 23/11/2010 23:02:55 Milkyway@home [error] State file error: missing task de_nbody_model5_3_810_1290534245 23/11/2010 23:02:55 Milkyway@home [error] Can't link task de_nbody_model5_3_810_1290534245_0 in state file 23/11/2010 23:02:55 Milkyway@home [error] No application found for task: windows_intelx86 45 sse2; discarding 23/11/2010 23:02:55 Milkyway@home [error] State file error: result de_nbody_model6_3_21065_1290515179_0 not found for task What changes they expect in .xml file to do? |
Send message Joined: 19 Oct 10 Posts: 4 Credit: 67,375,789 RAC: 224 |
Oh, how interesting. Just found the reason. The assigning of the file name of: "<app_info> <app> <name>milkyway</name> </app> <file_info> <name>milkyway_windows_intelx86__cuda23.exe</name> <executable/> </file_info>" is wrong! I copied the file name from the file manager and pasted it into the assigning section <name>*********(here)**********.exe</name>. The names seemed equal from the beginning but only the newly pasted makes no errors when BOINC start. I hope it will help somebody. |
Send message Joined: 19 Oct 10 Posts: 4 Credit: 67,375,789 RAC: 224 |
A-a-a-and, YES!!! The job started without crash finally! Goood. All tasks were successfully loaded (18), 1 finished and under validation now. Unfortunately I couldn't use CPU for Milkyway project now with this modified files: "23/11/2010 23:13:54 Milkyway@home Message from server: Your app_info.xml file doesn't have a version of MilkyWay@Home N-Body Simulation." |
Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 |
You can still do them, add the app to the app_info file - I dont have the syntax for the N-Body apps, hopefully someone will post reading this. Regards Zy |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
See this post: http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1977#42826 Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 4 Jul 09 Posts: 91 Credit: 17,266,898 RAC: 2,725 |
I am having a immediate error problem on only one of my systems an older Windows box single cpu 1.8mhz running Win2000. The work unit downloads showing an estimated time of 451 hours and immediately errors when it starts to run. None of my other systems is doing this. Suggestions anyone ? 11/27/2010 8:31:17 PM Milkyway@home Scheduler request completed: got 1 new tasks 11/27/2010 8:31:19 PM Milkyway@home Started download of de_separation_14_3s_fix_1_1690243_1290911426_search_parameters 11/27/2010 8:31:21 PM Milkyway@home Finished download of de_separation_14_3s_fix_1_1690243_1290911426_search_parameters 11/27/2010 8:31:21 PM Milkyway@home Starting de_separation_14_3s_fix_1_1690243_1290911426_0 11/27/2010 8:31:21 PM Milkyway@home [error] Process creation failed: 11/27/2010 8:31:21 PM Milkyway@home [error] Process creation failed: 11/27/2010 8:31:21 PM Milkyway@home [error] Process creation failed: 11/27/2010 8:31:21 PM Milkyway@home [error] Process creation failed: 11/27/2010 8:31:22 PM Milkyway@home [error] Process creation failed: 11/27/2010 8:31:23 PM Milkyway@home Computation for task de_separation_14_3s_fix_1_1690243_1290911426_0 finished In October of 1969 I took an oath to support and defend the Constitution of the United States against all enemies, foreign and domestic; There was no expiration date. |
Send message Joined: 28 Apr 10 Posts: 16 Credit: 9,668,276 RAC: 0 |
I got exactly the same error on my Windows 2000 box. For the sake of the test I decided to try older version of BOINC (which is supposed to work in NT and 98 and 2000) and here is what I see in logs: 09/02/2011 11:04:03 OS: Microsoft Windows 2000: Professional x86 Edition, Service Pack 4, (05.00.2195.00) 09/02/2011 11:04:03 Memory: 383.47 MB physical, 921.68 MB virtual 09/02/2011 11:04:03 Disk: 18.63 GB total, 16.32 GB free 09/02/2011 11:04:03 Local time is UTC +1 hours 09/02/2011 11:04:03 No CUDA-capable NVIDIA GPUs found 09/02/2011 11:04:03 No coprocessors 09/02/2011 11:04:03 Not using a proxy 09/02/2011 11:04:03 Version change (6.10.58 -> 6.6.38) 09/02/2011 11:04:03 No general preferences found - using BOINC defaults 09/02/2011 11:04:03 Preferences limit memory usage when active to 191.73MB 09/02/2011 11:04:03 Preferences limit memory usage when idle to 345.12MB 09/02/2011 11:04:03 Preferences limit disk usage to 9.31GB 09/02/2011 11:04:03 This computer is not attached to any projects 09/02/2011 11:04:03 Visit http://boinc.berkeley.edu for instructions 09/02/2011 11:04:03 Running CPU benchmarks 09/02/2011 11:04:03 Suspending computation - running CPU benchmarks 09/02/2011 11:04:14 Fetching configuration file from http://milkyway.cs.rpi.edu/milkyway/get_project_config.php 09/02/2011 11:04:33 Milkyway@home Master file download succeeded 09/02/2011 11:04:34 Benchmark results: 09/02/2011 11:04:34 Number of CPUs: 1 09/02/2011 11:04:34 596 floating point MIPS (Whetstone) per CPU 09/02/2011 11:04:34 1034 integer MIPS (Dhrystone) per CPU 09/02/2011 11:04:39 Milkyway@home Sending scheduler request: Project initialization. 09/02/2011 11:04:39 Milkyway@home Requesting new tasks 09/02/2011 11:04:44 Milkyway@home Scheduler request completed: got 1 new tasks 09/02/2011 11:04:44 Milkyway@home General prefs: from Milkyway@home (last modified 01-May-2010 12:09:17) 09/02/2011 11:04:44 Milkyway@home Host location: none 09/02/2011 11:04:44 Milkyway@home General prefs: using your defaults 09/02/2011 11:04:44 Preferences limit memory usage when active to 268.43MB 09/02/2011 11:04:44 Preferences limit memory usage when idle to 345.12MB 09/02/2011 11:04:44 Preferences limit disk usage to 9.31GB 09/02/2011 11:04:46 Milkyway@home Started download of milkyway_0.50_windows_intelx86.exe 09/02/2011 11:04:46 Milkyway@home Started download of parameters-82-1s-fix.txt 09/02/2011 11:04:47 Milkyway@home Finished download of parameters-82-1s-fix.txt 09/02/2011 11:04:47 Milkyway@home Started download of stars-82-new.txt 09/02/2011 11:04:49 Milkyway@home Finished download of milkyway_0.50_windows_intelx86.exe 09/02/2011 11:04:49 Milkyway@home Started download of de_separation_82_1s_fix_1_60710_1297245785_search_parameters 09/02/2011 11:04:50 Milkyway@home Finished download of de_separation_82_1s_fix_1_60710_1297245785_search_parameters 09/02/2011 11:05:22 Milkyway@home Finished download of stars-82-new.txt 09/02/2011 11:05:23 Milkyway@home Starting de_separation_82_1s_fix_1_60710_1297245785_0 09/02/2011 11:05:23 Milkyway@home [error] Process creation failed: 09/02/2011 11:05:23 Milkyway@home [error] Process creation failed: 09/02/2011 11:05:24 Milkyway@home [error] Process creation failed: 09/02/2011 11:05:24 Milkyway@home [error] Process creation failed: 09/02/2011 11:05:25 Milkyway@home [error] Process creation failed: 09/02/2011 11:05:26 Milkyway@home Computation for task de_separation_82_1s_fix_1_60710_1297245785_0 finished 09/02/2011 11:05:37 Milkyway@home task de_separation_82_1s_fix_1_60710_1297245785_0 suspended by user Is there any workaround for this? Thanks |
Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0 |
This MIGHT be because I think I built everything targeting Windows XP. |
Send message Joined: 16 Jan 09 Posts: 5 Credit: 400,627,866 RAC: 0 |
Any long task crasched at my HD 4770 @ XP x86. All short tasks are running OK. ID 120357. Any idea? <core_client_version>6.10.58</core_client_version> <![CDATA[ <message> Nesprávná funkce. (0x1) - exit code 1 (0x1) </message> <stderr_txt> Running Milkyway@home ATI GPU application version 0.23 (Win32, CAL 1.4) by Gipsel ignoring unknown input argument in app_info.xml: -np ignoring unknown input argument in app_info.xml: 8 ignoring unknown input argument in app_info.xml: -p ignoring unknown input argument in app_info.xml: 0.3519878880479614400000000 ignoring unknown input argument in app_info.xml: 26.1891288052719350000000000 ignoring unknown input argument in app_info.xml: -2.4507573159806090000000000 ignoring unknown input argument in app_info.xml: 42.2227791818824000000000000 ignoring unknown input argument in app_info.xml: 31.2596681887050780000000000 ignoring unknown input argument in app_info.xml: 2.2478147380792306000000000 ignoring unknown input argument in app_info.xml: 0.2000000000000000000000000 ignoring unknown input argument in app_info.xml: 2.0000000000000000000000000 instructed by BOINC client to use device 0 CPU: Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz (2 cores/threads) 2.9997 GHz (323ms) CAL Runtime: 1.4.900 Found 1 CAL device Device 0: ATI Radeon HD4700/4800 (RV740/RV770) 512 MB local RAM (remote 64 MB cached + 512 MB uncached) GPU core clock: 825 MHz, memory clock: 845 MHz 640 shader units organized in 8 SIMDs with 16 VLIW units (5-issue), wavefront size 64 threads supporting double precision Starting WU on GPU 0 main integral, 1500 iterations (3500x3000), 1 streams predicted runtime per iteration is 705 ms (33.3333 ms are allowed), dividing each iteration in 22 parts borders of the domains at 0 160 320 480 640 800 960 1120 1272 1432 1592 1752 1912 2072 2232 2392 2552 2704 2864 3024 3184 3344 3500 1, integration, Stream Allocation : Failed to create Buffer Kernel Execution : Uninitialized or Allocation failed Input streams. Stream Allocation : Failed to create Buffer </stderr_txt> ]]> |
©2024 Astroinformatics Group