Computation errors?
log in

Advanced search

Message boards : Number crunching : Computation errors?

1 · 2 · 3 · Next
Author Message
Profile microchip
Send message
Joined: 25 Feb 09
Posts: 77
Credit: 6,869,202
RAC: 0

Message 43817 - Posted: 14 Nov 2010, 10:25:17 UTC
Last modified: 14 Nov 2010, 10:28:09 UTC

Hi,

Seems there's something wrong here. No idea if it's on my side or on MW's side, but I extremely often get computation errors. If I don't get those, then I get invalid results. What's going on? Out of 7 WUs, only one has been successful thus far.

This is on a 64-bit Linux system. All my other projects (einstein, qmc, rosetta and cosmology@home) compute correctly, so I suspect it's not something on my side. I'm really thinking of just suspending MW as I can't seem to get decent outcomes

Any ideas?

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=112757

Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0

Message 43833 - Posted: 14 Nov 2010, 16:18:07 UTC - in response to Message 43817.

Hi,

Seems there's something wrong here. No idea if it's on my side or on MW's side, but I extremely often get computation errors. If I don't get those, then I get invalid results. What's going on? Out of 7 WUs, only one has been successful thus far.

This is on a 64-bit Linux system. All my other projects (einstein, qmc, rosetta and cosmology@home) compute correctly, so I suspect it's not something on my side. I'm really thinking of just suspending MW as I can't seem to get decent outcomes

Any ideas?

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=112757
This is a limitation of the N-body treecode that can be hit in rare circumstances, which I believe is usually related to precision issues. I've been debating treating it as non-fatal. You're supposed to get most of the credit when this happens, but something looks wrong with that. As for other Linux specific errors, there is currently a random crash in the BOINC libraries that sometimes happens. The dev list says it may have already been fixed for the next release.

Profile microchip
Send message
Joined: 25 Feb 09
Posts: 77
Credit: 6,869,202
RAC: 0

Message 43834 - Posted: 14 Nov 2010, 17:07:35 UTC

Hi

Thanks for answer. Well, it's not that rare over here. As said, I've been crunching for 4 days now and only one out of all WUs has succeeded. And yeah, I don't get credit for those errors so in the end I'm just burning CPU time for nothing which I can spend instead on something else less broken :/

Any ideas if I can do something to fix it?

You mention a random crash in the libs. Does this crash brings down the GUI? Because if it does, I haven't experienced it thus far.

Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0

Message 43840 - Posted: 15 Nov 2010, 0:40:08 UTC - in response to Message 43834.

Thanks for answer. Well, it's not that rare over here. As said, I've been crunching for 4 days now and only one out of all WUs has succeeded.
That's rather odd. I'm still not exactly clear on why this can happen. It's been my main question about it since I started working on it.

Any ideas if I can do something to fix it?
Nope. It's on this end.

You mention a random crash in the libs. Does this crash brings down the GUI? Because if it does, I haven't experienced it thus far.
No, it's just that the majority of crashes I see now in the database are random segfaults on Linux. I asked about it on the boinc dev list, and apparently Einstein@home has been having this problem for a while.

Profile microchip
Send message
Joined: 25 Feb 09
Posts: 77
Credit: 6,869,202
RAC: 0

Message 43848 - Posted: 15 Nov 2010, 8:39:46 UTC
Last modified: 15 Nov 2010, 8:49:38 UTC

hmm, interesting. Because I've got only 2 invalid WUs for einstein (it seems only the S5 searches are affected, though some succeed. ABP succeeds each time). Most of the rest are still pending, while the others have succeeded

http://einstein.phys.uwm.edu/results.php?userid=198995

About MW, this is how it goes here. Either it crunches and at some point I see computation error in BOINC (usually appears nearly the end, say between 80-90%) or it fully succeeds but when I go to my tasks on here, I see that after a while it gets marked as invalid. I've suspended MW for the time being until this can be resolved. No point in wasting cycles for nothing :/

If it matters, my system is a Phenom II X4 955 (no OC or anything), currently without CUDA GPU but soon I'll add one.

Profile Mike
Avatar
Send message
Joined: 31 Oct 10
Posts: 135
Credit: 2,062,190
RAC: 0

Message 43880 - Posted: 16 Nov 2010, 14:49:17 UTC

Henry W. Akeley
Send message
Joined: 2 Jun 09
Posts: 1
Credit: 13,154,951
RAC: 7,181

Message 43908 - Posted: 17 Nov 2010, 5:56:53 UTC

I am also getting many computation errors on my machines. One machine in particular errors immediately while the others make it part way through computations before failing. In checking results, my wingmen often fail computing the same workunit.
--Art

Profile SM6GXQ Peter Lindquist
Send message
Joined: 14 Dec 08
Posts: 2
Credit: 9,325,972
RAC: 0

Message 43946 - Posted: 18 Nov 2010, 9:46:02 UTC

Just installed my new i7 and Windows 7 64bit and NVIDIA GTX470.

It seems I get coputation error every time, after only 3 seconds.

Was running before with a rather old P4 with GT265-card.

regards
Peter
____________

Profile kashi
Send message
Joined: 30 Dec 07
Posts: 311
Credit: 148,905,504
RAC: 2

Message 43948 - Posted: 18 Nov 2010, 10:24:26 UTC - in response to Message 43946.

Fermi class NVIDIA cards are not supported by the default MilkyWay CUDA application. You need to download and install version 0.24: Fermi CUDA (MW_0.24_CUDA.zip) modified by Crunch3r from http://www.arkayn.us/milkyway

1) Stop BOINC by going to the advanced menu and selecting "Shut down connected client", click okay and then click cancel. Then exit out of the BOINC Manager.

2) Unzip the MW_0.24_CUDA.zip file you downloaded.

3) Copy the 4 files, app_info.xml, milkyway_windows_intelx86__cuda23.exe, cutil.dll, cudart.dll to your BOINC\projects\milkyway.cs.rpi.edu_milkyway folder.

Windows 98/SE/ME: C:\Windows\All Users\BOINC\ or C:\Windows\Profiles\All Users\BOINC\

Windows 2000/XP: C:\Documents and Settings\All Users\Application Data\BOINC\

Windows Vista and Windows 7: C:\ProgramData\BOINC\

There is a file called "app_info.xml" included that will make BOINC use this app automatically.

4) Restart BOINC by reopening BOINC Manager.

BOINC will tell you in Messages tab that it found app_info.xml and is using an "anonymous platform".

Profile microchip
Send message
Joined: 25 Feb 09
Posts: 77
Credit: 6,869,202
RAC: 0

Message 43959 - Posted: 18 Nov 2010, 21:30:08 UTC - in response to Message 43948.

Fermi class NVIDIA cards are not supported by the default MilkyWay CUDA application. You need to download and install version 0.24: Fermi CUDA (MW_0.24_CUDA.zip) modified by Crunch3r from http://www.arkayn.us/milkyway

1) Stop BOINC by going to the advanced menu and selecting "Shut down connected client", click okay and then click cancel. Then exit out of the BOINC Manager.

2) Unzip the MW_0.24_CUDA.zip file you downloaded.

3) Copy the 4 files, app_info.xml, milkyway_windows_intelx86__cuda23.exe, cutil.dll, cudart.dll to your BOINC\projects\milkyway.cs.rpi.edu_milkyway folder.

Windows 98/SE/ME: C:\Windows\All Users\BOINC\ or C:\Windows\Profiles\All Users\BOINC\

Windows 2000/XP: C:\Documents and Settings\All Users\Application Data\BOINC\

Windows Vista and Windows 7: C:\ProgramData\BOINC\

There is a file called "app_info.xml" included that will make BOINC use this app automatically.

4) Restart BOINC by reopening BOINC Manager.

BOINC will tell you in Messages tab that it found app_info.xml and is using an "anonymous platform".


I'm on Linux so if I add a CUDA GPU soon, I'll face the same problem? If so, is there a mod for Linux too?

Profile arkayn
Avatar
Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0

Message 43961 - Posted: 18 Nov 2010, 22:07:46 UTC - in response to Message 43959.

If I remember correctly, it was only the Windows version that had the problem.

Linux should be fine.
____________

Geras
Send message
Joined: 19 Oct 10
Posts: 4
Credit: 3,801,411
RAC: 4,106

Message 44130 - Posted: 23 Nov 2010, 21:07:37 UTC - in response to Message 43948.

Fermi class NVIDIA cards are not supported by the default MilkyWay CUDA application. You need to download and install version 0.24: Fermi CUDA (MW_0.24_CUDA.zip) modified by Crunch3r from http://www.arkayn.us/milkyway

1) Stop BOINC by going to the advanced menu and selecting "Shut down connected client", click okay and then click cancel. Then exit out of the BOINC Manager.

2) Unzip the MW_0.24_CUDA.zip file you downloaded.

3) Copy the 4 files, app_info.xml, milkyway_windows_intelx86__cuda23.exe, cutil.dll, cudart.dll to your BOINC\projects\milkyway.cs.rpi.edu_milkyway folder.

Windows 98/SE/ME: C:\Windows\All Users\BOINC\ or C:\Windows\Profiles\All Users\BOINC\

Windows 2000/XP: C:\Documents and Settings\All Users\Application Data\BOINC\

Windows Vista and Windows 7: C:\ProgramData\BOINC\

There is a file called "app_info.xml" included that will make BOINC use this app automatically.

4) Restart BOINC by reopening BOINC Manager.

BOINC will tell you in Messages tab that it found app_info.xml and is using an "anonymous platform".


Hi, I did as you described and found I row as
"23/11/2010 23:02:55 Milkyway@home Found app_info.xml; using anonymous platform"
BUT!!! The next messages from the project were:
23/11/2010 23:02:55 Milkyway@home [error] State file error: missing application milkyway_nbody
23/11/2010 23:02:55 Milkyway@home [error] Can't handle workunit in state file
23/11/2010 23:02:55 Milkyway@home [error] State file error: missing application milkyway_nbody
23/11/2010 23:02:55 Milkyway@home [error] Can't handle workunit in state file
23/11/2010 23:02:55 Milkyway@home [error] State file error: missing task de_nbody_model6_3_21065_1290515179
23/11/2010 23:02:55 Milkyway@home [error] Can't link task de_nbody_model6_3_21065_1290515179_0 in state file
23/11/2010 23:02:55 Milkyway@home [error] State file error: missing task de_nbody_model5_3_810_1290534245
23/11/2010 23:02:55 Milkyway@home [error] Can't link task de_nbody_model5_3_810_1290534245_0 in state file
23/11/2010 23:02:55 Milkyway@home [error] No application found for task: windows_intelx86 45 sse2; discarding
23/11/2010 23:02:55 Milkyway@home [error] State file error: result de_nbody_model6_3_21065_1290515179_0 not found for task

What changes they expect in .xml file to do?

Geras
Send message
Joined: 19 Oct 10
Posts: 4
Credit: 3,801,411
RAC: 4,106

Message 44132 - Posted: 23 Nov 2010, 21:18:32 UTC - in response to Message 44130.

Oh, how interesting.
Just found the reason.

The assigning of the file name of:

"<app_info>
<app>
<name>milkyway</name>
</app>
<file_info>
<name>milkyway_windows_intelx86__cuda23.exe</name>
<executable/>
</file_info>"

is wrong!

I copied the file name from the file manager and pasted it into the assigning section <name>*********(here)**********.exe</name>. The names seemed equal from the beginning but only the newly pasted makes no errors when BOINC start.

I hope it will help somebody.

Geras
Send message
Joined: 19 Oct 10
Posts: 4
Credit: 3,801,411
RAC: 4,106

Message 44133 - Posted: 23 Nov 2010, 21:38:17 UTC - in response to Message 44132.

A-a-a-and, YES!!!

The job started without crash finally! Goood.

All tasks were successfully loaded (18), 1 finished and under validation now.

Unfortunately I couldn't use CPU for Milkyway project now with this modified files:
"23/11/2010 23:13:54 Milkyway@home Message from server: Your app_info.xml file doesn't have a version of MilkyWay@Home N-Body Simulation."

Zydor
Avatar
Send message
Joined: 24 Feb 09
Posts: 620
Credit: 100,587,625
RAC: 0

Message 44148 - Posted: 24 Nov 2010, 2:20:58 UTC

You can still do them, add the app to the app_info file - I dont have the syntax for the N-Body apps, hopefully someone will post reading this.

Regards
Zy

Profile banditwolf
Avatar
Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0

Message 44151 - Posted: 24 Nov 2010, 3:08:35 UTC

See this post: http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1977#42826
____________
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.

Profile Bill F
Send message
Joined: 4 Jul 09
Posts: 11
Credit: 1,119,188
RAC: 597

Message 44379 - Posted: 28 Nov 2010, 5:07:04 UTC - in response to Message 44151.

I am having a immediate error problem on only one of my systems an older Windows box single cpu 1.8mhz running Win2000. The work unit downloads showing an estimated time of 451 hours and immediately errors when it starts to run. None of my other systems is doing this.

Suggestions anyone ?

11/27/2010 8:31:17 PM Milkyway@home Scheduler request completed: got 1 new tasks
11/27/2010 8:31:19 PM Milkyway@home Started download of de_separation_14_3s_fix_1_1690243_1290911426_search_parameters
11/27/2010 8:31:21 PM Milkyway@home Finished download of de_separation_14_3s_fix_1_1690243_1290911426_search_parameters
11/27/2010 8:31:21 PM Milkyway@home Starting de_separation_14_3s_fix_1_1690243_1290911426_0
11/27/2010 8:31:21 PM Milkyway@home [error] Process creation failed:
11/27/2010 8:31:21 PM Milkyway@home [error] Process creation failed:
11/27/2010 8:31:21 PM Milkyway@home [error] Process creation failed:
11/27/2010 8:31:21 PM Milkyway@home [error] Process creation failed:
11/27/2010 8:31:22 PM Milkyway@home [error] Process creation failed:
11/27/2010 8:31:23 PM Milkyway@home Computation for task de_separation_14_3s_fix_1_1690243_1290911426_0 finished


____________

Profile Volodymyr Shcherbyna
Send message
Joined: 28 Apr 10
Posts: 16
Credit: 9,668,276
RAC: 0

Message 46084 - Posted: 9 Feb 2011, 10:11:58 UTC - in response to Message 44379.

I got exactly the same error on my Windows 2000 box. For the sake of the test I decided to try older version of BOINC (which is supposed to work in NT and 98 and 2000) and here is what I see in logs:

09/02/2011 11:04:03 OS: Microsoft Windows 2000: Professional x86 Edition, Service Pack 4, (05.00.2195.00)
09/02/2011 11:04:03 Memory: 383.47 MB physical, 921.68 MB virtual
09/02/2011 11:04:03 Disk: 18.63 GB total, 16.32 GB free
09/02/2011 11:04:03 Local time is UTC +1 hours
09/02/2011 11:04:03 No CUDA-capable NVIDIA GPUs found
09/02/2011 11:04:03 No coprocessors
09/02/2011 11:04:03 Not using a proxy
09/02/2011 11:04:03 Version change (6.10.58 -> 6.6.38)
09/02/2011 11:04:03 No general preferences found - using BOINC defaults
09/02/2011 11:04:03 Preferences limit memory usage when active to 191.73MB
09/02/2011 11:04:03 Preferences limit memory usage when idle to 345.12MB
09/02/2011 11:04:03 Preferences limit disk usage to 9.31GB
09/02/2011 11:04:03 This computer is not attached to any projects
09/02/2011 11:04:03 Visit http://boinc.berkeley.edu for instructions
09/02/2011 11:04:03 Running CPU benchmarks
09/02/2011 11:04:03 Suspending computation - running CPU benchmarks
09/02/2011 11:04:14 Fetching configuration file from http://milkyway.cs.rpi.edu/milkyway/get_project_config.php
09/02/2011 11:04:33 Milkyway@home Master file download succeeded
09/02/2011 11:04:34 Benchmark results:
09/02/2011 11:04:34 Number of CPUs: 1
09/02/2011 11:04:34 596 floating point MIPS (Whetstone) per CPU
09/02/2011 11:04:34 1034 integer MIPS (Dhrystone) per CPU
09/02/2011 11:04:39 Milkyway@home Sending scheduler request: Project initialization.
09/02/2011 11:04:39 Milkyway@home Requesting new tasks
09/02/2011 11:04:44 Milkyway@home Scheduler request completed: got 1 new tasks
09/02/2011 11:04:44 Milkyway@home General prefs: from Milkyway@home (last modified 01-May-2010 12:09:17)
09/02/2011 11:04:44 Milkyway@home Host location: none
09/02/2011 11:04:44 Milkyway@home General prefs: using your defaults
09/02/2011 11:04:44 Preferences limit memory usage when active to 268.43MB
09/02/2011 11:04:44 Preferences limit memory usage when idle to 345.12MB
09/02/2011 11:04:44 Preferences limit disk usage to 9.31GB
09/02/2011 11:04:46 Milkyway@home Started download of milkyway_0.50_windows_intelx86.exe
09/02/2011 11:04:46 Milkyway@home Started download of parameters-82-1s-fix.txt
09/02/2011 11:04:47 Milkyway@home Finished download of parameters-82-1s-fix.txt
09/02/2011 11:04:47 Milkyway@home Started download of stars-82-new.txt
09/02/2011 11:04:49 Milkyway@home Finished download of milkyway_0.50_windows_intelx86.exe
09/02/2011 11:04:49 Milkyway@home Started download of de_separation_82_1s_fix_1_60710_1297245785_search_parameters
09/02/2011 11:04:50 Milkyway@home Finished download of de_separation_82_1s_fix_1_60710_1297245785_search_parameters
09/02/2011 11:05:22 Milkyway@home Finished download of stars-82-new.txt
09/02/2011 11:05:23 Milkyway@home Starting de_separation_82_1s_fix_1_60710_1297245785_0
09/02/2011 11:05:23 Milkyway@home [error] Process creation failed:
09/02/2011 11:05:23 Milkyway@home [error] Process creation failed:
09/02/2011 11:05:24 Milkyway@home [error] Process creation failed:
09/02/2011 11:05:24 Milkyway@home [error] Process creation failed:
09/02/2011 11:05:25 Milkyway@home [error] Process creation failed:
09/02/2011 11:05:26 Milkyway@home Computation for task de_separation_82_1s_fix_1_60710_1297245785_0 finished
09/02/2011 11:05:37 Milkyway@home task de_separation_82_1s_fix_1_60710_1297245785_0 suspended by user

Is there any workaround for this?

Thanks

Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0

Message 46097 - Posted: 9 Feb 2011, 19:10:34 UTC - in response to Message 46084.


09/02/2011 11:05:23 Milkyway@home Starting de_separation_82_1s_fix_1_60710_1297245785_0
09/02/2011 11:05:23 Milkyway@home [error] Process creation failed:
09/02/2011 11:05:23 Milkyway@home [error] Process creation failed:
09/02/2011 11:05:24 Milkyway@home [error] Process creation failed:
09/02/2011 11:05:24 Milkyway@home [error] Process creation failed:
09/02/2011 11:05:25 Milkyway@home [error] Process creation failed:
09/02/2011 11:05:26 Milkyway@home Computation for task de_separation_82_1s_fix_1_60710_1297245785_0 finished
09/02/2011 11:05:37 Milkyway@home task de_separation_82_1s_fix_1_60710_1297245785_0 suspended by user

Is there any workaround for this?

Thanks
This MIGHT be because I think I built everything targeting Windows XP.

Profile nenym
Send message
Joined: 16 Jan 09
Posts: 5
Credit: 102,424,707
RAC: 162

Message 46111 - Posted: 9 Feb 2011, 21:43:13 UTC
Last modified: 9 Feb 2011, 21:43:37 UTC

1 · 2 · 3 · Next
Post to thread

Message boards : Number crunching : Computation errors?


Main page · Your account · Message boards


Copyright © 2018 AstroInformatics Group