Welcome to MilkyWay@home

Computation Error (de_modfit_09_3s_testwrap_2_1372784655_3079055)

Message boards : Number crunching : Computation Error (de_modfit_09_3s_testwrap_2_1372784655_3079055)
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile BDDave
Avatar

Send message
Joined: 21 May 10
Posts: 19
Credit: 100,867,126
RAC: 0
Message 59426 - Posted: 23 Jul 2013, 4:37:36 UTC
Last modified: 23 Jul 2013, 4:45:25 UTC

Hi All,

I've gotton around 16 "error task" and 41 "Validation inconclusive task" this month. Any suggestions?

Work Unit error example: de_modfit_09_3s_testwrap_2_1372784655_3079055.
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=401985372

I did see another post from earlier in the month as well. I'll post what I have.


Message Log:

7/20/2013 10:44:42 PM | | No config file found - using defaults
7/20/2013 10:44:42 PM | | Starting BOINC client version 7.0.64 for windows_x86_64
7/20/2013 10:44:42 PM | | log flags: file_xfer, sched_ops, task
7/20/2013 10:44:42 PM | | Libraries: libcurl/7.25.0 OpenSSL/1.0.1 zlib/1.2.6
7/20/2013 10:44:42 PM | | Data directory: C:\ProgramData\BOINC
7/20/2013 10:44:42 PM | | Running under account
7/20/2013 10:44:42 PM | | Processor: 4 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q8300 @ 2.50GHz [Family 6 Model 23 Stepping 10]
7/20/2013 10:44:42 PM | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 syscall nx lm vmx tm2 pbe
7/20/2013 10:44:42 PM | | OS: Microsoft Windows 7: Home Premium x64 Edition, Service Pack 1, (06.01.7601.00)
7/20/2013 10:44:42 PM | | Memory: 4.00 GB physical, 8.00 GB virtual
7/20/2013 10:44:42 PM | | Disk: 139.73 GB total, 84.42 GB free
7/20/2013 10:44:42 PM | | Local time is UTC -7 hours
7/20/2013 10:44:42 PM | | CUDA: NVIDIA GPU 0: GeForce GTX 470 (driver version 314.22, CUDA version 5.0, compute capability 2.0, 1280MB, 1147MB available, 1089 GFLOPS peak)
7/20/2013 10:44:42 PM | | OpenCL: NVIDIA GPU 0: GeForce GTX 470 (driver version 314.22, device version OpenCL 1.1 CUDA, 1280MB, 1147MB available, 1089 GFLOPS peak)
7/20/2013 10:44:42 PM | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 220768; resource share 100
7/20/2013 10:44:42 PM | | Reading preferences override file
7/20/2013 10:44:42 PM | | Preferences:
7/20/2013 10:44:42 PM | | max memory usage when active: 2457.07MB
7/20/2013 10:44:42 PM | | max memory usage when idle: 3685.61MB
7/20/2013 10:44:42 PM | | max disk usage: 10.00GB
7/20/2013 10:44:42 PM | | suspend work if non-BOINC CPU load exceeds 25 %
7/20/2013 10:44:42 PM | | max download rate: 499999 bytes/sec
7/20/2013 10:44:42 PM | | max upload rate: 499999 bytes/sec
7/20/2013 10 :42 PM | | (to change preferences, visit a project web site or select Preferences in the Manager)
7/20/2013 10:44:42 PM | | Resetting file projects/www.gpugrid.net/logogpugrid.png: md5 checksum failed for file
7/20/2013 10:44:42 PM | | Not using a proxy
7/20/2013 10:44:46 PM | Milkyway@Home | Restarting task ps_nbody_07_11_dark_1372784655_473366_3 using milkyway_nbody version 132 (mt) in slot 7


Also, here is the "Error Task" link: http://milkyway.cs.rpi.edu/milkyway/results.php?userid=105340&offset=0&show_names=0&state=6&appid=

and "Validation inconclusive tasks" link: http://milkyway.cs.rpi.edu/milkyway/results.php?userid=105340&offset=0&show_names=0&state=3&appid=


Thanks!
Keep Cruchin'
BDDave
ID: 59426 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 59428 - Posted: 23 Jul 2013, 12:01:22 UTC - in response to Message 59426.  

Well...

All the errors except for one were on nBody tasks. There has been some issues with the new app for these, most of which have been taken care at this point. However, there appears to be at least one which is resisting being eradicated. I'm starting to wonder if it might be a bug in OpenMP, since it gets both Windows and Linux hosts (maybe Mac too, but you don't see as many of them). In any event, yours wasn't the only one to crap out on the tasks in question, so I wouldn't worry about those.

The regular MW Sep task which failed looked to be a GPU memory allocation failure. Since there was only one of them, I wouldn't worry too much about it either. Unfortunately, my experience is nVidia drivers can be a little cranky at times and I've had failures like that from time to time for no apparent reason I could determine. I suspect it's some kind of timing issue going on when one tasks ends and another is beginning, at the same time the OS says "Oh gee, I can get some time on the GPU at last!". Needless to say, something like that is going to be pretty tough to nail down, unless you have set up a pretty specialized test to trap on and catch it.

As far as the Inconclusives go, it just means the output of the task was outside the range the validator was expecting, so it just sends it out again to verify the result. As long as you aren't the odd man out when the WU validates it's normal and nothing to be concerned over.

HTH
ID: 59428 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile BDDave
Avatar

Send message
Joined: 21 May 10
Posts: 19
Credit: 100,867,126
RAC: 0
Message 59475 - Posted: 29 Jul 2013, 1:49:45 UTC - in response to Message 59428.  

Hi Alinator,

I figured that we should expect a certain percentage of error on an experiment that we work with. From time to time as I turn off BOINC, it stays active anyway and I have to kill the application in Windows task manager. Also, it some times crashes my NVIDIA drivers or I crash them. I'm sure that could account for one or two a month. Thank you for the abundance of knowledge!



Keep Crunchin'

BDDave
ID: 59475 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Computation Error (de_modfit_09_3s_testwrap_2_1372784655_3079055)

©2024 Astroinformatics Group