Welcome to MilkyWay@home

Marked as Invalid?

Message boards : Number crunching : Marked as Invalid?
Message board moderation

To post messages, you must log in.

AuthorMessage
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 37341 - Posted: 14 Mar 2010, 10:10:24 UTC

In the past few hours, it seems I have quite a number of WU's marked invalid. What does "marked as invalid" mean? All of the 'invalid' ones show version 19 MW (non-GPU) being run. Yet the task names look very similar to GPU task names.

There's also one GPU WU uploaded a week ago that still shows as pending. Any ideas?

de_13_3s_free_5_882287_1268553142_0 74583672 14 Mar 2010 7:53:52 UTC 14 Mar 2010 9:24:53 UTC Completed, marked as invalid 7.22 5.08 0.03 0.00 MilkyWay@Home v0.19  
de_14_3s_free_5_873342_1268551982_0 74574727 14 Mar 2010 7:34:38 UTC 14 Mar 2010 7:53:52 UTC Completed, marked as invalid 7.03 4.31 0.02 0.00 MilkyWay@Home v0.19  
de_12_3s_free_5_865100_1268550960_1 74566485 14 Mar 2010 7:28:48 UTC 14 Mar 2010 7:53:52 UTC Completed, marked as invalid 5.19 4.91 0.03 0.00 MilkyWay@Home v0.19  
de_13_3s_const_1_866713_1268551149_0 74568098 14 Mar 2010 7:20:29 UTC 14 Mar 2010 7:53:52 UTC Completed, marked as invalid 6.05 6.28 0.03 0.00 MilkyWay@Home v0.19  
de_14_3s_free_5_865385_1268550966_0 74566770 14 Mar 2010 7:17:47 UTC 14 Mar 2010 7:28:48 UTC Completed, marked as invalid 5.03 4.44 0.02 0.00 MilkyWay@Home v0.19  
de_14_3s_free_5_863897_1268550774_0 74565282 14 Mar 2010 7:14:57 UTC 14 Mar 2010 7:20:29 UTC Completed, marked as invalid 7.03 4.20 0.02 0.00 MilkyWay@Home v0.19  
de_14_3s_free_5_855846_1268549737_0 74557231 14 Mar 2010 6:57:22 UTC 14 Mar 2010 7:14:57 UTC Completed, marked as invalid 7.03 4.30 0.02 0.00 MilkyWay@Home v0.19  
de_11_3s_free_5_850061_1268548992_0 74551446 14 Mar 2010 6:44:40 UTC 14 Mar 2010 6:48:30 UTC Completed, marked as invalid 4.05 4.16 0.02 0.00 MilkyWay@Home v0.19  
de_11_3s_free_5_846534_1268548536_0 74547919 14 Mar 2010 6:36:39 UTC 14 Mar 2010 6:44:40 UTC Completed, marked as invalid 7.03 4.13 0.02 0.00 MilkyWay@Home v0.19  
de_13_3s_free_5_817792_1268544655_0 74519177 14 Mar 2010 5:32:32 UTC 14 Mar 2010 6:36:39 UTC Completed, marked as invalid 7.03 4.97 0.03 0.00 MilkyWay@Home v0.19  
de_13_3s_free_5_804775_1268542908_0 74506160 14 Mar 2010 5:03:14 UTC 14 Mar 2010 5:17:21 UTC Completed, marked as invalid 8.05 5.05 0.03 0.00 MilkyWay@Home v0.19  
de_13_3s_free_5_793319_1268541399_0 74494704 14 Mar 2010 4:38:14 UTC 14 Mar 2010 5:03:14 UTC Completed, marked as invalid 6.17 5.14 0.03 0.00 MilkyWay@Home v0.19  
de_12_3s_free_5_781617_1268539879_0 74483002 14 Mar 2010 4:12:34 UTC 14 Mar 2010 4:38:14 UTC Completed, marked as invalid 7.39 5.16 0.03 0.00 MilkyWay@Home v0.19  
ps_s222_pes_2_v04_3915991_1267951891_0 67695882 7 Mar 2010 8:52:34 UTC 7 Mar 2010 9:43:24 UTC Completed, waiting for validation 170.14 166.30 0.93 pending MilkyWay@Home v0.21 (ati13ati) 


Picking one of the invalid ones at random, it claims it ran for only 5 seconds. "stderr out" shows:

<core_client_version>6.10.18</core_client_version>
<![CDATA[
<stderr_txt>
Unrecognized XML in parse_init_data_file: hostid
Skipping: 124294
Skipping: /hostid
Unrecognized XML in parse_init_data_file: starting_elapsed_time
Skipping: 0.000000
Skipping: /starting_elapsed_time
Unrecognized XML in parse_init_data_file: computation_deadline
Skipping: 1269230688.000000
Skipping: /computation_deadline
Unrecognized XML in GLOBAL_PREFS::parse_override: mod_time
Skipping: /mod_time
Unrecognized XML in GLOBAL_PREFS::parse_override: run_gpu_if_user_active
Skipping: 1
Skipping: /run_gpu_if_user_active
Unrecognized XML in GLOBAL_PREFS::parse_override: max_ncpus_pct
Skipping: 100.000000
Skipping: /max_ncpus_pct

</stderr_txt>
]]>
ID: 37341 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile kashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,490,184
RAC: 0
Message 37350 - Posted: 14 Mar 2010, 16:29:41 UTC - in response to Message 37341.  
Last modified: 14 Mar 2010, 16:46:01 UTC

They are "marked as invalid" because they have completed in 5 seconds which is an impossibly short time for a CPU to process a task. I don't know why your CPU is not processing the tasks correctly. If I had to guess I would say that your computer does not have enough free memory to process current MilkyWay tasks on the CPU. User Purple Rabbit was experiencing the same issue in another thread. GPU and CPU tasks are the same, so have the same name. Your computer ID 124294 does not have a double precision capable GPU suitable for MilkyWay processing, so the tasks are trying to process on the CPU, that is why they are marked as MilkyWay version 19.

As to the pending task from a week ago on one of your other computers, this happens sometimes when there is a server malfunction. I have 28 tasks myself still pending from a few days ago. My ATI GPU is very efficient on MilkyWay processing, so I do not concern myself with minor glitches such as a few pending that may not be granted credit.
ID: 37350 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 37362 - Posted: 15 Mar 2010, 0:58:50 UTC - in response to Message 37350.  

Thanks for the response. I wasn't aware that the same WU could execute on either a GPU or CPU. (That particular machine has an HD5770 in it so can't execute MW on the GPU.)

I happened to catch yet another that was marked invalid later in the day. There is nothing in the message log to indicate it ran into a memory problem. You would think something that aborts the WU, like a memory allocation failure, would be flagged somewhere. (That particular machine is showing 270MB free RAM at the moment.)

I'll try dropping the number of tasks running to see if that improves matters.

14-Mar-2010 20:19:05 	Milkyway@home	Started download of de_13_3s_const_1_search_parameters_1351184_1268612254
14-Mar-2010 20:19:06 	Milkyway@home	Finished download of de_13_3s_const_1_search_parameters_1351184_1268612254
14-Mar-2010 20:19:19 	Milkyway@home	Starting de_13_3s_const_1_1351184_1268612254_0
14-Mar-2010 20:19:20 	Milkyway@home	Starting task de_13_3s_const_1_1351184_1268612254_0 using milkyway version 19
14-Mar-2010 20:19:29 	Milkyway@home	Computation for task de_13_3s_const_1_1351184_1268612254_0 finished
14-Mar-2010 20:19:31 	Milkyway@home	Started upload of de_13_3s_const_1_1351184_1268612254_0_0
14-Mar-2010 20:19:32 	Milkyway@home	Finished upload of de_13_3s_const_1_1351184_1268612254_0_0
ID: 37362 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Gavin Shaw
Avatar

Send message
Joined: 16 Jan 08
Posts: 98
Credit: 1,371,299
RAC: 0
Message 37364 - Posted: 15 Mar 2010, 1:30:03 UTC

I think MW units running on a CPU need around 10 - 15MB of RAM per unit in memory. At least that is the case on my machines. Machine 124294 has 16 CPU's so you need 16 x 10 - 15MB, which is 160 - 240MB minimum.

I also notice all of your failed units are part of the de_11_x, d_12_x, d_13_x or d_14_x runs. There is an issue with these units not correctly having their progress bar updated in the Boinc manager, but other than that they seem to process fine.

I assume you are running the standard MW app and not an optimised app (this is based on the output of your tasks)?

Never surrender and never give up. In the darkest hour there is always hope.

ID: 37364 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 37365 - Posted: 15 Mar 2010, 1:42:40 UTC - in response to Message 37364.  

I was sitting here minutes ago when one of the DE_12* 7-second wonders blew through. Performance Monitor was showing that physical RAM available was never less than 600MB during that time. So I am at a loss.
ID: 37365 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 37366 - Posted: 15 Mar 2010, 1:46:45 UTC - in response to Message 37364.  
Last modified: 15 Mar 2010, 1:50:57 UTC

I think MW units running on a CPU need around 10 - 15MB of RAM per unit in memory.
FYI, the two running as I type this have peak memory usage of 7,648KB (each).

I assume you are running the standard MW app and not an optimised app (this is based on the output of your tasks)?
You assume correctly.
ID: 37366 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile kashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,490,184
RAC: 0
Message 37368 - Posted: 15 Mar 2010, 3:15:35 UTC

Well the amount of memory was only a guess. People have had problems with GPU processing before when running Win XP or 7 with a smaller amount of memory and I just thought that a similar thing may affect CPU processing seeing as these longer tasks would require more resources.

There was also an issue a while ago when a new longer series of tasks was introduced affecting the CUDA application and Windows XP where the memory was not getting allocated properly.

Perhaps the overflow of the 32-bit integer that causes the progress bar to malfunction is also somehow causing the memory allocation to fail. I wouldn't know if this only affects 32-bit operating systems, it is beyond my area of knowledge so it is just another guess.
ID: 37368 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 37374 - Posted: 15 Mar 2010, 8:04:56 UTC - in response to Message 37350.  
Last modified: 15 Mar 2010, 8:07:16 UTC

There is more to this than meets the eye. I've checked the message log since the last reboot of that machine. There is not a single WU downloaded for the CPU version of MW that did NOT complete in less than 20sec.

Apparently all of the WU's returned for version 19 are being rejected as 'invalid'. At least all of them that I can still see under View Tasks.

What's going on?
ID: 37374 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 37376 - Posted: 15 Mar 2010, 13:13:02 UTC - in response to Message 37374.  
Last modified: 15 Mar 2010, 13:13:59 UTC

Yet more info. There are two DE_S222-series WU's that have apparently been running successfully for hours. All of the ones that I could see from the log and View Tasks that are 'invalid' are the new ones from DE_11 through DE_14.
ID: 37376 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cannibal Corpse
Avatar

Send message
Joined: 21 Mar 09
Posts: 25
Credit: 11,410,869
RAC: 0
Message 37390 - Posted: 15 Mar 2010, 22:05:58 UTC

Hello all..I am having the same problem with 3 to 5 second WU also invalid, so I found the possible problem with RAM. I still have/had 2 gigs at the ready..during those WU..So I was wondering , I have several S@H WU in transfer mode, cud that be a problem? Running standerd app, on my AMD quad core machine. my page file is rather large 1.18 gigs, usually runs 700-900 mb. Thx
ID: 37390 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 108
Credit: 430,760,953
RAC: 0
Message 37449 - Posted: 17 Mar 2010, 4:42:38 UTC - in response to Message 37390.  
Last modified: 17 Mar 2010, 4:43:47 UTC

I'm not buying that it's a RAM problem. I've seen WU's get trashed with 600MB+ of physical RAM still unused.

I just checked the WU results and the two DE_S222 series successfully completed on the CPU client. The ones that are failing are all the DE_11 through DE_14 series.
ID: 37449 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
LalitM

Send message
Joined: 1 Mar 10
Posts: 1
Credit: 141,064
RAC: 0
Message 37454 - Posted: 17 Mar 2010, 16:07:37 UTC

I'm having the same problem as well. All of the tasks de_11_x, d_12_x, d_13_x or d_14_x are doing this - they finish in 3-4 seconds and are "marked as invalid". The one DE_S222 task is running okay. I don't think it's a RAM issue - task manager shows > 1G free.
ID: 37454 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Marked as Invalid?

©2024 Astroinformatics Group