Welcome to MilkyWay@home

Very Strange Time To Competion

Message boards : Number crunching : Very Strange Time To Competion
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
XB-STX

Send message
Joined: 9 Aug 08
Posts: 18
Credit: 56,863,533
RAC: 0
Message 11699 - Posted: 20 Feb 2009, 0:18:38 UTC

This workunit - http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=5847786 - shows 307 seconds of CPU time, however, the client records nearly 25 times that amount (verified, since these ultra-long running WU's tend to p*ss me off royally).

Any ideas...?

I get maybe one in ten like this on my quad-core, but nothing like this on any of the other boxes.

Thx
XB
ID: 11699 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile m4rtyn
Avatar

Send message
Joined: 16 Jan 08
Posts: 18
Credit: 4,111,257
RAC: 0
Message 11710 - Posted: 20 Feb 2009, 0:41:12 UTC - in response to Message 11699.  

Do you have "leave applications in memory while suspended" checked. If not and you run another project it can effect the time displayed in the client.
m4rtyn
******************************* *******************************

ID: 11710 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 11716 - Posted: 20 Feb 2009, 0:56:46 UTC - in response to Message 11710.  

Do you have "leave applications in memory while suspended" checked. If not and you run another project it can effect the time displayed in the client.

Yep, it appears it restarted from a checkpoint. Otherwise the first lines with the CPU info in the task details wouldn't appear twice:

Running Milkyway@home version 0.19 by Gipsel
CPU: Intel(R) Core(TM)2 Quad CPU @ 2.40GHz (4 cores/threads) 2.40007 GHz (742ms)

Running Milkyway@home version 0.19 by Gipsel
CPU: Intel(R) Core(TM)2 Quad CPU @ 2.40GHz (4 cores/threads) 2.40008 GHz (1346ms)

WU completed. It took 307.797 seconds CPU time and 322.693 seconds wall clock time @ 2.40008 GHz.
ID: 11716 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
XB-STX

Send message
Joined: 9 Aug 08
Posts: 18
Credit: 56,863,533
RAC: 0
Message 11725 - Posted: 20 Feb 2009, 1:36:15 UTC - in response to Message 11716.  

Did not even think about that one... DOH!

Will see what happens now that the change has been made.

Much obliged!
XB
ID: 11725 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
XB-STX

Send message
Joined: 9 Aug 08
Posts: 18
Credit: 56,863,533
RAC: 0
Message 11911 - Posted: 20 Feb 2009, 23:11:32 UTC - in response to Message 11725.  

Made the changes, and still am getting excessive times for the run:

http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=6110598 ran > 3.5 hours, and got 8.18 credits on a Q6600

http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=6002848 ran > 5.5 hours, and got 12.35 credits on a Q6600

Plus three more currently running, all three of which will be over 4+ hours, all on the same Q6600.

Strangely, most other WU's run in around 12-15 minutes for 21/22/23, and 6-10 minutes for 79/82/86.

Travis (or anyone else, for that matter) - any ideas? What makes some WU's run extraordinarily long, and others right in line with expectations?

Thanks
XB
ID: 11911 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 11914 - Posted: 20 Feb 2009, 23:37:43 UTC - in response to Message 11911.  

Made the changes, and still am getting excessive times for the run:

http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=6110598 ran > 3.5 hours, and got 8.18 credits on a Q6600

Something is really bogging down your computer. Just looking at the WUs task details

<stderr_txt>
Running Milkyway@home version 0.19 by Gipsel
CPU: Intel(R) Core(TM)2 Quad CPU @ 2.40GHz (4 cores/threads) 2.40008 GHz (79033ms)

WU completed. It took 12983.9 seconds CPU time and 13124.3 seconds wall clock time @ 2.40008 GHz.

</stderr_txt>

reveals it lasts already 79 seconds before the actual calculation of the WU starts. The time behind the clock frequency (79033ms in this case) shows how long it took to do the initialization of the application and to read the input parameter files. On a Q6600 it shouldn't take more than half a second. So something really slows down your PC during those work units.
ID: 11914 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
XB-STX

Send message
Joined: 9 Aug 08
Posts: 18
Credit: 56,863,533
RAC: 0
Message 11927 - Posted: 21 Feb 2009, 0:02:13 UTC - in response to Message 11914.  

Well, I'm running (on this box) equal shares for ABC, Einstein, Cosmology, plus a reduced share for Climate Prediction. Any possibility there are conflicts between projects...?

Strangely, if I'm actually sitting at the machine, and I see a task taking longer than it should, and I suspend the task for a few minutes, then restart it, everything runs just the way it should. But since I'm not constantly sitting at this particular machine all day, it's tough to monitor it in that fashion. Should I get more than one (such as today, when FOUR WU's were all running excessively at the same time), it pretty much negates any real progress on this (or any other) project.

I'm open to suggestions - when I posted a similar thread a couple months back, it was suggested that the CPU was running too hot, so I upgraded the cooler, and now the cores run between 55-62 C. Hot, but within spec limits. So, it has to be something else that I have not considered.

XB
ID: 11927 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 11929 - Posted: 21 Feb 2009, 0:07:29 UTC

I don't think anyone else asked these questions, if they did forgive me ...

when the tasks are running long, have you tried to find out what the CPU is doing with either the MB monitoring tool or CPU-z? What I am getting at, could the MB be stepping down the clock rate because it thinks you are idle?

Have you looked at the CPU settings in the BIOS to see if it is set to green or performance?
ID: 11929 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 11931 - Posted: 21 Feb 2009, 0:16:36 UTC - in response to Message 11929.  

when the tasks are running long, have you tried to find out what the CPU is doing with either the MB monitoring tool or CPU-z? What I am getting at, could the MB be stepping down the clock rate because it thinks you are idle?

Or maybe even easier, he could just have a look in the task manager (be sure to enable the display of all tasks).
ID: 11931 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
XB-STX

Send message
Joined: 9 Aug 08
Posts: 18
Credit: 56,863,533
RAC: 0
Message 11942 - Posted: 21 Feb 2009, 0:40:44 UTC - in response to Message 11931.  

Task Manager reports that all tasks are running a consistent 24-25% each, so there are no other tasks hogging CPU cycles in the background.
ID: 11942 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 11952 - Posted: 21 Feb 2009, 1:44:43 UTC - in response to Message 11942.  

Task Manager reports that all tasks are running a consistent 24-25% each, so there are no other tasks hogging CPU cycles in the background.

That can't be true for the slow tasks according to the output of them.

CPU time 20302.56
stderr out <core_client_version>6.4.5</core_client_version>
<![CDATA[
<stderr_txt>
Running Milkyway@home version 0.19 by Gipsel
CPU: Intel(R) Core(TM)2 Quad CPU @ 2.40GHz (4 cores/threads) 2.40008 GHz (93592ms)

WU completed. It took 20302.6 seconds CPU time and 32987.4 seconds wall clock time @ 2.40008 GHz.

</stderr_txt>
]]>

One and a half minute just for reading the input files really indicates your computer has a serious problem. Also the difference between the CPU and the wall clock time clearly shows there is some background activity going on.
If it not some kind of downclocking / power saving feature, I would really start to worry.
ID: 11952 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 11954 - Posted: 21 Feb 2009, 1:50:40 UTC - in response to Message 11942.  
Last modified: 21 Feb 2009, 1:53:11 UTC

Task Manager reports that all tasks are running a consistent 24-25% each, so there are no other tasks hogging CPU cycles in the background.


The reason I suggested the look at the clock speed is that some systems will "stretch" the clock slowing the system ... so you will be showing 24% CPU, but the CPU is walking with a broken leg. When You fiddle with the system it may speed up and thus you see more normal operation ... then the system decides to slow and tasks take long times to complete.

This is the only thing I can think of that would explain the symptom if you are only showing a normal task load and they are running at the right percentage of load ...

{edit}

A sick hard drive? What drive activity do you see?
ID: 11954 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Phil
Avatar

Send message
Joined: 13 Feb 08
Posts: 1124
Credit: 46,740
RAC: 0
Message 11981 - Posted: 21 Feb 2009, 3:04:26 UTC - in response to Message 11952.  

One and a half minute just for reading the input files really indicates your computer has a serious problem. Also the difference between the CPU and the wall clock time clearly shows there is some background activity going on.
If it not some kind of downclocking / power saving feature, I would really start to worry.

He's doing Einstein and CPDN which both do a serious amount of disk activity when started.
Does this problem occur when starting BOINC, or when BOINC's been up for a fair while and just starting a new MW?
ID: 11981 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
XB-STX

Send message
Joined: 9 Aug 08
Posts: 18
Credit: 56,863,533
RAC: 0
Message 11990 - Posted: 21 Feb 2009, 3:14:20 UTC - in response to Message 11981.  

As a test, I suspended all other projects, moved Boinc Projects to a different physical disk, then restarted BOINC. So, the only projects in memory are one GPU Grid task (CUDA) and four MW tasks. As each task completes, I monitor the start-up progress of the new task. If it looks like it will complete in a reasonable time period, I let it run to completion. If it appears it will run for more than 45 minutes, I kill it.

Boincmgr.exe is, however, eating up around 4% of the CPU, which is also strange, since it should be around 1% or less.

There might still be something throttling the CPU (which I concur, can be seem by the larger differentials between the CPU time and the clock time on completed tasks), but for the life of me, I cannot find anything in either the BIOS or in the running tasks that would do so. I also disable many of the Windows services that might otherwise gobble up resources.

However, none of this would explain the large CPU time indicated on the post earlier. Yes, the differential is large, but the basis is huge as well.

Puzzling.
XB
ID: 11990 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 12038 - Posted: 21 Feb 2009, 6:02:56 UTC

The last thing I can think to try ... desperation time ...

Save your BOINC folders off to another system or CD ... do a fresh install of the OS ... In my case I go so far as to drop the partition table and make a new one just to try to ensure I kill off lingering stuff ...


ID: 12038 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile GalaxyIce
Avatar

Send message
Joined: 6 Apr 08
Posts: 2018
Credit: 100,142,856
RAC: 0
Message 12061 - Posted: 21 Feb 2009, 11:13:32 UTC
Last modified: 21 Feb 2009, 11:14:54 UTC

I have a Intel Celeron 1.70GHz 512MB RAM WinXP which jumped to over 3 hours per WU after applying optimization SSE2 v.0.19.

I put it back to v.0.16 and it's now taking just over 16 minutes per WU.

v.0.16 can still be found on zslip

ID: 12061 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Conan
Avatar

Send message
Joined: 2 Jan 08
Posts: 123
Credit: 69,522,293
RAC: 1,631
Message 12069 - Posted: 21 Feb 2009, 13:13:03 UTC

I made mention of the same behavior in the "unusual time to completion" thread See here

I am not running anything other than Milkyway but occasional work units and some together will just start running longer than they should.

So far I have only noticed this on my P4 253 machine (curiously the only Intel I have).

It is still doing it on any work unit type.
ID: 12069 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bobgoblin

Send message
Joined: 8 Dec 07
Posts: 60
Credit: 67,028,931
RAC: 0
Message 12072 - Posted: 21 Feb 2009, 13:21:32 UTC

I'm noticing some really long wu's on my older machines but not many. I posted the details of one here:

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=642&nowrap=true#12070

but it seems like it maybe just a random super long wu
ID: 12072 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 12074 - Posted: 21 Feb 2009, 13:28:00 UTC - in response to Message 12069.  

I made mention of the same behavior in the "unusual time to completion" thread See here

I am not running anything other than Milkyway but occasional work units and some together will just start running longer than they should.

So far I have only noticed this on my P4 253 machine (curiously the only Intel I have).

It is still doing it on any work unit type.

As said in the thread you linked, there is a general problem with some of the ps WUs (for the particle swarm search method). They experience an underflow condition and take quite a bit longer (up to twice as long on a K10 or Core2, happens also with the stock app, no increase for the GPU app) to finish. Maybe the P4 isn't as efficient in handling the situation.

But it could also be an unrelated problem (like for XB-STX, the increases he sees can't be caused by the slow WUs). But in this case it apparently happens only on a small set of computers. Therefore I would conclude it is a problem with these computers and not with the MW apps.
ID: 12074 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 12079 - Posted: 21 Feb 2009, 14:26:43 UTC - in response to Message 12072.  
Last modified: 21 Feb 2009, 14:28:31 UTC

I'm noticing some really long wu's on my older machines but not many. I posted the details of one here:

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=642&nowrap=true#12070

but it seems like it maybe just a random super long wu


The WUs aren't doing anything nondeterministic, so they all do the same amount of work (for a particular stripe).

If there are some out there that are taking 2x as long or so, this might be a problem with the alpha == delta == 1 optimization, some older machines might not be able to do the comparison as effectively. Other than that I honestly don't see what would make a WU take twice as long, unless there's something function going on with the processor or other running processes on that machine.

I think I'm going to update the code to do an initial check to see if alpha == delta == 1 with some low tolerance instead and see if this fixed the problem.
ID: 12079 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Very Strange Time To Competion

©2024 Astroinformatics Group