Message boards :
Number crunching :
Very Strange Time To Competion
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Aug 08 Posts: 18 Credit: 56,863,533 RAC: 0 |
This workunit - http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=5847786 - shows 307 seconds of CPU time, however, the client records nearly 25 times that amount (verified, since these ultra-long running WU's tend to p*ss me off royally). Any ideas...? I get maybe one in ten like this on my quad-core, but nothing like this on any of the other boxes. Thx XB |
Send message Joined: 16 Jan 08 Posts: 18 Credit: 4,111,257 RAC: 0 |
Do you have "leave applications in memory while suspended" checked. If not and you run another project it can effect the time displayed in the client. m4rtyn ******************************* ******************************* |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
Do you have "leave applications in memory while suspended" checked. If not and you run another project it can effect the time displayed in the client. Yep, it appears it restarted from a checkpoint. Otherwise the first lines with the CPU info in the task details wouldn't appear twice: Running Milkyway@home version 0.19 by Gipsel |
Send message Joined: 9 Aug 08 Posts: 18 Credit: 56,863,533 RAC: 0 |
Did not even think about that one... DOH! Will see what happens now that the change has been made. Much obliged! XB |
Send message Joined: 9 Aug 08 Posts: 18 Credit: 56,863,533 RAC: 0 |
Made the changes, and still am getting excessive times for the run: http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=6110598 ran > 3.5 hours, and got 8.18 credits on a Q6600 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=6002848 ran > 5.5 hours, and got 12.35 credits on a Q6600 Plus three more currently running, all three of which will be over 4+ hours, all on the same Q6600. Strangely, most other WU's run in around 12-15 minutes for 21/22/23, and 6-10 minutes for 79/82/86. Travis (or anyone else, for that matter) - any ideas? What makes some WU's run extraordinarily long, and others right in line with expectations? Thanks XB |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
Made the changes, and still am getting excessive times for the run: Something is really bogging down your computer. Just looking at the WUs task details <stderr_txt> reveals it lasts already 79 seconds before the actual calculation of the WU starts. The time behind the clock frequency (79033ms in this case) shows how long it took to do the initialization of the application and to read the input parameter files. On a Q6600 it shouldn't take more than half a second. So something really slows down your PC during those work units. |
Send message Joined: 9 Aug 08 Posts: 18 Credit: 56,863,533 RAC: 0 |
Well, I'm running (on this box) equal shares for ABC, Einstein, Cosmology, plus a reduced share for Climate Prediction. Any possibility there are conflicts between projects...? Strangely, if I'm actually sitting at the machine, and I see a task taking longer than it should, and I suspend the task for a few minutes, then restart it, everything runs just the way it should. But since I'm not constantly sitting at this particular machine all day, it's tough to monitor it in that fashion. Should I get more than one (such as today, when FOUR WU's were all running excessively at the same time), it pretty much negates any real progress on this (or any other) project. I'm open to suggestions - when I posted a similar thread a couple months back, it was suggested that the CPU was running too hot, so I upgraded the cooler, and now the cores run between 55-62 C. Hot, but within spec limits. So, it has to be something else that I have not considered. XB |
Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 |
I don't think anyone else asked these questions, if they did forgive me ... when the tasks are running long, have you tried to find out what the CPU is doing with either the MB monitoring tool or CPU-z? What I am getting at, could the MB be stepping down the clock rate because it thinks you are idle? Have you looked at the CPU settings in the BIOS to see if it is set to green or performance? |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
when the tasks are running long, have you tried to find out what the CPU is doing with either the MB monitoring tool or CPU-z? What I am getting at, could the MB be stepping down the clock rate because it thinks you are idle? Or maybe even easier, he could just have a look in the task manager (be sure to enable the display of all tasks). |
Send message Joined: 9 Aug 08 Posts: 18 Credit: 56,863,533 RAC: 0 |
Task Manager reports that all tasks are running a consistent 24-25% each, so there are no other tasks hogging CPU cycles in the background. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
Task Manager reports that all tasks are running a consistent 24-25% each, so there are no other tasks hogging CPU cycles in the background. That can't be true for the slow tasks according to the output of them. CPU time 20302.56 One and a half minute just for reading the input files really indicates your computer has a serious problem. Also the difference between the CPU and the wall clock time clearly shows there is some background activity going on. If it not some kind of downclocking / power saving feature, I would really start to worry. |
Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 |
Task Manager reports that all tasks are running a consistent 24-25% each, so there are no other tasks hogging CPU cycles in the background. The reason I suggested the look at the clock speed is that some systems will "stretch" the clock slowing the system ... so you will be showing 24% CPU, but the CPU is walking with a broken leg. When You fiddle with the system it may speed up and thus you see more normal operation ... then the system decides to slow and tasks take long times to complete. This is the only thing I can think of that would explain the symptom if you are only showing a normal task load and they are running at the right percentage of load ... {edit} A sick hard drive? What drive activity do you see? |
Send message Joined: 13 Feb 08 Posts: 1124 Credit: 46,740 RAC: 0 |
One and a half minute just for reading the input files really indicates your computer has a serious problem. Also the difference between the CPU and the wall clock time clearly shows there is some background activity going on. He's doing Einstein and CPDN which both do a serious amount of disk activity when started. Does this problem occur when starting BOINC, or when BOINC's been up for a fair while and just starting a new MW? |
Send message Joined: 9 Aug 08 Posts: 18 Credit: 56,863,533 RAC: 0 |
As a test, I suspended all other projects, moved Boinc Projects to a different physical disk, then restarted BOINC. So, the only projects in memory are one GPU Grid task (CUDA) and four MW tasks. As each task completes, I monitor the start-up progress of the new task. If it looks like it will complete in a reasonable time period, I let it run to completion. If it appears it will run for more than 45 minutes, I kill it. Boincmgr.exe is, however, eating up around 4% of the CPU, which is also strange, since it should be around 1% or less. There might still be something throttling the CPU (which I concur, can be seem by the larger differentials between the CPU time and the clock time on completed tasks), but for the life of me, I cannot find anything in either the BIOS or in the running tasks that would do so. I also disable many of the Windows services that might otherwise gobble up resources. However, none of this would explain the large CPU time indicated on the post earlier. Yes, the differential is large, but the basis is huge as well. Puzzling. XB |
Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 |
The last thing I can think to try ... desperation time ... Save your BOINC folders off to another system or CD ... do a fresh install of the OS ... In my case I go so far as to drop the partition table and make a new one just to try to ensure I kill off lingering stuff ... |
Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0 |
|
Send message Joined: 2 Jan 08 Posts: 123 Credit: 69,522,293 RAC: 1,631 |
I made mention of the same behavior in the "unusual time to completion" thread See here I am not running anything other than Milkyway but occasional work units and some together will just start running longer than they should. So far I have only noticed this on my P4 253 machine (curiously the only Intel I have). It is still doing it on any work unit type. |
Send message Joined: 8 Dec 07 Posts: 60 Credit: 67,028,931 RAC: 0 |
I'm noticing some really long wu's on my older machines but not many. I posted the details of one here: http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=642&nowrap=true#12070 but it seems like it maybe just a random super long wu |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
I made mention of the same behavior in the "unusual time to completion" thread See here As said in the thread you linked, there is a general problem with some of the ps WUs (for the particle swarm search method). They experience an underflow condition and take quite a bit longer (up to twice as long on a K10 or Core2, happens also with the stock app, no increase for the GPU app) to finish. Maybe the P4 isn't as efficient in handling the situation. But it could also be an unrelated problem (like for XB-STX, the increases he sees can't be caused by the slow WUs). But in this case it apparently happens only on a small set of computers. Therefore I would conclude it is a problem with these computers and not with the MW apps. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I'm noticing some really long wu's on my older machines but not many. I posted the details of one here: The WUs aren't doing anything nondeterministic, so they all do the same amount of work (for a particular stripe). If there are some out there that are taking 2x as long or so, this might be a problem with the alpha == delta == 1 optimization, some older machines might not be able to do the comparison as effectively. Other than that I honestly don't see what would make a WU take twice as long, unless there's something function going on with the processor or other running processes on that machine. I think I'm going to update the code to do an initial check to see if alpha == delta == 1 with some low tolerance instead and see if this fixed the problem. |
©2024 Astroinformatics Group