Welcome to MilkyWay@home

Posts by jdzukley

21) Message boards : News : N-Body 1.18 (Message 58855)
Posted 14 Jun 2013 by jdzukley
Post:
My observations, it appears to me that the more CPU cores you have the more "ok" it is to overcommit. For reference, I have 12 cores, and the MT tasks can run 2 GPU tasks quite well with total machine CPU loads at +/- 85% total CPU percent used. The scheduler does allow GPU tasks to load / execute while MT tasks are running, with the apparent exception of when a High Priority MT task changes to Normal priority. WHEN THIS HAPPENS, then the SCHEDULER stops the MT tasks puts the task in "wait" status and will not allow any other CPU based tasks to execute until ALL other work - INCLUDIONG ALL GPU tasks are complete, Including GPU tasks in QUE, which can mean if Boinc scheduler is obtaining more GPU tasks, CPU task will never commence. The Scheduler will not allow other any CPU work to start until the MT task is complete. Remember it is in "wait" state, waiting for 100% CPU availability. Essentially all 12 cores are available and only are supporting GPU work.

Bottom line, my opinion is that the more cores you have, it is GOOD to allow overcommitting. This allows cores that are underperforming because they have to wait to contribute to GPU activity. I also agree that the Sum GPU CPU core requirements needs to be <= 1 CPU core OR perhaps <= .1 * # of cores (.1*12 cores = 1.2 provided that no GPU task requires a 1, a whole core).

I would be glad to record or snapshot total CPU use on this computer with 12 cores. Please advise...

There is a number of issues here, the most serious being that the MT task when it changes status from High Priority to normal, then goes into "Wait" and holds all remaining CPU tasks hostage.
22) Message boards : News : N-Body 1.18 (Message 58772)
Posted 12 Jun 2013 by jdzukley
Post:
I am moving on to other projects as I received 3 different MT tasks tonight which halted all work on my 12 core CPU when the MT task reached 9x% complete. I suspend all GPU tasks to allow the MT task to complete and then released all GPU tasks. Boinc returned to normal operations at this time including downloading more tasks. All worked well for many MT tasks cycles. the suspended MT task in every case over the last few days always had hundreds of estimated hours to go... In all cases, task work continued without incident concerning GPU work. I'll look forward to when this condition becomes fixed, and I will be back for more.
23) Message boards : News : N-Body 1.18 (Message 58717)
Posted 11 Jun 2013 by jdzukley
Post:
If you have a Graphics card, you must manually shut down "suspend" all graphics jobs and wait for a few moments for the mt task to complete. As soon as the mt task completes restart all graphics jobs. Note you must hold ALL graphic card jobs, not just the current jobs running. The error was noted below and turned in as a problem. for the few MT tasks that did this, they all had something OTHER than dark or nodark in the task name, and all had estimated run times in the thousands (xxxx. hours) and all arrived at 98% after about 1 hour run time.
24) Message boards : News : N-Body 1.18 (Message 58696)
Posted 11 Jun 2013 by jdzukley
Post:
FYI, I just noticed that during the last 48 hours, I have 5 mt dark tasks long runs that have error out after reaching 100%... When looking at the work unit details indicate that they are erring for others too with "Too many errors (may have bug)" message...
25) Message boards : News : N-Body 1.18 (Message 58689)
Posted 11 Jun 2013 by jdzukley
Post:
posted twice, need to eliminate this one, do not see the delete button, might need new eye glasses or reading lessons...
26) Message boards : News : N-Body 1.18 (Message 58688)
Posted 11 Jun 2013 by jdzukley
Post:
yes, and but on my computer for mt dark tasks estimated at less then 10 minutes, 6 of 12 CPU's are parked for the entire duration of the task. Also, look at run time verses CPU time +/- equal in the results file. Why does the above group of tasks always have this condition! Bottom line the above referenced group of tasks are executing very ineffectively, perhaps with correct results, and are reserving 1100% more resources meaning only 1 CPU is required, and 12 CPUs are reserved for the entire run time!

also note that actual run time is most often 5 to 8 times (*) > original estimated run time. In other words, if MT was really working on the above group, the original estimate is ok.
27) Message boards : News : N-Body 1.18 (Message 58610)
Posted 10 Jun 2013 by jdzukley
Post:
morning here, I checked my tasks from last night, some 100 mt tasks ran since 10Jun UTC, and find about 20% of them have run time = +/- CPU time. All of these tasks that I checked are mt dark.
28) Message boards : News : N-Body 1.18 (Message 58605)
Posted 9 Jun 2013 by jdzukley
Post:
More observations... I have yet to see one of these x,xxx hour jobs actually go over 1 hour actual run time. And But, a strange thing often does happen at 98% complete... I also have 2 NVidia GPU cards running tasks for Siti astropulse. At 98% the nbody task stops running and want 100% of my cpu's which includes the .2 CPU's of the Siti astorpulse jobs NVidia jobs running. I have to manually stop - suspend all NVidia tasks on the computer until the nbody mt task finishes which is actuall less then 2 minutes actual time (etimated at xxx.xx hours).

Bottom line, these tasks run, but take monitoring & active involement action by the host operator otherwise the computer essential goes into a stall mode...
29) Message boards : News : N-Body 1.18 (Message 58599)
Posted 9 Jun 2013 by jdzukley
Post:
I have checked and am certain all jobs were marked as dark and mt that I was referencing. Furthermore, the second dark mt job just started that did fully utilize all 12 processors. However it's estimated run time is +30 minutes. All of the other dark nt jobs that did not fully deploy all processors were from 1 to 10 minutes.

Furthermore, I did check my tasks out, and found most of short dark mt jobs have CPU time +/- equal to run time seconds. I would have expected to see CPU seconds = to something like 10 * run time.

Just to continue the same thread... the long dark task above just finished. a short dark just started, 6 of 12 CPU's are "parked", then other 6 are deployed at low run rates.

The short dark mt finished, a 1+ hour dark mt task just started, and is functioning as expected. I would recommend the checks need to focus on short dark jobs, as perhaps not all the different multi threads need to be engaged.

Suggestion, have the system admin run a query on the database to return work units where: tasks types contain mt, and CPU time < runtime * 1.2
30) Message boards : News : N-Body 1.18 (Message 58596)
Posted 9 Jun 2013 by jdzukley
Post:
Continued Observations: So far AS I HAVE OBSERVED only 1 dark mt job has utilized all 12 cores. All of the short jobs - estimated at less than 10 minutes have all have many cores "parked". The one dark mt job that used all cores had an estimated time in the 0'000 hours, and took say 45 minutes to run...
31) Message boards : News : N-Body 1.18 (Message 58595)
Posted 9 Jun 2013 by jdzukley
Post:
Continued Observations: So far AS I HAVE OBSERVED only 1 dark mt job has utilized all 12 cores. All of the short jobs - estimated at less than 10 minutes have all have many cores "parked". The one dark mt job that used all cores had an estimated time in the 0'000 hours, and took say 45 minutes to run...
32) Message boards : News : N-Body 1.18 (Message 58593)
Posted 9 Jun 2013 by jdzukley
Post:
FYI, Reference mt & nbody. I have a 12 core computer and the nodark uses all 12 cores just fine. However, I have yet to observe the mt dark tasks utilize more than 1 equivalent core even though the task has taken control of all 12 cores. I am basing these comments viewing the resource monitor. Dark never gets above 10% CPU utilized and this has been after view many tasks. Many of the "dark" cpu cores are marked "parked" on the resource monitor and not utilized.

Time to eat my words somewhat, finally got a dark that is utilizing all cores...


Previous 20

©2024 Astroinformatics Group