Message boards :
Number crunching :
Very Long WU's
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
That's weird that the estimated runtime was so different from the actual runtime. I'll keep an eye on that. Has anyone else seen that problem? |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
Yes I had several, one I recall said 11 minutes and ended up over 7 hours. From what I recall the highest number had was an estimate of 1 hour something but ran for over 7 Hours. |
Send message Joined: 13 Oct 21 Posts: 44 Credit: 226,923,406 RAC: 2,496 |
That's weird that the estimated runtime was so different from the actual runtime. I'll keep an eye on that. Has anyone else seen that problem? I've seen this with another project, LHC, that tends to have highly variable runtimes from task to task (probably also highly variable estimated computation size) so I didn't think of it as unusual when I saw it here. Just figured that something changed with the science of things that made it difficult to estimate accurately. It could be that BOINC doesn't do well with a lot of variability and keeps trying to find consistency. |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
I'm not sure if the estimation is done by BOINC or by Milkyway. I'll ask around and see if anyone in the group knows. |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
Based on a small sample of WU’s I think estimated elapsed time on N-Body is out by between 2.2 and 2.7. Will keep monitoring over the weekend. WU estimated at 29 Mins took 78. WU estimated 59 Mins took 138 Minutes. This was on an Intel I7 using 4 CPU’s. |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
One more estimated elapsed 9 mins 59 secs, actual elapsed 152 mins 20 secs. Using 8 CPU’s. |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
Another one, estimated elapsed 12 Mins 29 secs. Actual 234 mins 38 seconds. |
Send message Joined: 17 Feb 09 Posts: 24 Credit: 3,501,120 RAC: 840 |
To add to the long w/u examples. Estimated time of 1hr 30 min (approx). Ran for 22hrs 11 mins. Its a 4 core I5 running Fedora 36. On the other hand a w/u on a fresh install of WIN 10 and an intel I5........ estimated run time of 19hrs. Looks like taking 14 hrs or so. Which is the preferred way. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
My three pence: .... Progress ........ Elapsed ......... Remaining ... __ 3.005% _____ 00:22:00 ____ 00:25:15 ___ _ 11:224% _____ 00:22:02 ____ 00:24:25 ___ __ 5.054% _____ 00:22:02 ____ 00:20:55 ___ All three on same PC and same clock time ... I'll report the real "elapsed time" when they are finished ... |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
Now I am starting to get "not started by deadline - canceled"! The reason is because of the very long runtimes - but is shown at the beginning as a "normal" long running task. Frustrating. |
Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,900,464 RAC: 0 |
Completed another one. Estimated elapsed was 20 mins 20 secs. Actual 187 mins 18 secs. What is annoying is the disparity in credits. Previous task got 2732.2 for 65,612 secs CPU. Task just completed used 80,624 secs CPU got 2,430.85 credits. Am winding down and going back to separation tasks. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
My three pence: Here they are: 08:13:52 01:50:29 04:28:01 Sequence as above. Something to think about |
Send message Joined: 13 Oct 21 Posts: 44 Credit: 226,923,406 RAC: 2,496 |
N-Body tasks have a 12 day deadline so even with long run times there shouldn't be any "not started by deadline" errors. The only reasons I can think of that would make one run out of time is a large BOINC queue and not having one's PC run close to 24/7. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
N-Body tasks have a 12 day deadline so even with long run times there shouldn't be any "not started by deadline" errors. The only reasons I can think of that would make one run out of time is a large BOINC queue and not having one's PC run close to 24/7. Well, the N-Body queue is, as always for months, constantly 100 large. The PC is running 24/7 - but I did stop it for about 8 hours of maintenance. The 12 day deadline can get a bit short, since the run times have lately increased significantly. Sometimes up to 27 hours. So, I guess decreasing the numbers of tasks in the queue should do it. Or "somebody" should/could perhaps extend the deadline? That way giving the old chunk of metal a decent chance ... |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
Just noticed a new run time situation for a N-body task: Progress ____ 52.126% Elapsed _____ 00:17:31 Remaining ___ 01:56:59 I am curious what the correct run time will be. I'll post it as soon it is finished. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
Just noticed a new run time situation for a N-body task: OK, it is done - Elapsed time is _____ 00:33:40 Somebody/something is trying to get me off track! |
Send message Joined: 16 Mar 10 Posts: 213 Credit: 108,322,900 RAC: 3,806 |
N-Body tasks have a 12 day deadline so even with long run times there shouldn't be any "not started by deadline" errors. The only reasons I can think of that would make one run out of time is a large BOINC queue and not having one's PC run close to 24/7. It took my systems a while to cut down the number of N-Body tasks when they started getting bigger, but eventually they seem to have got the hang of it; however, I suspect my configuration is very different to yours :-) On one system I've been running nothing but N-body since WCG went on hiatus in February, and I eventually tuned it down from 0.5 days work cached to 0.2 days work cached and that seems to be enough to keep the number of tasks under 10... That system runs one at a time on 3 out of 4 "CPUs". On the other system that is running N-Body I also run various GPU projects and have re-enabled WCG -- that caches 0.6 days work and the mix of WCG and N-Body seems to keep the N-Body count below 10 whilst allowing the maximum numbers of WCG tasks I'm prepared to accept at once. That system runs one N-Body at a time on 3 of the 12 "CPUs" I allow to BOINC on an 8 core/16 thread processor. So it can be tamed :-) Or "somebody" should/could perhaps extend the deadline? That way giving the old chunk of metal a decent chance ... As AndreyOR says, there's no reason a system should be having these problems in the first place -- whilst I understand that there may be problems for folks who attach to projects in what some folks have been known to call "fire and forget" mode, it seems unreasonable to expect deadlines to be extended to cope with users who either can't (Science United?; other BOINC projects with huge jobs?) or won't (for whatever reason) change configuration. The only excuse I can think of for a large queue nowadays is flaky or irregular internet access :-) Good luck with your system "tuning" -- hopefully things will improve. Cheers - Al. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
... I remember that there has been "an extension of the deadline" ... Managing the length of the queue is actually not the problem. I just wanted to point out that the run times have increased somewhat strange - well, I guess more input data. But I'm sure not everyone enjoys this! The active user number has gone down quite markedly. My real wondering is about the "remaining time" calculations (see my posts). Have a great day. |
Send message Joined: 13 Oct 21 Posts: 44 Credit: 226,923,406 RAC: 2,496 |
It's best to think of BOINC credit in terms of average rather than absolute credit per task, especially when there's a lot of variability in runtimes. If you compare average credit per runtime or CPU time for a bunch N-Body tasks before the variability of runtimes showed up to a bunch of tasks now, they're probably similar. BOINC doesn't do well with a lot of variability short term but long term things average out. I'd suggest to just keep crunching and let the credit average itself out. Unless someone can find evidence otherwise, I don't think users get shorted on credit long term. I'd say long term is at least a couple of weeks as one probably needs to complete a lot of tasks for BOINC to figure things out. |
Send message Joined: 13 Oct 21 Posts: 44 Credit: 226,923,406 RAC: 2,496 |
"Remaining time" being off by a lot is unusual for MilkyWay but somewhat common for projects that have a lot of variability in runtimes (LHC) or have very long (days to weeks) runtimes (CPDN). It's new to MilkyWay and Tom explained the reason for it earlier this week. |
©2024 Astroinformatics Group