Welcome to MilkyWay@home

Very Long WU's

Message boards : Number crunching : Very Long WU's
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 74364 - Posted: 6 Oct 2022, 14:22:19 UTC - in response to Message 74358.  

That's weird that the estimated runtime was so different from the actual runtime. I'll keep an eye on that. Has anyone else seen that problem?
ID: 74364 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Septimus

Send message
Joined: 8 Nov 11
Posts: 205
Credit: 2,893,225
RAC: 365
Message 74368 - Posted: 6 Oct 2022, 18:10:32 UTC - in response to Message 74364.  
Last modified: 6 Oct 2022, 18:20:45 UTC

Yes I had several, one I recall said 11 minutes and ended up over 7 hours.
From what I recall the highest number had was an estimate of 1 hour something but ran for over 7 Hours.
ID: 74368 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 13 Oct 21
Posts: 44
Credit: 225,163,287
RAC: 7,247
Message 74369 - Posted: 6 Oct 2022, 18:59:34 UTC - in response to Message 74364.  

That's weird that the estimated runtime was so different from the actual runtime. I'll keep an eye on that. Has anyone else seen that problem?

I've seen this with another project, LHC, that tends to have highly variable runtimes from task to task (probably also highly variable estimated computation size) so I didn't think of it as unusual when I saw it here. Just figured that something changed with the science of things that made it difficult to estimate accurately. It could be that BOINC doesn't do well with a lot of variability and keeps trying to find consistency.
ID: 74369 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 74379 - Posted: 7 Oct 2022, 13:59:07 UTC

I'm not sure if the estimation is done by BOINC or by Milkyway. I'll ask around and see if anyone in the group knows.
ID: 74379 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Septimus

Send message
Joined: 8 Nov 11
Posts: 205
Credit: 2,893,225
RAC: 365
Message 74380 - Posted: 7 Oct 2022, 17:54:55 UTC - in response to Message 74379.  
Last modified: 7 Oct 2022, 17:56:51 UTC

Based on a small sample of WU’s I think estimated elapsed time on N-Body is out by between 2.2 and 2.7. Will keep monitoring over the weekend. WU estimated at 29 Mins took 78. WU estimated 59 Mins took 138 Minutes.
This was on an Intel I7 using 4 CPU’s.
ID: 74380 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Septimus

Send message
Joined: 8 Nov 11
Posts: 205
Credit: 2,893,225
RAC: 365
Message 74383 - Posted: 8 Oct 2022, 11:59:37 UTC - in response to Message 74380.  
Last modified: 8 Oct 2022, 12:03:02 UTC

One more estimated elapsed 9 mins 59 secs, actual elapsed 152 mins 20 secs. Using 8 CPU’s.
ID: 74383 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Septimus

Send message
Joined: 8 Nov 11
Posts: 205
Credit: 2,893,225
RAC: 365
Message 74385 - Posted: 8 Oct 2022, 15:54:03 UTC - in response to Message 74383.  

Another one, estimated elapsed 12 Mins 29 secs. Actual 234 mins 38 seconds.
ID: 74385 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
nairb

Send message
Joined: 17 Feb 09
Posts: 24
Credit: 3,432,392
RAC: 68
Message 74388 - Posted: 8 Oct 2022, 21:45:30 UTC

To add to the long w/u examples.
Estimated time of 1hr 30 min (approx). Ran for 22hrs 11 mins. Its a 4 core I5 running Fedora 36.

On the other hand a w/u on a fresh install of WIN 10 and an intel I5........ estimated run time of 19hrs. Looks like taking 14 hrs or so. Which is the preferred way.
ID: 74388 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 74389 - Posted: 9 Oct 2022, 9:07:03 UTC

My three pence:

.... Progress ........ Elapsed ......... Remaining ...

__ 3.005% _____ 00:22:00 ____ 00:25:15 ___

_ 11:224% _____ 00:22:02 ____ 00:24:25 ___

__ 5.054% _____ 00:22:02 ____ 00:20:55 ___

All three on same PC and same clock time ...

I'll report the real "elapsed time" when they are finished ...
ID: 74389 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 74390 - Posted: 9 Oct 2022, 12:00:33 UTC

Now I am starting to get "not started by deadline - canceled"!
The reason is because of the very long runtimes - but is shown at the beginning as a "normal" long running task.
Frustrating.
ID: 74390 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Septimus

Send message
Joined: 8 Nov 11
Posts: 205
Credit: 2,893,225
RAC: 365
Message 74391 - Posted: 9 Oct 2022, 13:41:51 UTC - in response to Message 74390.  

Completed another one. Estimated elapsed was 20 mins 20 secs. Actual 187 mins 18 secs.

What is annoying is the disparity in credits. Previous task got 2732.2 for 65,612 secs CPU.

Task just completed used 80,624 secs CPU got 2,430.85 credits.

Am winding down and going back to separation tasks.
ID: 74391 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 74392 - Posted: 9 Oct 2022, 17:08:38 UTC - in response to Message 74389.  

My three pence:

.... Progress ........ Elapsed ......... Remaining ...

__ 3.005% _____ 00:22:00 ____ 00:25:15 ___

_ 11:224% _____ 00:22:02 ____ 00:24:25 ___

__ 5.054% _____ 00:22:02 ____ 00:20:55 ___

All three on same PC and same clock time ...

I'll report the real "elapsed time" when they are finished ...


Here they are:

08:13:52
01:50:29
04:28:01

Sequence as above.

Something to think about
ID: 74392 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 13 Oct 21
Posts: 44
Credit: 225,163,287
RAC: 7,247
Message 74393 - Posted: 9 Oct 2022, 20:14:12 UTC - in response to Message 74390.  

N-Body tasks have a 12 day deadline so even with long run times there shouldn't be any "not started by deadline" errors. The only reasons I can think of that would make one run out of time is a large BOINC queue and not having one's PC run close to 24/7.
ID: 74393 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 74394 - Posted: 10 Oct 2022, 5:13:23 UTC - in response to Message 74393.  
Last modified: 10 Oct 2022, 5:23:45 UTC

N-Body tasks have a 12 day deadline so even with long run times there shouldn't be any "not started by deadline" errors. The only reasons I can think of that would make one run out of time is a large BOINC queue and not having one's PC run close to 24/7.

Well, the N-Body queue is, as always for months, constantly 100 large.
The PC is running 24/7 - but I did stop it for about 8 hours of maintenance.
The 12 day deadline can get a bit short, since the run times have lately increased significantly. Sometimes up to 27 hours.
So, I guess decreasing the numbers of tasks in the queue should do it.

Or "somebody" should/could perhaps extend the deadline? That way giving the old chunk of metal a decent chance ...
ID: 74394 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 74395 - Posted: 10 Oct 2022, 5:24:59 UTC

Just noticed a new run time situation for a N-body task:

Progress ____ 52.126%
Elapsed _____ 00:17:31
Remaining ___ 01:56:59

I am curious what the correct run time will be.
I'll post it as soon it is finished.
ID: 74395 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 74396 - Posted: 10 Oct 2022, 6:42:58 UTC - in response to Message 74395.  

Just noticed a new run time situation for a N-body task:

Progress ____ 52.126%
Elapsed _____ 00:17:31
Remaining ___ 01:56:59

I am curious what the correct run time will be.
I'll post it as soon it is finished.

OK, it is done - Elapsed time is _____ 00:33:40

Somebody/something is trying to get me off track!
ID: 74396 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alanb1951

Send message
Joined: 16 Mar 10
Posts: 210
Credit: 105,939,401
RAC: 24,844
Message 74397 - Posted: 10 Oct 2022, 6:43:38 UTC - in response to Message 74394.  

N-Body tasks have a 12 day deadline so even with long run times there shouldn't be any "not started by deadline" errors. The only reasons I can think of that would make one run out of time is a large BOINC queue and not having one's PC run close to 24/7.

Well, the N-Body queue is, as always for months, constantly 100 large.
The PC is running 24/7 - but I did stop it for about 8 hours of maintenance.
The 12 day deadline can get a bit short, since the run times have lately increased significantly. Sometimes up to 27 hours.
So, I guess decreasing the numbers of tasks in the queue should do it.

It took my systems a while to cut down the number of N-Body tasks when they started getting bigger, but eventually they seem to have got the hang of it; however, I suspect my configuration is very different to yours :-)

On one system I've been running nothing but N-body since WCG went on hiatus in February, and I eventually tuned it down from 0.5 days work cached to 0.2 days work cached and that seems to be enough to keep the number of tasks under 10... That system runs one at a time on 3 out of 4 "CPUs".

On the other system that is running N-Body I also run various GPU projects and have re-enabled WCG -- that caches 0.6 days work and the mix of WCG and N-Body seems to keep the N-Body count below 10 whilst allowing the maximum numbers of WCG tasks I'm prepared to accept at once. That system runs one N-Body at a time on 3 of the 12 "CPUs" I allow to BOINC on an 8 core/16 thread processor.

So it can be tamed :-)

Or "somebody" should/could perhaps extend the deadline? That way giving the old chunk of metal a decent chance ...

As AndreyOR says, there's no reason a system should be having these problems in the first place -- whilst I understand that there may be problems for folks who attach to projects in what some folks have been known to call "fire and forget" mode, it seems unreasonable to expect deadlines to be extended to cope with users who either can't (Science United?; other BOINC projects with huge jobs?) or won't (for whatever reason) change configuration. The only excuse I can think of for a large queue nowadays is flaky or irregular internet access :-)

Good luck with your system "tuning" -- hopefully things will improve.

Cheers - Al.
ID: 74397 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 74399 - Posted: 10 Oct 2022, 7:12:18 UTC - in response to Message 74397.  

... I remember that there has been "an extension of the deadline" ...

Managing the length of the queue is actually not the problem.
I just wanted to point out that the run times have increased somewhat strange - well, I guess more input data.
But I'm sure not everyone enjoys this!
The active user number has gone down quite markedly.

My real wondering is about the "remaining time" calculations (see my posts).

Have a great day.
ID: 74399 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 13 Oct 21
Posts: 44
Credit: 225,163,287
RAC: 7,247
Message 74400 - Posted: 10 Oct 2022, 7:27:20 UTC - in response to Message 74391.  

It's best to think of BOINC credit in terms of average rather than absolute credit per task, especially when there's a lot of variability in runtimes. If you compare average credit per runtime or CPU time for a bunch N-Body tasks before the variability of runtimes showed up to a bunch of tasks now, they're probably similar. BOINC doesn't do well with a lot of variability short term but long term things average out. I'd suggest to just keep crunching and let the credit average itself out. Unless someone can find evidence otherwise, I don't think users get shorted on credit long term. I'd say long term is at least a couple of weeks as one probably needs to complete a lot of tasks for BOINC to figure things out.
ID: 74400 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 13 Oct 21
Posts: 44
Credit: 225,163,287
RAC: 7,247
Message 74401 - Posted: 10 Oct 2022, 7:50:38 UTC - in response to Message 74399.  

"Remaining time" being off by a lot is unusual for MilkyWay but somewhat common for projects that have a lot of variability in runtimes (LHC) or have very long (days to weeks) runtimes (CPDN). It's new to MilkyWay and Tom explained the reason for it earlier this week.
ID: 74401 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : Very Long WU's

©2024 Astroinformatics Group