Welcome to MilkyWay@home

Long-running tasks

Message boards : Number crunching : Long-running tasks
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Steven Gaber

Send message
Joined: 20 May 21
Posts: 20
Credit: 2,284,816
RAC: 3,013
Message 71593 - Posted: 6 Jan 2022, 1:06:35 UTC

Usually, most Milky Way tasks take from one to three hours to complete.
But lately, I've gotten some that take a day or more to finish.

This is a basic Windows 7 computer with no dedicated GPU. It runs 24/7/365, crinching Milky Way, World Community Grid and Rosetta (although Rosetta has not sent me any work in five days, so it's only running Milky Way and WCG).

Avg. credit Total credit BOINC version CPU GPU
928.89 198,653 7.16.20 Authentic AMD AMD A6-6400K APU with Radeon(tm) HD Graphics [Family 21 Model 19 Stepping 1] (2 processors) AMD AMD Radeon HD 7400/7500/8300/8400 series (Scrapper) (768MB) driver: 1.4.1848 OpenCL: 1.2 Operating System Last contact Details | Tasks
Microsoft Windows 7 Home Premium x64 Edition, Service Pack 1, (06.01.7601.00)

The current task is 46% complete with 5.24 hours elapsed and 10.50 hours remaining. Yesterday, one task finished after more than 23 hours.

Do the tasks vary that much or is the computer just not up to processing some tasks?
I checked to suspend CPU use in activity. preferences, storing 0.02 days of work and an additional 0.02 days of work, switching tasks every 30 minutes.

S. Gaber
Oldsmar, FL
ID: 71593 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 71594 - Posted: 6 Jan 2022, 12:54:36 UTC - in response to Message 71593.  

Usually, most Milky Way tasks take from one to three hours to complete.
But lately, I've gotten some that take a day or more to finish.

This is a basic Windows 7 computer with no dedicated GPU. It runs 24/7/365, crinching Milky Way, World Community Grid and Rosetta (although Rosetta has not sent me any work in five days, so it's only running Milky Way and WCG).

Avg. credit Total credit BOINC version CPU GPU
928.89 198,653 7.16.20 Authentic AMD AMD A6-6400K APU with Radeon(tm) HD Graphics [Family 21 Model 19 Stepping 1] (2 processors) AMD AMD Radeon HD 7400/7500/8300/8400 series (Scrapper) (768MB) driver: 1.4.1848 OpenCL: 1.2 Operating System Last contact Details | Tasks
Microsoft Windows 7 Home Premium x64 Edition, Service Pack 1, (06.01.7601.00)

The current task is 46% complete with 5.24 hours elapsed and 10.50 hours remaining. Yesterday, one task finished after more than 23 hours.

Do the tasks vary that much or is the computer just not up to processing some tasks?
I checked to suspend CPU use in activity. preferences, storing 0.02 days of work and an additional 0.02 days of work, switching tasks every 30 minutes.

S. Gaber
Oldsmar, FL


First I would uncheck the box to 'suspend cpu use' and change the 'switching tasks' to something longer so you can actually finish 1 task before switching to a different task, mine is set to 9000 minutes. Other than that it seems like you are chugging along just fine finishing units and getting credit for them. The AMD 6 core is an older cpu so doesn't have all the fancy things built in to help it crunch faster, I actually have one running as well but mine is running under Linux.

Rosetta doesn't have consistent tasks to send out right now, the people making the tasks make the next set based on the results of the current set of tasks so it takes a bit to get the next set out. They are trying to find more researchers wanting to use Rosetta but haven't had alot of luck so far, they are hoping it's a Covid blip. The other thing Rosetta did is come up with a super fast processing system to crunch tasks inhouse that never come to use users, that's another reason we don't get as many tasks to run.
ID: 71594 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 20 May 21
Posts: 20
Credit: 2,284,816
RAC: 3,013
Message 71595 - Posted: 6 Jan 2022, 17:56:53 UTC - in response to Message 71594.  

Thank you for the reply and advice.

"First I would uncheck the box to 'suspend cpu use' and change the 'switching tasks' to something longer so you can actually finish 1 task before switching to a different task,"

I will do that right away.

I get messages from Rosetta telling me that Virtual Box is not installed. That's because I uninstalled Virtual Box. My computer is too old and didn't like it.

So now I will look for another BOINC project to replace Rosetta. Any suggestions about which one? I need one that is suitable for this machine's limitations.

S.Gaber
ID: 71595 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
.clair.

Send message
Joined: 3 Mar 13
Posts: 84
Credit: 779,527,712
RAC: 0
Message 71597 - Posted: 6 Jan 2022, 19:41:46 UTC

I have had some unusualy long N body tasks recently, anything from a few hundred seconds to this biggie, I began to think it was a dud, {but this isn't a rosetta python that stalls at startup}
on a Q9450 with win 7 sp1
Completed and validated . run time71,870.88 . cpu time 203,854.70 . credit 3,748.69 Milkyway@home N-Body Simulation v1.82 (mt) windows_x86_64
ID: 71597 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 71598 - Posted: 7 Jan 2022, 10:53:54 UTC - in response to Message 71595.  

Thank you for the reply and advice.

"First I would uncheck the box to 'suspend cpu use' and change the 'switching tasks' to something longer so you can actually finish 1 task before switching to a different task,"

I will do that right away.

I get messages from Rosetta telling me that Virtual Box is not installed. That's because I uninstalled Virtual Box. My computer is too old and didn't like it.

So now I will look for another BOINC project to replace Rosetta. Any suggestions about which one? I need one that is suitable for this machine's limitations.

S.Gaber


TNGrid has tasks and I think is still doing Covid stuff http://gene.disi.unitn.it/test/index.php, World Community Grid if you stay away from the TB and the Africa Rainfall tasks, mine is running Amicable Numbers using 3 of the 6 cores at a time, It can do NFS and ODLK tasks, SiDock tasks too, as well as the shorter SRBase tasks. It can also do some of the Prime Grid tasks but most of them are faster if you have an Intel based pc.
ID: 71598 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Wailing Angus Beef

Send message
Joined: 24 Dec 07
Posts: 33
Credit: 1,923,257,146
RAC: 29,061
Message 71600 - Posted: 8 Jan 2022, 3:50:33 UTC - in response to Message 71593.  
Last modified: 8 Jan 2022, 3:52:56 UTC

skip
ID: 71600 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 0
Message 71635 - Posted: 20 Jan 2022, 19:10:23 UTC - in response to Message 71600.  

Is this task legit? it started out as an estimated run time of 20 minutes or so, but has at least 12 more hours to go, after running for more than 6 hours. I have suspended it until I get some feedback one way or the other. Thanks.

computer ID: 911207

Application
Milkyway@home N-Body Simulation 1.82 (mt)
Name
de_nbody_08_31_2021_v176_40k__data__12_1640968180_250031
State
Task suspended by user
Received
1/19/2022 1:15:12 AM
Report deadline
1/31/2022 1:15:14 AM
Resources
4 CPUs
Estimated computation size
15,623 GFLOPs
CPU time
1d 01:53:17
CPU time since checkpoint
00:03:23
Elapsed time
06:48:06
Estimated time remaining
12:50:43
Fraction done
34.619%
Virtual memory size
15.50 MB
Working set size
18.89 MB
Directory
slots/0
Process ID
9988
Progress rate
5.040% per hour
Executable
milkyway_nbody_1.82_windows_x86_64__mt.exe
ID: 71635 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 0
Message 71636 - Posted: 20 Jan 2022, 19:15:14 UTC - in response to Message 71635.  

Here is another long runner, also suspended for now.....

computer ID: 911207

Application
Milkyway@home N-Body Simulation 1.82 (mt)
Name
de_nbody_08_31_2021_v176_40k__data__11_1640968180_198374
State
Running
Received
1/19/2022 1:15:12 AM
Report deadline
1/31/2022 1:15:14 AM
Resources
4 CPUs
Estimated computation size
18,297 GFLOPs
CPU time
04:55:34
CPU time since checkpoint
00:02:15
Elapsed time
01:20:11
Estimated time remaining
08:15:59
Fraction done
13.918%
Virtual memory size
13.64 MB
Working set size
17.04 MB
Directory
slots/1
Process ID
9988
Progress rate
10.800% per hour
Executable
milkyway_nbody_1.82_windows_x86_64__mt.exe
ID: 71636 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 0
Message 71637 - Posted: 20 Jan 2022, 19:15:41 UTC - in response to Message 71635.  
Last modified: 20 Jan 2022, 19:16:50 UTC

deleted (duplicate)
ID: 71637 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 555,779,534
RAC: 43,400
Message 71641 - Posted: 21 Jan 2022, 2:12:33 UTC

Hopeful that someone experienced with running the N-body tasks chimes in here for you.

That said, there can be a lot of variability in the task difficulty among each application. So you may have run "easy" tasks before until you just tackled a "hard" task. Unfortunately BOINC only has the one APR parameter to apply to each application and can't discriminate between the APR developed for easy tasks and then suddenly tasked with hard tasks. The APR is how BOINC calculates the estimated processing speed.
The APR needs at least 10 validated tasks to be calculated. So it will take 10 validated tasks of the "hard" type to recalculate the new APR and then would be able to provide a more accurate estimate of the runtime.
You can also examine these new tasks in the Manager via the Properties of the tasks and look at the Estimated Computation Size in GFLOPS and compare these new ones to the ones you've previously done and see if they are now 10X larger in computation size. So then the long running tasks may be running perfectly fine.

Also since these are the MT N-body tasks they use all of the computers cpu cores all at once. If you have now added more cpu resource usage for other applications without limiting the N-body tasks to not use all the cpu cores, that also will slow down the computation.
ID: 71641 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 0
Message 71643 - Posted: 21 Jan 2022, 6:19:23 UTC - in response to Message 71641.  

Thanks Keith, I'm collecting data now. 7 finished tasks appear to be spot on with your comments. 3 more do not, they are definitely out of family, and are suspended. another one is running now.
ID: 71643 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 0
Message 71647 - Posted: 23 Jan 2022, 2:56:51 UTC - in response to Message 71643.  

Here is the final box score. Everything completed normally and all were credited. I guess I need to tamp down my paranoia just a bit.....,.
Sorry about the column alignment....
finished tasks Estimated computation size Elapsed time CPU time
de_nbody_08_31_2021_v176_40k__data__12_1640968180_921880 21,096 GFLOPs 00:19:14 01:04:27
de_nbody_08_31_2021_v176_40k__data__12_1640968180_955666 20,787 GFLOPs 00:18:39 01:02:48
de_nbody_08_31_2021_v176_40k__data__12_1640968180_944094 16,600 GFLOPs 00:18:51 01:02:35
de_nbody_08_31_2021_v176_40k__data__12_1640968180_613200 14,519 GFLOPs 00:17:46 00:58:43
de_nbody_08_31_2021_v176_40k__data__12_1640968180_974257 18,240 GFLOPs 00:18:48 01:01:55
de_nbody_08_31_2021_v176_40k__data__12_1640968180_277884 21,192 GFLOPs 00:12:44 00:40:02
de_nbody_08_31_2021_v176_40k__data__9_1639601048_805423 39,287 GFLOPs 00:25:21 01:26:01
de_nbody_08_31_2021_v176_40k__data__12_1640968180_277887 56,804 GFLOPs 03:39:20 13:42:17
de_nbody_08_31_2021_v176_40k__data__11_1640968180_198374 18,297 GFLOPs 10:34:30 40:11:49
de_nbody_08_31_2021_v176_40k__data__12_1640968180_250031 15,623 GFLOPs 19:46:15 75:06:56
de_nbody_08_31_2021_v176_40k__data__13_1640968180_278346 25,653 GFLOPs 08:47:25 33:13:24
ID: 71647 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 20 May 21
Posts: 20
Credit: 2,284,816
RAC: 3,013
Message 71828 - Posted: 26 Feb 2022, 18:38:45 UTC - in response to Message 71594.  
Last modified: 26 Feb 2022, 18:44:19 UTC

I only had one Milky Way task. After running awhile, that task indicated it would take 15 days to complete. So I aborted it.

Then, Milky Way sent me 160 new tasks, all with a deadline of March 10. They indicated that they would take 30 to 90 minutes to complete. Fine.

After running for 7 hours, the first of the 160 tasks indicated it would take 17 days to complete. So I aborted it, but the next one was running for 8 hours with the time remaining running backwards and hardly any progress being made.

So I reset the project.

Just now, Milky Way just sent me 329 new tasks, all with a deadline of March 10.

Any help?

Steven Gaber
ID: 71828 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 20 May 21
Posts: 20
Credit: 2,284,816
RAC: 3,013
Message 71829 - Posted: 26 Feb 2022, 20:23:32 UTC

Now, the first of those 329 new tasks says it will take 147 DAYS to complete.

Something has to be wrong here.

Steven Gaber
ID: 71829 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 555,779,534
RAC: 43,400
Message 71830 - Posted: 26 Feb 2022, 23:45:05 UTC

You have set WAY!!! too much work day cache size. Reduce your cache size down to 0.01 days and 0.0 additional days.
ID: 71830 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 20 May 21
Posts: 20
Credit: 2,284,816
RAC: 3,013
Message 71831 - Posted: 27 Feb 2022, 2:42:14 UTC - in response to Message 71830.  

You have set WAY!!! too much work day cache size. Reduce your cache size down to 0.01 days and 0.0 additional days.


Thanks.

Did that. I still have 327 tasks to go, all due on March 10. That ain't gonna happen.

The indicated completion times range from 30 minutes to 01:49:30. But the previous ones said the same thing and ended up taking three or more days, so
I aborted them.

With this new batch of 329, I will abort those that take like 15 hours or more to complete.

Should I reset the project again or let it finish as many as it can till the deadline?

Steven Gaber
ID: 71831 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 555,779,534
RAC: 43,400
Message 71832 - Posted: 27 Feb 2022, 3:15:07 UTC - in response to Message 71831.  

Your estimated time for completion is completely bogus until you have established an APR for each application.
Resetting the project throws away all the progress you have made toward establishing the APR.
It takes 11 validated tasks on each application to establish an APR upon which the estimated completion times can be believed.
I would abort at least some of the current cache you hold but leave enough to complete your APR for Separation and N-body.
ID: 71832 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 20 May 21
Posts: 20
Credit: 2,284,816
RAC: 3,013
Message 71833 - Posted: 27 Feb 2022, 3:26:14 UTC - in response to Message 71832.  
Last modified: 27 Feb 2022, 3:30:26 UTC

Your estimated time for completion is completely bogus until you have established an APR for each application.
Resetting the project throws away all the progress you have made toward establishing the APR.
It takes 11 validated tasks on each application to establish an APR upon which the estimated completion times can be believed.
I would abort at least some of the current cache you hold but leave enough to complete your APR for Separation and N-body.


Thanks again.

They seem to be running OK now.

What is an APR?

I'll wait till I see one that takes longer than it should before aborting any. Only 315 to go now. :>)

I appreciate your help.

Steven Gaber
ID: 71833 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 555,779,534
RAC: 43,400
Message 71835 - Posted: 28 Feb 2022, 0:58:12 UTC - in response to Message 71833.  

Open any host on your Computers list and go to the Details >> Show page. There it lists the APR (average processing rate) for each application.
ID: 71835 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Steven Gaber

Send message
Joined: 20 May 21
Posts: 20
Credit: 2,284,816
RAC: 3,013
Message 71836 - Posted: 28 Feb 2022, 8:25:18 UTC - in response to Message 71835.  
Last modified: 28 Feb 2022, 8:47:39 UTC

Open any host on your Computers list and go to the Details >> Show page. There it lists the APR (average processing rate) for each application.


So far, since my last post yesterday, this computer has finished 23 of the 329 Milky Way tasks it received, plus the three that I aborted. I doubt it will complete all 301 of them by the March 10 deadline, but it will do a chunk of them.

I found the APR for Milky Way on this computer:
Average processing rate 6.76 GFLOPS

For Rosetta:
Average processing rate 4.25 GFLOPS

For SETI
Average processing rate 31.40 GFLOPS

Steven Gaber
ID: 71836 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Long-running tasks

©2024 Astroinformatics Group