Welcome to MilkyWay@home

Tasks stuck after a few minutes and run indefinitely

Message boards : Number crunching : Tasks stuck after a few minutes and run indefinitely
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile KeithBriggs

Send message
Joined: 28 Apr 11
Posts: 36
Credit: 283,587,354
RAC: 0
Message 77172 - Posted: 9 Jul 2024, 15:33:48 UTC

I've had about 20 and here's an example that follows with two timestamps. I've suspended and restarted it without any affect.


Application
Milkyway@home N-Body Simulation with Orbit Fitting 1.87 (mt)
Name
de_nbody_07_01_2024_v186_pal5__data__13_1718187602_1773294
State
Running
Received
7/7/2024 8:29:03 PM
Report deadline
7/19/2024 8:29:01 PM
Estimated computation size
61,319 GFLOPs
CPU time
00:04:01
CPU time since checkpoint
00:00:00
Elapsed time
00:08:51
Estimated time remaining
00:58:04
Fraction done
2.980%
Virtual memory size
15.50 MB
Working set size
19.73 MB
Directory
slots/78
Process ID
33716
Progress rate
22.680% per hour
Executable
milkyway_nbody_orbit_fitting_1.87_windows_x86_64__mt.exe

10 minutes later:


Application
Milkyway@home N-Body Simulation with Orbit Fitting 1.87 (mt)
Name
de_nbody_07_01_2024_v186_pal5__data__13_1718187602_1773294
State
Running
Received
7/7/2024 8:29:03 PM
Report deadline
7/19/2024 8:29:01 PM
Estimated computation size
61,319 GFLOPs
CPU time
00:04:01
CPU time since checkpoint
00:00:00
Elapsed time
00:18:49
Estimated time remaining
00:58:04
Fraction done
2.980%
Virtual memory size
15.50 MB
Working set size
19.73 MB
Directory
slots/78
Process ID
33716
Progress rate
10.080% per hour
Executable
milkyway_nbody_orbit_fitting_1.87_windows_x86_64__mt.exe
ID: 77172 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KeithBriggs

Send message
Joined: 28 Apr 11
Posts: 36
Credit: 283,587,354
RAC: 0
Message 77173 - Posted: 9 Jul 2024, 15:57:27 UTC - in response to Message 77172.  

And another one:


Application
Milkyway@home N-Body Simulation with Orbit Fitting 1.87 (mt)
Name
de_nbody_07_01_2024_v186_pal5__data__13_1718187602_1773297
State
Running
Received
7/7/2024 8:29:03 PM
Report deadline
7/19/2024 8:29:01 PM
Estimated computation size
59,585 GFLOPs
CPU time
00:03:58
CPU time since checkpoint
00:00:00
Elapsed time
00:52:45
Estimated time remaining
00:56:32
Fraction done
2.777%
Virtual memory size
15.51 MB
Working set size
19.82 MB
Directory
slots/71
Process ID
38312
Progress rate
3.240% per hour
Executable
milkyway_nbody_orbit_fitting_1.87_windows_x86_64__mt.exe
ID: 77173 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 632
Credit: 19,379,218
RAC: 3,476
Message 77174 - Posted: 10 Jul 2024, 8:37:21 UTC
Last modified: 10 Jul 2024, 8:38:05 UTC

Allow BOINC to use 100% CPU time if you don't as that's likely the issue. To get them running again, simply restart BOINC.
ID: 77174 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 77175 - Posted: 10 Jul 2024, 10:29:23 UTC - in response to Message 77173.  

And another one:


Application
Milkyway@home N-Body Simulation with Orbit Fitting 1.87 (mt)
Name
de_nbody_07_01_2024_v186_pal5__data__13_1718187602_1773297
State

Directory
slots/71

milkyway_nbody_orbit_fitting_1.87_windows_x86_64__mt.exe


I have a question for you...how do you go from 32 cpu cores to using slot 71 and slot 78 for your tasks? Are you faking extra cpu cores on your pc to run something else like Yafu the 64 and 128 tasks?
ID: 77175 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tuba Jim

Send message
Joined: 8 Sep 21
Posts: 5
Credit: 21,243,528
RAC: 11,877
Message 77176 - Posted: 13 Jul 2024, 4:17:01 UTC - in response to Message 77175.  

Don't mean to be [enter any apologetic phrase], but I had the same outcome a couple of weeks ago and dumped them as something I did wrong. I only have 8 CPUs, I'm pretty sure, and was running the default setup. I saw that adjusting the setup to something that made no sense would help. Went to 1 task per CPU. Very ugly. Went back to default, 8 CPU per task, or something and then had to get rid of the 1 task per CPU that would not run while all the CPUs were busy with new tasks. Got those cleared out, but output kept going down. Turns out I don't really care how fast things are running, so I'm back to 1 task per CPU and I only check on it once a week to marvel how long some of these tasks take on my [computer doesn't have a front so I have no idea what it is] rescued hard drive computer. I'll keep plugging away. If I bust the program it was the computer's fault.
Tuba Jim, fresh from a week-long JHS/HS band camp.
May dark matter marvel at what an improvement can be made in music performance in only a week.
ID: 77176 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 77177 - Posted: 13 Jul 2024, 10:43:33 UTC - in response to Message 77176.  

Don't mean to be [enter any apologetic phrase], but I had the same outcome a couple of weeks ago and dumped them as something I did wrong. I only have 8 CPUs, I'm pretty sure, and was running the default setup. I saw that adjusting the setup to something that made no sense would help. Went to 1 task per CPU. Very ugly. Went back to default, 8 CPU per task, or something and then had to get rid of the 1 task per CPU that would not run while all the CPUs were busy with new tasks. Got those cleared out, but output kept going down. Turns out I don't really care how fast things are running, so I'm back to 1 task per CPU and I only check on it once a week to marvel how long some of these tasks take on my [computer doesn't have a front so I have no idea what it is] rescued hard drive computer. I'll keep plugging away. If I bust the program it was the computer's fault.
Tuba Jim, fresh from a week-long JHS/HS band camp.
May dark matter marvel at what an improvement can be made in music performance in only a week.


Your pc's seem to be doing fine as they are, running more than 1 cpu core per task does make the tasks faster but it's not like the Project needs the results yesterday or anything like that, this is a long term project and while yes they would the tasks crunched there is no immediate deadline on finishing them.

Sounds like you had fun at band camp, my two sons said music helped them in Math class because of the structure, I can't carry a tune in a bucket but it sounds good to me.
ID: 77177 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
doug

Send message
Joined: 22 Jun 18
Posts: 6
Credit: 47,208,335
RAC: 322
Message 77189 - Posted: 19 Jul 2024, 14:32:16 UTC

Hi all,

I am having what appears to be the same problem (with Orbit Fitting). Supposedly fairly short-runtime tasks running for half a day, a whole day or more, without ever finishing. Looking in BoincTasks, the CPU% is usually around 2.5% to 3.5%. Restarting doesn't help. Starting a new task and watching in BoincTasks, I can see the CPU% starting at a reasonable value, then almost immediately start dropping literally by the second. I have 4 CPUs. I've tried switching from 3 to 1 CPU (in the app_config.xml), which didn't work either (I can tell the switch is being accepted as the status in the BOINC Status column changes to "3 CPUs", etc.).

I've read all of this thread and I think tried everything. I don't know what else to do. Any suggestions? I'm about ready to drop Orbit Fitting and hope for more of the regular N-Body Simulation.

Thanks.

Doug
ID: 77189 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 632
Credit: 19,379,218
RAC: 3,476
Message 77190 - Posted: 19 Jul 2024, 16:05:46 UTC - in response to Message 77189.  

Are you sure you are allowing BOINC to use 100% CPU time? Btw., app_info.xml isn't necessary here anymore to select number of threads per task, better delete it and use the online preferences.
ID: 77190 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
doug

Send message
Joined: 22 Jun 18
Posts: 6
Credit: 47,208,335
RAC: 322
Message 77191 - Posted: 20 Jul 2024, 14:13:50 UTC - in response to Message 77190.  

Thanks. That seems to have fixed it.

Doug
ID: 77191 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KeithBriggs

Send message
Joined: 28 Apr 11
Posts: 36
Credit: 283,587,354
RAC: 0
Message 77205 - Posted: 14 Aug 2024, 13:18:16 UTC - in response to Message 77175.  

I'm a novice at all the details but regarding the slot question, I believe it has to do with the directory slot on the drive and nothing to do with which core its attached to. I only have 32 tasks running at any one time.

I discovered that the frozen jobs have to do with a weakness in the suspend and restart feature. About 1% of restarts don't actually restart. I was starting 32 tasks then deleting the tasks that give unfair credit and grabbing 32 more enough to get thru the night. The credit weakness revealed the suspend weakness.
ID: 77205 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Tasks stuck after a few minutes and run indefinitely

©2024 Astroinformatics Group