Message boards :
Number crunching :
Tasks stuck after a few minutes and run indefinitely
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Apr 11 Posts: 36 Credit: 283,587,354 RAC: 0 |
I've had about 20 and here's an example that follows with two timestamps. I've suspended and restarted it without any affect. Application Milkyway@home N-Body Simulation with Orbit Fitting 1.87 (mt) Name de_nbody_07_01_2024_v186_pal5__data__13_1718187602_1773294 State Running Received 7/7/2024 8:29:03 PM Report deadline 7/19/2024 8:29:01 PM Estimated computation size 61,319 GFLOPs CPU time 00:04:01 CPU time since checkpoint 00:00:00 Elapsed time 00:08:51 Estimated time remaining 00:58:04 Fraction done 2.980% Virtual memory size 15.50 MB Working set size 19.73 MB Directory slots/78 Process ID 33716 Progress rate 22.680% per hour Executable milkyway_nbody_orbit_fitting_1.87_windows_x86_64__mt.exe 10 minutes later: Application Milkyway@home N-Body Simulation with Orbit Fitting 1.87 (mt) Name de_nbody_07_01_2024_v186_pal5__data__13_1718187602_1773294 State Running Received 7/7/2024 8:29:03 PM Report deadline 7/19/2024 8:29:01 PM Estimated computation size 61,319 GFLOPs CPU time 00:04:01 CPU time since checkpoint 00:00:00 Elapsed time 00:18:49 Estimated time remaining 00:58:04 Fraction done 2.980% Virtual memory size 15.50 MB Working set size 19.73 MB Directory slots/78 Process ID 33716 Progress rate 10.080% per hour Executable milkyway_nbody_orbit_fitting_1.87_windows_x86_64__mt.exe |
Send message Joined: 28 Apr 11 Posts: 36 Credit: 283,587,354 RAC: 0 |
And another one: Application Milkyway@home N-Body Simulation with Orbit Fitting 1.87 (mt) Name de_nbody_07_01_2024_v186_pal5__data__13_1718187602_1773297 State Running Received 7/7/2024 8:29:03 PM Report deadline 7/19/2024 8:29:01 PM Estimated computation size 59,585 GFLOPs CPU time 00:03:58 CPU time since checkpoint 00:00:00 Elapsed time 00:52:45 Estimated time remaining 00:56:32 Fraction done 2.777% Virtual memory size 15.51 MB Working set size 19.82 MB Directory slots/71 Process ID 38312 Progress rate 3.240% per hour Executable milkyway_nbody_orbit_fitting_1.87_windows_x86_64__mt.exe |
Send message Joined: 19 Jul 10 Posts: 632 Credit: 19,379,218 RAC: 3,476 |
Allow BOINC to use 100% CPU time if you don't as that's likely the issue. To get them running again, simply restart BOINC. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
And another one: I have a question for you...how do you go from 32 cpu cores to using slot 71 and slot 78 for your tasks? Are you faking extra cpu cores on your pc to run something else like Yafu the 64 and 128 tasks? |
Send message Joined: 8 Sep 21 Posts: 5 Credit: 21,243,528 RAC: 11,877 |
Don't mean to be [enter any apologetic phrase], but I had the same outcome a couple of weeks ago and dumped them as something I did wrong. I only have 8 CPUs, I'm pretty sure, and was running the default setup. I saw that adjusting the setup to something that made no sense would help. Went to 1 task per CPU. Very ugly. Went back to default, 8 CPU per task, or something and then had to get rid of the 1 task per CPU that would not run while all the CPUs were busy with new tasks. Got those cleared out, but output kept going down. Turns out I don't really care how fast things are running, so I'm back to 1 task per CPU and I only check on it once a week to marvel how long some of these tasks take on my [computer doesn't have a front so I have no idea what it is] rescued hard drive computer. I'll keep plugging away. If I bust the program it was the computer's fault. Tuba Jim, fresh from a week-long JHS/HS band camp. May dark matter marvel at what an improvement can be made in music performance in only a week. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Don't mean to be [enter any apologetic phrase], but I had the same outcome a couple of weeks ago and dumped them as something I did wrong. I only have 8 CPUs, I'm pretty sure, and was running the default setup. I saw that adjusting the setup to something that made no sense would help. Went to 1 task per CPU. Very ugly. Went back to default, 8 CPU per task, or something and then had to get rid of the 1 task per CPU that would not run while all the CPUs were busy with new tasks. Got those cleared out, but output kept going down. Turns out I don't really care how fast things are running, so I'm back to 1 task per CPU and I only check on it once a week to marvel how long some of these tasks take on my [computer doesn't have a front so I have no idea what it is] rescued hard drive computer. I'll keep plugging away. If I bust the program it was the computer's fault. Your pc's seem to be doing fine as they are, running more than 1 cpu core per task does make the tasks faster but it's not like the Project needs the results yesterday or anything like that, this is a long term project and while yes they would the tasks crunched there is no immediate deadline on finishing them. Sounds like you had fun at band camp, my two sons said music helped them in Math class because of the structure, I can't carry a tune in a bucket but it sounds good to me. |
Send message Joined: 22 Jun 18 Posts: 6 Credit: 47,208,335 RAC: 322 |
Hi all, I am having what appears to be the same problem (with Orbit Fitting). Supposedly fairly short-runtime tasks running for half a day, a whole day or more, without ever finishing. Looking in BoincTasks, the CPU% is usually around 2.5% to 3.5%. Restarting doesn't help. Starting a new task and watching in BoincTasks, I can see the CPU% starting at a reasonable value, then almost immediately start dropping literally by the second. I have 4 CPUs. I've tried switching from 3 to 1 CPU (in the app_config.xml), which didn't work either (I can tell the switch is being accepted as the status in the BOINC Status column changes to "3 CPUs", etc.). I've read all of this thread and I think tried everything. I don't know what else to do. Any suggestions? I'm about ready to drop Orbit Fitting and hope for more of the regular N-Body Simulation. Thanks. Doug |
Send message Joined: 19 Jul 10 Posts: 632 Credit: 19,379,218 RAC: 3,476 |
Are you sure you are allowing BOINC to use 100% CPU time? Btw., app_info.xml isn't necessary here anymore to select number of threads per task, better delete it and use the online preferences. |
Send message Joined: 22 Jun 18 Posts: 6 Credit: 47,208,335 RAC: 322 |
Thanks. That seems to have fixed it. Doug |
Send message Joined: 28 Apr 11 Posts: 36 Credit: 283,587,354 RAC: 0 |
I'm a novice at all the details but regarding the slot question, I believe it has to do with the directory slot on the drive and nothing to do with which core its attached to. I only have 32 tasks running at any one time. I discovered that the frozen jobs have to do with a weakness in the suspend and restart feature. About 1% of restarts don't actually restart. I was starting 32 tasks then deleting the tasks that give unfair credit and grabbing 32 more enough to get thru the night. The credit weakness revealed the suspend weakness. |
©2024 Astroinformatics Group