Very Long WU's

Author	Message
JIM Send message Joined: 21 Jul 09 Posts: 4 Credit: 5,759,530 RAC: 0	Message 74181 - Posted: 14 Sep 2022, 20:37:40 UTC Whatâ€™s with the sudden influx of long de_nbody work units? I have an 8 core machine and am used to seen WUâ€™s that complete in 6 or 8 minutes running on all cores art once. Suddenly I am getting large numbers of WUâ€™s with run times of 5 to 8 hours. Thatâ€™s quite a change. What gives? ID: 74181 · Rating: 0 · rate: / Reply Quote

alanb1951 Send message Joined: 16 Mar 10 Posts: 218 Credit: 110,624,597 RAC: 1,723	Message 74183 - Posted: 15 Sep 2022, 1:34:45 UTC - in response to Message 74181. Whatâ€™s with the sudden influx of long de_nbody work units? I have an 8 core machine and am used to seen WUâ€™s that complete in 6 or 8 minutes running on all cores art once. Suddenly I am getting large numbers of WUâ€™s with run times of 5 to 8 hours. Thatâ€™s quite a change. What gives? Someone else raised the same point in the thread "NBody tasks taking much longer ..." just under a week ago. And I've been noticing these longer tasks since the middle of last month... We'd need the project scientist to give a proper explanation, but the long-running tasks admit to being long-running before they start, and seem to get credited accordingly so it looks like expected behaviour for certain work units! Cheers - Al. ID: 74183 · Rating: 0 · rate: / Reply Quote

Septimus Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,905,857 RAC: 0	Message 74323 - Posted: 30 Sep 2022, 10:59:53 UTC - in response to Message 74183. Same hereâ€¦WUâ€™s that took 4 Mins are now at least 4 hours, whether 4,6,8 Cores allocated. Estimated run times bear no relation to reality. I am running my pile down as the times are unpredictable. I may even abort the last bunch. If it takes as long to validate then no wonder the waiting validation pile is not moving much for anything. ID: 74323 · Rating: 0 · rate: / Reply Quote

Dave Studdert Send message Joined: 26 Mar 09 Posts: 2 Credit: 22,774,954 RAC: 0	Message 74324 - Posted: 30 Sep 2022, 12:35:17 UTC Yeah some of these CPU units are taking crazy amounts of time. 6 cores and its 9 hours in with only 27% complete on a 4.5Ghz CPU. ID: 74324 · Rating: 0 · rate: / Reply Quote

Septimus Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,905,857 RAC: 0	Message 74325 - Posted: 30 Sep 2022, 13:26:10 UTC - in response to Message 74324. Same here I have an Intel I7 and 2 are running on 4 CPUâ€™s each and say 7 hours to go after running for 2 hours so far. ID: 74325 · Rating: 0 · rate: / Reply Quote

Septimus Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,905,857 RAC: 0	Message 74326 - Posted: 30 Sep 2022, 19:59:19 UTC - in response to Message 74325. Last modified: 30 Sep 2022, 20:07:23 UTC Same here I have an Intel I7 and 2 are running on 4 CPUâ€™s each and say 7 hours to go after running for 2 hours so far. Thatâ€™s my Nbody WUâ€™s finished at last, one took over 96,000 CPU seconds, the other over 107,000 CPU sec over 4 CPUâ€™s. Apart from anything else credits are not consistent. WU producing 61707 CPU seconds got 3939 credits WU producing 96940 CPU seconds got 2139 credits WU producing 52276 CPU seconds got 2228 credits I will not be processing any more of these. sorry. ID: 74326 · Rating: 0 · rate: / Reply Quote

San-Fernando-Valley Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0	Message 74330 - Posted: 1 Oct 2022, 6:58:03 UTC - in response to Message 74326. I can confirm the inconsistencies. PC #1: 8,650.77___57,509.30_____2,036.67 7,802.28___51,971.95_____4,544.88 Both ran on same PC, same numver of CPUs, nothing else running. ------------------------------------------------------------------------------------- PC #2: 10,502.32___37,127.45_____3,211.39 13,434.64___49,270.17_____2,041.10 Both ran on same PC, same numver of CPUs, nothing else running. ------------------------------------------------------------------------------------- PC #3: 458.05___1,426.72_____54.40 462.13___1,433.34_____51.95 Both ran on same PC, same numver of CPUs, nothing else running. ------------------------------------------------------------------------------------- Hope there is a sensible explanation for this! Maybe we should open a new thread for this topic? ID: 74330 · Rating: 0 · rate: / Reply Quote

Septimus Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,905,857 RAC: 0	Message 74335 - Posted: 2 Oct 2022, 15:39:21 UTC - in response to Message 74330. Last modified: 2 Oct 2022, 15:52:27 UTC Looking back at old screenshots I did in April the Nbody Simulation average was .12 or .13 of an hour, about 7 Mins. Now the average is over 2 hours with some ridiculous maximums at 130 hours plus. Something has really changed. Also the awaiting validation seems to hovering between 1.69 and 1.80 million and has been like that for last few weeks. The last Nbodyâ€™s I did were mostly fails to complete and were timed out, not surprisingly. ID: 74335 · Rating: 0 · rate: / Reply Quote

San-Fernando-Valley Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0	Message 74336 - Posted: 2 Oct 2022, 18:00:39 UTC - in response to Message 74335. Last modified: 2 Oct 2022, 18:04:30 UTC ... The last Nbodyâ€™s I did were mostly fails to complete and were timed out, not surprisingly. Mine are doing fine. On the "slowest" PC run times are around max 15 hours. On the others around max 3 hours. I wonder why Tom doesn't respond to this "problem" of very long runtimes. Something must have changed - most likely the amount of input data? I think this will "scare off" a lot of crunchers. We are already down to around 2000 users. Not to mention the (long) response times of the "homepage" ... ID: 74336 · Rating: 0 · rate: / Reply Quote

AndreyOR Send message Joined: 13 Oct 21 Posts: 44 Credit: 232,975,711 RAC: 31,285	Message 74339 - Posted: 3 Oct 2022, 4:46:23 UTC Last modified: 3 Oct 2022, 4:48:08 UTC May I suggest that we don't worry too much about the variations that happen with runtimes, credit, etc. and just crunch. We can ask questions, speculate, like here: https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4930&postid=74329 but let's not give up, temporarily or permanently. Credit inconsistencies are not unusual with BOINC when estimated computation size of tasks changes or is variable. It'll average itself out though. If one is concerned, check the average credit per runtime or cpu time of a bunch of tasks before and compare to the same bunch of current tasks, I bet they'll be similar even if variability between tasks seems large right now. Anyone who has contributed to LHC, for example, knows that all of their subprojects have highly variable tasks but the credit tends to be pretty consistent on average. Disk crash a few months ago was a painful time for the project but everything worked out, nothing was lost and everyone got their credit, it just took time, but this is a long term project so in the end it was just one of the hurdles to overcome. This (high validation and tasks queue) is nothing close to that. The project does seem to have some server issues but there are plans to replace them soon, as mentioned in the forums. Longer runtimes is likely due to scientific reasons as speculated in the post I linked above. Some may already know this but project people who're most likely to watch and post on the forums are PhD students and are very busy with much higher priorities. We'll eventually get the info and things will eventually get fixed, it'll just take time. I'd encourage everyone to just stay the course and crunch regardless of what's happening. That's what's most helpful to the project and we'll get our credits and badges, even if sometimes it takes a bit of time. ID: 74339 · Rating: 0 · rate: / Reply Quote

San-Fernando-Valley Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0	Message 74341 - Posted: 3 Oct 2022, 5:22:19 UTC - in response to Message 74339. +1 -------------------------------------------------------------------------------------------------- I still don't understand the interesting situations, where tasks that "run" longer receive less credits. -------------------------------------------------------------------------------------------------- I personally am not keen about credits - OK, they're nice, but I always thought one does (should do) crunching for the pure sense of doing something good for science. Sort of like contributing to the understanding of our world/universe - etc.. -------------------------------------------------------------------------------------------------- Cheers ID: 74341 · Rating: 0 · rate: / Reply Quote

AndreyOR Send message Joined: 13 Oct 21 Posts: 44 Credit: 232,975,711 RAC: 31,285	Message 74345 - Posted: 4 Oct 2022, 14:15:45 UTC - in response to Message 74341. It's my understanding that BOINC credit calculation system has been criticized for years. The same glitches and oddities apply to everyone within a project though so It's only useful to compare within a project, never between projects. Even total BOINC credit for all projects is pretty useless for any comparisons. It's human nature to have some kind of metrics and rewards. I think there are simpler and better systems for doing this though. The most important metric I'd say is number of tasks completed, that's what's most important to the projects. A tally could be kept (for each sub-project) and a badge could be awarded for completing every given number of tasks. I'd argue that it's much more useful and meaningful to know that one has completed 1000 tasks than that one got 1 million points, for example. ID: 74345 · Rating: 0 · rate: / Reply Quote

San-Fernando-Valley Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0	Message 74348 - Posted: 4 Oct 2022, 14:49:33 UTC - in response to Message 74345. Well said. But, I am/was not comparing between different PCs running Milkyway. Also not comparing between projects - neither on same PC or a different one. The three examples I showed earlier, were run on the same PC using Milkyway. I was trying to show that one task has a lesser run time, but is awarded more credits than the other task, which ran longer. I am not aware of any such discrepancies in credit calculations, for example, on Einstein. Of course, a comparison of credits or run times between projects, on same PC or different ones, is of no value. ID: 74348 · Rating: 0 · rate: / Reply Quote

Tom Donlon Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0	Message 74351 - Posted: 4 Oct 2022, 15:00:37 UTC I talk about this a little bit in this thread https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4930#74308, but I think I know what the problem is. There are combinations of parameters (such as very dense dwarf galaxies) that cause the simulation to run for a long time. This is usually because the timestep resolution that you need to accurately simulate those systems is very small, so the simulation may choose to run 10,000 timesteps for very dense systems, but only 1,000 timesteps for a less dense system. Timesteps all take roughly the same amount of time to run, so in this example that would be a 10x increase in the time it would take to crunch that simulation. Eric had implemented a system that avoided parameters that would cause very long runtimes. Specifically, if your client calculated that you needed a very large number of timesteps for a simulation, that workunit would abort and move on to the next task. These very dense dwarf galaxies are not realistic, so we don't lose any scientific value by not running them. This was working for a while, but I am seeing N-body workunits that have very dense cores in the results pool. Something must have gotten changed when Eric made changes recently.... I have made the team aware of this problem and we'll work on fixing it. ID: 74351 · Rating: 0 · rate: / Reply Quote

Septimus Send message Joined: 8 Nov 11 Posts: 205 Credit: 2,905,857 RAC: 0	Message 74352 - Posted: 4 Oct 2022, 16:53:30 UTC - in response to Message 74351. Thanks for the update TomðŸ˜ ID: 74352 · Rating: 0 · rate: / Reply Quote

AndreyOR Send message Joined: 13 Oct 21 Posts: 44 Credit: 232,975,711 RAC: 31,285	Message 74353 - Posted: 4 Oct 2022, 19:58:59 UTC - in response to Message 74348. I was commenting more generally about BOINC credit system in that post. As far as discrepancies that you noticed... that's partly what I was referring to in an earlier post by saying that credit inconsistencies are not unusual with BOINC when estimated computation size of tasks changes all of the sudden or is variable. It usually averages itself out though over time. I believe the inconsistencies are due to BOINC trying to average things out but it's not good at doing that and takes a long time. The greater and more frequent the computation size variability (between tasks), the longer it takes BOINC to average things out. I think it's like weeks not days. I don't think BOINC is trying to short users of credit it's just trying to adapt to the changes but it's not good at doing that. I've seen or read about this with projects that rely at least in some part on BOINC default system, like Rosetta, LHC, and MilkyWay. Other projects like Universe, CPDN, and Einstein have a set credit per task (or trickle in case of CPDN) completed, which also varies by sub-project. That's why you haven't seen it in Einstein, for example. We also just got an explanation from Tom as to why N-body tasks started to take so long. Which is helpful to know that there's a scientific reason and not just glitchy or bad tasks. ID: 74353 · Rating: 0 · rate: / Reply Quote

S984s5KN6muKjYePgfqf7F37RiXw5f... Send message Joined: 8 May 09 Posts: 3339 Credit: 524,355,475 RAC: 17,168	Message 74355 - Posted: 5 Oct 2022, 10:20:59 UTC - in response to Message 74345. It's my understanding that BOINC credit calculation system has been criticized for years. The same glitches and oddities apply to everyone within a project though so It's only useful to compare within a project, never between projects. Even total BOINC credit for all projects is pretty useless for any comparisons. It's human nature to have some kind of metrics and rewards. I think there are simpler and better systems for doing this though. The most important metric I'd say is number of tasks completed, that's what's most important to the projects. A tally could be kept (for each sub-project) and a badge could be awarded for completing every given number of tasks. I'd argue that it's much more useful and meaningful to know that one has completed 1000 tasks than that one got 1 million points, for example. Try running the project wuprop then, it counts the hours your pc's put in, both cpu cores and the gpu, which is more like your last sentence https://wuprop.boinc-af.org it runs as an NCI task meaning, Non Computationally Intense or about 0.25 of a cpu core, and you should run it on everything that crunches Boinc tasks so you get both your hours counted and get Badges for reaching different milestones. The Project also has a forum section letting you know when new apps are starting or restarting after being off for awhile. https://wuprop.boinc-af.org/forum_thread.php?id=351 ID: 74355 · Rating: 0 · rate: / Reply Quote

S984s5KN6muKjYePgfqf7F37RiXw5f... Send message Joined: 8 May 09 Posts: 3339 Credit: 524,355,475 RAC: 17,168	Message 74356 - Posted: 5 Oct 2022, 10:25:54 UTC - in response to Message 74353. I was commenting more generally about BOINC credit system in that post. As far as discrepancies that you noticed... that's partly what I was referring to in an earlier post by saying that credit inconsistencies are not unusual with BOINC when estimated computation size of tasks changes all of the sudden or is variable. Credit inconsistencies come in when a project doesn't assign a fixed credit for each task, yes that has it's own problems like now when some tasks run much longer than other tasks, the variable credit metric used is complicated and is ALOT better though than the 'credit new' system that comes built into the Server side of Boinc ID: 74356 · Rating: 0 · rate: / Reply Quote

Tom Donlon Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0	Message 74357 - Posted: 5 Oct 2022, 13:45:11 UTC Went through and looked at the N-body tasks yesterday. We have a system that throws out tasks if they have more than 150k timesteps (the number of timesteps is determined by how long the simulation needs to evolve, as well as how dense the dwarf galaxy is). It turns out that the current N-body runs have optimized to a point where the number of timesteps is very close to 150k - we calculated the number of timesteps for an arbitrary WU, and it was 147,500 timesteps. Luckily, that means that the length of N-body tasks at the moment isn't because of a glitch. Everything is working as intended. The bad news is that there isn't any way to shorten the N-body simulations, unless we wanted to release a new client with a different timestep limit and put up new runs. You may also see many of your N-body tasks recently have only taken a few seconds to run - that happens when the simulation calculates that it needs more than 150k timesteps. ID: 74357 · Rating: 0 · rate: / Reply Quote

nairb Send message Joined: 17 Feb 09 Posts: 24 Credit: 3,601,252 RAC: 0	Message 74358 - Posted: 5 Oct 2022, 22:33:22 UTC - in response to Message 74357. You may also see many of your N-body tasks recently have only taken a few seconds to run - that happens when the simulation calculates that it needs more than 150k timesteps. Yup, just had one of those. Lasted all of 1 second. The other w/u had a claimed runtime of about 1hr 40Â mins before starting but took 19hrs 43Â mins to complete. I am hoping some of the other w/u will be shorter, otherwise some w/u will not meet the deadline. ID: 74358 · Rating: 0 · rate: / Reply Quote