21)
Message boards :
Number crunching :
MT Nbody 1.62 workunit locked up
(Message 65017)
Posted 11 Aug 2016 by Captiosus Post: Kewl, I'll be looking forward to the response, if any. |
22)
Message boards :
Number crunching :
MT Nbody 1.62 workunit locked up
(Message 65001)
Posted 8 Aug 2016 by Captiosus Post: Would like to add (since it wont let me edit): I've also been getting a number of errors in nbody, both for MT and ST tasks, all for v1.62. 2 MT units (the worse of the two highlighted above), and 5 (and counting) ST Nbody units have all failed with a computation error. Checking the log shows that all of the failed units have experienced an "Exceeded disk limit xx.xx MB > 50MB". I have 4GB of disk space set aside for BOINC units, so there's no way that can be it. |
23)
Message boards :
Number crunching :
MT Nbody 1.62 workunit locked up
(Message 64998)
Posted 7 Aug 2016 by Captiosus Post: So, I thought i'd cook a few units last night, and left my CPU and GPUs to finish running the batch I pulled from the servers. For the most part, everything worked out fine (some of the CPU MT units went by really fast), and I left it running overnight. I wake up this morning and notice that i've still got CPU activity. I check BOINC and I see one of the SMT units has frozen at 91.019%, and has been in this state for the past 11 hours or so. There was also a fairly considerable memory leak in progress, and when I aborted the unit I ended up freeing up about 3GB of memory even though taskmanager, resource monitor, and process explorer all showed it was using about 800mb. The WU in question is de_nbody_8_1_16_v162_2k_3_1470395169_81723_4 |
24)
Message boards :
News :
Nbody Release 1.54
(Message 64178)
Posted 15 Dec 2015 by Captiosus Post: Please read in the other topic for announcement of n-body 1.52 what I wrote (one month ago and now again), nobody ever answered me but they are not working on my Mac : running on 1 core out of 8, blocking the other boinc apps to run (pretending it is "mt"), and in 1.52 they would not even complete normally (AND they started running again now when my setup does not allow nbody)... It because, as Sidd said, the initialization period cannot be made multi-threaded. It screws up the math needed to set the initial values of each body in the model, and when it gets turned in along with other workunits from the same batch, the results poorly correlate due to the bad math. My suggestion in response to that was to use the idle initialization period to prime a batch of workunits for compute, then once enough are primed and ready to go, process them one by one. Alternatively, interleave it by setting 87.5% of the available threads for MT (on my CPU, it'd be 14 threads of 16 total, on yours it'd be 7 of 8 total), and use the remaining thread(s) to initialize the next unit, so when an MT workunit is done, another one can immediately start processing. |
25)
Message boards :
News :
New Release- Nbody version 1.52
(Message 63948)
Posted 22 Sep 2015 by Captiosus Post: I've got one that's seems to be causing problems. This one has been running for 78 hours and it's been at 100% since about 9 hours and it never did save a checkpoint. The original estimate was for about 1.5 hours. BOINC indicates it is using 8 cpus, but the activity monitor shows it is only using 1.3% cpu with 2 threads. I've seen some with the previous version of N-body work like this, so I'll probably watch it for a little while longer to see if it finishes. I think the long period of having low CPU Utilization (1-2 cores at most) is the initialization period, the setting up of the work so that computation can actually proceed. The problem with it is that it is one of those serial tasks that cant easily be split up into multiple threads for processing. I get the same thing as well; a long batch of single threaded work (3-5 min) running on a single core, and then a short burst using all of the set cores to do the actual compute. What I was thinking about suggesting was the splitting of the initialization period and compute period into 2 distinct tasks. Uninitialized work is sent out in batches, and they get prepped for computing in groups (with my Xeon it'd be 15 tasks at once getting initialized). Once initialized, the workunit is passed through an SHA hash function, which is then sent to the MW@H servers for comparison. If the results from (n) number of clients match, the workunits on those computers are flagged as ready for compute and will be processed at the next task switch. Once the MT tasks are complete, they are sent in for standard end of work processing and credit is awarded. Alternatively, work that is initialized is sent to the MW@H servers (not just the hash) for comparison to ensure they are initialized properly. A small block of credit is awarded if they are, and then the initialized work is sent back out to begin the actual compute process like any other work unit. Once complete, its sent back in and normal end of work processing is done. A third alternative would be to have the MW@H program take uninitialized workunits, initialize them in batches, checkpoint them at the end of the initialization period, then once a number are ready, switch to MT mode and rip through them before sending the work in. Now, I am aware this would increase overhead for the project by a not insignificant amount, but in the end the whole idea would be to minimize idle time on the clients (the major issue at the moment) so more work can be done. Any ideas to improve it would be nice. |
26)
Message boards :
News :
New Release- Nbody version 1.52
(Message 63938)
Posted 18 Sep 2015 by Captiosus Post: Hey, Oh goodie. Any ideas on doing that that seem viable? |
27)
Message boards :
News :
New Release- Nbody version 1.52
(Message 63934)
Posted 17 Sep 2015 by Captiosus Post: It works! Awesome! I just re-enabled it and my CPU is chewing through MT tasks like they're candy. I would like to ask though: Is there anything that can be done to further optimize CPU useage so there arent large periods of low (single thread) CPU utilization? As it stands right now, on my CPU with the MT tasks, theres about a minute of single thread activity (which as I understand it is the initialization period that cannot be multi-threaded), and once the initialization period is complete theres a quick (30sec) burst where the task uses all of the designated threads and completes itself (in my case, 15). Is there any way that this could be altered without breaking NBody again? |
28)
Message boards :
News :
Nbody Status Update
(Message 63863)
Posted 10 Aug 2015 by Captiosus Post: Groovy. Looking forward to it. |
29)
Message boards :
News :
Nbody Status Update
(Message 63861)
Posted 9 Aug 2015 by Captiosus Post: Hey Sidd, can we get a possible ETA on the new N-body being ready? |
30)
Message boards :
News :
New Nbody Version 1.50
(Message 63582)
Posted 15 May 2015 by Captiosus Post: Hmm, seems the problem with milkyway nbody still remains even though its been updated. M0CZY says it works in linux. That begs the question: if Nbody wont run right in windows, why not do it in a linux VM like some other projects do? And for me both single thread and MT Nbody exhibit the stalling bug. |
31)
Message boards :
News :
New Nbody version 1.48
(Message 63166)
Posted 20 Feb 2015 by Captiosus Post: Interesting thoughts. I wonder if Sidd can enlighten us where the parallelisation 'sweet spot' is, or if they'd like us to try and find it. Considering the MT phase only, I'd imagine that there comes a point where the overhead of managing and synchronising multiple threads exceeds the benefit - but I wouldn't know whether the tipping point is above or below 15 threads. I think with the new version the parallelization of the work tops off at about 10 active threads. Once my CPU gets going, it only uses about 2/3ds of the available core count. As for your tests, that is precisely what I was thinking of. Heres what I cooked up for an earlier post but cut it out: What I would like to know is if the initialization and actual computation of a run can be split for more effective use of available resources. Instead of doing it like this: A variation of this would have it going from Batch to stream mode as workflow gets moving, and instead of one monolithic MT unit, it does 2 or 3 at once depending on thread allocation count. On my rig for example, 2 blocks of 7 threads would run MT, with the 15th thread initializing units. Batch mode would be suitable for systems that run under the effective thread count (11), while batch to stream mode would be better suited for machines with high core counts. Machines with extremely high thread counts (>=24 threads) would dedicate more than one thread for maintaining the work queue (1 init thread per every work block). So a crazy person running an i4P loaded with 16c/32t Xeons (120T) could end up with 14 8-thread work blocks, with the remaining 8 threads feeding the beast so to speak. Tuning that for optimal workflow would take some time though. I wonder if BOINC allows running apps to have their own daughter processes.[/quote] |
32)
Message boards :
News :
New Nbody version 1.48
(Message 63164)
Posted 19 Feb 2015 by Captiosus Post: Therefore, for now, we removed all multithreading of the initialization of the dwarf galaxy. This is why it will run only on one thread until the initialization completes. However, after performing a speed profile on the code, it was determined that a majority of the run time of the previous code was spent on a single function (thanks to Roland Judd for catching that!). We optimized this function, leading to a great decrease in the run time. I think the initialization period varies between workunits. IMO the big problem right now is that the initialization period is blocking the other cores/threads from doing meaningful work. I have an 8c/16t CPU that i use in my main rig (E5-2690, 2.9ghz stock, 3.3ghz all core turbo, very powerful chip) that i have BOINC set to use 15 of those 16 threads explicitly for CPU tasks, with the remaining thread set aside to power the GPUs. What happens when BOINC has Nbody 1.48 running is that it allocates the 15 threads like it should, but then runs in single threaded mode for about 10-30 minutes depending on the unit. IMO the way Milkyway runs needs further tweaking. I had a couple of ideas for helping to prevent excessive idle time, but getting them implemented would require alot of extra work on the part of the devs. If anyone's interested I'll post those ideas, but they may seem a bit outlandish since I know little about programming. |
33)
Message boards :
News :
New Nbody version 1.48
(Message 63155)
Posted 18 Feb 2015 by Captiosus Post: Andrew-PC and cowboy2199 check the version of the CPU clients you are running. If it reads version 1.46, thats the bugged version and those units will never finish. Abandon those units then force an update for milkyway@home to get v1.48. Heads up though, v1.48 has a lengthy initialization period in which it runs single threaded but ties up all available cores. |
34)
Message boards :
News :
New Nbody version 1.48
(Message 63149)
Posted 17 Feb 2015 by Captiosus Post: Hey, So, as I understand it, what was happening was on startup, 1.46 was multi-threading some of the starting parameters for the dwarf galaxy's constituent stars. However, the way the code was working meant that the starting parameters would differ between machines even with the same settings due to the multi-threading part being random. This in turn would result in poor overall results even though the run itself completes. Right/wrong? And is there anything you can do to help lower the amount of time resources are spent idle? On my machine the initialization stage takes a good 5-10 minutes, during which there is only a single thread in use with the other 14 allocated threads sitting there twiddling their thumbs and unable to do anything else. When it does go to multi-threaded mode, it only uses about 10 threads, leaving 5 free to sit there doing nothing. Thats about a third of my CPU sitting there doing nothing. |
35)
Message boards :
News :
New Nbody version 1.48
(Message 63139)
Posted 14 Feb 2015 by Captiosus Post: Oh, I was wondering why it was only using one thread out of the 15 of 16 I have allocated for CPU tasks. Cycling BOINC and setting it to use all cores and threads had no effect in getting the app to use everything. Dont know if the "stalling to 100%" is in effect like what was going on with 1.46, but there's a whole new can of worms opened with this one. App is obviously v1.48, workunit that it's chewing through is ps_nbody_2_13_orphan_sim_1_1422013803_462792_0, BOINC client is 7.4.36 x64. If there's any more information you need, let me know and when I wake up you'll get it to the best of my abilities. Quick edit: About 8 minutes in running single threaded the program finally realized that there's more than one thread available, and started running on another 10 threads. Still not fully utilizing what's been made available, but its better than nothing. I'll let it chew through units throughout the night and see how it goes. |
©2024 Astroinformatics Group