Welcome to MilkyWay@home

Posts by Captiosus

1) Message boards : News : Separation Project Coming To An End (Message 75814)
Posted 21 Jun 2023 by Captiosus
Post:
Checking out the appconfig posted early here and see how well it works :)


Those are relevant to a specific computer, the one posted (I copy below) makes no sense to me. Looks like it's for a 6 core machine, and for some reason splitting into three twos. Why not just run one six? I also make the avg lower than the nthreads, so it runs more than it should, because there's always one in startup phase using only one core, which allows the others to take over the unused cores.


No it's for any of my multi-core pc's that are also running tasks from other projects and I'm limiting it to 6 cpu cores for Milky Way to use. I prefer the 3 double cpu tasks because they take longer for me to crunch and they also don't get hung up like the 16 cpu core tasks do. I never run both cpu and gpu tasks from the same project on my pc's, and no I'm not running multiple Boinc setups on my pc's, so this is an easy way for me to limit the tasks in this case for Milky Way and also run tasks from other projects at the same time. Just for info I was also running the Yoyo ecmP2 tasks at the same time and had the last line set to 1 to 4 tasks depending on how much ram was in each pc, I deleted all the other lines except the final line when I put it in the Yoyo folder.

<app_config>

<app_version>
<app_name>milkyway_nbody</app_name>

<plan_class>mt</plan_class>
<avg_ncpus>2</avg_ncpus>
<cmdline>--nthreads 2</cmdline>
</app_version>
<project_max_concurrent>3</project_max_concurrent>

</app_config>

I do wonder which would be more efficient: a bunch of small (2T) Nbody tasks, or a smaller number of larger (4T) tasks, or a single very large (8T) task. Right now I've got mine set up as 2 avg cpus, 2 threads per task for the command line, and 4 max concurrent tasks for 4 tasks, 8 threads taken up by MW@H. Guess I'll have to experiment and see how my RAC changes.
2) Message boards : News : Separation Project Coming To An End (Message 75612)
Posted 16 Jun 2023 by Captiosus
Post:
RIP Separation. I'll switch over to Nbody CPU using the app-config posted earlier in the thread on my dedicated cruncher, hopefully with it being leashed so it will behave itself and not try to murder other tasks or programs.
3) Message boards : News : Server Issues (Message 75241)
Posted 29 Mar 2023 by Captiosus
Post:
Was wondering why my MW@H RAC was dropping. Someone go give the server a kick.
4) Message boards : Number crunching : Abort or no? (Message 74679)
Posted 17 Nov 2022 by Captiosus
Post:
Nbody I've found is rather temperamental and rather viciously territorial. Anecdotal, but every time I've run Nbody, tasks from other projects would start to fail and other processes on the computer (favorite targets include any sort of monitoring program although BOINC itself has been victim at least once) would mysteriously terminate with no error message or warning. It doesnt happen immediately, but if left to its devices something will break.

IMO, if you want to just set and forget and not worry about any issues I would stick with Separation. Use Nbody (or both; cap Nbody to a lower number of threads and let Separation run on the rest) if you're checking up on the machine regularly, so when it does decide to cause issues it's corrected in a more timely manner.
5) Message boards : Number crunching : Daily graphs of server_status (Message 74627)
Posted 2 Nov 2022 by Captiosus
Post:
IT LIIIIIIIIIIIIIIIIIIIIIIIVES! If it breaks again I had nothing to do with it.
6) Message boards : Number crunching : Daily graphs of server_status (Message 74601)
Posted 30 Oct 2022 by Captiosus
Post:
Still working its way down, but at least it's going in the correct direction.


*snip*

You jinxed it!

Shoulda kept my damn mouth shut.
7) Message boards : Number crunching : Daily graphs of server_status (Message 74595)
Posted 30 Oct 2022 by Captiosus
Post:
Still working its way down, but at least it's going in the correct direction.
8) Message boards : Number crunching : Daily graphs of server_status (Message 74584)
Posted 28 Oct 2022 by Captiosus
Post:
Very nice drop in the validation queue. Seeing lots of tasks getting validated and cleared out and the resulting boost in RAC it brings. That tiny little upswing though can eff right off. Knowing our collective luck its going to start climbing again.
9) Message boards : Number crunching : Validation Pending too many tasks (Message 74579)
Posted 27 Oct 2022 by Captiosus
Post:
Yep, definitely noticed that something has become unstuck and WUs are getting processed. According to the BOINC statistics viewer in the client the last 2-3 days worth of RAC have been trending upwards.
10) Message boards : Number crunching : Daily graphs of server_status (Message 74545)
Posted 23 Oct 2022 by Captiosus
Post:
Seems there's some improvement every time there's a server reset, but then things start overloading again and the pending validation tasks start climbing again.

So, experimental idea would be to do a server reset every couple of days or so and see if there's any progress made. If it has positive results it could make for a workable brute-force solution to the pending issue until the new hardware is in play. Is it a good idea? Probably not.
11) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 74478)
Posted 17 Oct 2022 by Captiosus
Post:
Not sure how useful this tidbit of information is to the thread, but I shall post it anyways. I run CPU only on a fairly slow Xeon and it shares runtime with 4 other tasks split evenly across the threads. I went and did some averaging of processed runtimes (~133 tasks averaged), both Total and CPU, and it comes out to an average Total runtime of 10,856.8 seconds, and an average CPU runtime of 10810.2 seconds, rounded to the closest tenth. All tasks are of the Separation 1.46 application, no N-Body.

The CPU in question is a E5-2650L V4 (14C/28T), and with all threads loaded it runs into the power limit (65W) and thus pulls the clocks back to about 1.7ghz to maintain that limit. The CPU is sharing runtime with Yoyo, Einstein, and Primegrid evenly. No GPU for heat reasons.
12) Message boards : Number crunching : Splitting MT N-body? (Message 66825)
Posted 1 Dec 2017 by Captiosus
Post:
Ha yeah, I probably should have mentioned that your initial copy over from Cosmology@home completely mangled the code.
13) Message boards : Number crunching : Splitting MT N-body? (Message 66821)
Posted 30 Nov 2017 by Captiosus
Post:
Cheers! Took a little bit to get it right, but i got it and now hopefully it will help in improving CPU utilization a bit.
14) Message boards : Number crunching : Splitting MT N-body? (Message 66814)
Posted 26 Nov 2017 by Captiosus
Post:
I was wondering, is there a way to split MT N-Body units so that instead of say, 1 unit taking up all 12 threads, i can have 2 units using 6 threads each?
15) Message boards : Number crunching : m2050 (Message 66208)
Posted 20 Feb 2017 by Captiosus
Post:
will these work ? open cl 1.1 but are DP
http://www.ebay.com/itm/NVidia-Tesla-M2050-GPU-3GB-DDR5-Graphics-ProcessingCard-PCI-E-X16-1YR-Warranty-/152300708200?_trksid=p2385738.m2548.l4275

If you can cool it, yeah it will work. Before I accidentally broke mine I was using my own M2050 for milkyway (amongst other projects), and am currently using a GTX 460 (same core architecture; Fermi) for milkyway and it does well enough.

A word of warning though, it IS very first generation Fermi and thus will run hot and power hungry. Make sure you have a strong enough PSU and ample airflow to keep it fed and cooled.

Finally, i recommend you set up your client to run multiple workunits on it, it is a powerful card after all.
16) Message boards : Number crunching : Just had a BOINC unexpectedly quit when starting an Nbody unit (Message 66151)
Posted 1 Feb 2017 by Captiosus
Post:
Has anyone been having issues with BOINC crashing while computing Milkyway@home tasks? I just had BOINC quit while it was in the process of starting a workunit, and it gave no warning, no error nothing.

The last line of the log says it was starting an Nbody Tighter Constraints MT unit (de_nbody_11_7_16_v162_20k_tighterconstraints_1_1484858102_344533_1).
The only other behavior I can add is that it appears BOINC had just paused the CPU modfit programs (all 8 of them) to start the MT unit when it happened.

I'm not sure if its nbody being derp again, or I just got a bad unit, or what is going on. This rig had successfully gone through easily a dozen or two Nbody units before the crash.

This machine is overclocked, but I dont think that has anything to do with it as this machine has successfully passed running 12hr prime95 large inplace FFTs, and has processed a number of both Collatz and Milkyway@home units, with all units being considered valid by their respective projects. The Collatz CPU units in particular took the machine about 5 days to chew through the 8 units (one per thread) it was given, and those 8 were all considered valid.

I dunno, I'll cook the remaining Milkyway units I have left then try other projects, see if anything can get BOINC to quit again.
17) Message boards : News : Scheduled Maintenance Concluded (Message 65731)
Posted 13 Nov 2016 by Captiosus
Post:
My GPU units are not executing on the GPUs, they're running on the CPU and sucking up CPU time. My machine is running win7, and i typically run 2 units per card.

Oh well, looks like I'm waiting for the next fix. Off to SETI!
18) Message boards : Number crunching : Massive server issues and Wu validation delays (Message 65549)
Posted 28 Oct 2016 by Captiosus
Post:

I have so many tasks pending the server has trouble loading the "tasks" page. I agree with the aforementioned solution of increasing WU size. The sheer number of these tiny WUs is probably the majority of the server side problems. Since there is such a discrepancy between a high DP card and a low DP card, doubling the runtime on a tahiti with its 1/4 DP would increase a 1/32nd DP card to 16x? A 750ti looks to run 1 Wu in about 100 sec, and a 1080 in about 25 sec. If the tahiti time doubled I think that would mean 1600 seconds for a 750ti and 400 seconds for a 1080.


If you double the length of the WU, the runtime doubles for all cards.

ie right now all cards are "running a 100 yard dash". If you made it a 200 yard dash, all the cards times would go up by a factor of 2.


Well, I'm wondering if it would be possible to hand out longer and shorter workunits to different video cards according to their processing capability (in gflops) and manufacturer, so that cards that fall within certain performance metrics get adequately sized workunits.

I mean, I have a mismatched pair of video cards in my main rig that struggle to get above 150 Gflops double precision, and they each chew through a pair of units in about 3-4 minutes tops. Why the hell am I getting the same sized units as those getting sent out to machines equipped with R9 280x or other TFLOP-class DP cards, cards that as I understand it have to run 8 or more concurrent workunits to maximize utilization due to how fast they burn through them?
19) Message boards : Number crunching : MT Nbody 1.62 workunit locked up (Message 65070)
Posted 24 Aug 2016 by Captiosus
Post:
Well, unit hasnt locked up, but I now have an Nbody WU that is suffering a memory leak as it progresses. Idle for this rig is about 1GB. In the past hour and a half, this unit has pushed the computer from 1GB memory useage to 1.9GB, and the climb started in earnest about 10 minutes in.
Memory increase rate looks to be about ~100mb every 10 minutes or so, and is linear in growth after the first 10 minutes.

WU that is leaking is de_nbody_8_1_16_v162_2k_3_1471352127_585064_0, although shorter workunits also show the same memory behavior until they end (very slow growth for the first 10 minutes, then 100MB every following 10 minutes).
20) Message boards : Number crunching : MT Nbody 1.62 workunit locked up (Message 65069)
Posted 24 Aug 2016 by Captiosus
Post:
Adding in I just had a Nbody unit lock up on my secondary rig. Made it to 1.516% complete then froze.

WU is de_nbody_8_1_16_v162_2k_3_147135217_585063_0

Not entirely sure if this particular one froze because of the secondary rig being under the effects of a fairly significant overclock (+1.1ghz on a C2Q), so I will back it down to 3.6ghz (which i know is stable on this rig) just to be sure.


Next 20

©2024 Astroinformatics Group