Welcome to MilkyWay@home

Posts by KeithBriggs

1) Message boards : Number crunching : How much CPU cache milkyway@home cpu tasks occupy? (Message 77206)
Posted 14 Aug 2024 by Profile KeithBriggs
Post:
I can't see from HWiNFO 64 (as of yet) how much L1 and L2 cache's are being utilized. Generally tasks are high cpu and low memory so I'm guessing that includes the cache memory. The most important change to make is to run one task per core and not MT.
2) Message boards : Number crunching : Tasks stuck after a few minutes and run indefinitely (Message 77205)
Posted 14 Aug 2024 by Profile KeithBriggs
Post:
I'm a novice at all the details but regarding the slot question, I believe it has to do with the directory slot on the drive and nothing to do with which core its attached to. I only have 32 tasks running at any one time.

I discovered that the frozen jobs have to do with a weakness in the suspend and restart feature. About 1% of restarts don't actually restart. I was starting 32 tasks then deleting the tasks that give unfair credit and grabbing 32 more enough to get thru the night. The credit weakness revealed the suspend weakness.
3) Message boards : Number crunching : Tasks stuck after a few minutes and run indefinitely (Message 77173)
Posted 9 Jul 2024 by Profile KeithBriggs
Post:
And another one:


Application
Milkyway@home N-Body Simulation with Orbit Fitting 1.87 (mt)
Name
de_nbody_07_01_2024_v186_pal5__data__13_1718187602_1773297
State
Running
Received
7/7/2024 8:29:03 PM
Report deadline
7/19/2024 8:29:01 PM
Estimated computation size
59,585 GFLOPs
CPU time
00:03:58
CPU time since checkpoint
00:00:00
Elapsed time
00:52:45
Estimated time remaining
00:56:32
Fraction done
2.777%
Virtual memory size
15.51 MB
Working set size
19.82 MB
Directory
slots/71
Process ID
38312
Progress rate
3.240% per hour
Executable
milkyway_nbody_orbit_fitting_1.87_windows_x86_64__mt.exe
4) Message boards : Number crunching : Tasks stuck after a few minutes and run indefinitely (Message 77172)
Posted 9 Jul 2024 by Profile KeithBriggs
Post:
I've had about 20 and here's an example that follows with two timestamps. I've suspended and restarted it without any affect.


Application
Milkyway@home N-Body Simulation with Orbit Fitting 1.87 (mt)
Name
de_nbody_07_01_2024_v186_pal5__data__13_1718187602_1773294
State
Running
Received
7/7/2024 8:29:03 PM
Report deadline
7/19/2024 8:29:01 PM
Estimated computation size
61,319 GFLOPs
CPU time
00:04:01
CPU time since checkpoint
00:00:00
Elapsed time
00:08:51
Estimated time remaining
00:58:04
Fraction done
2.980%
Virtual memory size
15.50 MB
Working set size
19.73 MB
Directory
slots/78
Process ID
33716
Progress rate
22.680% per hour
Executable
milkyway_nbody_orbit_fitting_1.87_windows_x86_64__mt.exe

10 minutes later:


Application
Milkyway@home N-Body Simulation with Orbit Fitting 1.87 (mt)
Name
de_nbody_07_01_2024_v186_pal5__data__13_1718187602_1773294
State
Running
Received
7/7/2024 8:29:03 PM
Report deadline
7/19/2024 8:29:01 PM
Estimated computation size
61,319 GFLOPs
CPU time
00:04:01
CPU time since checkpoint
00:00:00
Elapsed time
00:18:49
Estimated time remaining
00:58:04
Fraction done
2.980%
Virtual memory size
15.50 MB
Working set size
19.73 MB
Directory
slots/78
Process ID
33716
Progress rate
10.080% per hour
Executable
milkyway_nbody_orbit_fitting_1.87_windows_x86_64__mt.exe
5) Message boards : Number crunching : Tasks slow to start (Message 77171)
Posted 3 Jul 2024 by Profile KeithBriggs
Post:
I did switch to from 2 to 1 core per work unit. Thanks again.
6) Message boards : Number crunching : Run Times Weird????? (Message 77164)
Posted 27 Jun 2024 by Profile KeithBriggs
Post:
I run one task per core instead of multicore. My run times are about 9% greater than my cpu times. I wasn't exactly clear on your question though.
7) Message boards : Number crunching : Tasks slow to start (Message 77148)
Posted 29 May 2024 by Profile KeithBriggs
Post:
Great catch btw.

I switched from 16 cores to 2 cores per WU. Odd that the delay went from ~30 sec to ~60 sec so the other cores were doing something "during idle" when 16 were process each WU.

With 32 cores, I have 16 running now.

Maybe I'll try one core per WU later but it seems to be much more efficient per task manager.
8) Message boards : News : New Separation Runs (Message 60402)
Posted 16 Nov 2013 by Profile KeithBriggs
Post:
My HD7870 cards are not running nearly as well as a month ago. I see % utilization on the cards consistently dropping below 100%. Something in the main work algorithm is quite off. They also do not allow the cards to run in overdrive like Seti does.
9) Message boards : News : New MilkyWay Separation Modified Fit Runs (Message 60158)
Posted 17 Oct 2013 by Profile KeithBriggs
Post:
When you run 4 simultaneous tasks per GPU it kind of makes more sense based on credit and time to complete:

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=517443&offset=0&show_names=0&state=4&appid=

but when you run just 1 task per gpu, I'm seeing the same discrepancy:

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=518108&offset=0&show_names=0&state=4&appid=

Different machines but both have a pair of Sapphire HD 7870's
10) Message boards : News : Users Auto-Aborting Work Units (Message 60060)
Posted 30 Sep 2013 by Profile KeithBriggs
Post:
The key is to not count "Abort by User" or "Error While Computing" zero second runs toward the error count.
11) Message boards : News : Users Auto-Aborting Work Units (Message 60040)
Posted 28 Sep 2013 by Profile KeithBriggs
Post:
Probably the CAL driver message. Just disregard. On the main page is Statistics and under that is the GPU list. http://milkyway.cs.rpi.edu/milkyway/gpu_list.php.
12) Message boards : News : Users Auto-Aborting Work Units (Message 60035)
Posted 27 Sep 2013 by Profile KeithBriggs
Post:
Hey AMueller91,
glad you figured it out. I have not seen any benefit beyond 4 tasks per gpu. Key is no down time and the chances that 4 tasks finish at the same time is minimal.

If they are running in tandem, just pause one then start it back up. All you need for cpus is set logical cores = physical cores plus 1.
13) Message boards : News : Users Auto-Aborting Work Units (Message 60032)
Posted 27 Sep 2013 by Profile KeithBriggs
Post:
The boinc manager wont delete wu's you've already received. Watch the newly downloaded ones and see if it is working correctly.

If your computer is listed as "school" or "home" you'll have to change the acceptable apps for each class or computers you have.

14) Message boards : News : Users Auto-Aborting Work Units (Message 60024)
Posted 27 Sep 2013 by Profile KeithBriggs
Post:
First go to my account, then under MilkywayPreferences you'll see

Use CPU Enforced by version 6.10+ yes
Use ATI GPU Enforced by version 6.10+ yes
Use NVIDIA GPU Enforced by version 6.10+ yes

A few more lines down you'll see:

Run only the selected applications
MilkyWay@Home: yes
MilkyWay@Home N-Body Simulation: no
Milkyway@Home Separation: no
Milkyway@Home Separation (Modified Fit): no
15) Message boards : News : Users Auto-Aborting Work Units (Message 60013)
Posted 27 Sep 2013 by Profile KeithBriggs
Post:
Yes, that's right. Good point.

Here's my app_config

<app_config>
<app>
<name>milkyway_nbody</name>
<max_concurrent>0</max_concurrent>
<gpu_versions>
<gpu_usage>.1</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
<app>
<name>milkyway</name>
<max_concurrent>8</max_concurrent>
<gpu_versions>
<gpu_usage>.25</gpu_usage>
<cpu_usage>.11</cpu_usage>
</gpu_versions>
</app>
<app>
<name>milkyway_separation__modified_fit</name>
<max_concurrent>8</max_concurrent>
<gpu_versions>
<gpu_usage>.25</gpu_usage>
<cpu_usage>.12</cpu_usage>
</gpu_versions>
</app>
</app_config>

and here's my cc_config

<cc_config>
<log_flags>
</log_flags>
<options>
<ncpus>4</ncpus>
<max_file_xfers>30</max_file_xfers>
<max_file_xfers_per_project>30</max_file_xfers_per_project>
<http_transfer_timeout>30</http_transfer_timeout>
<rec_half_life_days>10</rec_half_life_days>
<report_results_immediately>0</report_results_immediately>
</options>
</cc_config>

so one gpu is running 4 WU at a time. Then no down time. Particular machine has two cpu cores but I have 4 virtual cores. Again, no cpu down time. 4 cpu wu and 4 gpu wu which is about 10% more work than letting them cycle down. Also have constant fan speeds and more stable temperatures.

Holler if any wants my 2 gpu xml files.

16) Message boards : News : Users Auto-Aborting Work Units (Message 60009)
Posted 26 Sep 2013 by Profile KeithBriggs
Post:
I also use app_config but its easiest to just do it in the preferences.
17) Message boards : News : Users Auto-Aborting Work Units (Message 60006)
Posted 26 Sep 2013 by Profile KeithBriggs
Post:
Here's some major aborters:

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=104692 2900 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=529892 8800 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=322721 4300 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=520641 15000 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=529525 3400 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=366486 2800 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=485608 5000 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=484725 1600 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=452569 3700 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=532562 3400 aborts
18) Message boards : News : Users Auto-Aborting Work Units (Message 60005)
Posted 26 Sep 2013 by Profile KeithBriggs
Post:
Maybe new users should have to opt into beta projects.

http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=437270 has about 5373 aborted WU's and counting.
19) Message boards : News : Separation Modified Fit v1.28 Release (Message 60000)
Posted 26 Sep 2013 by Profile KeithBriggs
Post:
I agree with Conan. When there's a known fault with an app depending on execution source, the least you can do is adjust the max errors. I think I've mentioned it before but aborting by a user should not count toward errors either.

20) Message boards : News : Separation Modified Fit v1.28 Release (Message 59941)
Posted 20 Sep 2013 by Profile KeithBriggs
Post:
Invalids are down considerably. Thanks. I have one cpu 1.26 still showing in history. 8150 secs and about 100 secs more for the new 1.28 (cpu). Intel.


Next 20

©2024 Astroinformatics Group