Welcome to MilkyWay@home

Posts by KeithBriggs

1) Message boards : Number crunching : Tasks stuck after a few minutes and run indefinitely (Message 77173)
Posted 17 days ago by Profile KeithBriggs
Post:
And another one:


Application
Milkyway@home N-Body Simulation with Orbit Fitting 1.87 (mt)
Name
de_nbody_07_01_2024_v186_pal5__data__13_1718187602_1773297
State
Running
Received
7/7/2024 8:29:03 PM
Report deadline
7/19/2024 8:29:01 PM
Estimated computation size
59,585 GFLOPs
CPU time
00:03:58
CPU time since checkpoint
00:00:00
Elapsed time
00:52:45
Estimated time remaining
00:56:32
Fraction done
2.777%
Virtual memory size
15.51 MB
Working set size
19.82 MB
Directory
slots/71
Process ID
38312
Progress rate
3.240% per hour
Executable
milkyway_nbody_orbit_fitting_1.87_windows_x86_64__mt.exe
2) Message boards : Number crunching : Tasks stuck after a few minutes and run indefinitely (Message 77172)
Posted 17 days ago by Profile KeithBriggs
Post:
I've had about 20 and here's an example that follows with two timestamps. I've suspended and restarted it without any affect.


Application
Milkyway@home N-Body Simulation with Orbit Fitting 1.87 (mt)
Name
de_nbody_07_01_2024_v186_pal5__data__13_1718187602_1773294
State
Running
Received
7/7/2024 8:29:03 PM
Report deadline
7/19/2024 8:29:01 PM
Estimated computation size
61,319 GFLOPs
CPU time
00:04:01
CPU time since checkpoint
00:00:00
Elapsed time
00:08:51
Estimated time remaining
00:58:04
Fraction done
2.980%
Virtual memory size
15.50 MB
Working set size
19.73 MB
Directory
slots/78
Process ID
33716
Progress rate
22.680% per hour
Executable
milkyway_nbody_orbit_fitting_1.87_windows_x86_64__mt.exe

10 minutes later:


Application
Milkyway@home N-Body Simulation with Orbit Fitting 1.87 (mt)
Name
de_nbody_07_01_2024_v186_pal5__data__13_1718187602_1773294
State
Running
Received
7/7/2024 8:29:03 PM
Report deadline
7/19/2024 8:29:01 PM
Estimated computation size
61,319 GFLOPs
CPU time
00:04:01
CPU time since checkpoint
00:00:00
Elapsed time
00:18:49
Estimated time remaining
00:58:04
Fraction done
2.980%
Virtual memory size
15.50 MB
Working set size
19.73 MB
Directory
slots/78
Process ID
33716
Progress rate
10.080% per hour
Executable
milkyway_nbody_orbit_fitting_1.87_windows_x86_64__mt.exe
3) Message boards : Number crunching : Tasks slow to start (Message 77171)
Posted 23 days ago by Profile KeithBriggs
Post:
I did switch to from 2 to 1 core per work unit. Thanks again.
4) Message boards : Number crunching : Run Times Weird????? (Message 77164)
Posted 29 days ago by Profile KeithBriggs
Post:
I run one task per core instead of multicore. My run times are about 9% greater than my cpu times. I wasn't exactly clear on your question though.
5) Message boards : Number crunching : Tasks slow to start (Message 77148)
Posted 29 May 2024 by Profile KeithBriggs
Post:
Great catch btw.

I switched from 16 cores to 2 cores per WU. Odd that the delay went from ~30 sec to ~60 sec so the other cores were doing something "during idle" when 16 were process each WU.

With 32 cores, I have 16 running now.

Maybe I'll try one core per WU later but it seems to be much more efficient per task manager.
6) Message boards : News : New Separation Runs (Message 60402)
Posted 16 Nov 2013 by Profile KeithBriggs
Post:
My HD7870 cards are not running nearly as well as a month ago. I see % utilization on the cards consistently dropping below 100%. Something in the main work algorithm is quite off. They also do not allow the cards to run in overdrive like Seti does.
7) Message boards : News : New MilkyWay Separation Modified Fit Runs (Message 60158)
Posted 17 Oct 2013 by Profile KeithBriggs
Post:
When you run 4 simultaneous tasks per GPU it kind of makes more sense based on credit and time to complete:

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=517443&offset=0&show_names=0&state=4&appid=

but when you run just 1 task per gpu, I'm seeing the same discrepancy:

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=518108&offset=0&show_names=0&state=4&appid=

Different machines but both have a pair of Sapphire HD 7870's
8) Message boards : News : Users Auto-Aborting Work Units (Message 60060)
Posted 30 Sep 2013 by Profile KeithBriggs
Post:
The key is to not count "Abort by User" or "Error While Computing" zero second runs toward the error count.
9) Message boards : News : Users Auto-Aborting Work Units (Message 60040)
Posted 28 Sep 2013 by Profile KeithBriggs
Post:
Probably the CAL driver message. Just disregard. On the main page is Statistics and under that is the GPU list. http://milkyway.cs.rpi.edu/milkyway/gpu_list.php.
10) Message boards : News : Users Auto-Aborting Work Units (Message 60035)
Posted 27 Sep 2013 by Profile KeithBriggs
Post:
Hey AMueller91,
glad you figured it out. I have not seen any benefit beyond 4 tasks per gpu. Key is no down time and the chances that 4 tasks finish at the same time is minimal.

If they are running in tandem, just pause one then start it back up. All you need for cpus is set logical cores = physical cores plus 1.
11) Message boards : News : Users Auto-Aborting Work Units (Message 60032)
Posted 27 Sep 2013 by Profile KeithBriggs
Post:
The boinc manager wont delete wu's you've already received. Watch the newly downloaded ones and see if it is working correctly.

If your computer is listed as "school" or "home" you'll have to change the acceptable apps for each class or computers you have.

12) Message boards : News : Users Auto-Aborting Work Units (Message 60024)
Posted 27 Sep 2013 by Profile KeithBriggs
Post:
First go to my account, then under MilkywayPreferences you'll see

Use CPU Enforced by version 6.10+ yes
Use ATI GPU Enforced by version 6.10+ yes
Use NVIDIA GPU Enforced by version 6.10+ yes

A few more lines down you'll see:

Run only the selected applications
MilkyWay@Home: yes
MilkyWay@Home N-Body Simulation: no
Milkyway@Home Separation: no
Milkyway@Home Separation (Modified Fit): no
13) Message boards : News : Users Auto-Aborting Work Units (Message 60013)
Posted 27 Sep 2013 by Profile KeithBriggs
Post:
Yes, that's right. Good point.

Here's my app_config

<app_config>
<app>
<name>milkyway_nbody</name>
<max_concurrent>0</max_concurrent>
<gpu_versions>
<gpu_usage>.1</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
<app>
<name>milkyway</name>
<max_concurrent>8</max_concurrent>
<gpu_versions>
<gpu_usage>.25</gpu_usage>
<cpu_usage>.11</cpu_usage>
</gpu_versions>
</app>
<app>
<name>milkyway_separation__modified_fit</name>
<max_concurrent>8</max_concurrent>
<gpu_versions>
<gpu_usage>.25</gpu_usage>
<cpu_usage>.12</cpu_usage>
</gpu_versions>
</app>
</app_config>

and here's my cc_config

<cc_config>
<log_flags>
</log_flags>
<options>
<ncpus>4</ncpus>
<max_file_xfers>30</max_file_xfers>
<max_file_xfers_per_project>30</max_file_xfers_per_project>
<http_transfer_timeout>30</http_transfer_timeout>
<rec_half_life_days>10</rec_half_life_days>
<report_results_immediately>0</report_results_immediately>
</options>
</cc_config>

so one gpu is running 4 WU at a time. Then no down time. Particular machine has two cpu cores but I have 4 virtual cores. Again, no cpu down time. 4 cpu wu and 4 gpu wu which is about 10% more work than letting them cycle down. Also have constant fan speeds and more stable temperatures.

Holler if any wants my 2 gpu xml files.

14) Message boards : News : Users Auto-Aborting Work Units (Message 60009)
Posted 26 Sep 2013 by Profile KeithBriggs
Post:
I also use app_config but its easiest to just do it in the preferences.
15) Message boards : News : Users Auto-Aborting Work Units (Message 60006)
Posted 26 Sep 2013 by Profile KeithBriggs
Post:
Here's some major aborters:

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=104692 2900 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=529892 8800 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=322721 4300 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=520641 15000 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=529525 3400 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=366486 2800 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=485608 5000 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=484725 1600 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=452569 3700 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=532562 3400 aborts
16) Message boards : News : Users Auto-Aborting Work Units (Message 60005)
Posted 26 Sep 2013 by Profile KeithBriggs
Post:
Maybe new users should have to opt into beta projects.

http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=437270 has about 5373 aborted WU's and counting.
17) Message boards : News : Separation Modified Fit v1.28 Release (Message 60000)
Posted 26 Sep 2013 by Profile KeithBriggs
Post:
I agree with Conan. When there's a known fault with an app depending on execution source, the least you can do is adjust the max errors. I think I've mentioned it before but aborting by a user should not count toward errors either.

18) Message boards : News : Separation Modified Fit v1.28 Release (Message 59941)
Posted 20 Sep 2013 by Profile KeithBriggs
Post:
Invalids are down considerably. Thanks. I have one cpu 1.26 still showing in history. 8150 secs and about 100 secs more for the new 1.28 (cpu). Intel.
19) Message boards : News : Separation Modified Fit v1.28 Release (Message 59934)
Posted 19 Sep 2013 by Profile KeithBriggs
Post:
For me, 59 invalid out of 5009 tasks. 1.1% If you increased the max errors, it would help those. 0% from 1.02.
20) Message boards : News : Separation Modified Fit v1.28 Release (Message 59930)
Posted 19 Sep 2013 by Profile KeithBriggs
Post:
I notice more invalids (~1%) but noticed that every modfit for computer 119143 is erring. Makes it easier for a valid task to go invalid.


Next 20

©2024 Astroinformatics Group