Welcome to MilkyWay@home

Users Auto-Aborting Work Units

Message boards : News : Users Auto-Aborting Work Units
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 60004 - Posted: 26 Sep 2013, 19:41:43 UTC

Hello all,

It has come to our attention that some users have been setting their BOINC clients to auto abort work units from specific applications. Doing this sends an error results back to our server which then causes some work units to be unable to validate. Essentially, it prevents some of our hard working crunchers from getting their due credits. The proper way to prevent yourself from getting work units from a specific applications such as our beta applications N-Body or Modified Fit, is to go to your account page on our website (http://milkyway.cs.rpi.edu/milkyway/home.php). Under the Preferences section please select the link for your preferences for this project. There will then be a link to edit these preferences on this page. Halfway down your preferences, there will be some check boxes in the "Run only the selected applications" section. You will only receive work units for the applications you have check marks next to. For reference: Milkyway@home is our flagship application and is considered stable and in its final released state; Milkyway@home N-body Simulation is our beta version N-body simulation and orbit fit program; Milkyway@home Separation is an, as of now, unused application; Milkyway@home Separation (Modified Fit) is our beta version separation code testing new models for both streams and background in the Milky Way Halo. As usual if you have any issues with this method or questions about it please post them here. We appreciate your cooperation and understanding in this.

Thank you,

Jake W.

TL;DR: If you are auto-aborting work units please stop and use the method above to prevent users from losing credits and to prevent problems in our algorithms.
ID: 60004 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KeithBriggs

Send message
Joined: 28 Apr 11
Posts: 29
Credit: 257,238,237
RAC: 52,119
Message 60005 - Posted: 26 Sep 2013, 21:16:01 UTC - in response to Message 60004.  

Maybe new users should have to opt into beta projects.

http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=437270 has about 5373 aborted WU's and counting.
ID: 60005 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KeithBriggs

Send message
Joined: 28 Apr 11
Posts: 29
Credit: 257,238,237
RAC: 52,119
Message 60006 - Posted: 26 Sep 2013, 21:29:26 UTC - in response to Message 60005.  

Here's some major aborters:

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=104692 2900 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=529892 8800 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=322721 4300 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=520641 15000 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=529525 3400 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=366486 2800 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=485608 5000 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=484725 1600 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=452569 3700 aborts
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=532562 3400 aborts
ID: 60006 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Tomahawk4196

Send message
Joined: 17 Aug 13
Posts: 3
Credit: 336,920,753
RAC: 0
Message 60007 - Posted: 26 Sep 2013, 22:47:03 UTC

Folks:

I would like to just do GPU work units for Milkyway@home, and to that end I have been using an app_info.xml to make my FirePro do two workunits at a time.

However, I do often get messages that state

Message from server: Your app_info.xml file doesn't have a usable version of Milkyway@Home Separation (Modified Fit).

I sure hope I'm not causing any problems.

Which check boxes should I clear if I only want to do GPU processing? I didn't even know this 'Preferences' page existed for this project - good news for me.

Thanks
ID: 60007 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 18 Jul 09
Posts: 300
Credit: 303,562,776
RAC: 0
Message 60008 - Posted: 26 Sep 2013, 23:05:19 UTC

Check the MilkyWay@home box and the MilkyWay@home Separation (Modified fit) boxes. Stop using the app_info file and use an app_config file instead. This one works well for me:


<app_config>
<app>
<name>milkyway</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.05</cpu_usage>
</gpu_versions>
</app>
</app_config>
ID: 60008 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KeithBriggs

Send message
Joined: 28 Apr 11
Posts: 29
Credit: 257,238,237
RAC: 52,119
Message 60009 - Posted: 26 Sep 2013, 23:19:04 UTC - in response to Message 60008.  

I also use app_config but its easiest to just do it in the preferences.
ID: 60009 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
swiftmallard
Avatar

Send message
Joined: 18 Jul 09
Posts: 300
Credit: 303,562,776
RAC: 0
Message 60010 - Posted: 26 Sep 2013, 23:53:04 UTC

He wants to crunch two at a time, he'll need the config file for that.
ID: 60010 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KeithBriggs

Send message
Joined: 28 Apr 11
Posts: 29
Credit: 257,238,237
RAC: 52,119
Message 60013 - Posted: 27 Sep 2013, 2:13:40 UTC - in response to Message 60010.  

Yes, that's right. Good point.

Here's my app_config

<app_config>
<app>
<name>milkyway_nbody</name>
<max_concurrent>0</max_concurrent>
<gpu_versions>
<gpu_usage>.1</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
<app>
<name>milkyway</name>
<max_concurrent>8</max_concurrent>
<gpu_versions>
<gpu_usage>.25</gpu_usage>
<cpu_usage>.11</cpu_usage>
</gpu_versions>
</app>
<app>
<name>milkyway_separation__modified_fit</name>
<max_concurrent>8</max_concurrent>
<gpu_versions>
<gpu_usage>.25</gpu_usage>
<cpu_usage>.12</cpu_usage>
</gpu_versions>
</app>
</app_config>

and here's my cc_config

<cc_config>
<log_flags>
</log_flags>
<options>
<ncpus>4</ncpus>
<max_file_xfers>30</max_file_xfers>
<max_file_xfers_per_project>30</max_file_xfers_per_project>
<http_transfer_timeout>30</http_transfer_timeout>
<rec_half_life_days>10</rec_half_life_days>
<report_results_immediately>0</report_results_immediately>
</options>
</cc_config>

so one gpu is running 4 WU at a time. Then no down time. Particular machine has two cpu cores but I have 4 virtual cores. Again, no cpu down time. 4 cpu wu and 4 gpu wu which is about 10% more work than letting them cycle down. Also have constant fan speeds and more stable temperatures.

Holler if any wants my 2 gpu xml files.

ID: 60013 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
greg_be

Send message
Joined: 18 Aug 09
Posts: 122
Credit: 20,716,927
RAC: 1,665
Message 60022 - Posted: 27 Sep 2013, 11:45:17 UTC

Any news as to when 1.38 will be out?
I have opted out of modfit 1.28 because it just crashes on my system.
ID: 60022 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KeithBriggs

Send message
Joined: 28 Apr 11
Posts: 29
Credit: 257,238,237
RAC: 52,119
Message 60024 - Posted: 27 Sep 2013, 13:09:03 UTC - in response to Message 60022.  

First go to my account, then under MilkywayPreferences you'll see

Use CPU Enforced by version 6.10+ yes
Use ATI GPU Enforced by version 6.10+ yes
Use NVIDIA GPU Enforced by version 6.10+ yes

A few more lines down you'll see:

Run only the selected applications
MilkyWay@Home: yes
MilkyWay@Home N-Body Simulation: no
Milkyway@Home Separation: no
Milkyway@Home Separation (Modified Fit): no
ID: 60024 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile AMueller91

Send message
Joined: 6 Jul 12
Posts: 4
Credit: 12,385,544
RAC: 0
Message 60025 - Posted: 27 Sep 2013, 13:29:41 UTC - in response to Message 60013.  

Yes, that's right. Good point.

Here's my app_config

<app_config>
<app>
<name>milkyway_nbody</name>
<max_concurrent>0</max_concurrent>
<gpu_versions>
<gpu_usage>.1</gpu_usage>
<cpu_usage>1</cpu_usage>
</gpu_versions>
</app>
<app>
<name>milkyway</name>
<max_concurrent>8</max_concurrent>
<gpu_versions>
<gpu_usage>.25</gpu_usage>
<cpu_usage>.11</cpu_usage>
</gpu_versions>
</app>
<app>
<name>milkyway_separation__modified_fit</name>
<max_concurrent>8</max_concurrent>
<gpu_versions>
<gpu_usage>.25</gpu_usage>
<cpu_usage>.12</cpu_usage>
</gpu_versions>
</app>
</app_config>

and here's my cc_config

<cc_config>
<log_flags>
</log_flags>
<options>
<ncpus>4</ncpus>
<max_file_xfers>30</max_file_xfers>
<max_file_xfers_per_project>30</max_file_xfers_per_project>
<http_transfer_timeout>30</http_transfer_timeout>
<rec_half_life_days>10</rec_half_life_days>
<report_results_immediately>0</report_results_immediately>
</options>
</cc_config>

so one gpu is running 4 WU at a time. Then no down time. Particular machine has two cpu cores but I have 4 virtual cores. Again, no cpu down time. 4 cpu wu and 4 gpu wu which is about 10% more work than letting them cycle down. Also have constant fan speeds and more stable temperatures.

Holler if any wants my 2 gpu xml files.


Hey, this is exactly what i want, but it dosent work. I got a i7-3930k CPU with 6 Cores and 12 Threads & two GTX Titan.
If i let run MW@Home without any app_configs or something, all 12 Threads and both GPUs are working.
If i add the app_config, only GPUs are working correctly (2 WUs per GPU), but the CPU does nothing... BUT if i drag n drop the app_config.xml file out of my MW@home folder and restart BOINC, all works fine (12 CPU WUs and 2 GPU WUs on each card) for around 10min! After this 10 minutes, the GPUs automatically stops the additional WUs and keeps processing one WU on each card.
How can i make all or even 8 to 10 threads working while using the multiple WU App_config?

Sorry for my bad english, im from Fondue-Switzerland :)
PS: My app_config.xml
<app_config>
   <app>
      <name>milkyway</name>
      <max_concurrent>4</max_concurrent>
      <gpu_versions>
          <gpu_usage>.500</gpu_usage>
          <cpu_usage>0.25</cpu_usage>
      </gpu_versions>
   </app>
</app_config>
ID: 60025 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile AMueller91

Send message
Joined: 6 Jul 12
Posts: 4
Credit: 12,385,544
RAC: 0
Message 60026 - Posted: 27 Sep 2013, 14:10:09 UTC - in response to Message 60025.  
Last modified: 27 Sep 2013, 14:38:11 UTC

ok... after searching for a solution since 5 hours, it works now.

After setting the value <max_concurrent>4</max_concurrent> to <max_concurrent>14</max_concurrent> inside the app_config.xml and <ncpus>4</ncpus> to <ncpus>14</ncpus> inside the cc_config.xml it works fine.
4 GPU WUs (2 WUs/GPU @ 0.25 CPU/GPU-WU) and 10 CPU WUs are active. Hope it will hold longer than 10 minutes :D

Edit: With optimized app_config and cc_config, i can run 23 WUs at the same time. 12 on GPU (6 per card with Double Precision enabled) and 11 WUs on CPU. Each GPU WU take around 2 minutes to complete, CPU WUs run between 1-2 Hours.
<app_config>
   <app>
      <name>milkyway</name>
      <max_concurrent>23</max_concurrent>
      <gpu_versions>
          <gpu_usage>.15</gpu_usage>
          <cpu_usage>0.05</cpu_usage>
      </gpu_versions>
   </app>
</app_config>

<cc_config>
<log_flags>
</log_flags>
<options>
<ncpus>23</ncpus> 
<max_file_xfers>30</max_file_xfers>
<max_file_xfers_per_project>30</max_file_xfers_per_project>
<http_transfer_timeout>30</http_transfer_timeout>
<rec_half_life_days>10</rec_half_life_days>
<report_results_immediately>0</report_results_immediately>
</options>
</cc_config>
ID: 60026 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 60027 - Posted: 27 Sep 2013, 14:51:48 UTC

Hey all,

Thank you for posting examples of good configuration options.

Jake W
ID: 60027 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3319
Credit: 520,302,243
RAC: 20,411
Message 60028 - Posted: 27 Sep 2013, 14:54:51 UTC

I know how to edit my preferences but how do I stop the "de_separation_DR_8_rev_3_1_2" units!! It is NOT labeled that way on my list and EVERY SINGLE ONE is failing!!!

These are my choices:
Run only the selected applications
MilkyWay@Home: yes
MilkyWay@Home N-Body Simulation: yes
Milkyway@Home Separation: yes
Milkyway@Home Separation (Modified Fit): no

It seems to me the project has a problem and we users are being blamed for it, and the project is NOT helping to solve the problem!! Label the choices as to the units you are sending out and I WILL uncheck them!!! Until then deal with the problem, just like I am!!!
ID: 60028 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 60031 - Posted: 27 Sep 2013, 15:39:15 UTC
Last modified: 27 Sep 2013, 15:40:05 UTC

Hey there,

Any runs named _separation_ are coming from Milkyway@Home and runs named _modfit_ are coming from Milkyway@home Separation (Modified Fit). This run may have a slightly more complicated data set so it might actually just take longer to run them. Those are Jeff's runs and I am meeting with him in 10 minutes. I will let him know about your problem and see he thinks is going on.

Sorry,

Jake W
ID: 60031 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KeithBriggs

Send message
Joined: 28 Apr 11
Posts: 29
Credit: 257,238,237
RAC: 52,119
Message 60032 - Posted: 27 Sep 2013, 15:55:13 UTC - in response to Message 60028.  

The boinc manager wont delete wu's you've already received. Watch the newly downloaded ones and see if it is working correctly.

If your computer is listed as "school" or "home" you'll have to change the acceptable apps for each class or computers you have.

ID: 60032 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KeithBriggs

Send message
Joined: 28 Apr 11
Posts: 29
Credit: 257,238,237
RAC: 52,119
Message 60035 - Posted: 27 Sep 2013, 17:43:50 UTC - in response to Message 60026.  

Hey AMueller91,
glad you figured it out. I have not seen any benefit beyond 4 tasks per gpu. Key is no down time and the chances that 4 tasks finish at the same time is minimal.

If they are running in tandem, just pause one then start it back up. All you need for cpus is set logical cores = physical cores plus 1.
ID: 60035 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile AMueller91

Send message
Joined: 6 Jul 12
Posts: 4
Credit: 12,385,544
RAC: 0
Message 60036 - Posted: 27 Sep 2013, 18:47:43 UTC - in response to Message 60035.  

Hey AMueller91,
glad you figured it out. I have not seen any benefit beyond 4 tasks per gpu. Key is no down time and the chances that 4 tasks finish at the same time is minimal.

If they are running in tandem, just pause one then start it back up. All you need for cpus is set logical cores = physical cores plus 1.

Exactly :)

After it starts working fine with 6 tasks per GPU, i tested the maximum number of WUs to my Titan Cards. So without Double Precision, they can only handle a maximum of 3 WUs per Card to get a GPU load of 99%. But with Double Precision enabled, i get a maximum of 8 WUs per Card (16 GPU Tasks simultaneously) at a 99% GPU load. I let it run for around 5 minutes, finished nearby 30 tasks but also the card heat up to 90°C.

So im fine with 6 WUs/card. It runs stable, without errors and temps around 85°C.
ID: 60036 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3319
Credit: 520,302,243
RAC: 20,411
Message 60039 - Posted: 28 Sep 2013, 11:12:56 UTC - in response to Message 60031.  

Hey there,

Any runs named _separation_ are coming from Milkyway@Home and runs named _modfit_ are coming from Milkyway@home Separation (Modified Fit). This run may have a slightly more complicated data set so it might actually just take longer to run them. Those are Jeff's runs and I am meeting with him in 10 minutes. I will let him know about your problem and see he thinks is going on.

Sorry,

Jake W


I really don't see the difference, they BOTH say Milkyway@home!! Are you trying to say you are getting units from a 3rd party supplier, putting the MilkyWay@home name on them, and are still not responsible if they are bad or don't work?

Today I got a message from MW saying the driver I am using, the AMD 13.10 Beta, is not supported here. Okay that's fine, but I can't find a list of which ones ARE supported here? Is this a trial and error thing until I stop getting the message, or am I just not seeing the list of approved drivers somewhere?
ID: 60039 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile KeithBriggs

Send message
Joined: 28 Apr 11
Posts: 29
Credit: 257,238,237
RAC: 52,119
Message 60040 - Posted: 28 Sep 2013, 13:40:31 UTC - in response to Message 60039.  

Probably the CAL driver message. Just disregard. On the main page is Statistics and under that is the GPU list. http://milkyway.cs.rpi.edu/milkyway/gpu_list.php.
ID: 60040 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : News : Users Auto-Aborting Work Units

©2024 Astroinformatics Group