Users Auto-Aborting Work Units

Author	Message
Jake Weiss Volunteer moderator Project developer Project tester Project scientist Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0	Message 60004 - Posted: 26 Sep 2013, 19:41:43 UTC Hello all, It has come to our attention that some users have been setting their BOINC clients to auto abort work units from specific applications. Doing this sends an error results back to our server which then causes some work units to be unable to validate. Essentially, it prevents some of our hard working crunchers from getting their due credits. The proper way to prevent yourself from getting work units from a specific applications such as our beta applications N-Body or Modified Fit, is to go to your account page on our website (http://milkyway.cs.rpi.edu/milkyway/home.php). Under the Preferences section please select the link for your preferences for this project. There will then be a link to edit these preferences on this page. Halfway down your preferences, there will be some check boxes in the "Run only the selected applications" section. You will only receive work units for the applications you have check marks next to. For reference: Milkyway@home is our flagship application and is considered stable and in its final released state; Milkyway@home N-body Simulation is our beta version N-body simulation and orbit fit program; Milkyway@home Separation is an, as of now, unused application; Milkyway@home Separation (Modified Fit) is our beta version separation code testing new models for both streams and background in the Milky Way Halo. As usual if you have any issues with this method or questions about it please post them here. We appreciate your cooperation and understanding in this. Thank you, Jake W. TL;DR: If you are auto-aborting work units please stop and use the method above to prevent users from losing credits and to prevent problems in our algorithms. ID: 60004 · Rating: 0 · rate: / Reply Quote

KeithBriggs Send message Joined: 28 Apr 11 Posts: 29 Credit: 257,238,237 RAC: 52,119	Message 60005 - Posted: 26 Sep 2013, 21:16:01 UTC - in response to Message 60004. Maybe new users should have to opt into beta projects. http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=437270 has about 5373 aborted WU's and counting. ID: 60005 · Rating: 0 · rate: / Reply Quote

KeithBriggs Send message Joined: 28 Apr 11 Posts: 29 Credit: 257,238,237 RAC: 52,119	Message 60006 - Posted: 26 Sep 2013, 21:29:26 UTC - in response to Message 60005. Here's some major aborters: http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=104692 2900 aborts http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=529892 8800 aborts http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=322721 4300 aborts http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=520641 15000 aborts http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=529525 3400 aborts http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=366486 2800 aborts http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=485608 5000 aborts http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=484725 1600 aborts http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=452569 3700 aborts http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=532562 3400 aborts ID: 60006 · Rating: 0 · rate: / Reply Quote

Tomahawk4196 Send message Joined: 17 Aug 13 Posts: 3 Credit: 336,920,753 RAC: 0	Message 60007 - Posted: 26 Sep 2013, 22:47:03 UTC Folks: I would like to just do GPU work units for Milkyway@home, and to that end I have been using an app_info.xml to make my FirePro do two workunits at a time. However, I do often get messages that state Message from server: Your app_info.xml file doesn't have a usable version of Milkyway@Home Separation (Modified Fit). I sure hope I'm not causing any problems. Which check boxes should I clear if I only want to do GPU processing? I didn't even know this 'Preferences' page existed for this project - good news for me. Thanks ID: 60007 · Rating: 0 · rate: / Reply Quote

swiftmallard Send message Joined: 18 Jul 09 Posts: 300 Credit: 303,562,776 RAC: 0	Message 60008 - Posted: 26 Sep 2013, 23:05:19 UTC Check the MilkyWay@home box and the MilkyWay@home Separation (Modified fit) boxes. Stop using the app_info file and use an app_config file instead. This one works well for me: <app_config> <app> <name>milkyway</name> <gpu_versions> <gpu_usage>0.5</gpu_usage> <cpu_usage>0.05</cpu_usage> </gpu_versions> </app> </app_config> ID: 60008 · Rating: 0 · rate: / Reply Quote

KeithBriggs Send message Joined: 28 Apr 11 Posts: 29 Credit: 257,238,237 RAC: 52,119	Message 60009 - Posted: 26 Sep 2013, 23:19:04 UTC - in response to Message 60008. I also use app_config but its easiest to just do it in the preferences. ID: 60009 · Rating: 0 · rate: / Reply Quote

swiftmallard Send message Joined: 18 Jul 09 Posts: 300 Credit: 303,562,776 RAC: 0	Message 60010 - Posted: 26 Sep 2013, 23:53:04 UTC He wants to crunch two at a time, he'll need the config file for that. ID: 60010 · Rating: 0 · rate: / Reply Quote

KeithBriggs Send message Joined: 28 Apr 11 Posts: 29 Credit: 257,238,237 RAC: 52,119	Message 60013 - Posted: 27 Sep 2013, 2:13:40 UTC - in response to Message 60010. Yes, that's right. Good point. Here's my app_config <app_config> <app> <name>milkyway_nbody</name> <max_concurrent>0</max_concurrent> <gpu_versions> <gpu_usage>.1</gpu_usage> <cpu_usage>1</cpu_usage> </gpu_versions> </app> <app> <name>milkyway</name> <max_concurrent>8</max_concurrent> <gpu_versions> <gpu_usage>.25</gpu_usage> <cpu_usage>.11</cpu_usage> </gpu_versions> </app> <app> <name>milkyway_separation__modified_fit</name> <max_concurrent>8</max_concurrent> <gpu_versions> <gpu_usage>.25</gpu_usage> <cpu_usage>.12</cpu_usage> </gpu_versions> </app> </app_config> and here's my cc_config <cc_config> <log_flags> </log_flags> <options> <ncpus>4</ncpus> <max_file_xfers>30</max_file_xfers> <max_file_xfers_per_project>30</max_file_xfers_per_project> <http_transfer_timeout>30</http_transfer_timeout> <rec_half_life_days>10</rec_half_life_days> <report_results_immediately>0</report_results_immediately> </options> </cc_config> so one gpu is running 4 WU at a time. Then no down time. Particular machine has two cpu cores but I have 4 virtual cores. Again, no cpu down time. 4 cpu wu and 4 gpu wu which is about 10% more work than letting them cycle down. Also have constant fan speeds and more stable temperatures. Holler if any wants my 2 gpu xml files. ID: 60013 · Rating: 0 · rate: / Reply Quote

greg_be Send message Joined: 18 Aug 09 Posts: 122 Credit: 20,716,927 RAC: 1,665	Message 60022 - Posted: 27 Sep 2013, 11:45:17 UTC Any news as to when 1.38 will be out? I have opted out of modfit 1.28 because it just crashes on my system. ID: 60022 · Rating: 0 · rate: / Reply Quote

KeithBriggs Send message Joined: 28 Apr 11 Posts: 29 Credit: 257,238,237 RAC: 52,119	Message 60024 - Posted: 27 Sep 2013, 13:09:03 UTC - in response to Message 60022. First go to my account, then under MilkywayPreferences you'll see Use CPU Enforced by version 6.10+ yes Use ATI GPU Enforced by version 6.10+ yes Use NVIDIA GPU Enforced by version 6.10+ yes A few more lines down you'll see: Run only the selected applications MilkyWay@Home: yes MilkyWay@Home N-Body Simulation: no Milkyway@Home Separation: no Milkyway@Home Separation (Modified Fit): no ID: 60024 · Rating: 0 · rate: / Reply Quote

AMueller91 Send message Joined: 6 Jul 12 Posts: 4 Credit: 12,385,544 RAC: 0	Message 60025 - Posted: 27 Sep 2013, 13:29:41 UTC - in response to Message 60013. Yes, that's right. Good point. Here's my app_config <app_config> <app> <name>milkyway_nbody</name> <max_concurrent>0</max_concurrent> <gpu_versions> <gpu_usage>.1</gpu_usage> <cpu_usage>1</cpu_usage> </gpu_versions> </app> <app> <name>milkyway</name> <max_concurrent>8</max_concurrent> <gpu_versions> <gpu_usage>.25</gpu_usage> <cpu_usage>.11</cpu_usage> </gpu_versions> </app> <app> <name>milkyway_separation__modified_fit</name> <max_concurrent>8</max_concurrent> <gpu_versions> <gpu_usage>.25</gpu_usage> <cpu_usage>.12</cpu_usage> </gpu_versions> </app> </app_config> and here's my cc_config <cc_config> <log_flags> </log_flags> <options> <ncpus>4</ncpus> <max_file_xfers>30</max_file_xfers> <max_file_xfers_per_project>30</max_file_xfers_per_project> <http_transfer_timeout>30</http_transfer_timeout> <rec_half_life_days>10</rec_half_life_days> <report_results_immediately>0</report_results_immediately> </options> </cc_config> so one gpu is running 4 WU at a time. Then no down time. Particular machine has two cpu cores but I have 4 virtual cores. Again, no cpu down time. 4 cpu wu and 4 gpu wu which is about 10% more work than letting them cycle down. Also have constant fan speeds and more stable temperatures. Holler if any wants my 2 gpu xml files. Hey, this is exactly what i want, but it dosent work. I got a i7-3930k CPU with 6 Cores and 12 Threads & two GTX Titan. If i let run MW@Home without any app_configs or something, all 12 Threads and both GPUs are working. If i add the app_config, only GPUs are working correctly (2 WUs per GPU), but the CPU does nothing... BUT if i drag n drop the app_config.xml file out of my MW@home folder and restart BOINC, all works fine (12 CPU WUs and 2 GPU WUs on each card) for around 10min! After this 10 minutes, the GPUs automatically stops the additional WUs and keeps processing one WU on each card. How can i make all or even 8 to 10 threads working while using the multiple WU App_config? Sorry for my bad english, im from Fondue-Switzerland :) PS: My app_config.xml <app_config> <app> <name>milkyway</name> <max_concurrent>4</max_concurrent> <gpu_versions> <gpu_usage>.500</gpu_usage> <cpu_usage>0.25</cpu_usage> </gpu_versions> </app> </app_config> ID: 60025 · Rating: 0 · rate: / Reply Quote

AMueller91 Send message Joined: 6 Jul 12 Posts: 4 Credit: 12,385,544 RAC: 0	Message 60026 - Posted: 27 Sep 2013, 14:10:09 UTC - in response to Message 60025. Last modified: 27 Sep 2013, 14:38:11 UTC ok... after searching for a solution since 5 hours, it works now. After setting the value <max_concurrent>4</max_concurrent> to <max_concurrent>14</max_concurrent> inside the app_config.xml and <ncpus>4</ncpus> to <ncpus>14</ncpus> inside the cc_config.xml it works fine. 4 GPU WUs (2 WUs/GPU @ 0.25 CPU/GPU-WU) and 10 CPU WUs are active. Hope it will hold longer than 10 minutes :D Edit: With optimized app_config and cc_config, i can run 23 WUs at the same time. 12 on GPU (6 per card with Double Precision enabled) and 11 WUs on CPU. Each GPU WU take around 2 minutes to complete, CPU WUs run between 1-2 Hours. <app_config> <app> <name>milkyway</name> <max_concurrent>23</max_concurrent> <gpu_versions> <gpu_usage>.15</gpu_usage> <cpu_usage>0.05</cpu_usage> </gpu_versions> </app> </app_config> <cc_config> <log_flags> </log_flags> <options> <ncpus>23</ncpus> <max_file_xfers>30</max_file_xfers> <max_file_xfers_per_project>30</max_file_xfers_per_project> <http_transfer_timeout>30</http_transfer_timeout> <rec_half_life_days>10</rec_half_life_days> <report_results_immediately>0</report_results_immediately> </options> </cc_config> ID: 60026 · Rating: 0 · rate: / Reply Quote

Jake Weiss Volunteer moderator Project developer Project tester Project scientist Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0	Message 60027 - Posted: 27 Sep 2013, 14:51:48 UTC Hey all, Thank you for posting examples of good configuration options. Jake W ID: 60027 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3319 Credit: 520,302,243 RAC: 20,411	Message 60028 - Posted: 27 Sep 2013, 14:54:51 UTC I know how to edit my preferences but how do I stop the "de_separation_DR_8_rev_3_1_2" units!! It is NOT labeled that way on my list and EVERY SINGLE ONE is failing!!! These are my choices: Run only the selected applications MilkyWay@Home: yes MilkyWay@Home N-Body Simulation: yes Milkyway@Home Separation: yes Milkyway@Home Separation (Modified Fit): no It seems to me the project has a problem and we users are being blamed for it, and the project is NOT helping to solve the problem!! Label the choices as to the units you are sending out and I WILL uncheck them!!! Until then deal with the problem, just like I am!!! ID: 60028 · Rating: 0 · rate: / Reply Quote

Jake Weiss Volunteer moderator Project developer Project tester Project scientist Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0	Message 60031 - Posted: 27 Sep 2013, 15:39:15 UTC Last modified: 27 Sep 2013, 15:40:05 UTC Hey there, Any runs named _separation_ are coming from Milkyway@Home and runs named _modfit_ are coming from Milkyway@home Separation (Modified Fit). This run may have a slightly more complicated data set so it might actually just take longer to run them. Those are Jeff's runs and I am meeting with him in 10 minutes. I will let him know about your problem and see he thinks is going on. Sorry, Jake W ID: 60031 · Rating: 0 · rate: / Reply Quote

KeithBriggs Send message Joined: 28 Apr 11 Posts: 29 Credit: 257,238,237 RAC: 52,119	Message 60032 - Posted: 27 Sep 2013, 15:55:13 UTC - in response to Message 60028. The boinc manager wont delete wu's you've already received. Watch the newly downloaded ones and see if it is working correctly. If your computer is listed as "school" or "home" you'll have to change the acceptable apps for each class or computers you have. ID: 60032 · Rating: 0 · rate: / Reply Quote

KeithBriggs Send message Joined: 28 Apr 11 Posts: 29 Credit: 257,238,237 RAC: 52,119	Message 60035 - Posted: 27 Sep 2013, 17:43:50 UTC - in response to Message 60026. Hey AMueller91, glad you figured it out. I have not seen any benefit beyond 4 tasks per gpu. Key is no down time and the chances that 4 tasks finish at the same time is minimal. If they are running in tandem, just pause one then start it back up. All you need for cpus is set logical cores = physical cores plus 1. ID: 60035 · Rating: 0 · rate: / Reply Quote

AMueller91 Send message Joined: 6 Jul 12 Posts: 4 Credit: 12,385,544 RAC: 0	Message 60036 - Posted: 27 Sep 2013, 18:47:43 UTC - in response to Message 60035. Hey AMueller91, glad you figured it out. I have not seen any benefit beyond 4 tasks per gpu. Key is no down time and the chances that 4 tasks finish at the same time is minimal. If they are running in tandem, just pause one then start it back up. All you need for cpus is set logical cores = physical cores plus 1. Exactly :) After it starts working fine with 6 tasks per GPU, i tested the maximum number of WUs to my Titan Cards. So without Double Precision, they can only handle a maximum of 3 WUs per Card to get a GPU load of 99%. But with Double Precision enabled, i get a maximum of 8 WUs per Card (16 GPU Tasks simultaneously) at a 99% GPU load. I let it run for around 5 minutes, finished nearby 30 tasks but also the card heat up to 90Â°C. So im fine with 6 WUs/card. It runs stable, without errors and temps around 85Â°C. ID: 60036 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3319 Credit: 520,302,243 RAC: 20,411	Message 60039 - Posted: 28 Sep 2013, 11:12:56 UTC - in response to Message 60031. Hey there, Any runs named _separation_ are coming from Milkyway@Home and runs named _modfit_ are coming from Milkyway@home Separation (Modified Fit). This run may have a slightly more complicated data set so it might actually just take longer to run them. Those are Jeff's runs and I am meeting with him in 10 minutes. I will let him know about your problem and see he thinks is going on. Sorry, Jake W I really don't see the difference, they BOTH say Milkyway@home!! Are you trying to say you are getting units from a 3rd party supplier, putting the MilkyWay@home name on them, and are still not responsible if they are bad or don't work? Today I got a message from MW saying the driver I am using, the AMD 13.10 Beta, is not supported here. Okay that's fine, but I can't find a list of which ones ARE supported here? Is this a trial and error thing until I stop getting the message, or am I just not seeing the list of approved drivers somewhere? ID: 60039 · Rating: 0 · rate: / Reply Quote

KeithBriggs Send message Joined: 28 Apr 11 Posts: 29 Credit: 257,238,237 RAC: 52,119	Message 60040 - Posted: 28 Sep 2013, 13:40:31 UTC - in response to Message 60039. Probably the CAL driver message. Just disregard. On the main page is Statistics and under that is the GPU list. http://milkyway.cs.rpi.edu/milkyway/gpu_list.php. ID: 60040 · Rating: 0 · rate: / Reply Quote