Message boards :
News :
Nbody WU Flush
Message board moderation
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 22 · Next
Author | Message |
---|---|
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
... 7.16.20 is the latest official version. To get the one I have, 7.19.0 which fixes some annoying bugs, you have to know people. It's not downloadable anywhere, unless you want to compile it yourself. Send me a private message with your email address if you want me to send the the exe files for Windows. You just replace what you have. I can't do that for Linux/Mac. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
... I think the ingrained default is 0. The default that's in the file which should always be there and appears when you install it is 1 (or has been in recent times - perhaps they changed the default by adding a 1 to the file placed there when you install it). The only computer I have with no file is one that got corrupted somehow. Now that uses all 4 GPUs, but they're identical GPUs. I might remember to come here and tell you what happens when I move a different one over to it later. Strange. When I installed BM there was no cc_config in the ProgramData file that was automatically installed. On none of my rigs. I had to create one in order to add the following 5 lines: <cc_config> <options> <use_all_gpus>1</use_all_gpus> </options> </cc_config> These set of lines will be merged by the system with the standard cc_config - that is what I understood. Had to add the use_all_gpus 1 in order to get more than one GPU working in each rig. I have never had any trouble using this setup. Thanks for the infos. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
One of mine was empty and I assumed I'd broken the file. But this is an example of what the other 6 look like:... I think the ingrained default is 0. The default that's in the file which should always be there and appears when you install it is 1 (or has been in recent times - perhaps they changed the default by adding a 1 to the file placed there when you install it). The only computer I have with no file is one that got corrupted somehow. Now that uses all 4 GPUs, but they're identical GPUs. I might remember to come here and tell you what happens when I move a different one over to it later.Strange. <cc_config> <log_flags> <file_xfer>1</file_xfer> <sched_ops>1</sched_ops> <task>1</task> <app_msg_receive>0</app_msg_receive> <app_msg_send>0</app_msg_send> <async_file_debug>0</async_file_debug> <benchmark_debug>0</benchmark_debug> <checkpoint_debug>0</checkpoint_debug> <coproc_debug>0</coproc_debug> <cpu_sched>0</cpu_sched> <cpu_sched_debug>0</cpu_sched_debug> <cpu_sched_status>0</cpu_sched_status> <dcf_debug>0</dcf_debug> <disk_usage_debug>0</disk_usage_debug> <file_xfer_debug>0</file_xfer_debug> <gui_rpc_debug>0</gui_rpc_debug> <heartbeat_debug>0</heartbeat_debug> <http_debug>0</http_debug> <http_xfer_debug>0</http_xfer_debug> <idle_detection_debug>0</idle_detection_debug> <mem_usage_debug>0</mem_usage_debug> <network_status_debug>0</network_status_debug> <notice_debug>0</notice_debug> <poll_debug>0</poll_debug> <priority_debug>0</priority_debug> <proxy_debug>0</proxy_debug> <rr_simulation>0</rr_simulation> <rrsim_detail>0</rrsim_detail> <sched_op_debug>0</sched_op_debug> <scrsave_debug>0</scrsave_debug> <slot_debug>0</slot_debug> <state_debug>0</state_debug> <statefile_debug>0</statefile_debug> <suspend_debug>0</suspend_debug> <task_debug>0</task_debug> <time_debug>0</time_debug> <trickle_debug>0</trickle_debug> <unparsed_xml>0</unparsed_xml> <work_fetch_debug>0</work_fetch_debug> </log_flags> <options> <abort_jobs_on_exit>0</abort_jobs_on_exit> <allow_gui_rpc_get>0</allow_gui_rpc_get> <allow_multiple_clients>0</allow_multiple_clients> <allow_remote_gui_rpc>0</allow_remote_gui_rpc> <disallow_attach>0</disallow_attach> <dont_check_file_sizes>0</dont_check_file_sizes> <dont_contact_ref_site>0</dont_contact_ref_site> <lower_client_priority>0</lower_client_priority> <dont_suspend_nci>0</dont_suspend_nci> <dont_use_vbox>0</dont_use_vbox> <dont_use_wsl>0</dont_use_wsl> <exclude_gpu> <url>https://milkyway.cs.rpi.edu/milkyway/</url> <device_num>0</device_num> </exclude_gpu> <exclude_gpu> <url>http://boincvm.proxyma.ru:30080/test4vm/</url> <device_num>1</device_num> </exclude_gpu> <exclude_gpu> <url>http://numberfields.asu.edu/NumberFields/</url> <device_num>1</device_num> </exclude_gpu> <exclude_gpu> <url>http://einstein.phys.uwm.edu/</url> <device_num>1</device_num> <app>einstein_O3AS</app> </exclude_gpu> <exclusive_app>Fallout4.exe</exclusive_app> <exit_after_finish>0</exit_after_finish> <exit_before_start>0</exit_before_start> <exit_when_idle>0</exit_when_idle> <fetch_minimal_work>0</fetch_minimal_work> <fetch_on_update>0</fetch_on_update> <force_auth>default</force_auth> <http_1_0>0</http_1_0> <http_transfer_timeout>300</http_transfer_timeout> <http_transfer_timeout_bps>10</http_transfer_timeout_bps> <ignore_ati_dev>10</ignore_ati_dev> <max_event_log_lines>2000</max_event_log_lines> <max_file_xfers>8</max_file_xfers> <max_file_xfers_per_project>2</max_file_xfers_per_project> <max_stderr_file_size>0.000000</max_stderr_file_size> <max_stdout_file_size>0.000000</max_stdout_file_size> <max_tasks_reported>0</max_tasks_reported> <ncpus>-1</ncpus> <no_alt_platform>0</no_alt_platform> <no_gpus>0</no_gpus> <no_info_fetch>0</no_info_fetch> <no_opencl>0</no_opencl> <no_priority_change>0</no_priority_change> <os_random_only>0</os_random_only> <process_priority>-1</process_priority> <process_priority_special>-1</process_priority_special> <proxy_info> <socks_server_name></socks_server_name> <socks_server_port>80</socks_server_port> <http_server_name></http_server_name> <http_server_port>80</http_server_port> <socks5_user_name></socks5_user_name> <socks5_user_passwd></socks5_user_passwd> <socks5_remote_dns>0</socks5_remote_dns> <http_user_name></http_user_name> <http_user_passwd></http_user_passwd> <no_proxy></no_proxy> <no_autodetect>0</no_autodetect> </proxy_info> <rec_half_life_days>10.000000</rec_half_life_days> <report_results_immediately>0</report_results_immediately> <run_apps_manually>0</run_apps_manually> <save_stats_days>30</save_stats_days> <skip_cpu_benchmarks>0</skip_cpu_benchmarks> <simple_gui_only>0</simple_gui_only> <start_delay>0.000000</start_delay> <stderr_head>0</stderr_head> <suppress_net_info>0</suppress_net_info> <unsigned_apps_ok>0</unsigned_apps_ok> <use_all_gpus>1</use_all_gpus> <use_certs>0</use_certs> <use_certs_only>0</use_certs_only> <vbox_window>0</vbox_window> </options> </cc_config> I have no idea what most of that means, so I can't have put it there. We're talking about cc_config.xml in c:\programdata\boinc, right? |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
@Peter: ... even stranger - maybe a new feature of your version of BM ? |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
@Peter: No, it was the same before that one. |
Send message Joined: 13 Apr 17 Posts: 256 Credit: 604,411,638 RAC: 0 |
@Peter: I wonder how the "exclude_gpu" got in there. As far as I understand it, you are excluding certain GPUs on certain projects. Maybe someone else can help to explain ... |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 557,033,228 RAC: 42,334 |
Every BOINC version shipped before 7.16.20 never included a cc_config.xml file. But it is extremely easy to automatically create one. Just toggle on any other logging flag for the Event Log in the Manager >> Options >> Event Log options menu, besides the default ones, and a new, fully populated cc_config.xml is automatically created. Then you can just untoggle what you switched on temporarily and be back to default logging. Though I recommend leaving sched_op_debug on because it will tell you how many seconds of work you are requesting every scheduler connection which is good to know for work fetch debugging. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
If you mean this:I wonder how the "exclude_gpu" got in there.@Peter:No, it was the same before that one. <exclude_gpu> <url>https://milkyway.cs.rpi.edu/milkyway/</url> <device_num>0</device_num> </exclude_gpu> <exclude_gpu> <url>http://boincvm.proxyma.ru:30080/test4vm/</url> <device_num>1</device_num> </exclude_gpu> <exclude_gpu> <url>http://numberfields.asu.edu/NumberFields/</url> <device_num>1</device_num> </exclude_gpu> <exclude_gpu> <url>http://einstein.phys.uwm.edu/</url> <device_num>1</device_num> <app>einstein_O3AS</app> </exclude_gpu> <exclusive_app>Fallout4.exe</exclusive_app>Then I put that bit in there. The last line is to pause Boinc when I'm shooting people post nuclear war and the rest put projects on specific GPUs. But the likes of this is not my doing: <rec_half_life_days>10.000000</rec_half_life_days> <report_results_immediately>0</report_results_immediately> <run_apps_manually>0</run_apps_manually> <save_stats_days>30</save_stats_days> <skip_cpu_benchmarks>0</skip_cpu_benchmarks> <simple_gui_only>0</simple_gui_only> <start_delay>0.000000</start_delay> <stderr_head>0</stderr_head> <suppress_net_info>0</suppress_net_info> <unsigned_apps_ok>0</unsigned_apps_ok> <use_all_gpus>1</use_all_gpus> <use_certs>0</use_certs> <use_certs_only>0</use_certs_only> <vbox_window>0</vbox_window> |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
... Yes it means Boinc Manager, most of us have just upgraded over the years to newer versions installing them right over the top of the older versions already on our pc's, a few people though are 'newbies' so when they install Boinc for the first time there is no remains of the older versions hanging around. It doesn't matter to me one way or the other with this setting, it's the default in the cc_config file I put on every new OS install anyway |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
If you mean this:I wonder how the "exclude_gpu" got in there.@Peter:No, it was the same before that one. Yes the last part is some of the default settings in Boinc that could be changed if someone thinks it will make Boinc work better for them, I did not know that the use_all_gpus line was in there, thanks! |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Yes it means Boinc Manager, most of us have just upgraded over the years to newer versions installing them right over the top of the older versions already on our pc's, a few people though are 'newbies' so when they install Boinc for the first time there is no remains of the older versions hanging around. It doesn't matter to me one way or the other with this setting, it's the default in the cc_config file I put on every new OS install anywayOr it's an extra machine they got after the last version came out. Or the disk exploded and was reinstalled from scratch. Or somebody stole the computer and it got returned. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Yes the last part is some of the default settings in Boinc that could be changed if someone thinks it will make Boinc work better for them, I did not know that the use_all_gpus line was in there, thanks!There's tonnes of options here for anyone that didn't know: https://boinc.berkeley.edu/wiki/Client_configuration - easy to find in google, just type "boinc configuration" or "boinc config". |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Can someone check on the leftover WUs that still need to be validated, and look at a couple things for me? I'd like to know what the following values are for these WUs: 1) server_state 2) outcome 3) client_state 4) validate_state 5) need_validate This should be found in a combination of the "result" and the "WU" information. I'm not sure what things look like on the client. If you send some links to a few of these WUs that might be the most helpful since I can directly compare these stuck WUs with the freely-flowing WUs. Then, hopefully, I can manually force these WUs to generate new validation WUs, but I can do it ~10k WUs at a time or some other reasonable number so that we don't blow up the server. |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
I'll basically be running this re-validation script that BOINC provides but being more particular about what WUs we are asking to be re-validated. https://gitlab.camras.nl/matthieu/boinc/-/blob/wrapper/26014/html/ops/revalidate.php |
Send message Joined: 16 Mar 10 Posts: 213 Credit: 109,115,063 RAC: 29,767 |
Can someone check on the leftover WUs that still need to be validated, and look at a couple things for me? I'd like to know what the following values are for these WUs: Tom, I presume you're referring to the work units that are stalled because of a "Didn't need" result. In the simplest case (first result returned, second result marked "Didn't need" I see the following: Server state Over Outcome Success Client state Done Validate state Checked, but no consensus yetThose are the only fields I can see as an end user - unsure what that "need_validate" value you mentioned is :-) ... The "Didn't need" results have the following: Server state Over Outcome Didn't need Client state New Validate state Initial Those settings look good for what you are trying to do - the PHP script you referenced in your follow-up post will ignore the "Didn't need" result (validate_state=0) and give the work unit a kick - transitioner should then do what you want! Trying to clear out some/all of these before the transitioner hits the "safety net" time on these work units (and kicks them all back to life at once?) is probably a good idea :-) So, good luck finding a search that works without confusing the database! Cheers - Al. P.S. I've just read your latest comments over in the Number crunching forum... I think that script may re-validate results for work units that have a canonical result if your query simply selects on work units with a "Didn't need" result -- even with my small total of valid NBody tasks I've helped validate previously stalled tasks that unstalled when the original result returned something after deadline, so I guess there'll be a lot more in the same situation! |
Send message Joined: 16 Mar 10 Posts: 213 Credit: 109,115,063 RAC: 29,767 |
Tom, If you want a small sample of work units to try with that PHP script, the following consecutive work units are a random selection from those where I sent in the initial result and then got blocked. These are 4 of 47, so I'm not exactly inconvenienced by them being stuck, but I can offer feedback on what happens next... :-) 415099432 to 415099435 I've no doubt other users might chip in with much larger sets to sample from; I'd just like to help get this moving if I can! Cheers - Al. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0 |
Can someone check on the leftover WUs that still need to be validated, and look at a couple things for me? I'd like to know what the following values are for these WUs: ALL of my tasks that are waiting to be validated from the last 5 days, except the ones from February last year that are just stuck forever, are _0 tasks sent out in the last 5 days, I have: In progress (1143) · Validation pending (10) · Validation inconclusive (1263) · Valid (5563). Most of the In Progress tasks are _0 tasks as well though I do have a smattering of _1 and _2 and the rare _3 task that I see. One thing I've wondered is that if there is a way to prioritize the _1 etc tasks by say giving them a shorter deadline to encourage them to be crunched more quickly thus reducing the Validation Inconclusive tasks others have? I think over time it would even out between who's first and who gets the validator task especially since not every tasks needs one, but since there's no benefit to being first here it really doesn't matter, except to get the tasks in and then back out of the database again as quickly as possible while maintaining the integrity of the tasks. |
Send message Joined: 13 Oct 21 Posts: 44 Credit: 227,500,762 RAC: 21,057 |
Looks like we can finally expect to start seeing progress with all those N-Body inconclusives. I just got a batch of wingman tasks that were initially processed way back in mid-March but flagged as inconclusive and after the purge the wingman tasks were labeled as "Didn't need". Example task https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=390676029. I've even validated others' tasks from back in March that were "Didn't need". For example https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=390754804 The total tasks ready to send for N-Body has jumped to almost 12 million but that's OK, if the user base stays as high as it is now it should take weeks not months to get through. It seems like probably the only way to get everything validated is to process all those original 14 million tasks that got created after disk reconstruction. I suspected there may be consequences from cancelling all of those tasks. We're back to the original solution: Gotta Crunch 'Em All! |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
If it's in the Boinc Pentathlon, they'll go in a week. I only know Universe is in it so far, not sure what the other 4 disciplines are. Milkyway hasn't been there since 2010 so I'm not hopeful. |
Send message Joined: 5 Oct 13 Posts: 9 Credit: 1,010,442,552 RAC: 9,488 |
Can someone check on the leftover WUs that still need to be validated, and look at a couple things for me? I'd like to know what the following values are for these WUs: https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=412567034 https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=413057030 https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=413674439 https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=414290323 |
©2024 Astroinformatics Group