Message boards :
News :
New Separation Runs
Message board moderation
Author | Message |
---|---|
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Hey Everyone, It's finally that time! The southern sky runs from before have converged, and it's time to start new ones while we dig through all that data. I have taken down the following runs: de_modfit_84_bundle4_4s_south4s_gapfix de_modfit_84_bundle4_4s_south4s_gapfix_bgset2 de_modfit_84_bundle4_4s_south4s_gapfix_bgset3 de_modfit_85_bundle4_4s_south4s_gapfix de_modfit_85_bundle4_4s_south4s_gapfix_bgset2 de_modfit_85_bundle4_4s_south4s_gapfix_bgset3 And I have put up the following runs: de_modfit_70_bundle5_3s_south_pt2 de_modfit_71_bundle5_3s_south_pt2 de_modfit_72_bundle5_3s_south_pt2 de_modfit_73_bundle5_3s_south_pt2 de_modfit_74_bundle5_3s_south_pt2 de_modfit_75_bundle5_3s_south_pt2 de_modfit_76_bundle5_3s_south_pt2 de_modfit_77_bundle5_3s_south_pt2 de_modfit_78_bundle5_3s_south_pt2 de_modfit_79_bundle5_3s_south_pt2 de_modfit_80_bundle5_3s_south_pt2 de_modfit_81_bundle5_3s_south_pt2 de_modfit_82_bundle5_3s_south_pt2 de_modfit_83_bundle5_3s_south_pt2 de_modfit_84_bundle5_3s_south_pt2 de_modfit_85_bundle5_3s_south_pt2 de_modfit_86_bundle5_3s_south_pt2 The workunits from the old runs will continue to go out for a little while while results and validations trickle in to the server. You should see workunits from the old runs go away within a week or so. Please let me know if you have any issues with these runs. With any luck the 7-size workunit problem should disappear. Thanks for the continued support, and happy crunching! -Tom EDIT: The above runs were taken down and replaced with these: de_modfit_70_bundle5_3s_south_pt2_2 de_modfit_71_bundle5_3s_south_pt2_2 de_modfit_72_bundle5_3s_south_pt2_2 de_modfit_73_bundle5_3s_south_pt2_2 de_modfit_74_bundle5_3s_south_pt2_2 de_modfit_75_bundle5_3s_south_pt2_2 de_modfit_76_bundle5_3s_south_pt2_2 de_modfit_77_bundle5_3s_south_pt2_2 de_modfit_78_bundle5_3s_south_pt2_2 de_modfit_79_bundle5_3s_south_pt2_2 de_modfit_80_bundle5_3s_south_pt2_2 de_modfit_81_bundle5_3s_south_pt2_2 de_modfit_82_bundle5_3s_south_pt2_2 de_modfit_83_bundle5_3s_south_pt2_2 de_modfit_84_bundle5_3s_south_pt2_2 de_modfit_85_bundle5_3s_south_pt2_2 de_modfit_86_bundle5_3s_south_pt2_2 |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
The new part of the sky that we're fitting looks like this: In this picture, the background stars are shown in blue/purple, and the black region is where we are fitting data. Bluer regions have more dust, and the purple regions have very little dust: dust is bad for fitting. We had to make some cuts in the stripes in order to avoid the dustiest regions, which is why there is some gaps in the data. We are not fitting the bottom left part of the picture, because that's where we've already fit! (We are fitting a little bit of overlap as a sanity check, though). |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
I've also tightened the validator tolerance after the loosened restrictions that we had from the last runs. Let me know if we start getting a lot of invalid workunits in case I need to loosen the validator tolerance again. |
Send message Joined: 12 Mar 21 Posts: 1 Credit: 128,187,907 RAC: 0 |
Hi, glad to see some update on progress. If possible I would love to see them more often. Now i started getting errors, all from new bundle. Well I am running Radeon hd 7970. I will wait for a bit if restrictions will be loosened. Otherwise I will have to stop for a while. Not feeling like buying new graphics card with these prices. |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Looks like there are some problems with the runs. I'm checking things out. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Hi, glad to see some update on progress. If possible I would love to see them more often. Now i started getting errors, all from new bundle. Well I am running Radeon hd 7970. I will wait for a bit if restrictions will be loosened. Otherwise I will have to stop for a while. Not feeling like buying new graphics card with these prices.I have the same card (4 of them), it's not you it's them. |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 555,459,757 RAC: 38,748 |
Yes, all the new stripe tasks are failing. Looks like invalid parameter sets cause the compiler to barf instantly. All my hosts forced into 3 hour backoffs. Probably should abort all the current improperly formatted tasks. |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
For whatever reason, all of the initial workunits generated with a list of zeros as their initial parameters. This is not typical and I'm looking into what caused it, and how to fix it. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Yes, all the new stripe tasks are failing. Looks like invalid parameter sets cause the compiler to barf instantly.I keep nudging mine to get more. Should be cleared soon. |
Send message Joined: 5 Mar 14 Posts: 24 Credit: 501,232,884 RAC: 0 |
WU are erroring out 2 seconds in Driver version: 460.32.03 Version: OpenCL 1.2 CUDA Compute capability: 6.0 Max compute units: 56 Clock frequency: 1328 Mhz Global mem size: 17071734784 Local mem size: 49152 Max const buf size: 65536 Double extension: cl_khr_fp64 Build log: -------------------------------------------------------------------------------- <kernel>:183:72: warning: unknown attribute 'max_constant_size' ignored __constant real* _ap_consts __attribute__((max_constant_size(18 * sizeof(real)))), ^ <kernel>:185:62: warning: unknown attribute 'max_constant_size' ignored __constant SC* sc __attribute__((max_constant_size(NSTREAM * sizeof(SC)))), ^ <kernel>:186:67: warning: unknown attribute 'max_constant_size' ignored __constant real* sg_dx __attribute__((max_constant_size(256 * sizeof(real)))), ^ <kernel>:235:26: error: use of undeclared identifier 'inf' tmp = mad((real) Q_INV_SQR, z * z, tmp); /* (q_invsqr * z^2) + (x^2 + y^2) */ ^ <built-in>:35:19: note: expanded from here #define Q_INV_SQR inf ^ -------------------------------------------------------------------------------- clBuildProgram: Build failure (-11): CL_BUILD_PROGRAM_FAILURE Error building program from source (-11): CL_BUILD_PROGRAM_FAILURE Error creating integral program from source Failed to calculate likelihood Background Epsilon (22.750900) must be >= 0, <= 1 00:24:19 (3571): called boinc_finish(1) </stderr_txt> ]]> |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Thanks for the error messages. I've been digging around in workunits and from what I can tell it looks like the errors are caused by a parameter list of all zeros. I have no idea why the parameter sets initialized that way. I'm going to look into things on our end. In the meantime, I have taken down these runs. It might be a day or so before I can get functional runs up, in which case it will be CPU-only crunching for a little while. Thanks for your patience! Things never go smoothly with BOINC it seems :) |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Although based on the actual workunit xml_docs, they aren't actually filled with all zeros for the parameter sets... The plot thickens |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Thanks for the error messages. I've been digging around in workunits and from what I can tell it looks like the errors are caused by a parameter list of all zeros. I have no idea why the parameter sets initialized that way. I'm going to look into things on our end.No problem Tom, we will process whatever you can manage to throw at us. We don't expect miracles or smooth sailing all the time, nothing is like that in life! |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
In the meantime, I have taken down these runs. It might be a day or so before I can get functional runs up, in which case it will be CPU-only crunching for a little while.The 84 and 85 ones are ok, are those from the last run? |
Send message Joined: 24 Jan 11 Posts: 715 Credit: 555,459,757 RAC: 38,748 |
Yes, those are from the last runs that he said they have completed the data and it has converged. So no reason to repeat any of that work. Plus those have the never fixed 7 bundle tasks that error out. I would just set NNT and wait for Tom to give us the go ahead with the new work when the bad data problem is figured out. |
Send message Joined: 12 Nov 21 Posts: 236 Credit: 575,038,236 RAC: 0 |
yep. a bunch errored out in 2 seconds. there seems to be some vinegar over the Background Epsilon parameter. Background Epsilon (32.469300) must be >= 0, <= 1 Name de_modfit_76_bundle5_3s_south_pt2_1643910122_7995307_0 Workunit 351119902 Created 10 Feb 2022, 0:45:45 UTC Sent 10 Feb 2022, 0:55:58 UTC Report deadline 22 Feb 2022, 0:55:58 UTC Received 10 Feb 2022, 0:59:41 UTC Server state Over Outcome Computation error Client state Compute error Exit status 1 (0x00000001) Unknown error code Computer ID 906873 Run time 2 sec CPU time Validate state Invalid Credit 0.00 Device peak FLOPS 5.13 GFLOPS Application version Milkyway@home Separation v1.46 windows_x86_64 Peak disk usage 0.01 MB +++++++ Stderr output <core_client_version>7.16.20</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1)</message> <stderr_txt> <search_application> milkyway_separation 1.46 Windows x86_64 double </search_application> Reading preferences ended prematurely Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Switching to Parameter File 'astronomy_parameters.txt' <number_WUs> 5 </number_WUs> <number_params_per_WU> 20 </number_params_per_WU> Using SSE4.1 path q is 0.0 Integral 0 time = 0.000023 s Failed to calculate integral 0 Failed to calculate likelihood Background Epsilon (32.469300) must be >= 0, <= 1 18:58:10 (12228): called boinc_finish(1) </stderr_txt> ]]> |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Yes, those are from the last runs that he said they have completed the data and it has converged. So no reason to repeat any of that work.I was helping empty the buffer, which server status now shows is nbody only. So you can set your CPUs off on that. |
Send message Joined: 16 Mar 10 Posts: 213 Credit: 108,363,048 RAC: 4,419 |
Tom, I don't know why, but the <command line> entry for these new tasks as it appears in client_state.xml has double-quotes around the -f parameter, and it appears that it is not the ASCII double quote but a matched pair of opening and closing double quotes (almost as if someone edited a script with a word processor instead of a "raw" text editor...) Now, if the program tries to parse that as a command-line argument it could have a problem :-) -- might it then fail to absorb any of the command-line parameters at all? I spotted this because my little script that looks for strange -np values (so I could abort 7-job tasks) couldn't parse the new command line - I modified it to look for ASCII double-quotes and it still couldn't parse it! Hope this helps. Cheers - Al |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
Thanks Al, I stepped away for a bit but that's almost definitely it. I copy-pasted the command from a text file I had saved to save time and I bet that that's exactly what happened. I'll try putting a new set of runs up to see if that was the issue |
Send message Joined: 10 Apr 19 Posts: 408 Credit: 120,203,200 RAC: 0 |
I've put up one run: de_modfit_70_bundle5_3s_south_pt2_2. Once we know for sure that it works I'll put the other stripes up too. |
©2024 Astroinformatics Group