Welcome to MilkyWay@home

New Separation Runs

Message boards : News : New Separation Runs
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 71697 - Posted: 9 Feb 2022, 22:10:18 UTC
Last modified: 10 Feb 2022, 4:28:35 UTC

Hey Everyone,

It's finally that time! The southern sky runs from before have converged, and it's time to start new ones while we dig through all that data.

I have taken down the following runs:

de_modfit_84_bundle4_4s_south4s_gapfix
de_modfit_84_bundle4_4s_south4s_gapfix_bgset2
de_modfit_84_bundle4_4s_south4s_gapfix_bgset3
de_modfit_85_bundle4_4s_south4s_gapfix
de_modfit_85_bundle4_4s_south4s_gapfix_bgset2
de_modfit_85_bundle4_4s_south4s_gapfix_bgset3


And I have put up the following runs:

de_modfit_70_bundle5_3s_south_pt2
de_modfit_71_bundle5_3s_south_pt2
de_modfit_72_bundle5_3s_south_pt2
de_modfit_73_bundle5_3s_south_pt2
de_modfit_74_bundle5_3s_south_pt2
de_modfit_75_bundle5_3s_south_pt2
de_modfit_76_bundle5_3s_south_pt2
de_modfit_77_bundle5_3s_south_pt2
de_modfit_78_bundle5_3s_south_pt2
de_modfit_79_bundle5_3s_south_pt2
de_modfit_80_bundle5_3s_south_pt2
de_modfit_81_bundle5_3s_south_pt2
de_modfit_82_bundle5_3s_south_pt2
de_modfit_83_bundle5_3s_south_pt2
de_modfit_84_bundle5_3s_south_pt2
de_modfit_85_bundle5_3s_south_pt2
de_modfit_86_bundle5_3s_south_pt2


The workunits from the old runs will continue to go out for a little while while results and validations trickle in to the server. You should see workunits from the old runs go away within a week or so. Please let me know if you have any issues with these runs. With any luck the 7-size workunit problem should disappear.

Thanks for the continued support, and happy crunching!

-Tom

EDIT:

The above runs were taken down and replaced with these:

de_modfit_70_bundle5_3s_south_pt2_2
de_modfit_71_bundle5_3s_south_pt2_2
de_modfit_72_bundle5_3s_south_pt2_2
de_modfit_73_bundle5_3s_south_pt2_2
de_modfit_74_bundle5_3s_south_pt2_2
de_modfit_75_bundle5_3s_south_pt2_2
de_modfit_76_bundle5_3s_south_pt2_2
de_modfit_77_bundle5_3s_south_pt2_2
de_modfit_78_bundle5_3s_south_pt2_2
de_modfit_79_bundle5_3s_south_pt2_2
de_modfit_80_bundle5_3s_south_pt2_2
de_modfit_81_bundle5_3s_south_pt2_2
de_modfit_82_bundle5_3s_south_pt2_2
de_modfit_83_bundle5_3s_south_pt2_2
de_modfit_84_bundle5_3s_south_pt2_2
de_modfit_85_bundle5_3s_south_pt2_2
de_modfit_86_bundle5_3s_south_pt2_2
ID: 71697 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 71698 - Posted: 9 Feb 2022, 22:11:27 UTC
Last modified: 9 Feb 2022, 22:11:42 UTC

The new part of the sky that we're fitting looks like this:



In this picture, the background stars are shown in blue/purple, and the black region is where we are fitting data. Bluer regions have more dust, and the purple regions have very little dust: dust is bad for fitting. We had to make some cuts in the stripes in order to avoid the dustiest regions, which is why there is some gaps in the data.

We are not fitting the bottom left part of the picture, because that's where we've already fit! (We are fitting a little bit of overlap as a sanity check, though).
ID: 71698 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 71699 - Posted: 9 Feb 2022, 22:16:01 UTC
Last modified: 9 Feb 2022, 22:19:29 UTC

I've also tightened the validator tolerance after the loosened restrictions that we had from the last runs. Let me know if we start getting a lot of invalid workunits in case I need to loosen the validator tolerance again.
ID: 71699 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kator

Send message
Joined: 12 Mar 21
Posts: 1
Credit: 128,187,907
RAC: 0
Message 71700 - Posted: 9 Feb 2022, 23:15:10 UTC - in response to Message 71699.  

Hi, glad to see some update on progress. If possible I would love to see them more often. Now i started getting errors, all from new bundle. Well I am running Radeon hd 7970. I will wait for a bit if restrictions will be loosened. Otherwise I will have to stop for a while. Not feeling like buying new graphics card with these prices.
ID: 71700 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 71701 - Posted: 9 Feb 2022, 23:56:37 UTC

Looks like there are some problems with the runs. I'm checking things out.
ID: 71701 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 71702 - Posted: 9 Feb 2022, 23:59:35 UTC - in response to Message 71700.  

Hi, glad to see some update on progress. If possible I would love to see them more often. Now i started getting errors, all from new bundle. Well I am running Radeon hd 7970. I will wait for a bit if restrictions will be loosened. Otherwise I will have to stop for a while. Not feeling like buying new graphics card with these prices.
I have the same card (4 of them), it's not you it's them.
ID: 71702 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 555,458,678
RAC: 38,779
Message 71703 - Posted: 10 Feb 2022, 0:15:23 UTC - in response to Message 71701.  

Yes, all the new stripe tasks are failing. Looks like invalid parameter sets cause the compiler to barf instantly.
All my hosts forced into 3 hour backoffs. Probably should abort all the current improperly formatted tasks.
ID: 71703 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 71704 - Posted: 10 Feb 2022, 0:21:46 UTC

For whatever reason, all of the initial workunits generated with a list of zeros as their initial parameters. This is not typical and I'm looking into what caused it, and how to fix it.
ID: 71704 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 71705 - Posted: 10 Feb 2022, 0:48:13 UTC - in response to Message 71703.  

Yes, all the new stripe tasks are failing. Looks like invalid parameter sets cause the compiler to barf instantly.
All my hosts forced into 3 hour backoffs. Probably should abort all the current improperly formatted tasks.
I keep nudging mine to get more. Should be cleared soon.
ID: 71705 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
MindCrime

Send message
Joined: 5 Mar 14
Posts: 24
Credit: 501,232,884
RAC: 0
Message 71706 - Posted: 10 Feb 2022, 0:51:45 UTC

WU are erroring out 2 seconds in

Driver version: 460.32.03
Version: OpenCL 1.2 CUDA
Compute capability: 6.0
Max compute units: 56
Clock frequency: 1328 Mhz
Global mem size: 17071734784
Local mem size: 49152
Max const buf size: 65536
Double extension: cl_khr_fp64
Build log:
--------------------------------------------------------------------------------
<kernel>:183:72: warning: unknown attribute 'max_constant_size' ignored
__constant real* _ap_consts __attribute__((max_constant_size(18 * sizeof(real)))),
^
<kernel>:185:62: warning: unknown attribute 'max_constant_size' ignored
__constant SC* sc __attribute__((max_constant_size(NSTREAM * sizeof(SC)))),
^
<kernel>:186:67: warning: unknown attribute 'max_constant_size' ignored
__constant real* sg_dx __attribute__((max_constant_size(256 * sizeof(real)))),
^
<kernel>:235:26: error: use of undeclared identifier 'inf'
tmp = mad((real) Q_INV_SQR, z * z, tmp); /* (q_invsqr * z^2) + (x^2 + y^2) */
^
<built-in>:35:19: note: expanded from here
#define Q_INV_SQR inf
^

--------------------------------------------------------------------------------
clBuildProgram: Build failure (-11): CL_BUILD_PROGRAM_FAILURE
Error building program from source (-11): CL_BUILD_PROGRAM_FAILURE
Error creating integral program from source
Failed to calculate likelihood
Background Epsilon (22.750900) must be >= 0, <= 1
00:24:19 (3571): called boinc_finish(1)

</stderr_txt>
]]>
ID: 71706 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 71707 - Posted: 10 Feb 2022, 0:59:06 UTC

Thanks for the error messages. I've been digging around in workunits and from what I can tell it looks like the errors are caused by a parameter list of all zeros. I have no idea why the parameter sets initialized that way. I'm going to look into things on our end.

In the meantime, I have taken down these runs. It might be a day or so before I can get functional runs up, in which case it will be CPU-only crunching for a little while.

Thanks for your patience! Things never go smoothly with BOINC it seems :)
ID: 71707 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 71708 - Posted: 10 Feb 2022, 1:05:21 UTC

Although based on the actual workunit xml_docs, they aren't actually filled with all zeros for the parameter sets... The plot thickens
ID: 71708 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 71709 - Posted: 10 Feb 2022, 1:05:58 UTC - in response to Message 71707.  

Thanks for the error messages. I've been digging around in workunits and from what I can tell it looks like the errors are caused by a parameter list of all zeros. I have no idea why the parameter sets initialized that way. I'm going to look into things on our end.

In the meantime, I have taken down these runs. It might be a day or so before I can get functional runs up, in which case it will be CPU-only crunching for a little while.

Thanks for your patience! Things never go smoothly with BOINC it seems :)
No problem Tom, we will process whatever you can manage to throw at us. We don't expect miracles or smooth sailing all the time, nothing is like that in life!
ID: 71709 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 71710 - Posted: 10 Feb 2022, 1:08:06 UTC - in response to Message 71707.  
Last modified: 10 Feb 2022, 1:08:26 UTC

In the meantime, I have taken down these runs. It might be a day or so before I can get functional runs up, in which case it will be CPU-only crunching for a little while.
The 84 and 85 ones are ok, are those from the last run?
ID: 71710 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 555,458,678
RAC: 38,779
Message 71711 - Posted: 10 Feb 2022, 1:55:08 UTC - in response to Message 71710.  

Yes, those are from the last runs that he said they have completed the data and it has converged. So no reason to repeat any of that work.
Plus those have the never fixed 7 bundle tasks that error out. I would just set NNT and wait for Tom to give us the go ahead with the new work when the bad data problem is figured out.
ID: 71711 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 0
Message 71712 - Posted: 10 Feb 2022, 2:02:58 UTC - in response to Message 71711.  

yep. a bunch errored out in 2 seconds. there seems to be some vinegar over the Background Epsilon parameter. Background Epsilon (32.469300) must be >= 0, <= 1


Name de_modfit_76_bundle5_3s_south_pt2_1643910122_7995307_0
Workunit 351119902
Created 10 Feb 2022, 0:45:45 UTC
Sent 10 Feb 2022, 0:55:58 UTC
Report deadline 22 Feb 2022, 0:55:58 UTC
Received 10 Feb 2022, 0:59:41 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 1 (0x00000001) Unknown error code
Computer ID 906873
Run time 2 sec
CPU time
Validate state Invalid
Credit 0.00
Device peak FLOPS 5.13 GFLOPS
Application version Milkyway@home Separation v1.46
windows_x86_64
Peak disk usage 0.01 MB

+++++++
Stderr output

<core_client_version>7.16.20</core_client_version>
<![CDATA[
<message>
Incorrect function.
(0x1) - exit code 1 (0x1)</message>
<stderr_txt>
<search_application> milkyway_separation 1.46 Windows x86_64 double </search_application>
Reading preferences ended prematurely
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 5 </number_WUs>
<number_params_per_WU> 20 </number_params_per_WU>
Using SSE4.1 path
q is 0.0
Integral 0 time = 0.000023 s
Failed to calculate integral 0
Failed to calculate likelihood
Background Epsilon (32.469300) must be >= 0, <= 1
18:58:10 (12228): called boinc_finish(1)

</stderr_txt>
]]>
ID: 71712 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 71713 - Posted: 10 Feb 2022, 2:09:22 UTC - in response to Message 71711.  

Yes, those are from the last runs that he said they have completed the data and it has converged. So no reason to repeat any of that work.
Plus those have the never fixed 7 bundle tasks that error out. I would just set NNT and wait for Tom to give us the go ahead with the new work when the bad data problem is figured out.
I was helping empty the buffer, which server status now shows is nbody only. So you can set your CPUs off on that.
ID: 71713 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alanb1951

Send message
Joined: 16 Mar 10
Posts: 213
Credit: 108,363,048
RAC: 4,419
Message 71714 - Posted: 10 Feb 2022, 3:22:01 UTC

Tom,

I don't know why, but the <command line> entry for these new tasks as it appears in client_state.xml has double-quotes around the -f parameter, and it appears that it is not the ASCII double quote but a matched pair of opening and closing double quotes (almost as if someone edited a script with a word processor instead of a "raw" text editor...)

Now, if the program tries to parse that as a command-line argument it could have a problem :-) -- might it then fail to absorb any of the command-line parameters at all?

I spotted this because my little script that looks for strange -np values (so I could abort 7-job tasks) couldn't parse the new command line - I modified it to look for ASCII double-quotes and it still couldn't parse it!

Hope this helps.

Cheers - Al
ID: 71714 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 71715 - Posted: 10 Feb 2022, 4:06:52 UTC

Thanks Al, I stepped away for a bit but that's almost definitely it. I copy-pasted the command from a text file I had saved to save time and I bet that that's exactly what happened.

I'll try putting a new set of runs up to see if that was the issue
ID: 71715 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 71716 - Posted: 10 Feb 2022, 4:10:25 UTC

I've put up one run: de_modfit_70_bundle5_3s_south_pt2_2. Once we know for sure that it works I'll put the other stripes up too.
ID: 71716 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : News : New Separation Runs

©2024 Astroinformatics Group