Welcome to MilkyWay@home

Posts by alanb1951

1) Message boards : Number crunching : Monitoring of Invalid results on separation run de_modfit_84_bundle4_4s_south4s_1 (Message 68855)
Posted 13 days ago by alanb1951
Post:
From my ever growing list of invalids, it seems that Linux based clients are failing including MAC clients. Only Windows based clients seem to be succeeding.

This matches what I saw with the south4s_0 tasks that were failing towards the end of that batch, though I also observed some NVIDIA GPU jobs flagging invalid on Windows (and as I don't run a huge number of tasks per day I had a fairly small sample to look at...)

(And on that note, I've not seen any invalid south4s_1 jobs yet - as I said, small sample...)

As I don't run any MilkyWay CPU jobs, I don't know whether it applies across both CPU and GPU tasks! It would be interesting to know if Linux/Mac CPU jobs get flagged invalid too - it would constitute another diagnostic point!

If Tom is going to suss this out, he will need to know things like "CPU versus GPU", Operating System, GPU type (and driver version) -- it could be a compilation issue (different compilers for different platforms, code generated with different rounding options, different execution sequences causing rounding differences, et cetera), and in the case of GPUs it could be a matter of whether the GPUs are capable of particular rounding options and, if so, whether all GPU kernels use the same control parameters... Also, of course, if there are any random numbers used in the processing it could simply be the butterfly effect!
2) Message boards : Number crunching : Large surge of Invalid results and Validate errors on ALL machines (Message 68800)
Posted 27 days ago by alanb1951
Post:
Al,

Thank you for your reply, it helps a lot. From what others had been saying it sounded like it was just one subtask (de_modfit_84_xxx) that was being consistently invalidated, so I thought maybe either there was a problem with that specific task. From your data it looks like there are mismatches between all workunits and machines.

I think you might be correct with your analysis, that we may just be reaching the optimization point. We know that the likelihood surface is rather volatile and sensitive to small changes when close to optimization. Perhaps this is a computer precision issue, which I thought had been resolved before I joined the project, but maybe not. I'll look into this some more.

- Tom


Tom,

Firstly, to eliminate any possible confusion, I've only seen errors for de_modfit_84 tasks, but the mismatches don't seem to have a pattern (e.g. same GPU type on same O/S might produce discrepancies, though usually pairs of "identical" Windows jobs seem to validate if anything does...

I think I can safely eliminate the command line length now as I've been watching all the tasks I've been processing over the last 24 hours and I've got validated de_modfit_81/2/3 tasks with more characters (over 890) in their parameter lists than some of the de_modfit_84 ones that have been failing out (as few as 880).

Also, I'd only been looking at the parts of the invalid tasks that showed big differences (always that third stream-only likelihood and the search likelihood) and hadn't been paying an awful lot of attention to the actual integrals. When I paid a bit more attention to the stream integral values I noticed that all the sub-tasks I had looked at that showed this problem had the third stream integral value zero (or very close to zero) - if that is a characteristic of de_modfit_84 reaching the optimization point, perhaps there may be precision error problems down in the small numbers.

None of the non-84 tasks seem to have such small integral values or such large stream-only likelihood values, and none of those seem to be failing to validate.

I seem to recall there were a lot of validation errors around the time of the server upgrade too, and I've just found a few results I scraped from that time. Where there was a discrepancy then, it was typically in the fourth integral and likelihood, not the third; I've just been looking at one where four out of five sub-tasks matched nicely but the last one had fourth stream integral zero and my result showed a -227.xxxx value whilst another reported Not-A-Number... Another work-unit from back then had all five sub-tasks showing that fourth integral as zero, and all the likelihoods were around -227.xxx (with two Win ATI tasks validating, and a Win NVIDIA task and my Linux NVIDIA task going invalid.)

So it has happened before. Unfortunately, I can't tell what the task names were, as it doesn't seem to get written to the log, though I do have the task numbers if they're of any use.

I'm not sure how you might be able to resolve this (and I have to confess I've only ever looked at the source code to try to work out why it was having parameter issues [that problem with old clients]). That said, if there's anything I can do to help (even if it's just looking at results like I have been doing!) let me know; what's more, I'm sure there are others here who will be equally willing to pitch in (and some of them process far more work units a day than I do!)

Good luck - Al.

[Edited to fix the typos I've spotted :-) and to include (limited) information about earlier failures]
3) Message boards : Number crunching : Large surge of Invalid results and Validate errors on ALL machines (Message 68795)
Posted 28 days ago by alanb1951
Post:
There's a possibility that the command line is being barely overflowed by the de_modfit_84_xxxx workunits. When we release runs we estimate the number of characters that the program will use in a typical command, then divide the total number of characters that can go in a command line by that estimate. This is why when we bundled 5 workunits it invalidated many workunits, but nobody had problems with 4 bundled workunits for these runs (until now). We might have reached some strange point in the optimization where the command line is being just barely overflowed for the 84th stripe (why results are off by only a couple decimal places).

I will be taking these runs down soon (they're fairly optimized by this point), which will solve any problems we are having at the moment. In the future I will bundle fewer workunits together (expect quicker runtimes and a corresponding drop in credits per bundle) and see if that resolves the issue.

My goal is to be as quick and transparent with these issues as possible. Thank you for your help debugging and your continued support.

- Tom


Tom,

I'm confused... I thought the command-line parameters were in sets (of 26 in this case) per task (first 26 for sub-task "0", next 26 for "1" and so on. If so, a command line issue doesn't seem to explain why it isn't always the last sub-task that has the result mismatches.

As an example, consider this work unit (1762004863) which I found in my "Validation Inconclusive" group. I noted the values that were subject to significant variation in the table below:

Task name de_modfit_84_bundle4_4s_south4s_0_1556550902_9326578
==============================================================

Results for almost every field were in agreement to almost every digit, except the third item in
<stream_only_likelihood> section and (as a result) the <search_likelihood> value.

Third items in stream_only_likelihood for workunit 1762004863

 Task #                   sub-task 0           sub-task 1           sub-task 2           sub-task 3
228606523 (linux_nv) -227.210679809332987 -226.058958276256107 -226.637965999408493 -225.448975231683107
228734898 (win_ati)  -227.031087670723000 -225.861406271732960 -226.378959825809260 -224.653666491325540
228769453 (win_ati)  -227.031087670723000 -225.861406271732960 -226.378959825809260 -225.219521513209800
228880978 (mac_cpu)  -227.366355524707785 -226.327009414887982 -226.773215523370538 -225.661657335596914
229879322 (win_nv)   -227.031087670723000 -225.861406271732960 -226.378959825809260 -224.906795326118360
230050421 (win cpu)  -227.210679809332990 -226.149044739582390 -226.378959825809260 -225.331140445405400

Search likelihoods
			  sub-task 0           sub-task 1           sub-task 2           sub-task 3
228606523 (linux_nv)   -2.701811909377543   -2.697248226571820   -2.698127277409402   -2.699543526455799
228734898 (win_ati)    -2.700467703517479   -2.696294728154231   -2.696355810233853   -2.693811168003535
228769453 (win_ati)    -2.700467703517479   -2.696294728154231   -2.696355810233853   -2.698124073672830
228880978 (mac_cpu)    -2.702729910785179   -2.699130448070281   -2.699309267449726   -2.700545829750069
229879322 (win_nv)     -2.700467703517479   -2.696294728154231   -2.696355810233853   -2.695800028583053
230050421 (win cpu)    -2.701811909377544   -2.697998681670587   -2.696355810233853   -2.698904228937406

All of the above were on client 7.14.2 except the Windows CPU one (7.12.1). Both Windows 7 and Windows 10 were in evidence.

Note how the Linux NVIDIA one (mine!) doesn't agree with ANY of the others except the Windows CPU one, where it agrees on sub-task 0 only.
The Mac CPU job doesn't agree with any of the others at all.
The three Windows GPU jobs agree on all but sub-task 3 (on which nobody agrees!)
The Windows CPU job agrees with mine on sub-task 0 and with the Windows GPU jobs on
sub-task 2. No agreement with anyone on sub-task 1 or 3.
There's another Windows CPU job out there but IT isn't likely to resolve anything unless
it agrees wholeheartedly with the other Windows CPU job...

As I said, I'd've thought that if this was a minor command-line issue the errors would only manifest on the last sub-task, but maybe it doesn't use the parameters the way I think it does, so I'm willing to be put right about that!

I have several tasks from the offending group on my machine at the moment, and their command line parameter lists have between 880 and 920 characters (so shouldn't cause any problems, I'd've thought) - I'll keep an eye on these when they run and see how they do...

If it isn't a command-line issue causing the problems, it would be a shame to shorten the parameter lists and hence increase the number of workunits - it rather defeats the original purpose of batching the work, after all :-). And I do wonder if these errors have only started to show up when the batch in question is getting close to a finish, in which case perhaps it's just a part of getting near the boundaries of what's computable (the butterfly effect?)

Hoping the above helps in some way, and thanking you for your efforts - Al.

[Edited for typos]
4) Message boards : Number crunching : "New" validate errors? (Message 68771)
Posted 22 May 2019 by alanb1951
Post:
Uli, this is also being discussed in the thread "Large surge of Invalid results and Validate errors on ALL machines" above, where there are one or two observations about the nature of the invalid results.

If your invalid results show a different pattern to the one I described there, post about it - it might help if/when someone at the other end looks into this!

Cheers - Al.
5) Message boards : Number crunching : Large surge of Invalid results and Validate errors on ALL machines (Message 68770)
Posted 22 May 2019 by alanb1951
Post:
=================IN CONCLUSION FOR WHAT IT IS WORTH===========
Results 3 & 4 above are identical exactly to all decimal digits but only the last one is valid
Results 1 & 2 differ at only the 12 or 13th decimal digit but only the first one is valid.

Since there seem to be 4 "work units" in each "work unit" maybe there is additional testing at the server end when the result arrives



Not quite, I'm afraid... If you look a little earlier in the two invalid results, you'll discover significant discrepancies in the third stream_only_likelihood values as indicated below (I've only cited one of the validated results...)

task 224802410, nvidia 1080TI VALID ===================

<stream_only_likelihood> -4.108396938608555 -3.238645743416023 -224.702400078412690 -59.273665947044705 </stream_only_likelihood>
<search_likelihood> -2.699214438485444 </search_likelihood>
...
<stream_only_likelihood1> -3.569118173756952 -3.226581525684845 -1.#IND00000000000 -88.910516593562207 </stream_only_likelihood1>
<search_likelihood1> -999.000000000000000 </search_likelihood1>

task 224932122 ATI RX560 INVALID ====================

<stream_only_likelihood1> -3.569118173756952 -3.226581525684845 -223.633156025982856 -88.910516593562207 </stream_only_likelihood1>
<search_likelihood1> -2.696904747021611 </search_likelihood1>

task 224990755 ATI S9000 INVALID======================

<stream_only_likelihood> -4.108396938608555 -3.238645743416023 -224.852854109725940 -59.273665947044705 </stream_only_likelihood>
<search_likelihood> -2.700466143836502 </search_likelihood>

That is consistent with what I've been finding in my recent cluster of invalid results -- it always seems to be a significant difference in that third stream_only_likelihood value; in some cases (as in one of the above), one result has been recognized as non-finite and another hasn't, whilst in others that value is typically up around the -227 to -230 level and differs significantly.

I suspect we have data and parameters that are producing results on the edge of what can be calculated; some of them diverge and not all GPUs diverge at quite the same rate, possibly because of different chunk sizes in use on cards with different amounts of available memory(?) If that is the case, I have no idea whether anything can be done about it :-(

Cheers - Al.
6) Message boards : Number crunching : NoContraintsWithDisk200 vs. south4s (or whatever) (Message 68545)
Posted 16 Apr 2019 by alanb1951
Post:
Just a thought here: I suspect most (if not all) of the people having the "instant fail" trouble with the new work units might be running older versions of BOINC which have a maximum command line capability of about 1024 characters. There have been other tasks in the past that had lots of parameters and overflowed that limit, though I'm not sure it produced this error.

In particular, it appears the OP is using 7.2.42 (which certainly truncated command lines for one of the earlier groups of tasks with extra parameters!)

So if nothing else seems to work it might be worth trying a newer BOINC client if one is available for your distribution. I think 7.8 didn't have this problem, and I know 7.14.2 doesn't because that's what I use and it runs these new tasks quite happily.

As I said, just a thought...

Good luck - Al.
7) Message boards : Number crunching : New Linux system trashes all tasks (Message 67964)
Posted 26 Dec 2018 by alanb1951
Post:
Keith,

The major difference between the two job types is that one does 6 tasks with 14 parameters and the other does 4 tasks with 26 parameters.

I'm guessing here, but I suspect the reason you're having problems with the older BOINC client might be because of the extra parameters! The command line it has to construct contains the path to the executable (a 96-character relative path on my system) and the parameters as given by the <command_line> element of the <workunit> section in client_state.xml (which is 861 characters for one of the 26-parameter tasks I've just looked at!) The total command line length for that job would be nearly 960 characters, and that may not be the longest it could be...

The parameters seem to be free-format (in that they don't have a fixed number of decimal places) but typically have 6 or 7 digits and a decimal point. Some also have a minus-sign. There are a few with less than 6 digits, but not many in the examples I looked at.

So I'm wondering if there's an issue with the maximum command line length that the older client can handle, and perhaps these jobs trigger that problem?

As I said, just guessing!

Cheers - Al.
8) Message boards : News : Database Maintenance 9-4-2014 (Message 67802)
Posted 8 Sep 2018 by alanb1951
Post:
Hey Everyone,

Just wanted to explain the validation inconclusive rates after this maintenance, and most maintenance. This is not an issue with how quickly our database can handle the workunits, but how quickly our users can cross validate workunits. As many of you know, we can require up to 4 other users to cross validate workunits before they are considered valid.

At any given time, we have 300,000+ workunits being calculated by volunteers. When we take the server down for maintenance, many of these are completed while we are down. They are then all sent back to us at once to be validated, so we end up with a queue of 300,000 workunits or more that have to be validated by other volunteers to catch up. This doesn't require much work for our server or database, but it does take a long time for users to work through them all.

Sorry that there is such a backlog for validation.

Jake


So you are saying you don't send out the same wu to two computers at the same time but instead wait for someone to return it first before sending it out for validation by a 2nd pc? That's contrary to what I was seeing prior to the maintenance as many days I never saw an unsent wu listed in my list of workunits waiting for validation.


I've been seeing this "We've sent one out, it's replied, now we'll send another one" behaviour for a long time. If you check the "Sent" time of the second task sent, you'll see that it will be anything from a few minutes to a lot longer from when your task was returned. It's been like that for quite a while now (or, at least, it has for my tasks...)

This is not just a recent occurrence. I think it dates back to a time when it was possible to look at a result and decide it was within a given range (possibly a near-exact match for a prior from the same start-point?) -- if it was, no wing-man would be called upon.

I'm unsure whether this stopped when we started getting batched work-units or earlier, and it could be that it's no longer possible. If it will always need at least one wing-man now, then (as you imply) the server should probably be re-configured to send out two at the beginning!...

Perhaps Jake can clarify this for us??? Inquiring minds want to know!

Cheers - Al.
9) Message boards : Number crunching : About 75% of my GPU finishes in "Validation inconclusive " !!! (Message 67705)
Posted 10 Aug 2018 by alanb1951
Post:
There's nothing wrong, or, at least, nothing wrong the way I see it :-) ...

Your tasks will sit at "Validation Inconclusive" until a wing-man's task completes and matches your result; as at 03:15 UTC on 10th August, you had 30 tasks in that category and they were all awaiting a wing-man!

It can be a bit frustrating if your wing-man is using a CPU rather than a GPU, especially if that machine has a large queue of work to do; however, as you also seem to have 177 recent Valid results and no Invalid or Error results, I don't think you really have anything to worry about!

(And bear in mind that some other users might be waiting for you to return a result before their work gets validated!)

Cheers - Al.
10) Message boards : Number crunching : Compiling MilkyWay@home (Message 67643)
Posted 2 Jul 2018 by alanb1951
Post:
As no-one else has offered anything, a thought or two. (If you've already been through all this, my apologies.)

Whilst I've never tried building a MilkyWay program, especially not a multi-threaded one(!), I can't help wondering whether you're having problems because you've built an MT version and it's only getting to use one core...

Now, I don't know whether that's a build issue, a problem with the way your version is being fired up by BOINC or something else, but if I'd been doing the builds that's what I'd be looking at.

By the way, there is also a non-MT version, and if the source of that is also available it might be worth fetching that, building it and seeing if that also gives Invalids -- that might help you discover whether it's a build issue or something else...

Unfortunately, I have no idea as to what you might do to sort out a build issue; as I said, I've never tried building the NBody application...

Good luck getting it sorted.

P.S. Nice system!
11) Message boards : Number crunching : New Linux system trashes all tasks (Message 67524)
Posted 21 May 2018 by alanb1951
Post:
Thanks for the reply. Good to know where the problem lies. Now to determine how to fix it. It would be best to know just where in the BOINC code the problem occurs.

Do you know the bug# in the BOINC codebase by chance? If I could determine which module contains the bug, I could have the 7.4.44 developer pull the updated module into the build list so he could build a newer version of 7.4.44 which does not have the error.

I do not have the skills to do it myself. My attempt to build 7.8.3 BOINC ended in failure after many weeks so I just resigned to use someone else's 7.4.44 build.

The snippet of code you provided I believe is from the MW application and not from BOINC so I can't use the code tracker function at BOINC github to determine the problem module.


Keith,

Yes, the code snippet was indeed from the MW code!

As for where the error was in 7.4.44, I've no idea - I will, however, observe that some of the people reporting that problem here were on 7.2.something rather than 7.4.something, so it wasn't just happening for one version. As I had never been bitten by this problem myself, I didn't research it any further than working out what the crash was, and as people seemed to find that moving up to a 7.6 BOINC seemed to fix it...

(It acted as if it was a timing or file pick-up issue, with the application trying to access a data file that hadn't actually been put in place yet [but which should have been put in place by BOINC!] - if I had been going code-diving that's where I would've started, anyway...)

I'm not sure what version of Linux you're using (though I note it's a fairly current [Ryzen-friendly?] kernel.) I presume it's not based on Debian, because if it were you'd be able to get a viable BOINC from the repositories to see if it fixed the issue (though you would probably have to put up with it installing where it wants to and getting used to using sudo if you wanted to tune the configuration for your SETI usage!)

I don't know what the situation is for RedHat-based systems and I guess it's "not good" for some of the others - one often sees "I have to run this version of the client because it's all that's available for my distro" messages :-(

Sorry I can't be of more help; the BOINC code is above my C++ competence level, I suspect! I just hope you can find a pre-built client that resolves this particular issue. (I noticed you've a post in another thread about [different?] issues with 7.83 on Windows - perhaps your machines are just too powerful :-) ...)

Good luck finding a fix - Al

P.S. FWIW, I'm happily running a GTI 1050Ti for MW, Einstein and SETI on XUbuntu 16.04 with client 7.6.31 on an I7-7700k; I get the occasional Invalid at Einstein but I don't think I've ever noticed an OpenCL build error that wasn't a product of a corrupt data file...
12) Questions and Answers : Unix/Linux : Yet another computation-error problem (Message 67512)
Posted 20 May 2018 by alanb1951
Post:
Keith,

I popped something in your "New Linux system trashes all tasks" thread in the Number Crunching forum which may or may not help...

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4288

Cheers - Al.
13) Questions and Answers : Unix/Linux : MilyWay OpenCL applications fails to build wisdom file and all tasks error in Ubuntu 18.04 LTS (Message 67511)
Posted 20 May 2018 by alanb1951
Post:
Keith (or anyone else who sees this thread...),

I popped something in your "New Linux system trashes all tasks" thread in the Number Crunching forum which may or may not help...

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4288

Cheers - Al.
14) Message boards : Number crunching : New Linux system trashes all tasks (Message 67510)
Posted 20 May 2018 by alanb1951
Post:
Found another errored task and this one has a lot more information in the stderr.txt output. It looks like the application had a problem compiling the OpenCL wisdom file. I have not had any issues with either Seti or Einstein compiling their OpenCL applications wisdom files.

Has anyone else had issues with Linux compiling the OpenCL wisdom files before?

Stderr output
<core_client_version>7.4.44</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
<search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application>
[...snip...]
Build log:
--------------------------------------------------------------------------------
<kernel>:183:72: warning: unknown attribute 'max_constant_size' ignored
__constant real* _ap_consts __attribute__((max_constant_size(18 * sizeof(real)))),
^
<kernel>:185:62: warning: unknown attribute 'max_constant_size' ignored
__constant SC* sc __attribute__((max_constant_size(NSTREAM * sizeof(SC)))),
^
<kernel>:186:67: warning: unknown attribute 'max_constant_size' ignored
__constant real* sg_dx __attribute__((max_constant_size(256 * sizeof(real)))),
^
<kernel>:235:26: error: use of undeclared identifier 'inf'
tmp = mad((real) Q_INV_SQR, z * z, tmp); /* (q_invsqr * z^2) + (x^2 + y^2) */
^
<built-in>:35:19: note: expanded from here
#define Q_INV_SQR inf
^

--------------------------------------------------------------------------------
clBuildProgram: Build failure (-11): CL_BUILD_PROGRAM_FAILURE
Error building program from source (-11): CL_BUILD_PROGRAM_FAILURE
Error creating integral program from source
Failed to calculate likelihood
Background Epsilon (61.817300) must be >= 0, <= 1
18:13:51 (10595): called boinc_finish(1)

</stderr_txt>
]]>


Keith,

I recognized that Q_INV_SQR error message! Rather than duplicating stuff posted back in early 2017, I'll refer you to a thread in the Linux forum, titled "Consistent "Validate error" status", in which I mentioned some research I'd done into why the client was apparently building bad GPU kernels:

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4091

Also, another thread (in the Science board) called "Fix it or I'm gone":

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4093

In precis, it looks as if the parameter reading can get out of sync in some versions of boinclib; folks seeing that error back then cleared it by moving to newer clients (usually 7.6 family...)

Now I know you're an enthusiastic Seti@Home person, so it might be you have a good reason for running 7.4.44 (which was reputed to have some singularities of its own!)- I'll just say that I've never had any problems with Seti, Einstein or MilkyWay using client 7.6.32 or .33 (using NVidia GPUs)

Don't know whether this will have helped any in your case, but at least it explains what causes the error message!

Cheers - Al.

P.S. I think it's actually trying to build a GPU kernel, not a wisdom file (but I'm not a MilkyWay developer s I could be wrong...)
15) Message boards : News : New Separation Runs 7 May (Message 67452)
Posted 12 May 2018 by alanb1951
Post:
Any particular reason these runs are bundled in 4s rather than 5s?


They appear to have 26 parameters instead of 20 in the previous tasks. And, judging by the elapsed time for these new tasks on my GPU it looks as if 4*26 parameters takes about the same time as 5*20 (!)

Obviously, this consistency of run time would be more significant for folks running CPU tasks - an increase of about 25% CPU time (likely with 5 per batch) might not go down quite so well there!...
16) Message boards : Number crunching : Account Problems (Message 67423)
Posted 5 May 2018 by alanb1951
Post:
I notice that if I examine your computers via your visible profile it says the last time your Threadripper accessed the Milkyway site was in mid-April!!!

Obviously, if you're fetching, doing and reporting MW work it has to be going somewhere, but where?

There's a bit of diagnostic work you can do that might help work this out. Look at client_state.xml in your BOINC data directory and find the project data for Milkyway, which should show the following at its head...

<project>
<master_url>http://milkyway.cs.rpi.edu/milkyway/</master_url>

Scan down to the <userid> and <hostid> numbers in that section -- if the userid doesn't match what shows up on the top of your user profile, it probably means you've managed to set up a second profile somehow at some time; the URL

http://milkyway.cs.rpi.edu/milkyway/show_user.php?userid=xxxxxx

(with the userid number instead of xxxxxx, of course!) should lead you to the profile for that alternate account.

Also, of course, check the listed hostid against the Computer ID on "Your computers" -- if you haven't got a second account, I'm unsure as to how that could be a mismatch, but at least it would help Jake sort it out...

Good luck getting it sorted! - Al.
17) Message boards : Number crunching : Restricting CPUs per Work Unit (Message 67234)
Posted 10 Mar 2018 by alanb1951
Post:
Have a look at https://boinc.berkeley.edu/wiki/Client_configuration

It doesn't provide lots of examples(!) but there's a lot of information there.

Cheers - Al.
18) Message boards : Application Code Discussion : name and location of data file win10? (Message 66994)
Posted 23 Jan 2018 by alanb1951
Post:
Regarding not getting CPU tasks... I only run MW GPU jobs(!) and that is done by finding out which "Location" a computer is associated with (e.g. Home or Default) then going to "Preferences for this project" on the "Your account" page at the Milkyway web site, finding that location and de-selecting "Use CPU"

Now, I don't know if the grcpool manager can interfere with that somehow, but it's worth a try...

Cheers - Al.
19) Message boards : Number crunching : Validate error with only ONE WU? (Message 66759)
Posted 3 Nov 2017 by alanb1951
Post:
If you look at the result report for the task you identified, you'll see that the fourth of the five sub-tasks failed to compute the initial integral; the result was, indeed, invalid! It's not that easy to spot in the middle of the other (probably valid) output...

It'll be interesting to see if the other attempt fares any better! (I suspect not...)

Cheers - Al.
20) Message boards : Number crunching : Thousands of validation errors, no good work? (Message 66242)
Posted 24 Mar 2017 by alanb1951
Post:
Donald,

I notice that all your Invalid tasks report the application as "MilkyWay@Home Anonymous platform (CPU)"

I would expect a Linux system running CPU tasks to report as "MilkyWay@Home 1.40" (GPU tasks would have the GPU plan class listed in brackets after that...)

So the obvious question is why is your application reporting as Anonymous? It might be worth resetting Milkyway on that machine to see if kick-starting it again fetches the current application from the server - you won't be any worse off than you are now!

Good luck sorting it out.

Cheers - Al.


Next 20

©2019 Astroinformatics Group