Welcome to MilkyWay@home

Posts by alanb1951

1) Message boards : Number crunching : NoContraintsWithDisk200 vs. south4s (or whatever) (Message 68545)
Posted 3 days ago by alanb1951
Post:
Just a thought here: I suspect most (if not all) of the people having the "instant fail" trouble with the new work units might be running older versions of BOINC which have a maximum command line capability of about 1024 characters. There have been other tasks in the past that had lots of parameters and overflowed that limit, though I'm not sure it produced this error.

In particular, it appears the OP is using 7.2.42 (which certainly truncated command lines for one of the earlier groups of tasks with extra parameters!)

So if nothing else seems to work it might be worth trying a newer BOINC client if one is available for your distribution. I think 7.8 didn't have this problem, and I know 7.14.2 doesn't because that's what I use and it runs these new tasks quite happily.

As I said, just a thought...

Good luck - Al.
2) Message boards : Number crunching : New Linux system trashes all tasks (Message 67964)
Posted 26 Dec 2018 by alanb1951
Post:
Keith,

The major difference between the two job types is that one does 6 tasks with 14 parameters and the other does 4 tasks with 26 parameters.

I'm guessing here, but I suspect the reason you're having problems with the older BOINC client might be because of the extra parameters! The command line it has to construct contains the path to the executable (a 96-character relative path on my system) and the parameters as given by the <command_line> element of the <workunit> section in client_state.xml (which is 861 characters for one of the 26-parameter tasks I've just looked at!) The total command line length for that job would be nearly 960 characters, and that may not be the longest it could be...

The parameters seem to be free-format (in that they don't have a fixed number of decimal places) but typically have 6 or 7 digits and a decimal point. Some also have a minus-sign. There are a few with less than 6 digits, but not many in the examples I looked at.

So I'm wondering if there's an issue with the maximum command line length that the older client can handle, and perhaps these jobs trigger that problem?

As I said, just guessing!

Cheers - Al.
3) Message boards : News : Database Maintenance 9-4-2014 (Message 67802)
Posted 8 Sep 2018 by alanb1951
Post:
Hey Everyone,

Just wanted to explain the validation inconclusive rates after this maintenance, and most maintenance. This is not an issue with how quickly our database can handle the workunits, but how quickly our users can cross validate workunits. As many of you know, we can require up to 4 other users to cross validate workunits before they are considered valid.

At any given time, we have 300,000+ workunits being calculated by volunteers. When we take the server down for maintenance, many of these are completed while we are down. They are then all sent back to us at once to be validated, so we end up with a queue of 300,000 workunits or more that have to be validated by other volunteers to catch up. This doesn't require much work for our server or database, but it does take a long time for users to work through them all.

Sorry that there is such a backlog for validation.

Jake


So you are saying you don't send out the same wu to two computers at the same time but instead wait for someone to return it first before sending it out for validation by a 2nd pc? That's contrary to what I was seeing prior to the maintenance as many days I never saw an unsent wu listed in my list of workunits waiting for validation.


I've been seeing this "We've sent one out, it's replied, now we'll send another one" behaviour for a long time. If you check the "Sent" time of the second task sent, you'll see that it will be anything from a few minutes to a lot longer from when your task was returned. It's been like that for quite a while now (or, at least, it has for my tasks...)

This is not just a recent occurrence. I think it dates back to a time when it was possible to look at a result and decide it was within a given range (possibly a near-exact match for a prior from the same start-point?) -- if it was, no wing-man would be called upon.

I'm unsure whether this stopped when we started getting batched work-units or earlier, and it could be that it's no longer possible. If it will always need at least one wing-man now, then (as you imply) the server should probably be re-configured to send out two at the beginning!...

Perhaps Jake can clarify this for us??? Inquiring minds want to know!

Cheers - Al.
4) Message boards : Number crunching : About 75% of my GPU finishes in "Validation inconclusive " !!! (Message 67705)
Posted 10 Aug 2018 by alanb1951
Post:
There's nothing wrong, or, at least, nothing wrong the way I see it :-) ...

Your tasks will sit at "Validation Inconclusive" until a wing-man's task completes and matches your result; as at 03:15 UTC on 10th August, you had 30 tasks in that category and they were all awaiting a wing-man!

It can be a bit frustrating if your wing-man is using a CPU rather than a GPU, especially if that machine has a large queue of work to do; however, as you also seem to have 177 recent Valid results and no Invalid or Error results, I don't think you really have anything to worry about!

(And bear in mind that some other users might be waiting for you to return a result before their work gets validated!)

Cheers - Al.
5) Message boards : Number crunching : Compiling MilkyWay@home (Message 67643)
Posted 2 Jul 2018 by alanb1951
Post:
As no-one else has offered anything, a thought or two. (If you've already been through all this, my apologies.)

Whilst I've never tried building a MilkyWay program, especially not a multi-threaded one(!), I can't help wondering whether you're having problems because you've built an MT version and it's only getting to use one core...

Now, I don't know whether that's a build issue, a problem with the way your version is being fired up by BOINC or something else, but if I'd been doing the builds that's what I'd be looking at.

By the way, there is also a non-MT version, and if the source of that is also available it might be worth fetching that, building it and seeing if that also gives Invalids -- that might help you discover whether it's a build issue or something else...

Unfortunately, I have no idea as to what you might do to sort out a build issue; as I said, I've never tried building the NBody application...

Good luck getting it sorted.

P.S. Nice system!
6) Message boards : Number crunching : New Linux system trashes all tasks (Message 67524)
Posted 21 May 2018 by alanb1951
Post:
Thanks for the reply. Good to know where the problem lies. Now to determine how to fix it. It would be best to know just where in the BOINC code the problem occurs.

Do you know the bug# in the BOINC codebase by chance? If I could determine which module contains the bug, I could have the 7.4.44 developer pull the updated module into the build list so he could build a newer version of 7.4.44 which does not have the error.

I do not have the skills to do it myself. My attempt to build 7.8.3 BOINC ended in failure after many weeks so I just resigned to use someone else's 7.4.44 build.

The snippet of code you provided I believe is from the MW application and not from BOINC so I can't use the code tracker function at BOINC github to determine the problem module.


Keith,

Yes, the code snippet was indeed from the MW code!

As for where the error was in 7.4.44, I've no idea - I will, however, observe that some of the people reporting that problem here were on 7.2.something rather than 7.4.something, so it wasn't just happening for one version. As I had never been bitten by this problem myself, I didn't research it any further than working out what the crash was, and as people seemed to find that moving up to a 7.6 BOINC seemed to fix it...

(It acted as if it was a timing or file pick-up issue, with the application trying to access a data file that hadn't actually been put in place yet [but which should have been put in place by BOINC!] - if I had been going code-diving that's where I would've started, anyway...)

I'm not sure what version of Linux you're using (though I note it's a fairly current [Ryzen-friendly?] kernel.) I presume it's not based on Debian, because if it were you'd be able to get a viable BOINC from the repositories to see if it fixed the issue (though you would probably have to put up with it installing where it wants to and getting used to using sudo if you wanted to tune the configuration for your SETI usage!)

I don't know what the situation is for RedHat-based systems and I guess it's "not good" for some of the others - one often sees "I have to run this version of the client because it's all that's available for my distro" messages :-(

Sorry I can't be of more help; the BOINC code is above my C++ competence level, I suspect! I just hope you can find a pre-built client that resolves this particular issue. (I noticed you've a post in another thread about [different?] issues with 7.83 on Windows - perhaps your machines are just too powerful :-) ...)

Good luck finding a fix - Al

P.S. FWIW, I'm happily running a GTI 1050Ti for MW, Einstein and SETI on XUbuntu 16.04 with client 7.6.31 on an I7-7700k; I get the occasional Invalid at Einstein but I don't think I've ever noticed an OpenCL build error that wasn't a product of a corrupt data file...
7) Questions and Answers : Unix/Linux : Yet another computation-error problem (Message 67512)
Posted 20 May 2018 by alanb1951
Post:
Keith,

I popped something in your "New Linux system trashes all tasks" thread in the Number Crunching forum which may or may not help...

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4288

Cheers - Al.
8) Questions and Answers : Unix/Linux : MilyWay OpenCL applications fails to build wisdom file and all tasks error in Ubuntu 18.04 LTS (Message 67511)
Posted 20 May 2018 by alanb1951
Post:
Keith (or anyone else who sees this thread...),

I popped something in your "New Linux system trashes all tasks" thread in the Number Crunching forum which may or may not help...

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4288

Cheers - Al.
9) Message boards : Number crunching : New Linux system trashes all tasks (Message 67510)
Posted 20 May 2018 by alanb1951
Post:
Found another errored task and this one has a lot more information in the stderr.txt output. It looks like the application had a problem compiling the OpenCL wisdom file. I have not had any issues with either Seti or Einstein compiling their OpenCL applications wisdom files.

Has anyone else had issues with Linux compiling the OpenCL wisdom files before?

Stderr output
<core_client_version>7.4.44</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
<search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application>
[...snip...]
Build log:
--------------------------------------------------------------------------------
<kernel>:183:72: warning: unknown attribute 'max_constant_size' ignored
__constant real* _ap_consts __attribute__((max_constant_size(18 * sizeof(real)))),
^
<kernel>:185:62: warning: unknown attribute 'max_constant_size' ignored
__constant SC* sc __attribute__((max_constant_size(NSTREAM * sizeof(SC)))),
^
<kernel>:186:67: warning: unknown attribute 'max_constant_size' ignored
__constant real* sg_dx __attribute__((max_constant_size(256 * sizeof(real)))),
^
<kernel>:235:26: error: use of undeclared identifier 'inf'
tmp = mad((real) Q_INV_SQR, z * z, tmp); /* (q_invsqr * z^2) + (x^2 + y^2) */
^
<built-in>:35:19: note: expanded from here
#define Q_INV_SQR inf
^

--------------------------------------------------------------------------------
clBuildProgram: Build failure (-11): CL_BUILD_PROGRAM_FAILURE
Error building program from source (-11): CL_BUILD_PROGRAM_FAILURE
Error creating integral program from source
Failed to calculate likelihood
Background Epsilon (61.817300) must be >= 0, <= 1
18:13:51 (10595): called boinc_finish(1)

</stderr_txt>
]]>


Keith,

I recognized that Q_INV_SQR error message! Rather than duplicating stuff posted back in early 2017, I'll refer you to a thread in the Linux forum, titled "Consistent "Validate error" status", in which I mentioned some research I'd done into why the client was apparently building bad GPU kernels:

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4091

Also, another thread (in the Science board) called "Fix it or I'm gone":

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4093

In precis, it looks as if the parameter reading can get out of sync in some versions of boinclib; folks seeing that error back then cleared it by moving to newer clients (usually 7.6 family...)

Now I know you're an enthusiastic Seti@Home person, so it might be you have a good reason for running 7.4.44 (which was reputed to have some singularities of its own!)- I'll just say that I've never had any problems with Seti, Einstein or MilkyWay using client 7.6.32 or .33 (using NVidia GPUs)

Don't know whether this will have helped any in your case, but at least it explains what causes the error message!

Cheers - Al.

P.S. I think it's actually trying to build a GPU kernel, not a wisdom file (but I'm not a MilkyWay developer s I could be wrong...)
10) Message boards : News : New Separation Runs 7 May (Message 67452)
Posted 12 May 2018 by alanb1951
Post:
Any particular reason these runs are bundled in 4s rather than 5s?


They appear to have 26 parameters instead of 20 in the previous tasks. And, judging by the elapsed time for these new tasks on my GPU it looks as if 4*26 parameters takes about the same time as 5*20 (!)

Obviously, this consistency of run time would be more significant for folks running CPU tasks - an increase of about 25% CPU time (likely with 5 per batch) might not go down quite so well there!...
11) Message boards : Number crunching : Account Problems (Message 67423)
Posted 5 May 2018 by alanb1951
Post:
I notice that if I examine your computers via your visible profile it says the last time your Threadripper accessed the Milkyway site was in mid-April!!!

Obviously, if you're fetching, doing and reporting MW work it has to be going somewhere, but where?

There's a bit of diagnostic work you can do that might help work this out. Look at client_state.xml in your BOINC data directory and find the project data for Milkyway, which should show the following at its head...

<project>
<master_url>http://milkyway.cs.rpi.edu/milkyway/</master_url>

Scan down to the <userid> and <hostid> numbers in that section -- if the userid doesn't match what shows up on the top of your user profile, it probably means you've managed to set up a second profile somehow at some time; the URL

http://milkyway.cs.rpi.edu/milkyway/show_user.php?userid=xxxxxx

(with the userid number instead of xxxxxx, of course!) should lead you to the profile for that alternate account.

Also, of course, check the listed hostid against the Computer ID on "Your computers" -- if you haven't got a second account, I'm unsure as to how that could be a mismatch, but at least it would help Jake sort it out...

Good luck getting it sorted! - Al.
12) Message boards : Number crunching : Restricting CPUs per Work Unit (Message 67234)
Posted 10 Mar 2018 by alanb1951
Post:
Have a look at https://boinc.berkeley.edu/wiki/Client_configuration

It doesn't provide lots of examples(!) but there's a lot of information there.

Cheers - Al.
13) Message boards : Application Code Discussion : name and location of data file win10? (Message 66994)
Posted 23 Jan 2018 by alanb1951
Post:
Regarding not getting CPU tasks... I only run MW GPU jobs(!) and that is done by finding out which "Location" a computer is associated with (e.g. Home or Default) then going to "Preferences for this project" on the "Your account" page at the Milkyway web site, finding that location and de-selecting "Use CPU"

Now, I don't know if the grcpool manager can interfere with that somehow, but it's worth a try...

Cheers - Al.
14) Message boards : Number crunching : Validate error with only ONE WU? (Message 66759)
Posted 3 Nov 2017 by alanb1951
Post:
If you look at the result report for the task you identified, you'll see that the fourth of the five sub-tasks failed to compute the initial integral; the result was, indeed, invalid! It's not that easy to spot in the middle of the other (probably valid) output...

It'll be interesting to see if the other attempt fares any better! (I suspect not...)

Cheers - Al.
15) Message boards : Number crunching : Thousands of validation errors, no good work? (Message 66242)
Posted 24 Mar 2017 by alanb1951
Post:
Donald,

I notice that all your Invalid tasks report the application as "MilkyWay@Home Anonymous platform (CPU)"

I would expect a Linux system running CPU tasks to report as "MilkyWay@Home 1.40" (GPU tasks would have the GPU plan class listed in brackets after that...)

So the obvious question is why is your application reporting as Anonymous? It might be worth resetting Milkyway on that machine to see if kick-starting it again fetches the current application from the server - you won't be any worse off than you are now!

Good luck sorting it out.

Cheers - Al.
16) Message boards : Number crunching : After switching to XFCE desktop, no Milkyway tasks showing (Message 66192)
Posted 14 Feb 2017 by alanb1951
Post:
I'm glad you had something more efficient than a rescue disk to help you get sorted out. At least you're "on the way", even if there's still a bump in the road...

There's a known issue with certain versions of BOINC client which can result in completed tasks not freeing up their resources completely. Each job runs in a slot directory, and if the slot directory isn't flushed out properly on completion it can't be re-used.

Looks like you've hit that!

I gather from some stuff in the BOINC developer's forum at Berkeley, and other places (Google is my friend here!) that if the higher-numbered directories happen to be empty and you delete them it'll re-create them and use them (until they get broken again...) However, if there are directories in use above some empty ones, those empty ones won't be re-used. And deleting a chunk of directories out of the middle isn't a good idea. Apparently, sometimes completely shutting down BOINC and restarting it gets things moving again, but it's a pain if this recurs, especially if it's aborting the tasks it can't start, as the only way to stop the cycle in that case is to stop all projects from getting new tasks until you've got an empty system to shut down!

Unfortunately, the issue that caused that to happen only affected users on UNIX-family machines (Mac, Linux, FreeBSD &c), and it didn't hit everyone, so it took a while to be recognized as such and solved. The fix didn't happen until version 7.6.11.

So it would definitely pay you to get that repository I mentioned and load a 7.6 client if there is one.

(Einstein, SETI and MilkyWay all work with 7.6.32, which is the version I've got on both XUbuntu 14.04 and XUbuntu 16.04 systems...)

Hope you finish getting it sorted soon.

Once again, good luck - Al.
17) Message boards : Number crunching : After switching to XFCE desktop, no Milkyway tasks showing (Message 66187)
Posted 13 Feb 2017 by alanb1951
Post:
The below is quite long(!) and there may be information towards the end that you need to consider before acting on information in the middle! Given that I'm not sat at your computer, able to get the fine details of what's where, I can't give you a single, simple(?) script to solve your problems. Sorry about that,,,

Here's the command output you requested:
 df -k
Filesystem     1K-blocks      Used Available Use% Mounted on
udev             4065428        12   4065416   1% /dev
tmpfs             817480      1592    815888   1% /run
/dev/sda5       20510716  19962556         0 100% /
...

That is one very full root file-store! That needs sorting out, seriously!

You may have ways of recovering some space. Check how much [very] old stuff you have in /var/log, for example.

If not, I'd be seriously tempted to use a recovery disk to move the partitions around, as although it's time-consuming it's probably less error-prone than juggling parts of your root file-store onto other devices (which is what a lot of the rest of this post is about.)

Otherwise, some content will have to move, and I'd repeat my suggestion that you move /var to another partition. You could do this on /dev/sda7 where you tried to do a symlink for /var/lib/boinc-client... (And as a side-effect you'll see how much space /var was really using!)

Note: if you mount /var on a new partition and haven't cleared out the original /var before you reboot you will not have recovered the space! There are lots of ways of sorting out new partitions, of which the easiest is probably a re-install with custom configuration to get it to build a new /var (which won't contain BOINC, of course) -- it effectively trashes your system partitions and starts from scratch -- however, if you've installed lots of other stuff that won't be there after the re-install, and you may lose /home if you aren't careful.

I would tend to use either a recovery disk (as mentioned in my other post) or, perhaps, a "Live CD/DVD" (and some cautious use of root) to do this. You will need to be happy mounting devices (modern recovery disks will provide a tool to do this for you, usually to /media.)

If you don't have anything you would like to keep in your BOINC directories at present, sorting this out is easy as all you need to do is remove the boinc-client stuff from sda7, copy everything under /var/lib to that drive as root (example command below...).

If you want to try to retain any data in your existing boinc-client directory, and you don't have another copy, either move it away (to /home, for instance) or make sure that if you're going to leave it on /dev/sda7 it's called something like boinc-client-saved to make sure it doesn't get displaced by the copy operation... Again, you need to be root (and cp -a is your friend)

If you are doing the copying using your normal booted system, that might look like

sudo cp -a /var/lib/* /mnt/sd7/

and if you're using a recovery disk that might look like

cp -a /media/sda5/var/lib/* /media/sda7/

If, after doing that, you find you've got a /var directory in your mount-point you've copied the directory and its contents, not just the contents, and you'll need to do

mv /media/sda7/var/* /media/sda7/

(or equivalent) to get rid of the extra level of directory structure! That's why my cp commands put /* on the end of the source path.

You might now want to remove the symlink /media/sda7/var/lib/boinc-client (if that's the level you linked at) - it'll be meaningless once you reboot - and if you still had your boinc-client stuff on the partition (see above) you could now shift it using mv.

Once you've done that you can look at /etc/fstab to see what syntax has been used to mount /home and make something equivalent for /var. Note that if it uses UUID syntax you can find out the ID by doing

blkid /dev/sda7

Once you've done the above, you will need to use a recovery/Live-DVD method to remove the contents of the existing /dev/sda5/var. Things won't go well if you try to empty /var whilst you're running your live system - there are techniques for working round this, but the KISS principle should probably be applied here!

After that, if you exit the recovery/Live-DVD (having edited /etc/fstab to mount the new /var) a re-boot should bring up your system using the moved /var content and having space back on your root partition.

----

Woops, one error above: After moving/linking the /var/lib/boinc-client folder to the empty partition, BOINC Manager is now reporting zero space used, zero space free. It's not seeing those entries at all -- even after doing a package manager reinstall of the BOINC packages. It also still won't let me connect to a project, though I no longer have a notification about "no internet connection".

If you are symlinking to a mount-point (/var/lib/boinc-client -> /mnt/sda7) that might not work anyway(!)

Otherwise, did you check to see if the symlink you made survived the package re-install? Just a thought...

If you can get the symlink method to work, fine; if not, the comments above about moving all of /var might still apply, or you could try making a new version of /etc/init.d/boinc-client, changing the BOINC_DIR option to point at your alternative directory. The reason I don't like editing startup files is that one has to keep track of that if/when you update BOINC or your O/S...

(Of course, if you edit a file in /etc/init.d, you should probably make a copy of the old version first; you'll need to be root to edit the file - what I do when changing files in /etc is typically:

sudo cp -p /etc/init.d/file_I_want_to_edit ~/my_etc/file_I_want_to_edit.orig
sudo vi /etc/file_I_want_to_edit

Obviously, that's a "template"...

----

Regarding your earlier question about package versions, there's a developer's PPA for BOINC 7.6.32 which works fine on my XUbuntu 14.04 laptop and worked fine on my workstation (which has a GPU as well...) on 14.04 and still works on 16.04.

To collect that:

sudo add-apt-repository ppa:costamagnagianfranco/boinc
sudo apt-get update

after which a newer version of BOINC should show up in Synaptic (or whatever else is the package manager for KUbuntu)

----

Whatever happens next, good luck; I tackle tasks like that with some trepidation (in case I get fat-fingered...) and expect to spend lots of time on them, and (in theory, at least) I'm supposed to know what I'm doing in enough detail to spot problems before they become crises. I repeat - good luck!

Cheers - Al.
18) Message boards : Number crunching : After switching to XFCE desktop, no Milkyway tasks showing (Message 66181)
Posted 12 Feb 2017 by alanb1951
Post:
As you still seem to be using BOINC 7.2.42 I'm not sure whether that's a repository-based BOINC install that you've never updated or not.

If you have installed BOINC from the Ubuntu repositories, it should have put the vast bulk of the BOINC-related stuff in /var/lib/boinc-client. /var/log can eat a lot of space too, so it might be worth checking that out.

If you definitely need more space in /var, your easiest solutions in that case are to either use a recovery disk of some sort to move the contents of /var to a free, formatted partition or (as you suggested) modify your partitions to make more room for whichever one contains your BOINC data.

If you installed manually, you could possibly try moving the contents of the BOINC directory to a free, formatted partition and mounting the partition to the BOINC directory. This solution is not appropriate if the data lives in /var as if you do a system upgrade you may well lose the lot (depending on how the upgrade is done - I've just had what purported to be an upgrade do the equivalent of a "nuke and pave" on one of my machines; fortunately I'd made allowances for the possibility...)

Any safe method you use to re-organize is probably going to require doing it using a recovery disk of some form (I use PartedMagic - it's usually got a fairly up-to-date kernel and hardware handling, so I reckon it's worth the small donation required); and if you are careful about moving stuff to a new partition it is going to take a while (voice of experience here!) so if you are happy with partition-juggling it might be nearly as quick!

The usual caveats apply - make sure anything you're worried about is backed up before messing with your file store (and yes, I know you know that, but...), and if you aren't sure how do do something, ask (and my apologies if you're a Linux Guru already!...)

I hope the above makes some sort of sense, and that you can move forwards now. If you do need to ask for more help here, please give a bit more detail about where things live, how big your partition(s) are, and perhaps the output of df -k too so actual usage can be assessed.

Good luck - Al.

P.S. If you did a repository install of BOINC in the first place, you should be able to get a newer version, which is probably a good idea even though it won't solve space problems!
19) Message boards : Application Code Discussion : Tons of failed jobs (Message 66161)
Posted 5 Feb 2017 by alanb1951
Post:
The most reasonable explanation is a TYPO in the project files. Someone fatfingered a "t" into a "f" - which is not too hard to do since they are next to each other on the keyboard.


How careless they are! A number of people are contributing their computational power to MilkyWay@Home without and charge and MilkyWay@Home just work so casually. They clearly understand that if they loose you, they will have other people. So they don't even care about your feeling. I have already quit, and before they solve the problem, I will not be back.


Actually, it's not fat-fingering at all, as I discovered by getting the application source from github and trying to remember my C/C++ programming from 20+ years ago!...

For more information, I refer you to my response to Chris's post entitled "Fix it or I'm gone" in the MilkyWay@home Science board. Whilst it doesn't make the issue go away, it does try to explain it and point out a possible solution...

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4093

Cheers - Al.

P.S. I'm just a user, like you folks...

P.P.S. How many (full time or otherwise) programmers and technical staff do you think this project has?
20) Message boards : MilkyWay@home Science : Fix it or I'm gone (Message 66160)
Posted 5 Feb 2017 by alanb1951
Post:
It's been months that one particular Milky Way program started failing. I have lost millions of points! If Milkyway@home is interested in keeping my considerable computing resources working for them - you have until the end of February to fix YOUR problem (hint, there is no "inf" variable type), or I'm going somewhere else.

https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4087


You might be interested in another thread in the Linux forum, titled "Consistent "Validate error" status", in which I mentioned some research I'd done into why the client was apparently building bad GPU kernels:

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4091

To summarize for your information, inf is actually what some C/C++ run-time print statements produce for infinity, and the user in this other thread was doing CPU tasks that reported "q is 0" for the first task of a batch of 5. It appears that the kernel constructor computes 1/q and its square root, and passes the latter value into the kernel compiler with the label Q_INV_SQR. If, for some reason, q is zero (which it should not be, of course!) it will pass in Q_INV_SQR=inf. Problem explained, but not solved.

Now, I found it interesting that folks who were seeing "q is 0" or this Q_INV_SQR problem seemed to be using old versions of the BOINC client, so I wondered if perhaps there's something in the old BOINC client libraries that messes up reading the parameter files on occasion - certainly, it would seem that a change to a much more recent client resolved the issue for other users.

And if that is the cause, I don't think there's anything much that the project programmer can realistically do about it. After all, if you know the data files are valid and you are reading them properly, putting in lots of defensive code to allow for system errors is not really productive!

I believe you're using a non-Debian based Linux, so I'm afraid I don't know how you can resolve the client issue (unless you're willing to build your own and, perhaps, feed it back to your community?) I do realize this doesn't help you much, but I hope you are now at least informed as to the likely cause.

By the way, I have no association with this project other than that of user - I just got so irritated at seeing "it doesn't work" posts that I decided to go to github and grab the source to find out what was going on!

I hope you do eventually manage to find a newer client anyway - amongst other things, the later clients are better at detecting GPUs and offer more logging options.

Cheers - Al.


Next 20

©2019 Astroinformatics Group