Welcome to MilkyWay@home

Posts by Richard Haselgrove

81) Message boards : Number crunching : Computation errors. (Message 60301)
Posted 3 Nov 2013 by Richard Haselgrove
Post:
Please don't download masses of tasks, only to abort them later - that really is unkind to the project and other participants.

Use the tools provided on MilkyWay@Home preferences so you only download tasks for the application(s) you're interested in.
82) Message boards : Number crunching : Computation errors. (Message 60281)
Posted 2 Nov 2013 by Richard Haselgrove
Post:
OK. The best I can do.

I've never run BOINC on an AMD/ATI GPU, but I read a lot - and from what I read, AMD/ATI driver support is all over the place. Components come and go, sometimes with an announcement and sometimes silently. Some projects can write applications that run under practically any driver, some projects write applications which are fussy and only run under one driver (or group of drivers), other projects write applications which run under other drivers. And sometimes - for some reason the numbers 13.4 stick in my mind - there's a driver which is on the 'must have' list for one project, and on the 'can't get it to run at all' list for another project.

And that's before we even get onto the subject of operating systems...

One thing I can state for certain: for Windows XP, OpenCL was removed from driver 12.2 (February 2012 - that's what the numbers mean) and later, although the download page at AMD claimed it was still included. I did the original research for that one, and the data is still on this hard disk:

 Directory of C:\AMD\Support\12-1_xp32_dd_ccc\Packages\Apps

17/05/2012  16:51    <DIR>          .
17/05/2012  16:51    <DIR>          ..
17/05/2012  16:50    <DIR>          ATIPCE
17/05/2012  16:50    <DIR>          CCC
17/05/2012  16:51    <DIR>          CIM
17/05/2012  16:51    <DIR>          dotnetfx
17/05/2012  16:51    <DIR>          OpenCL
               0 File(s)              0 bytes
               7 Dir(s)  43,120,893,952 bytes free

 Directory of C:\AMD\Support\12-2_xp32_dd_ccc\Packages\Apps

29/06/2012  09:36    <DIR>          .
29/06/2012  09:36    <DIR>          ..
29/06/2012  09:36    <DIR>          ATIPCE
29/06/2012  09:36    <DIR>          CCC
29/06/2012  09:36    <DIR>          CIM
29/06/2012  09:36    <DIR>          dotnetfx
               0 File(s)              0 bytes
               6 Dir(s)  43,120,893,952 bytes free

Spot the difference ;)

Any machine that you've ever run an AMD driver installation on will have a similar "C:\AMD\Support\..." folder tree (IIRC, they don't let you unpack the download to any disk other than C:), and you can search it for 'OpenCL.msi'

Many people in BOINC-land advise that you remove all traces of all AMD drivers before you change to a different one. AMD seem to have heard them, and produced a driver removal tool. The only official-looking link Google could find for me is http://sites.amd.com/us/game/downloads/Pages/catalyst-uninstall-utility.aspx, but that's asking me for a user/password login, which I don't have. Maybe somebody else here does?

Another useful resource, especially for matching up the various version numbers of the various internal components and tying them back to the simple year.month Catalyst identifier is HAL 9000's ATI Driver Version Cheat Sheet.

One extremely common and simple question which you posted over at Anandtech is 'the card is detecting but is listed as "not used"'. With two dis-similar GPUs in the same computer (from the same 'family'), BOINC will by default only use the 'better' card, and will 'not use' the lesser card - that's been policy for a long, long time. You can over-ride it by setting the <use_all_gpus> option described in client configuration.
83) Message boards : Number crunching : Computation errors. (Message 60276)
Posted 2 Nov 2013 by Richard Haselgrove
Post:
ill try this if a fix is found for the errors. right now updating the client to that version is pointless anyways because doing so made my 6970 instantly fail all GPU work units anyways making updating a moot point. At least with the old version i can run both congruently and run some of the GPU work units. And the card isn't technically disabled, its showing up as enabled in device manager and in drivers the same with both versions. The version switch was also done without a reboot (i rebooted to see if the card would start getting them, but when i switched back to the old version it instantly started working so i didn't bother rebooting again, making the first reboot redundant), telling me the card was detecting fine to begin with, BOINC was just disabling it as a compute card

Of course they failed, they were downloaded under an older version of Boinc and since you are now using a newer version any existing units WILL ALWAYS fail, this is just one of the many security checks Boinc has. Most projects just resend you the same units back again though under the new version. That's one reason you either just accept that it's going to happen, or set the pc to no new tasks prior to updating.

Sorry, that's one thing you need to UN-learn today.

It shouldn't matter one jot which version of BOINC you use, either to download or to run tasks. And for the vast majority of projects, it doesn't. I test pretty much every version of BOINC that's compiled, and change them at will - I cycled through v7.2.23/v7.2.25/v7.2.26 several times in one day, earlier this week, to diagnose a BOINC installer problem. No tasks were hurt in the course of that test, and no security violations were triggered.

Now, that's a generalisation, and a statement of policy. There are certain boundaries that shouldn't be crossed, because there were major design changes which introduced incompatibilities - downgrading from BOINC v7 to BOINC v6 (any versions) is a case in point. But that would show itself by abandoning tasks on restart, not erroring them when they attempt to run.

Now, it's perfectly possible that some applications or tasks may have difficulties with an upgrade or downgrade - though I don't, in all honesty, know how that could come about. If so, the project concerned has problems, either with its programming, or with its interfacing with the BOINC infrastructure. That may be the case here - it certainly happened in the case of the early testing of N-Body, which is what drew me here - but I'm not in a position to test on ATI hardware.
84) Message boards : Number crunching : Computation errors. (Message 60270)
Posted 1 Nov 2013 by Richard Haselgrove
Post:
no i am using a 6970 and a 5870 in the other rig. i just updated to the latest version of BOINC and am still getting errors, and additionally the 5870 is no longer being issued work units. I ran driversweeper and fresh installed a few different drivers as well and no dice. the rig will soon have a second 5870 added to it as well once i retrieve it from a friends.

edit: upon a quick check, it seems that updating my client actually has disallowed me from running my 6970 and 5870 congruently. the updated client seems to have disabled the card in a computing capacity arbitrarily without any discourse for myself to go about re-enabling my 5870. reverting back to the old version from the installer i still had enabled me to start using the cards simultaneously again.


Next time try rebooting the pc but with a 2nd monitor plugged into the 2nd card, Windows has a bad habit of 'saving resources' by turning off things it thinks you don't need, such as a gpu with no monitor attached. You can also use a 'dummy plug' you can easily make yourself with directions from here:
http://www.overclock.net/t/384733/the-30-second-dummy-plug

The 2nd monitor only has to be connected during the bootup process, not full time, so an extension cable from a 2nd monitor will work just fine.

Even a second cable from the same monitor - most good LCDs these days will have at least two of VGA, DVI, HDMI - and quite possibly a spare VGA (blue plugs) and DVI (while plugs) cable as well. Video cards often come with DVI-to-VGA adapters - with a pocketful of spares like that, you can fake up any number of connections.
85) Message boards : Number crunching : Computation errors. (Message 60252)
Posted 30 Oct 2013 by Richard Haselgrove
Post:
Thank you for the thread redirect! I'll make a new thread there if I have further issues.

I'm getting glibc 2.14 from here, all versions to 2.18 are currently available. Hopefully this gets the modfit tasks completing successfully!


Modified Fit work units still require the use of a Double Precision capable card. Neither of your cards meet that requirement. The separation and n-body work units compute on your CPU.

At least some of the Modified Fit units run on cpu's, I have my gpu's elsewhere right now and am still finishing Modified Fit units just fine using just my cpu's.

And the OP's log shows:

Sat 26 Oct 2013 16:13:41 WST | | No usable GPUs found

He's a CPU-only cruncher.
86) Message boards : Number crunching : Computation errors. (Message 60234)
Posted 28 Oct 2013 by Richard Haselgrove
Post:
The erroring tasks are all of type "Milkyway@Home Separation (Modified Fit) v1.28", and they have the error message:

../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_separation__modified_fit_1.28_x86_64-pc-linux-gnu: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.14' not found (required by ../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_separation__modified_fit_1.28_x86_64-pc-linux-gnu)

As a Windows user, I'm not familiar with the mantra for obtaining and installing a missing Linux library, but I'm sure somebody here is.
87) Message boards : Number crunching : AMD Radeon R9 290X (Message 60220)
Posted 26 Oct 2013 by Richard Haselgrove
Post:
2013-10-25 22:50:16 | | Starting BOINC client version 7.2.23 for windows_x86_64

You obviously know how to keep up with test versions of BOINC. There's a v7.2.26 available now, with additional AMD detection. You should test that quickly, and make sure the detection is complete - the developers want to finalise this version soon.
88) Message boards : Number crunching : Extremely long units (Message 60193)
Posted 21 Oct 2013 by Richard Haselgrove
Post:
Far from a record. During testing, I got (message 58603) one:

100k_chisq_alt which is estimated at 35062:56:00 - over FOUR years this time.

What's more, it is set to run MT, so it actually wants 16 CPU-years. Within a 12-day deadline, that's a tough ask.
89) Message boards : Number crunching : Computation Errors in home 1.20 (opencl_amd_ati) (Message 60184)
Posted 20 Oct 2013 by Richard Haselgrove
Post:
There were similar problems reported over the summer for the Einstein OpenCL apps.

They were resolved by building against a newer set of BOINC API libraries.

See the Einstein discussion (and testing process) at BRP5 Version 1.36 not running with Boinc 7.0.64
90) Message boards : Number crunching : stuck job (Message 60135)
Posted 9 Oct 2013 by Richard Haselgrove
Post:
Milkyway nbody jobs (both ps_nbody and de_nbody) are usually issued as MT - multi-threaded - jobs. They are set - at download time - to use every CPU core available to BOINC *at that time*.

If you subsequently change your preferences and reduce the number of CPU cores available to BOINC, the local client won't be able to schedule the task - it'll never find enough spare resources. If you've done that, you might be able to shift the task by temporarily going back to allowing BOINC to use 100% of everything - but then you'd risk the same thing happening with the next task you are allocated.

Unless you really enjoy the thrill of the chase, I think it would be best to abort the task. Subsequent tasks issued with your current preferences in place should run with those preferences.
91) Message boards : News : Users Auto-Aborting Work Units (Message 60120)
Posted 6 Oct 2013 by Richard Haselgrove
Post:
I've followed these instructions and am still auto aborting work units. Have a Radeon HD 4850. haven't been active in a year, but, have used this GPU to accumulate over 5,000,000 points.

should I do something like reinstall the BOINC or detach from the program?

Your tasks are actually quitting with the message:

Exit status	201 (0xc9) EXIT_MISSING_COPROC

Whatever caused that, it shouldn't be classed as a user abort. I'll leave it to the ATI GPU specialists, but it sounds more likely to be a driver problem. And why is the server issuing tasks to a host with a missing co-processor?

92) Message boards : News : N-Body 1.38 (Message 60106)
Posted 4 Oct 2013 by Richard Haselgrove
Post:
and all I get is 1.28 which I have to abort because all they do is crash.

The ones you have been aborting are "Milkyway@Home Separation (Modified Fit) v1.28"

Not N-Body. Not this thread.
93) Message boards : Number crunching : Computation errors. (Message 60100)
Posted 4 Oct 2013 by Richard Haselgrove
Post:
I'm running Boinc version 7.0.64 (x64). Is this my problem. I can't find an upgrade on the home page. Is this my problem?

Unlikely. v7.0.64 may not be perfect for running MT applications, but it's OK.

Your tasks (which hadn't been reported when I last posted) are now all showing exit code -1073741515 (0xc0000135): "The application failed to initialise properly." This usually indicates missing or damaged DLLs - sometimes anti-virus programs block the download of new executable files, or simple network congestion when everybody is trying to download the same file at the same time can have the same effect.

You could try downloading the two N-Body DLLs manually, and replacing the possibly damaged copies in your Milkyway project directory:

http://milkyway.cs.rpi.edu/milkyway/download/libgomp_64-1_nbody_1.38.dll (49,152 bytes)
http://milkyway.cs.rpi.edu/milkyway/download/pthreadGC2_64_nbody_1.38.dll (49,152 bytes)
94) Message boards : Number crunching : Computation errors. (Message 60098)
Posted 3 Oct 2013 by Richard Haselgrove
Post:
I'm getting computation errors on the n-body tasks again.

Mine are running fine. Windows 7/64, BOINC v7.2.16

Looks like I got the new app, DLLs, isotropic LUA, and isotropic histogram all in one go this morning.
95) Message boards : News : Users Auto-Aborting Work Units (Message 60087)
Posted 2 Oct 2013 by Richard Haselgrove
Post:
I think the root problem lies in the points awarded for data crunching. If the same credit is given for a chunk of data that takes 50 minutes to crunch as a chunk of data that takes 24 hours to crunch, people looking for max points will do whatever they need to do in order to maximize their point production.

Sure, they may employ inelegant solutions to stop certain types of data, but not everyone knows the ins and outs of configuration. I'm also willing to bet a lot of people don't even know how to access the message boards to find out that they're doing something wrong.

They have done studies that show that less then 10% of users ever even visit the forums! And of the ones that do visit, NOT all post!

But I would have expected - OK, maybe just hoped - that people who are so involved and interested that they "will do whatever they need to do in order to maximize their point production" would, in the process, learn a little bit about this mysterious game they're playing, and how best to micro-manage it.
96) Message boards : News : Users Auto-Aborting Work Units (Message 60051)
Posted 29 Sep 2013 by Richard Haselgrove
Post:
Hey Jake,

I'm a BOINC beta-tester, and was curious -- how is it even possible to "auto-abort" a work unit? Can you explain the procedure/settings necessary to pull that off? Does it require using a non-standard BOINC Manager/program?

Once I know more details, if it's something that can be controlled with the standard BOINC Manager/client, I might get in contact with the BOINC developers to try to prevent it.

Let me know,
Thanks,
Jacob

I believe it is thru an app_info.xml file, but it has been a long time since I messed with that part.

There is certainly no documented 'setting' for a BOINC client which would have this effect.

The app_info theory is plausible - most of us who design or test app_info files have at one time or another dumped a bunch of tasks through a typo or other silly mistake. The difference is that we go back, correct our own errors, and get it right next time. (or maybe the time after next...)

Jacob, please don't ask the developers to remove a feature which when used responsibly can be very useful. Wait until we have a better idea of what exactly is going on.

Jake seemed to give the impression that he thought this was a deliberate action by a significant number of people. It seems to be an absurd amount of trouble to go to, if so, when the controls to opt out of particular application types via the website are so easy to use. But if it is widespread and deliberate, two ideas come to mind:

1) People who deliberately break the rules of courteous co-operation tend to do so anonymously. If Jake could link us to an anonymous host exhibiting this symptom (which he could do without breaking any privacy rules), we might be able to help him diagnose what is being done.

2) Suggesting that it is widespread implies there is a network of communication between users - presumably not via this message board. If anybody who has in the past received this sort of information now wishes to pass it on to help the project root out this problem, they can do so privately via a PM, or even in public without necessarily revealing their source.

But there are innocent explanations as well. On message boards like this, people often post examples of app_info files - either asking for help with one which isn't working, or to be helpful to other people. These example files can hang around for ages after the need in question has passed: users can install them, and forget to update them: some people might even install an app_info without realising that they have to download application and support files manually too. Most responsible users will notice if their attempts to manage BOINC result in failure, and ask for help - but some don't: maybe embarrassment, lack of language skills, or (like my own initial attempt to post information here) the fact that you can't post on the boards until you've already solved the problem yourself and earned some RAC.
97) Message boards : News : Separation Modified Fit v1.28 Release (Message 59956)
Posted 22 Sep 2013 by Richard Haselgrove
Post:
Stopped running 1.28, not worth it, switched to MilkyWay# home 1.01. Looks like no one else seems to care about the crazy long run times. No comparable projects out there that offer reasonable work value,, the result is I might just drop out of distributed computing all together.

What is this "value" of which you speak?

Knowledge, discovery, improvement in the human condition? Or just gollum-points you can't even exchange for a toaster?

Make your own decision, my friend - whether you contribute to distributed computing or not is entirely a personal choice. But there are other things to weigh in the balance, besides the credits.
98) Message boards : News : Separation Modified Fit v1.28 Release (Message 59917)
Posted 17 Sep 2013 by Richard Haselgrove
Post:
I don't run MT tasks because they don't play well with multi-gpu hosts. Best, KB

Separation Modified Fit isn't an MT task - see Applications.

Wrong thread.
99) Message boards : Number crunching : "ATI-GPU not found" problem solved! (Message 59865)
Posted 10 Sep 2013 by Richard Haselgrove
Post:
The cc_config file is still MIA...

cc_config.xml is an optional file which you can create for yourself, or not, entirely according to personal preference.

Details are in Client configuration.
100) Message boards : News : N-Body 1.36 (Message 59681)
Posted 26 Aug 2013 by Richard Haselgrove
Post:
First, the other milkyway projects have a pretty long list of dlls they use - at least as reported by Process Explorer. Such as advapi32, KernalBase, lpk, ntdll, sechost, many more. The n-body craters so fast I haven't been able to catch it - perhaps if you run it in a debugger

Most of those will be standard Windows system DLLs.

If you run Dependency Walker against the main N-Body executable, it will tell you which DLLs are linked and whether they are present on your computer. It will probably flag up libgomp_64-1.dll and pthreadGC2_64.dll (because the distribution names are different), and you usually get warnings about late-loading dependencies too - they can be ignored. But it helps to narrow the list of suspects down.


Previous 20 · Next 20

©2024 Astroinformatics Group