Welcome to MilkyWay@home

Posts by floyd

1) Message boards : Number crunching : Memory Leak (Message 62183)
Posted 16 Aug 2014 by floyd
Post:
Next question, is there a bug in the driver or in the application? Just in case anyone cares, this is for milkyway_separation_1.02_x86_64-pc-linux-gnu__opencl_nvidia and driver version 304.117. No, I'm not installing a later driver, I have no relevant issues with this one.


Sorry I cannot help you there.


Thanks mikey. Actually I don't need help with this as I'm currently not running Milkyway anyway. Just wanted to share some information.
2) Message boards : Number crunching : Memory Leak (Message 62179)
Posted 16 Aug 2014 by floyd
Post:
Some quick tests show only NVIDIA WUs affected in my case, not CPU or ATI, so this is not by design. Next question, is there a bug in the driver or in the application? Just in case anyone cares, this is for milkyway_separation_1.02_x86_64-pc-linux-gnu__opencl_nvidia and driver version 304.117. No, I'm not installing a later driver, I have no relevant issues with this one.
3) Message boards : Number crunching : Memory Leak (Message 62170)
Posted 15 Aug 2014 by floyd
Post:
I'm not running Milkyway now but I'm sure there was a memory leak when I last ran Milkyway Separation (not Modified Fit) OpenCL tasks, to the extent where a single task used the whole 2GB I allowed for BOINC and other tasks were suspended for lack of memory. When restarted from a checkpoint, the same task would run with only 90MB, but increasing again. The rate of memory loss depended on the work chunk frequency setting in the Milkyway preferences.
4) Message boards : News : N-Body 1.08 (Message 57623)
Posted 23 Mar 2013 by floyd
Post:
I am seeing the Glibc errors coming across, but on older versions of the BOINC client.

I am guessing you are running BOINC 6.10 or 6.12 if it matches the pattern I have been observing.


That´s just because older clients will be more likely to run on older systems with older glibc. AFAICS the real reason is that the application is dynamically linked - BTW the i686 binary is not - and it specifically depends on GLIBC_2.14 just as stated in the error message. To be more precise it´s just memcpy from that version. If this requirement is really necessary is up to the developers. If in doubt I´d suggest a static binary.
5) Message boards : Number crunching : MilkyWay tasks always on high priority (Message 54177)
Posted 26 Apr 2012 by floyd
Post:
do still show a DCF number when clicking properties in the Boinc Manager, are you saying that number is just for show?

I think it is unused in projects like Milkyway and just hasn't been removed (yet?) for simplicity. Does it change? I noticed the estimated run times of queued tasks no longer do.

6) Message boards : Number crunching : MilkyWay tasks always on high priority (Message 54146)
Posted 25 Apr 2012 by floyd
Post:
mikey, thanks for your explanations. As far as I can tell that's all correct except the assumption in the last sentence. Unfortunately this whole DCF thing does not apply here. BOINC is moving away from the DCF concept and Milkyway seems to be an early adopter. That's why we got bitten by this non-dcf bug and other projects did not.

The problem is that the boinc client doesn't even consider normal priority if the project doesn't have a DCF below 90. But in BOINC 7.0.24 and later, Milkyway does not have a DCF at all and BOINC gets the hiccups from that. In future releases this will be "DCF below 90 or no DCF", so non-dcf projects' tasks will be handled normally then.
7) Message boards : Number crunching : MilkyWay tasks always on high priority (Message 54134)
Posted 24 Apr 2012 by floyd
Post:
Please see this thread for information and possible solutions, but let's keep the discussion here.
8) Message boards : News : New Nbody Run (Message 54133)
Posted 24 Apr 2012 by floyd
Post:
As I wrote, the bug was introduced with a change in 7.0.24, so previous releases are safe from it. Of course older 7.0 versions can't be recommended, so a possible solution would be to switch back to BOINC 6 as you did. Now we all know that you can't simply downgrade from v7 to v6 (we have read the release notes, haven't we?) so here's instructions how to do it.

Meanwhile the fix has been added to the source code so it will be in the next BOINC release, whenever that may me. If you can wait for that, that's another possible solution.
9) Message boards : News : New Nbody Run (Message 54103)
Posted 22 Apr 2012 by floyd
Post:
For those of us not experienced in editing or removing sections of the Boinc Mgr program, how do you change whatever is causing this problem.

You can't. The problem is in your local Boinc installation and you'll have to recompile the code or find someone to do it for you. This is the patch I used:

diff -Naur boinc_core_release_7_0_25/client/cpu_sched.cpp boinc_core_release_7_0_25_patched/client/cpu_sched.cpp
--- boinc_core_release_7_0_25/client/cpu_sched.cpp      2012-04-14 10:02:12.000000000 +0200
+++ boinc_core_release_7_0_25_patched/client/cpu_sched.cpp      2012-04-22 02:14:30.000000000 +0200
@@ -470,7 +470,7 @@
 
         // treat projects with DCF>90 as if they had deadline misses
         //
-        if (!p->dont_use_dcf && p->duration_correction_factor < 90.0) {
+        if (p->dont_use_dcf || p->duration_correction_factor < 90.0) {
             if (p->rsc_pwf[rsc_type].deadlines_missed_copy <= 0) {
                 continue;
             }


This is part of the code where Boinc looks for tasks to run in high priority. Basically it means, if a project uses DCF (Duration Correction Factor) and the value is somewhat reasonable, consider to leave the task alone. Otherwise it's a candidate for special treatment. Unfortunately, if a project does not use DCF, the "leave it alone" part is never reached.

On the other hand 3 other projects do not have this problem and the PS tasks also do not have this problem. So why do the Nbody tasks have this issue?

If a project uses DCF, which I think most still do, you won't notice anything of this. Look in your client_state.xml file, if you find a line reading "<dont_use_dcf/>" the project that it belongs to should be affected. The whole project. And I say "should" because Milkyway is the only one for me, too, so I can't verify this. But the PS tasks do have the problem, that's why I looked into this in the first place. I noticed both of my GPUs running nothing but Milkyway for half a day.

Why not modify what ever the offending code is in the task rather than have us novices go monkying about in the programs behind boinc mgr?

The offending code is not in the task, nor is it with the project. It is at your end and you can't change it yourself. The project could probably work around it, but you would have to do that for every single project seperately. It's more efficient to fix the problem where it is.

I'll look over at the Boinc site for a contact address, if I find one I'll suggest to change this in the next release. But if somebody knows whom to inform, please feel free to point them here.
10) Message boards : News : New Nbody Run (Message 54099)
Posted 22 Apr 2012 by floyd
Post:
If your 4 apps continues to hog all 4 CPUs on this machine, you'll find the MIPS I donate no longer available to you. Simple guys. Back it off and quickly. You have 16 elapsed hours from the time of this post.


First of all, CALM DOWN!

I think this is caused by a bug that was introduced with the new dont_use_dcf command in Boinc 7.0.24. It makes the client assume that all tasks are about to miss the deadline, thus run them in high priority mode. This should affect all tasks of all projects that set dont_use_dcf.

By the way, if I understand correctly how dont_use_dcf works, one should try to avoid using an app_info.xml on such projects, bug or not.
11) Message boards : Number crunching : New BOINC Manager = no MWAH WUs (Message 54033)
Posted 15 Apr 2012 by floyd
Post:
This change didn't make sense to me which is why I didn't try it before you suggested it. It seems like there is a limit to how many tasks you can have locally in buffer/queue and I was already there.


Now that you describe it in more detail it doesn't make sense to me either. There is a limit of 40 MilkyWay GPU tasks per GPU, but you should have been able to get a new task every time one is returned. Perhaps the whole refetch mechanism isn't activated if you don't get the buffer filled to minimum in the first place? That would be "unexpected behaviour" at least. But glad it works now.
12) Message boards : Number crunching : New BOINC Manager = no MWAH WUs (Message 54031)
Posted 15 Apr 2012 by floyd
Post:
Thanks for that info Floyd! That made the difference.


Are you sure you are talking to me? There seems to be some other Floyd around.

What I wrote was meant for Mike S. Your issue seems to be something different and I don't think it is caused by those settings or the Boinc version. In fact I noticed "no work" or "work for CPU available" messages back with Boinc 6 and they went away without any change on my part. Must be some server side thing.


floyd (the one, not the only)
13) Message boards : Number crunching : New BOINC Manager = no MWAH WUs (Message 54028)
Posted 15 Apr 2012 by floyd
Post:
Upgraded to 7.0.25.

It used to auto-upload each WU before in the 1 minute interval before the next one (WU) completed.


Check your settings for minimum work buffer and additional work buffer. Boinc 7 is supposed to fetch work when it runs below minimum, then try to fill up to minimum+additional. That's a change from Boinc 6 which had different settings in the same place, so your values may not be as you want them for Boinc 7.

There's a lot of confusion about this change, I'm surprised you haven't noticed all the questions like yours popping up everywhere.
14) Message boards : Number crunching : Waiting for GPU memory (Message 53271)
Posted 18 Feb 2012 by floyd
Post:
Waiting to run (waiting for GPU memory)

I haven't experienced this myself but I've read you need to do a system reboot to clear the GPU memory.
15) Message boards : Number crunching : Not getting work (Message 53043)
Posted 10 Feb 2012 by floyd
Post:
I'll try Nvidia's stable 275 driver though, no doubt that will work, too.

Well, it does work, but with 40% load. Back to 290 then.
16) Message boards : Number crunching : Not getting work (Message 53042)
Posted 10 Feb 2012 by floyd
Post:
I can't afford to burn a full CPU core just to keep the GPU busy so I'd like to revert to the 260.19.44 driver
The new version has a workaround for it. By default it isn't particularly aggressive about trying to reduce it, but for me it keeps around 50%.

Right, BOINC doesn't seem to recognize the driver:
NVIDIA GPU 0: GeForce GTX 260 (driver version unknown, CUDA version 4010, compute capability 1.3, 895MB, 560 GFLOPS peak)
I added the fallback version check against the CUDA version which is still reported.


Thanks Matt, I got a batch of 1.02 work units and right now I am watching the first ones run. The numbers look excellent. With the 290.10 driver CPU load is about 2%, just as good as the 0.90 application with the 260 driver. So there's no need to go back to that. I'll try Nvidia's stable 275 driver though, no doubt that will work, too.
17) Message boards : Number crunching : Not getting work (Message 53024)
Posted 10 Feb 2012 by floyd
Post:
Until recently I was crunching with this setup:
    Intel Core2Duo running Debian Squeeze
    Nvidia GTX260 with driver 260.19.44, specifically selected for Milkyway because of the excessive CPU usage in later versions
    BOINC 6.12.34

Everything was running fine, but then I started to get MilkyWay@Home v1.00 jobs. Yes, I got those without any tricks, but they all crashed with what I think were OpenCL errors. So I decided to try a later GPU driver, first 275.43, now 290.10. But I don't get work any more. The log says

Rejecting newer opencl_nvidia application due to older Nvidia drivers

Right, BOINC doesn't seem to recognize the driver:
NVIDIA GPU 0: GeForce GTX 260 (driver version unknown, CUDA version 4010, compute capability 1.3, 895MB, 560 GFLOPS peak)

And sched_request_milkyway.cs.rpi.edu_milkyway.xml contains this line in the coproc_cuda section:
<drvVersion>0</drvVersion>

So how do I get GPU work from Milkyway again? If you'd like to make any suggestions, please keep this in mind:
    I'm not running alpha software (BOINC 7)
    I don't do babysitting (app_info.xml)
    I can't afford to burn a full CPU core just to keep the GPU busy so I'd like to revert to the 260.19.44 driver

Perhaps you could drop the version check? Oh, and Nvidia's recommended Linux driver is still a v275. You shouldn't require more than that if you really need a minimum version, but from the announcement I got the impression that the old drivers should still work.
Sorry if my first post here sounds like a rant, but I am a bit disappointed because everything is messed up again after it started to work flawlessly with the new server. I was just about to order a second GPU, will have to cancel that for now. Hopefully I will be back in business soon.





©2024 Astroinformatics Group