Welcome to MilkyWay@home

Possible Memory Leak with Nvidia GPU

Message boards : Number crunching : Possible Memory Leak with Nvidia GPU
Message board moderation

To post messages, you must log in.

AuthorMessage
SunnyKona

Send message
Joined: 4 Apr 16
Posts: 3
Credit: 31,943
RAC: 0
Message 64456 - Posted: 7 Apr 2016, 1:58:52 UTC

My system:
Ubuntu 14.04 server running kernel 3.16.0-69
BOINC client version 7.2.42 for x86_64-pc-linux-gnu
OpenCL: NVIDIA GPU 0: GeForce GTX 560 Ti (driver version 361.42, device version OpenCL 1.1 CUDA, 1023MB, 990MB available, 1306 GFLOPS peak)

I've configured the project to only run GPU work units and have configured the client to not use more than 25% of RAM. The problem that I see on this system:

Every time a Work Unit is started, it uses up another 10M to 30M of RAM. I can monitor this by issuing a "free -h" command every time a work unit starts (a work unit only takes about 3 minutes to complete). This slowly fills up my RAM and after a couple hours the system has to start using swap space. I have tried to clear the ram by issuing a "echo 3 > /proc/sys/vm/drop_caches" at regular intervals, but this does not keep the RAM from getting completely filled by the work units. Once the RAM is filled, I have to reboot the server.

For the time being, I have switched to Einstein@home and my machine is humming along fine with no problems. Looks like a memory leak issue with the Milkyway@home work units (i.e. the work unit does not release all the RAM it used after it completes). Also annoying that the client doesn't seem to pay any attention to that 25% limitation.
ID: 64456 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Burstaholic

Send message
Joined: 15 Jun 16
Posts: 5
Credit: 199,739
RAC: 0
Message 64688 - Posted: 20 Jun 2016, 14:20:00 UTC - in response to Message 64456.  

Looks like I'm also seeing this issue - my desktop is using quite a bit of swap space, and I don't think MilkyWay@Home is meant to be using 7.7 GB of RAM? I'm running only this project, with default settings (Ubuntu 16.04):

 
-> sudo service boinc-client status
● boinc-client.service - Berkeley Open Infrastructure Network Computing Client
   Loaded: loaded (/lib/systemd/system/boinc-client.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2016-06-16 09:11:06 CDT; 4 days ago
  Process: 14284 ExecStopPost=/bin/rm -f /var/lib/boinc-client/lockfile (code=exited, status=0/SUCCESS)
  Process: 10403 ExecStartPre=/bin/chown boinc:boinc /var/log/boinc.log /var/log/boincerr.log (code=exited, status=0/SUCCESS)
  Process: 10400 ExecStartPre=/usr/bin/touch /var/log/boinc.log /var/log/boincerr.log (code=exited, status=0/SUCCESS)
 Main PID: 10409 (sh)
    Tasks: 3
   Memory: 7.7G
      CPU: 2w 3h 33min 57.426s
   CGroup: /system.slice/boinc-client.service
           ├─10409 /bin/sh -c /usr/bin/boinc --dir /var/lib/boinc-client >/var/log/boinc.log 2>/var/log/boincerr.log
           └─10414 /usr/bin/boinc --dir /var/lib/boinc-client
ID: 64688 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 64689 - Posted: 20 Jun 2016, 16:51:23 UTC

Hey Guys,

Any idea if it is the Nbody application or the separation application that you are seeing the memory issues with?

Jake
ID: 64689 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Burstaholic

Send message
Joined: 15 Jun 16
Posts: 5
Credit: 199,739
RAC: 0
Message 64697 - Posted: 20 Jun 2016, 19:56:12 UTC

Not sure how to tell. I can easily disable GPU jobs overnight and see what happens; is there a way to pick only one of the CPU applications to isolate those?
ID: 64697 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 64704 - Posted: 21 Jun 2016, 12:57:19 UTC

Yes you can stop receiving work units from specific applications by changing your settings on your account settings for MilkyWay@home here.

Thanks for your willingness to help test this.

Jake
ID: 64704 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Burstaholic

Send message
Joined: 15 Jun 16
Posts: 5
Credit: 199,739
RAC: 0
Message 64710 - Posted: 21 Jun 2016, 14:58:29 UTC

Well, that was easy. It's definitely the GPU jobs. With GPU suspended the service's RAM usage stays below 40 MB, but with GPU enabled it climbs to ~600 MB within a couple of hours.

My GPU is an Nvidia Quadro 2000; I can give you more system details if they would be helpful.

Killing boinc-client and simply restarting X (via logout/login) seems to clear it up, which is what led me to suspect the GPU app initially. Maybe a bug in the code that interacts with Xorg?
ID: 64710 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 64713 - Posted: 21 Jun 2016, 15:36:31 UTC
Last modified: 21 Jun 2016, 15:37:21 UTC

It looks like you are using BOINC client version 7.6.31 which is not currently available for download on the BOINC website (Maybe an old beta version they have since updated). Can you try downloading their newest stable version 7.6.22 and seeing if the issue persists? Also which Linux OS are you using?

Jake
ID: 64713 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Burstaholic

Send message
Joined: 15 Jun 16
Posts: 5
Credit: 199,739
RAC: 0
Message 64717 - Posted: 21 Jun 2016, 18:02:30 UTC
Last modified: 21 Jun 2016, 18:03:04 UTC

Additional information: should have mentioned I determined those numbers during the day, not actually by leaving it overnight. It looks like suspending and resuming GPU jobs may be the culprit - when I suspend a job, memory usage drops 25MB, but when I resume, it goes up by 50 MB. So having dynamic 'Suspend when computer is busy' will cause some serious memory growth if it cycles very often.

I can try using the older client, though I'm not sure why the download on the site is four minor versions behind.

7.6.31 is the version in the Ubuntu 16.04 repos, and was released Mar 3 (https://github.com/BOINC/boinc/releases/tag/client_release%2F7.6%2F7.6.31). 7.6.33 was released June 5, in fact. So that line is definitely not old, though it may not be considered stable in some way?

I don't understand their versioning scheme.
ID: 64717 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 64722 - Posted: 22 Jun 2016, 0:16:58 UTC

Sounds like whoever makes the packages for the package manager picked a strange version. I would try one of the other ones and please let me know what you see.

Jake
ID: 64722 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Burstaholic

Send message
Joined: 15 Jun 16
Posts: 5
Credit: 199,739
RAC: 0
Message 64807 - Posted: 5 Jul 2016, 13:38:06 UTC - in response to Message 64722.  

Haven't tried the older client, but A/B testing confirms the problem is with the separation application. I left N-body running for the full long weekend and it's still at 41.1M RAM usage. Running just 'Separation (Modified Fit)' eats RAM fast.

(I didn't realize about the 'Reset project' button, so switching tasks to confirm things took a while.)
ID: 64807 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 64810 - Posted: 5 Jul 2016, 15:07:54 UTC
Last modified: 5 Jul 2016, 15:08:12 UTC

N-body is a CPU application and Separation is a GPU and CPU application. It may be that the client version you are running has a memory leak when dealing with GPU runs. Since you are the first user to bring up this problem and we are unable to reproduce the problem I find it most likely to be related to the BOINC version you are running. If you can test with most recent stable version of the BOINC client code and still see the problem, I will be able to help you more.

Jake
ID: 64810 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Possible Memory Leak with Nvidia GPU

©2024 Astroinformatics Group