Message boards :
Number crunching :
GPU computation errors on one host but non on the other
Message board moderation
Author | Message |
---|---|
Send message Joined: 19 Nov 12 Posts: 3 Credit: 330,132,224 RAC: 0 |
Hi @all, since a couple of days, one of my hosts has some OpenCL computation errors. It's running linux (Debian 7 64bit) and uses a HD5970 GPU (crunching 3 WU on each GPU). Another linux host (same OS version) with two HD6950 (crunching 4 WU on each GPU) doesn't show any computation errors. good host: http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=484075 bad host: http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=494095 The "bad" host shows CL_OUT_OF_RESOURCES and CL_MAP_FAILURE errors. What's happening here ? best regards, Rene |
Send message Joined: 24 Feb 09 Posts: 620 Credit: 100,587,625 RAC: 0 |
The "bad" host shows CL_OUT_OF_RESOURCES and CL_MAP_FAILURE errors. That's the clue ..... basically BOINC ran out of resources on the hardware to use and gave up. 5XXX cards are way way different beasts in their technical architecture and abilities, compared to 6XXX cards. No matter which Capability Variant is used, there is always a finite capacity unique to that variant. Crudely speaking 6XXX are way faster and better than 5XXX cards - as a generalisation - and will have greater capacity and flexibility. Hence the reason the 5XXX bombed out first. As a generalisation don't run any more than two of a WU type on a GPU. Sometimes, rarely, three will run successfully. All that happens when more than 2 or three are run concurrently, is BOINC starts to run out of resources to cope, and in any case it time shares between the WUs as the full capacity has been reached - usually after two concurrently running WUs - so there is no or miniscule time saved. Especially when it bombs out crashing with too much being thrown at it. Any miniscule time saved by running more than 2 (sometimews three) is way way overrun by time lost whilst the machine is down for you to get it going again. A good Rule of thumb for BOINC is a max of two WUs per GPU when the WUs are short run is seconds or a min or so. More than that don't bother - in fact you will lose time eventually running too many at once. |
©2024 Astroinformatics Group