GPU app teaser

Author	Message
Exar Kun [HoloNet] Send message Joined: 12 Nov 08 Posts: 26 Credit: 1,542,686 RAC: 0	Message 12073 - Posted: 21 Feb 2009, 13:21:49 UTC Same message, again : not reporting or requesting tasks... I can't crunch today :( Note for Vista-64 users : I had to install a hotfix from ATI to get the CCC working. The driver is 8.2 PS : when GPU activity is 0%, I have an idle temp of 80Â°c and fan speed at 25%. If I use the CCC to set the fan speed at 50%, the temp decrease to 60Â°c :) But I have a question : if I sent the fan speed manually, may I have a problem when running the GPU at 100% ? If anyone can help with this "not reporting or requesting tasks" thing ... thank you ^^ Star Wars BOINC Team ID: 12073 · Rating: 0 · rate: / Reply Quote

bobgoblin Send message Joined: 8 Dec 07 Posts: 60 Credit: 67,028,931 RAC: 0	Message 12106 - Posted: 21 Feb 2009, 15:42:08 UTC - in response to Message 12067. i left it at 9.1 but took your advice of upping the fan speed. that's brought it down from ~83c to ~77c. and it made it through the night without locking up. So it could be that it was overheating. But I had also upgraded to version .19. So, was there a change in there that corrected the problem? either way, v.19 is working fine on an i7 and hd4870 with 512m Besides the CPU detection nothing changed between 0.17 and 0.19. So maybe really a temperature problem. But I have seen your crunch times are slightly on the high side. This could be caused by running too many WUs concurrently on the GPU. At a certain point the RAM on the graphics card is not sufficient for the number of WUs taking space there. Before it errors out (when even more WUs would be crunched), it slows down (probably some swapping over PCI-Express happens). And with 16 WUs it is getting already a bit crowded on a 512MB card. Another reason for the higher times could be that the card runs downclocked in a power saving mode. Maybe you should check the clock speed of the card. Furthermore you may think about attaching to a second BOINC project with that i7. This will reduce the number of MW WUs that are running a he same time, but not the throughput. You will still finish the same number of WUs per hour even with less concurrently running WUs. In fact, it could even rise in your case. Furthermore your CPU cores wouldn't be idling that much ;) off the top of my head, i remembered reading that you basically just changed the version number - but couldn't remember if that was just the opti app or the gpu. also, i've noticed the temp has dropped even further overnight, so I may reset the fan speed to 40%. and i have a climate prediction model sitting @ 50% done, so I'll resume that one. ID: 12106 · Rating: 0 · rate: / Reply Quote

Slicker [TopGun] Send message Joined: 20 Mar 08 Posts: 46 Credit: 69,382,802 RAC: 0	Message 12128 - Posted: 21 Feb 2009, 18:50:26 UTC I've noticed that if I enable both MW and another project, that it will run both but that the MW gpu app will SHARE a cpu with the other project. e.g. Q9450 runs 4 ABC apps and 1 or 2 MW. 3 of the ABC run on their own cpu. The 4th ABC runs on the same cpu as the MW app(s). When the CUDA apps do this, they set their priority to "Below Normal" instead of "Low" (a.k.a. Idle). Any chance the gpu app could be modified to do the same? Then, whatever processing power is left will go to the other app since it will be set to Low. ID: 12128 · Rating: 0 · rate: / Reply Quote

Brickhead Send message Joined: 20 Mar 08 Posts: 108 Credit: 2,607,924,860 RAC: 0	Message 12131 - Posted: 21 Feb 2009, 18:53:33 UTC - in response to Message 11982. But could you please test that it runs at all? Oh yes, the app works just as intended, I guess, chewing through one WU every 8-12 seconds (depending on the WU). I forgot to mention that the CPU usage so far hasn't exceeded 23% of one core (still 4.0 GHz Yorkfield). Much less most of the time. Kudos to everyone involved! ID: 12131 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 12158 - Posted: 21 Feb 2009, 20:51:17 UTC - in response to Message 12128. I've noticed that if I enable both MW and another project, that it will run both but that the MW gpu app will SHARE a cpu with the other project. e.g. Q9450 runs 4 ABC apps and 1 or 2 MW. 3 of the ABC run on their own cpu. The 4th ABC runs on the same cpu as the MW app(s). When the CUDA apps do this, they set their priority to "Below Normal" instead of "Low" (a.k.a. Idle). Any chance the gpu app could be modified to do the same? Then, whatever processing power is left will go to the other app since it will be set to Low. That scheduling problem is hard to solve as long there is no ATI support in BOINC. Maybe it will come with 6.7. But as the GPU app uses quite little CPU resources (the core used in the moment is mainly for polling the GPU, less than 2 seconds CPU time for a WU are really needed), there is the chance I can free up some of it. That would reduce the problem I guess. ID: 12158 · Rating: 0 · rate: / Reply Quote

GalaxyIce Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0	Message 12159 - Posted: 21 Feb 2009, 20:54:28 UTC Last modified: 21 Feb 2009, 21:00:11 UTC Ha ha, I love it. Claimed credit 0.01 Granted credit 8.44 Cluster Physik you're a genius :) [edit] 7 to 12 seconds a WU. Amazing! Sorry, another edit - that's 4 WU at a time, which makes it an average of 3 seconds a WU or below. I can't get my head around this. Hey, and I'm not even using the 4870 ID: 12159 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 12162 - Posted: 21 Feb 2009, 21:03:44 UTC - in response to Message 12131. Kudos to everyone involved! Basically, that's just me ;) But I guess you also have to thank the two guys from my team Planet3DNow! who didn't hesitate to test the very first incarnations of the app, when it did crash on virtually every WU and didn't delivered any results let alone credits. Thank you HiRN and L@MiR/Emploi! You have to know I do the GPU coding in some kind of a vacuum. I don't have a compatible card yet to test it for myself. That will be some kind of a problem for the multi GPU stuff. We will see how this works out. ID: 12162 · Rating: 0 · rate: / Reply Quote

GalaxyIce Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0	Message 12166 - Posted: 21 Feb 2009, 21:06:55 UTC - in response to Message 12162. Kudos to everyone involved! Basically, that's just me ;) But I guess you also have to thank the two guys from my team Planet3DNow! who didn't hesitate to test the very first incarnations of the app, when it did crash on virtually every WU and didn't delivered any results let alone credits. Thank you HiRN and L@MiR/Emploi! You have to know I do the GPU coding in some kind of a vacuum. I don't have a compatible card yet to test it for myself. That will be some kind of a problem for the multi GPU stuff. We will see how this works out. L@Mir? Fantistic! (Hello again :) and HiRN also - many thanks! :) ID: 12166 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 12167 - Posted: 21 Feb 2009, 21:07:49 UTC - in response to Message 12159. 7 to 12 seconds a WU. Amazing! Sorry, another edit - that's 4 WU at a time, which makes it an average of 3 seconds a WU or below. I can't get my head around this. Hey, and I'm not even using the 4870 Sorry, but it isn't that fast ;) The CPU time gives a good indication for the throughput in the moment, in your case it means a WU finishes every 7 to 12 seconds, but not every 3. Take a stopwatch if you don't believe it ;) ID: 12167 · Rating: 0 · rate: / Reply Quote

GalaxyIce Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0	Message 12168 - Posted: 21 Feb 2009, 21:11:27 UTC - in response to Message 12167. Last modified: 21 Feb 2009, 21:15:39 UTC 7 to 12 seconds a WU. Amazing! Sorry, another edit - that's 4 WU at a time, which makes it an average of 3 seconds a WU or below. I can't get my head around this. Hey, and I'm not even using the 4870 Sorry, but it isn't that fast ;) The CPU time gives a good indication for the throughput in the moment, in your case it means a WU finishes every 7 to 12 seconds, but not every 3. Take a stopwatch if you don't believe it ;) Ah, I see, they took less than a minute so somewhere more around 25 seconds - without doing a precise test (I wish I had a stop watch....) [edit] 4 ran in 64 seconds - that's 16 secs a WU. Is that about right? ID: 12168 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 12184 - Posted: 21 Feb 2009, 21:32:45 UTC - in response to Message 12168. Last modified: 21 Feb 2009, 21:34:28 UTC Ah, I see, they took less than a minute so somewhere more around 25 seconds - without doing a precise test (I wish I had a stop watch....) [edit] 4 ran in 64 seconds - that's 16 secs a WU. Is that about right? For the longer dual stream WUs (12 to 13 credits) it is the right time for a HD4850 (a 4870 would be 20% faster). The shorter single stream WUs (~8 credits) should take 10 to 11 seconds or so on your card. ID: 12184 · Rating: 0 · rate: / Reply Quote

Brickhead Send message Joined: 20 Mar 08 Posts: 108 Credit: 2,607,924,860 RAC: 0	Message 12186 - Posted: 21 Feb 2009, 21:36:48 UTC BoincView looked like a scene from "The Shining" after I installed Catalyst 9.2, and I also noticed 0% load on the GPU. Reverted to 9.1, and the WUs again finished without errors. Has anyone else tried the latest incarnation of the driver? ID: 12186 · Rating: 0 · rate: / Reply Quote

GalaxyIce Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0	Message 12191 - Posted: 21 Feb 2009, 21:41:02 UTC - in response to Message 12184. Ah, I see, they took less than a minute so somewhere more around 25 seconds - without doing a precise test (I wish I had a stop watch....) [edit] 4 ran in 64 seconds - that's 16 secs a WU. Is that about right? For the longer dual stream WUs (12 to 13 credits) it is the right time for a HD4850 (a 4870 would be 20% faster). The shorter single stream WUs (~8 credits) should take 10 to 11 seconds or so on your card. Aha, you've worked out which card I have ;) ID: 12191 · Rating: 0 · rate: / Reply Quote

GalaxyIce Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0	Message 12194 - Posted: 21 Feb 2009, 21:45:54 UTC - in response to Message 12186. BoincView looked like a scene from "The Shining" after I installed Catalyst 9.2, and I also noticed 0% load on the GPU. Reverted to 9.1, and the WUs again finished without errors. Has anyone else tried the latest incarnation of the driver? I got the HD4850 today which came with the 8.5 catalyst driver. That didn't work. I then tried a 9.1 which wouldn't work at all, but I think it was for Vista (I have XP). I tried the 9.1 for XP and it worked, but it seemed like the 4 WUs were going to take forever. So finally I located the 8.12 driver for XP and it works a treat. ID: 12194 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 12205 - Posted: 21 Feb 2009, 22:04:15 UTC - in response to Message 12186. Last modified: 21 Feb 2009, 22:10:36 UTC BoincView looked like a scene from "The Shining" after I installed Catalyst 9.2, and I also noticed 0% load on the GPU. Reverted to 9.1, and the WUs again finished without errors. Has anyone else tried the latest incarnation of the driver? Just look here. If you really want to use the Cat 9.2 driver, it should be possible to manually rename the three atical.dll files in the Windows/system32 folder to amdcal.dll. Someone in my team tried and it worked. @Ice: Could you add a note to zslip, that the Cat 9.2 is not working with the GPU application? And the older 0.17 GPU app is only available for Win64, not Win32/64 as stated there. ID: 12205 · Rating: 0 · rate: / Reply Quote

GalaxyIce Send message Joined: 6 Apr 08 Posts: 2018 Credit: 100,142,856 RAC: 0	Message 12208 - Posted: 21 Feb 2009, 22:11:23 UTC - in response to Message 12205. Last modified: 21 Feb 2009, 22:30:29 UTC BoincView looked like a scene from "The Shining" after I installed Catalyst 9.2, and I also noticed 0% load on the GPU. Reverted to 9.1, and the WUs again finished without errors. Has anyone else tried the latest incarnation of the driver? Just look here. If you really want to use the Cat 9.2 driver, it should be possible to manually rename the three atical.dll files in the Windows/system32 folder to amdcal.dll. Someone in my team tried and it worked. @Ice: Could you add a note to zslip, that the Cat 9.2 is not working with the GPU application? And the older 0.17 GPU app is only available for Win64, not Win32/64 as stated there. Sure, I'll do that [edit] done ID: 12208 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0	Message 12231 - Posted: 21 Feb 2009, 23:13:11 UTC - in response to Message 12158. But as the GPU app uses quite little CPU resources (the core used in the moment is mainly for polling the GPU, less than 2 seconds CPU time for a WU are really needed), there is the chance I can free up some of it. That would reduce the problem I guess. The "ideal" is to use the IRQ so that there is zero load on the CPU unless it is needed. Though they have not shared technical details with the community it looks like that is what GPU Grid did... last month they were using idle polling and were consuming up to a whole core per GPU core. Then they made a change and the CPU time dropped to less than 1% ... Small hit to speed where the application takes about 8-10% longer to run over 4-20 hours ... with 9-14 second run times I am not sure that I would even notice the change ... but, would have less load on the CPUs ... If they are using the standard windows API to get the GPU id adding the ATI cards should be a no brainer, all it is is a different look-up table ... As I said in the other thread I sent them the notes on that but no reply so I don't know if they even looked at it or not ... in that I can't write C code no point in me trying ... but it does not look that hard to modify the BOINC software to pick up an ATI card ... ID: 12231 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 12242 - Posted: 21 Feb 2009, 23:54:30 UTC - in response to Message 12231. The "ideal" is to use the IRQ so that there is zero load on the CPU unless it is needed. Though they have not shared technical details with the community it looks like that is what GPU Grid did... last month they were using idle polling and were consuming up to a whole core per GPU core. Then they made a change and the CPU time dropped to less than 1% ... I thought about a bit different and higher level solution to it. But it should be also quite effective. 1% CPU load is not possible with the current split of the work between GPU and CPU. I don't plan to change anything on that because the effort for doing the remaining 0.1% of the CPU calculations on the GPU appears to be too much. If all the more urgent issues are solved, maybe one can think about it again. But I doubt the conclusion will be much different. One needs the CPU about half a second in the beginning and slightly more (about a second) at the end of a WU (scales with CPU speed of course). In between a CPU load of about 10% of a core or maybe even less should be doable. ID: 12242 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0	Message 12298 - Posted: 22 Feb 2009, 4:20:27 UTC - in response to Message 12242. The "ideal" is to use the IRQ so that there is zero load on the CPU unless it is needed. Though they have not shared technical details with the community it looks like that is what GPU Grid did... last month they were using idle polling and were consuming up to a whole core per GPU core. Then they made a change and the CPU time dropped to less than 1% ... I thought about a bit different and higher level solution to it. But it should be also quite effective. 1% CPU load is not possible with the current split of the work between GPU and CPU. I don't plan to change anything on that because the effort for doing the remaining 0.1% of the CPU calculations on the GPU appears to be too much. If all the more urgent issues are solved, maybe one can think about it again. But I doubt the conclusion will be much different. One needs the CPU about half a second in the beginning and slightly more (about a second) at the end of a WU (scales with CPU speed of course). In between a CPU load of about 10% of a core or maybe even less should be doable. Then perhaps the next challenge is to make the ATI GPU recognized and managed by BOINC? ID: 12298 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 12301 - Posted: 22 Feb 2009, 4:45:07 UTC - in response to Message 12298. Then perhaps the next challenge is to make the ATI GPU recognized and managed by BOINC? I've heard somewhere it will be in 6.7. ID: 12301 · Rating: 0 · rate: / Reply Quote