Welcome to MilkyWay@home

CUDA Application Updated


Advanced search

Message boards : Number crunching : CUDA Application Updated
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Anthony Waters

Send message
Joined: 16 Jun 09
Posts: 85
Credit: 172,476
RAC: 0
100 thousand credit badge10 year member badge
Message 31548 - Posted: 26 Sep 2009, 3:33:54 UTC

The CUDA application for 32 bit Windows has been updated with speed improvements, users should notice a 2x increase in performance.

Thank you to Brent from NVIDIA for assisting with making the application run faster on NVIDIA's hardware and also thanks to Cluster Physik for providing methods to also increase the performance.
ID: 31548 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
50 million credit badge10 year member badgeextraordinary contributions badge
Message 31553 - Posted: 26 Sep 2009, 4:00:10 UTC - in response to Message 31548.  
Last modified: 26 Sep 2009, 4:41:33 UTC

The CUDA application for 32 bit Windows has been updated with speed improvements, users should notice a 2x increase in performance.

Thank you to Brent from NVIDIA for assisting with making the application run faster on NVIDIA's hardware and also thanks to Cluster Physik for providing methods to also increase the performance.

So how long do those 53 credit WUs you just crunched really take? The two seconds one sees in your task list are just the CPU time.

As a HD4870 at stock clocks needs about 48 seconds, and the nvidia guy has probably done what was possible, I guess the CUDA app is now approaching its theoretical ceiling. My guess is something between 2:30 and 3:00 minutes for a GTX285 at stock clocks for a 53 credit WU.
ID: 31553 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfilePaul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
100 million credit badge10 year member badge
Message 31554 - Posted: 26 Sep 2009, 4:21:48 UTC - in response to Message 31548.  

The CUDA application for 32 bit Windows has been updated with speed improvements, users should notice a 2x increase in performance.

Thank you to Brent from NVIDIA for assisting with making the application run faster on NVIDIA's hardware and also thanks to Cluster Physik for providing methods to also increase the performance.

Um, did the version number get an update? I am just got 0.20 cuda23 ...

Whatever the version, the ones I just downloaded ran on GTX260 in about 202 seconds (3:02) which would be about the run time on a GTX295 core if my experience on GPU Grid transfers (I have not tried it there yet) ...

Changed version to 6.10.7 just in case, but, I will note that the windows interface becomes almost unusable because of lag ... so ... not sure "we" have the balance quite right yet ... note I have no speciall settings set to control things I pretty much run stock ... YMMV
ID: 31554 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileEdboard
Avatar

Send message
Joined: 22 Feb 09
Posts: 20
Credit: 105,156,399
RAC: 0
100 million credit badge10 year member badge
Message 31564 - Posted: 26 Sep 2009, 11:03:17 UTC
Last modified: 26 Sep 2009, 11:03:56 UTC

I have crunched some units with the new app. and they last 3:20 (200 seconds) with a stock GTX280. This graphic card took 6:30 (390 seconds) with the previous one.

Impressive: 95% faster than before.
ID: 31564 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileThe Gas Giant
Avatar

Send message
Joined: 24 Dec 07
Posts: 1947
Credit: 240,884,648
RAC: 0
200 million credit badge10 year member badge
Message 31591 - Posted: 26 Sep 2009, 20:27:42 UTC - in response to Message 31564.  

I have crunched some units with the new app. and they last 3:20 (200 seconds) with a stock GTX280. This graphic card took 6:30 (390 seconds) with the previous one.

Impressive: 95% faster than before.

Not quite 95% faster...you may want to redo your maths.
ID: 31591 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKWSN Checklist
Avatar

Send message
Joined: 12 Aug 08
Posts: 253
Credit: 275,593,872
RAC: 0
200 million credit badge10 year member badge
Message 31594 - Posted: 26 Sep 2009, 20:51:03 UTC - in response to Message 31591.  
Last modified: 26 Sep 2009, 20:51:29 UTC

I have crunched some units with the new app. and they last 3:20 (200 seconds) with a stock GTX280. This graphic card took 6:30 (390 seconds) with the previous one.

Impressive: 95% faster than before.

Not quite 95% faster...you may want to redo your maths.

Edboard and I must have went to the same school then.

    ID: 31594 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
    ProfileGalaxyIce
    Avatar

    Send message
    Joined: 6 Apr 08
    Posts: 2018
    Credit: 100,142,856
    RAC: 0
    100 million credit badge10 year member badge
    Message 31596 - Posted: 26 Sep 2009, 20:57:14 UTC - in response to Message 31594.  

    I have crunched some units with the new app. and they last 3:20 (200 seconds) with a stock GTX280. This graphic card took 6:30 (390 seconds) with the previous one.

    Impressive: 95% faster than before.

    Not quite 95% faster...you may want to redo your maths.

    Edboard and I must have went to the same school then.

    I think you're both right - we should all be given 95% more credit immediatley.



    ID: 31596 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
    ProfileThe Gas Giant
    Avatar

    Send message
    Joined: 24 Dec 07
    Posts: 1947
    Credit: 240,884,648
    RAC: 0
    200 million credit badge10 year member badge
    Message 31599 - Posted: 26 Sep 2009, 22:17:02 UTC - in response to Message 31594.  

    I have crunched some units with the new app. and they last 3:20 (200 seconds) with a stock GTX280. This graphic card took 6:30 (390 seconds) with the previous one.

    Impressive: 95% faster than before.

    Not quite 95% faster...you may want to redo your maths.

    Edboard and I must have went to the same school then.

    The reference point is the 390 seconds (previous), not the 200 seconds (current).

    But I agree with Ice.
    ID: 31599 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
    ProfileEdboard
    Avatar

    Send message
    Joined: 22 Feb 09
    Posts: 20
    Credit: 105,156,399
    RAC: 0
    100 million credit badge10 year member badge
    Message 31610 - Posted: 27 Sep 2009, 7:13:38 UTC - in response to Message 31599.  

    I said "faster", so I'm comparing "speeds" not "durations". In other words, I'm comparing inverses of time (1/t) not times (t).
    ID: 31610 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
    ProfilePaul D. Buck

    Send message
    Joined: 12 Apr 08
    Posts: 621
    Credit: 161,934,067
    RAC: 0
    100 million credit badge10 year member badge
    Message 31612 - Posted: 27 Sep 2009, 8:03:06 UTC - in response to Message 31610.  

    I said "faster", so I'm comparing "speeds" not "durations". In other words, I'm comparing inverses of time (1/t) not times (t).

    Quibble all they want, I still think it is about twice as fast ... which is an increase of about 100% ...

    Of course the down side is that I have seen more lag on the screen for some updates. Changing the pane/tab in BM ... but once the tab is up it seems to refresh ok... so I don't quite understand all I know about that ...

    Still, as most my machines are dedicated to BOINC, it is a little bit of who cares most of the time. I am still waiting for it to settle in so I can see if BOINC will allow MW on CUDA to play nice with GPU Grid ... or not ... so far it has been a little bit of not ... sigh ...
    ID: 31612 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
    ProfileEdboard
    Avatar

    Send message
    Joined: 22 Feb 09
    Posts: 20
    Credit: 105,156,399
    RAC: 0
    100 million credit badge10 year member badge
    Message 31613 - Posted: 27 Sep 2009, 8:57:17 UTC - in response to Message 31612.  
    Last modified: 27 Sep 2009, 9:01:35 UTC

    It's exactly the same to say:

    100% faster
    Twice faster
    100% increase in speed

    I chose the first because it was not exactly 100%. May be it would be more clear if I had choosen the third one: 95% increase in speed or I had said: "almost twice faster"
    ID: 31613 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
    ExtraTerrestrial Apes
    Avatar

    Send message
    Joined: 1 Sep 08
    Posts: 204
    Credit: 219,354,537
    RAC: 0
    200 million credit badge10 year member badge
    Message 31621 - Posted: 27 Sep 2009, 11:00:08 UTC - in response to Message 31596.  


    Scanning for our furry friends since Jan 2002
    ID: 31621 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
    seigell

    Send message
    Joined: 5 Aug 09
    Posts: 9
    Credit: 4,472,571
    RAC: 3
    3 million credit badge10 year member badge
    Message 31664 - Posted: 28 Sep 2009, 16:44:57 UTC - in response to Message 31612.  

    down side is that I have seen more lag on the screen ... as most my machines are dedicated to BOINC ...


    What about setting the CUDA App "priority" down just a bit, to improve the GUI responsiveness for the other 95% of us who want to contribute to MW Science but need to perform work in the foreground to pay for our BOINC "contributions" ??

    Can this new CUDA App be "detuned" sufficiently to improve GUI Responsiveness ??
    ID: 31664 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
    |MatMan|

    Send message
    Joined: 12 Dec 07
    Posts: 3
    Credit: 15,796,608
    RAC: 0
    10 million credit badge10 year member badge
    Message 31665 - Posted: 28 Sep 2009, 17:16:58 UTC - in response to Message 31621.  
    Last modified: 28 Sep 2009, 17:18:10 UTC

    Anyway, the new CUDA app looks quite nice: 200s for a GTX280 is only 4 times slower than a 110€ ATI. That's better than expected ;)

    If I'm not wrong the theoretical DP performance of a 4870 (is this the card you meant?) vs a GTX280 is 240 GFLOPS vs 78 GFLOPS = ~3 : 1.
    So a factor of 4 is nice but we should get to a factor of 3... :P

    I know it's just a comparison of theoretical numbers...
    ID: 31665 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
    ProfilePaul D. Buck

    Send message
    Joined: 12 Apr 08
    Posts: 621
    Credit: 161,934,067
    RAC: 0
    100 million credit badge10 year member badge
    Message 31668 - Posted: 28 Sep 2009, 19:43:36 UTC - in response to Message 31665.  

    Anyway, the new CUDA app looks quite nice: 200s for a GTX280 is only 4 times slower than a 110€ ATI. That's better than expected ;)

    If I'm not wrong the theoretical DP performance of a 4870 (is this the card you meant?) vs a GTX280 is 240 GFLOPS vs 78 GFLOPS = ~3 : 1.
    So a factor of 4 is nice but we should get to a factor of 3... :P

    I know it's just a comparison of theoretical numbers...

    Because those are theoretical numbers is the reason that 4:1 ratio is not so bad.

    It is another case of AMD vs. Intel and which is better or faster for a particular project.
    ID: 31668 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
    ExtraTerrestrial Apes
    Avatar

    Send message
    Joined: 1 Sep 08
    Posts: 204
    Credit: 219,354,537
    RAC: 0
    200 million credit badge10 year member badge
    Message 31706 - Posted: 29 Sep 2009, 19:20:13 UTC - in response to Message 31665.  

    You're right, I was talking about a 4870 and it's maximum dp performance is indeed 240 GFlops at 750 MHz. Mine runs at 800 MHz (256 GFlops) and achieves ~190 GFlops at MW. That's a really really good optimization done by CP, so even achieving something close to these numbers is challenging.

    @Seigell: there's still the option to choose "don't run CUDA when user is active". It's not ideal, but achieving good performance on the GPU while keeping the UI responsive is also rather challenging.
    Ideally the app would switch behaviour depending on what the user is doing (idle, normal work, graphics intensive work / game). In the non-idle cases the GPU wouldn't have to stop completely, just crunch a little less intensive.

    MrS
    Scanning for our furry friends since Jan 2002
    ID: 31706 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
    Thamir Ghaslan

    Send message
    Joined: 31 Mar 08
    Posts: 61
    Credit: 18,325,284
    RAC: 0
    10 million credit badge10 year member badge
    Message 31707 - Posted: 29 Sep 2009, 19:42:50 UTC - in response to Message 31706.  

    You're right, I was talking about a 4870 and it's maximum dp performance is indeed 240 GFlops at 750 MHz. Mine runs at 800 MHz (256 GFlops) and achieves ~190 GFlops at MW. That's a really really good optimization done by CP, so even achieving something close to these numbers is challenging.


    Which makes me disappointed when other projects whine overpay in milky way!

    I know its already done, credit lowering and all, optimization will overcome or surpass lowering and all, but Boinc and its projects are decentralized, so MW admin should not bow under pressure from other projects!

    CUDA is not equal to Brooks. ATI is not equal to Nnvidia. Intel is not equal to AMD.
    ID: 31707 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
    Profileverstapp
    Avatar

    Send message
    Joined: 26 Jan 09
    Posts: 589
    Credit: 497,834,261
    RAC: 0
    300 million credit badge10 year member badge
    Message 31715 - Posted: 29 Sep 2009, 21:00:54 UTC

    'All processors are equal, but some are more equal than others.' :D
    Cheers,

    PeterV

    .
    ID: 31715 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
    ExtraTerrestrial Apes
    Avatar

    Send message
    Joined: 1 Sep 08
    Posts: 204
    Credit: 219,354,537
    RAC: 0
    200 million credit badge10 year member badge
    Message 31718 - Posted: 29 Sep 2009, 21:38:22 UTC - in response to Message 31707.  


    Scanning for our furry friends since Jan 2002
    ID: 31718 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
    Cluster Physik

    Send message
    Joined: 26 Jul 08
    Posts: 627
    Credit: 94,940,203
    RAC: 0
    50 million credit badge10 year member badgeextraordinary contributions badge
    Message 31720 - Posted: 29 Sep 2009, 22:16:21 UTC - in response to Message 31718.  
    Last modified: 29 Sep 2009, 22:17:18 UTC

    That's why other projects are "whining" about the overpay at MW. For most of them it's just impossible to utilize the hardware to this extent. It's not just about being lazy programmers - the algorithm and the problem itself don't allow it. They could never achieve the same Flop/s even if they made an ATI app.

    MrS

    That is very true.
    The MW algorithm is really perfect for a GPU. Take vast amounts of parallelism (millions of threads), no branching (except you want to call a loop with a counter checked each iteration a branch), a very compute intense algorithm with only a few memory accesses, minimal communication between the threads (the values are just added in the end), and what you get is virtually the peak performance of a given GPU for the instruction mix of the algorithm. It's not all about multiply-adds, so you won't get exactly peak performance. But the v0.20 has cut all the overhead down to a minimum so you really arrive within 10% of what is theoretically possible with the algorithm's instruction mix. That's better than any current CPU achieves, even relative to its peak performance.

    And that will continue to scale, the new ATI HD5870 should easily double the performance of a HD4890 at Milkyway. And when the next nvidia generation arrives, I'm quite sure it will do much more than to double the DP performance of a GTX285 ;)
    ID: 31720 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
    1 · 2 · Next

    Message boards : Number crunching : CUDA Application Updated

    ©2020 Astroinformatics Group