Welcome to MilkyWay@home

v0.18/v0.19 issues here


Advanced search

Message boards : Number crunching : v0.18/v0.19 issues here
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6

AuthorMessage
ProfileHydropower
Avatar

Send message
Joined: 2 Apr 09
Posts: 16
Credit: 1,517,499
RAC: 0
1 million credit badge10 year member badge
Message 31897 - Posted: 3 Oct 2009, 0:11:11 UTC - in response to Message 30816.  

I am happy to hear it is now working. I just bought an ATI card today (incredible I know) and have the drivers working for Folding@Home as a test. However, I seem not to get any work from Milkyway. I suspect it has to do with the fact that I run an AGP HD3850 on an old machine with Duron CPU. It does not support SSE2, only SSE. My questions:

1. do I need to manually download the GPU app ?
2. If so, where is it ?!
3. Is there a GPU app for ATI which does NOT require SSE2 ?

thanks !
Hydro.
ID: 31897 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileHydropower
Avatar

Send message
Joined: 2 Apr 09
Posts: 16
Credit: 1,517,499
RAC: 0
1 million credit badge10 year member badge
Message 31915 - Posted: 3 Oct 2009, 8:49:32 UTC - in response to Message 31897.  

I'm up and running.
ID: 31915 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
steveplanetary
Avatar

Send message
Joined: 19 May 09
Posts: 16
Credit: 20,273
RAC: 0
10 thousand credit badge10 year member badge
Message 31936 - Posted: 4 Oct 2009, 4:02:44 UTC - in response to Message 11261.  

On a P3 1233 Mhz with Win2000 the new stock app v0.19 with 55-57 min is faster than the previous optimized win32 app v0.16SSE with 1:15h.
Nice work Travis - Thanks ;)

As stated before, there are no improvements in the codebase of the relabeled v0.19 opti app beside the additional logging. So SSE optimized app v0.16 and v0.19 for windows should give the same runtimes.
Does this mean the stockapp is better for old PCs with only SSE?


The x87 and SSE versions are created using different compilers. So it is perfectly possible that the x87 version is faster than the SSE one, as SSE cannot be used for the time consuming stuff here at MW requiring double precision.
Also be aware there are WUs with different length floating around. One should compare only the times of similar ones.


Now I'm really confused, since I recently installed a Windows v0.20 Microsoft compiled opti app for SSE from zslip. I haven't been using it long enough to judge the results for myself, but I would be interested in hearing from people 'in the know' about using this app with a mobile Athlon-XP 2400+ with Windows XP Pro. For one thing the discussion seems to be centered around v0.16 and v0.19 opti apps. What about v0.20? I'm especially curious to know, since it has been stated that SSE doesn't support double precision, how does an app that's optimized for SSE handle that problem? Does the CPU switch between SSE and x86 instructions to achieve double precision FP results, or does it return single precision results? I don't know how single precision results could be acceptable. I just wanted to increase my throughput so as to make as great a contribution as possible. And ultimately I will have to judge for myself, but that will be difficult, since my DCF isn't quite where it should be. Credits don't mean anything to me. I just like to see a constant, steep slope on the graph. Answers to my questions and advice, especially from you developers/programmers would be greatly appreciated.

Steveplanetary
ID: 31936 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
50 million credit badge10 year member badgeextraordinary contributions badge
Message 31951 - Posted: 4 Oct 2009, 10:43:44 UTC - in response to Message 31936.  

As I said already, the x87 and the SSE versions are built using a different compiler. As SSE isn't useful for MW, you see just how well they play with different CPUs, the calculations in the SSE version are also done with the x87 FPU. The code of the x87 and the SSE version is identical just a different compiler was used.

The x87 version is built using the intel compiler and is obviously quite slow on AthlonXPs (but not so on Athlon64 and up). The SSE version built with MSVC2005 more or less uses the SSE flag just as a hint to the compiler it should optimize for P3 and AthlonXP generation CPUs and nothing older. It probably shows a more balanced performance (not the huge fluctuations one sees with the intel compiler) between different CPUs.

So it may be posible that on some CPUs the x87 version is faster while on other systems the SSE version wins. I don't know what Travis uses to compile the stock app, but probably it shows even another behaviour with different CPUs.

That the stock app is able to beat my versions on some configurations is a testament that I was not talking complete crap as I claimed that most improvements got implemented to it. So the stock app isn't performing bad. Only the vectorized SSE2/3 versions are of course better as they are able to fully use the resources of the CPUs. And I have mentioned several times already that this vectorization is done by the compiler, not by hand. So there is not that much I could share with Travis to enable a further improvement as the differences between all the versions are mostly the compiler and some switches.

I hope that all is said about this issue now.

On a P3 1233 Mhz with Win2000 the new stock app v0.19 with 55-57 min is faster than the previous optimized win32 app v0.16SSE with 1:15h.
Nice work Travis - Thanks ;)

As stated before, there are no improvements in the codebase of the relabeled v0.19 opti app beside the additional logging. So SSE optimized app v0.16 and v0.19 for windows should give the same runtimes.
Does this mean the stockapp is better for old PCs with only SSE?


The x87 and SSE versions are created using different compilers. So it is perfectly possible that the x87 version is faster than the SSE one, as SSE cannot be used for the time consuming stuff here at MW requiring double precision.
Also be aware there are WUs with different length floating around. One should compare only the times of similar ones.


Now I'm really confused, since I recently installed a Windows v0.20 Microsoft compiled opti app for SSE from zslip. I haven't been using it long enough to judge the results for myself, but I would be interested in hearing from people 'in the know' about using this app with a mobile Athlon-XP 2400+ with Windows XP Pro. For one thing the discussion seems to be centered around v0.16 and v0.19 opti apps. What about v0.20? I'm especially curious to know, since it has been stated that SSE doesn't support double precision, how does an app that's optimized for SSE handle that problem? Does the CPU switch between SSE and x86 instructions to achieve double precision FP results, or does it return single precision results? I don't know how single precision results could be acceptable. I just wanted to increase my throughput so as to make as great a contribution as possible. And ultimately I will have to judge for myself, but that will be difficult, since my DCF isn't quite where it should be. Credits don't mean anything to me. I just like to see a constant, steep slope on the graph. Answers to my questions and advice, especially from you developers/programmers would be greatly appreciated.

Steveplanetary

ID: 31951 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
steveplanetary
Avatar

Send message
Joined: 19 May 09
Posts: 16
Credit: 20,273
RAC: 0
10 thousand credit badge10 year member badge
Message 31962 - Posted: 4 Oct 2009, 17:48:20 UTC - in response to Message 31951.  

Cluster, I really appreciate the in-depth answer to my question. I wasn't hoping for Yes or No since, as a computer nerd since the '90s, when I built my own computers, I always like to learn. It was also nice to hear it all in one post. Thanks again. You add a lot to this Message Board.

Steve
ID: 31962 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfilePaul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
100 million credit badge10 year member badge
Message 31979 - Posted: 5 Oct 2009, 7:25:16 UTC - in response to Message 31962.  

Cluster, I really appreciate the in-depth answer to my question. I wasn't hoping for Yes or No since, as a computer nerd since the '90s, when I built my own computers, I always like to learn. It was also nice to hear it all in one post. Thanks again. You add a lot to this Message Board.

For those interested in the subject of optimization there is a fascinating HP project called "Dynamo (a list of papers) with the best maybe being: Transparent Dynamic Optimization: The Design and Implementation of Dynamo.

What fascinated me about this concept was that HP was getting faster run times while running some software under a software emulator than they got when they ran that software on the hardware natively ...

I grant that this probably would not help much in the BOINC universe, but, still, software running faster on a software emulator of the base hardware?
ID: 31979 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Odd-Rod

Send message
Joined: 7 Sep 07
Posts: 442
Credit: 1,387,450
RAC: 41
1 million credit badge10 year member badge
Message 31988 - Posted: 5 Oct 2009, 19:43:25 UTC - in response to Message 31979.  

but, still, software running faster on a software emulator of the base hardware?


Hmm, and if that software is another instance of the emulator it could be faster still. Run enough levels of emulations within emulations and you could have instant computing! :D :D


ID: 31988 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Terry Stick

Send message
Joined: 5 Sep 09
Posts: 3
Credit: 267,349
RAC: 0
100 thousand credit badge10 year member badge
Message 32853 - Posted: 28 Oct 2009, 22:06:15 UTC
Last modified: 28 Oct 2009, 22:10:44 UTC

I have a computer with ATI 4890 OC and AMD Phenom II X4 920 processor. I am using boinc 6.10.16 and the ATI drivers are 9.9 on WinXP32.

I have been trying the optimized applications for the ATI GPU. I have tried different versions of boinc manager, 18 and 19 version of optimized applications and even tried Folding@home alone. All those applications and Folding@home using the 4890 OC will crash the computer.

The only optimized application I can use is the SSE3 version of 0.19 for the CPU and it works fine. Any application that uses the GPU of the 4890 OC will crash the computer. The power supply I am using is 750 watts and I am thinking that the problem is that the video card is the overclocked and I need to underclock it. In addition I tried the 8.12 and the 9.1 video drivers.
ID: 32853 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 3 · 4 · 5 · 6

Message boards : Number crunching : v0.18/v0.19 issues here

©2019 Astroinformatics Group