Welcome to MilkyWay@home

Lastest Stock Apps - Optimized or Not

Message boards : Number crunching : Lastest Stock Apps - Optimized or Not
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Sean Arrowsmith

Send message
Joined: 30 Mar 09
Posts: 15
Credit: 15,856,582
RAC: 0
Message 49529 - Posted: 24 Jun 2011, 11:03:07 UTC

I have jut restarted MW after the last Optimized apps were dropped arounf April time.

I have just reattached to MW and using the stock apps on a Win7 64bit with GTX460

Are the stock apps optimised versions or are there going to be NEW optimized version coming soon especially for CUDA


I want to run 2 WU on my GTX460, the current stock app is only running one WU on the GTX460, can some one provide an app_info.xml file for the stock apps so I can modify the CUDA count to allow 2 WU

Thanks
ID: 49529 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 49530 - Posted: 24 Jun 2011, 12:26:02 UTC - in response to Message 49529.  
Last modified: 24 Jun 2011, 12:26:47 UTC

I have jut restarted MW after the last Optimized apps were dropped arounf April time.

I have just reattached to MW and using the stock apps on a Win7 64bit with GTX460

Are the stock apps optimised versions or are there going to be NEW optimized version coming soon especially for CUDA

Just the past week a 'optimized' stock app was released. I am down to 9-10 hours on my P4 Xp for the de_separation_13_3s tasks. A large improvement over the previous app. Still seems slightly slower than the old Opti apps.



I want to run 2 WU on my GTX460, the current stock app is only running one WU on the GTX460, can some one provide an app_info.xml file for the stock apps so I can modify the CUDA count to allow 2 WU

Thanks
I know it was posted somewhere.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 49530 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 49534 - Posted: 24 Jun 2011, 15:42:10 UTC

<app_info>
  <app>
    <name>milkyway</name>
  </app>
  <file_info>
    <name>milkyway_separation_0.82_windows_intelx86__cuda_opencl.exe</name>
    <executable />
  </file_info>
  <app_version>
    <app_name>milkyway</app_name>
    <version_num>82</version_num>
    <flops>1.0e11</flops>
    <avg_ncpus>0.05</avg_ncpus>
    <max_ncpus>1</max_ncpus>
    <plan_class>cuda</plan_class>
    <coproc>
      <type>CUDA</type>
      <count>1</count>
    </coproc>
	<cmdline></cmdline>
    <file_ref>
      <file_name>milkyway_separation_0.82_windows_intelx86__cuda_opencl.exe</file_name>
      <main_program/>
    </file_ref>
  </app_version>
</app_info>


New features:
- Faster CPU calculations because of SSE2 intrinsics from Crunch3r. This should get a bit faster later when I get to some other stuff.

- The binaries now include the different SSE levels for the critical function (e.g. x87/SSE2/SSE3) all in one, and will use the appropriate one for the detected CPU's capabilities, so you don't need any special __sse2 version or anything. The problems on systems without SSE2 which happened sometimes for the GPU applications should also be fixed.

- Checkpointing for GPUs for both OpenCL/Nvidia and ATI/AMD for CAL. It will checkpoint no more frequently than after at least 10% progress, so it might not checkpoint as frequently as your settings specify if you have a particularly slow GPU. GPU checkpointing is a bit slow when it does happen, so if you don't want it, you can disable it with the flag --gpu-disable-checkpointing.

- The Nvidia OpenCL platform should now always be used avoiding issues if you had both AMD's and Nvidia's installed

- More reliable chunking for different GPUs. I didn't play with the time estimates so much, but I think the actual run times should now be closer to the estimates. There should also be fewer cases where a GPU will end up using the slowest possible work size option. I'll probably have to fiddle with this some more if people still aren't happy with the lag.

- Work around for a Catalyst driver problem where sometimes the GPU was reported as 0 Mhz, resulting in much more lag which I think was some peoples' problem.

- New flag that some people requested: --process-priority (-b). On Windows this is 0 (lowest) - 4 (highest) for overriding the process priority. On Linux this is the nice value.

- CAL specific: Removed --responsiveness-factor flag. Use --gpu-target-frequency or --non-responsive instead depending on what you want to do.

- Actually updated the 32-bit OS X application

- In the event of a crash on Windows, you should no longer be bothered with useless crash dialogs

ID: 49534 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Sean Arrowsmith

Send message
Joined: 30 Mar 09
Posts: 15
Credit: 15,856,582
RAC: 0
Message 49556 - Posted: 25 Jun 2011, 3:19:26 UTC - in response to Message 49534.  

Thanks, after a bit of searching I found these OPTI apps on your site.

I did notice a mistake with the downloaded Win64bit CPU files, in the app_info for the CPU version you are missing a < from the ending tag of name for the first opti app. see below. Upon running BOINC it complained about the missing app and delete it, upon placing the < into the file makes it work.

<app_info>
<app>
<name>milkyway</name>
</app>
<file_info>
<name>milkyway_separation_0.88_windows_x86_64.exe/name>
<executable />
</file_info>
<app_version>
<app_name>milkyway</app_name>
<version_num>88</version_num>
<cmdline></cmdline>
<file_ref>
<file_name>milkyway_separation_0.88_windows_x86_64.exe</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info>

<app_info>
  <app>
    <name>milkyway</name>
  </app>
  <file_info>
    <name>milkyway_separation_0.82_windows_intelx86__cuda_opencl.exe</name>
    <executable />
  </file_info>
  <app_version>
    <app_name>milkyway</app_name>
    <version_num>82</version_num>
    <flops>1.0e11</flops>
    <avg_ncpus>0.05</avg_ncpus>
    <max_ncpus>1</max_ncpus>
    <plan_class>cuda</plan_class>
    <coproc>
      <type>CUDA</type>
      <count>1</count>
    </coproc>
	<cmdline></cmdline>
    <file_ref>
      <file_name>milkyway_separation_0.82_windows_intelx86__cuda_opencl.exe</file_name>
      <main_program/>
    </file_ref>
  </app_version>
</app_info>


New features:
- Faster CPU calculations because of SSE2 intrinsics from Crunch3r. This should get a bit faster later when I get to some other stuff.

- The binaries now include the different SSE levels for the critical function (e.g. x87/SSE2/SSE3) all in one, and will use the appropriate one for the detected CPU's capabilities, so you don't need any special __sse2 version or anything. The problems on systems without SSE2 which happened sometimes for the GPU applications should also be fixed.

- Checkpointing for GPUs for both OpenCL/Nvidia and ATI/AMD for CAL. It will checkpoint no more frequently than after at least 10% progress, so it might not checkpoint as frequently as your settings specify if you have a particularly slow GPU. GPU checkpointing is a bit slow when it does happen, so if you don't want it, you can disable it with the flag --gpu-disable-checkpointing.

- The Nvidia OpenCL platform should now always be used avoiding issues if you had both AMD's and Nvidia's installed

- More reliable chunking for different GPUs. I didn't play with the time estimates so much, but I think the actual run times should now be closer to the estimates. There should also be fewer cases where a GPU will end up using the slowest possible work size option. I'll probably have to fiddle with this some more if people still aren't happy with the lag.

- Work around for a Catalyst driver problem where sometimes the GPU was reported as 0 Mhz, resulting in much more lag which I think was some peoples' problem.

- New flag that some people requested: --process-priority (-b). On Windows this is 0 (lowest) - 4 (highest) for overriding the process priority. On Linux this is the nice value.

- CAL specific: Removed --responsiveness-factor flag. Use --gpu-target-frequency or --non-responsive instead depending on what you want to do.

- Actually updated the 32-bit OS X application

- In the event of a crash on Windows, you should no longer be bothered with useless crash dialogs

ID: 49556 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 49557 - Posted: 25 Jun 2011, 4:36:27 UTC

That is what happens when you are copying and pasting names quickly.
ID: 49557 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Crunch3r
Volunteer developer
Avatar

Send message
Joined: 17 Feb 08
Posts: 363
Credit: 258,227,990
RAC: 0
Message 49576 - Posted: 25 Jun 2011, 18:40:24 UTC - in response to Message 49530.  
Last modified: 25 Jun 2011, 18:42:45 UTC


Just the past week a 'optimized' stock app was released. I am down to 9-10 hours on my P4 Xp for the de_separation_13_3s tasks. A large improvement over the previous app. Still seems slightly slower than the old Opti apps.


Yes, the new stock opti apps are slower. The stock app is using a dispatcher that chooses a code path (SSE level) which is supported on your CPU and the rest of the code is not optimized at all.

(The whole new build system is also preventing me from releasing some more tuned binaries using the intel compiler etc... it's a real pain in the ass and i hate that cmake crap!!)

A major part of the optimizations is still missing, hopfully Matt will find the time to integrate it into the stock app. That one will boost performance again and should outperform the old optimized cpu apps by a few percent.

Join Support science! Joinc Team BOINC United now!
ID: 49576 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Crunch3r
Volunteer developer
Avatar

Send message
Joined: 17 Feb 08
Posts: 363
Credit: 258,227,990
RAC: 0
Message 49595 - Posted: 26 Jun 2011, 17:07:36 UTC - in response to Message 49576.  
Last modified: 26 Jun 2011, 17:13:51 UTC

Here's a 'optimized' CPU app which was compiled using the Intel(R) C++ Compiler XE 12.0.4.196 for Windows.

A SSE2 compatible CPU is required (AMD & Intel)!
Difference is that we're using Intels LibM especially the exp(e^x) function, which is faster than the 'stock' SSE2 polyn. eval....

download -> MilkyWay Separation SSE2 Intel&AMD

Join Support science! Joinc Team BOINC United now!
ID: 49595 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 49599 - Posted: 26 Jun 2011, 18:01:00 UTC

When I use the app_info.xml, must there also be an entry for the N-body in it?
ID: 49599 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Crunch3r
Volunteer developer
Avatar

Send message
Joined: 17 Feb 08
Posts: 363
Credit: 258,227,990
RAC: 0
Message 49600 - Posted: 26 Jun 2011, 18:06:08 UTC - in response to Message 49599.  
Last modified: 26 Jun 2011, 18:08:47 UTC

When I use the app_info.xml, must there also be an entry for the N-body in it?


You don't have to run n-body at all (why isn't it possible in the user prefs to disable n-body ???).

The included app-info.xml doesn't have an entry for n-body so you're only going to run separation WUs.

Join Support science! Joinc Team BOINC United now!
ID: 49600 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 49602 - Posted: 26 Jun 2011, 18:44:37 UTC

You don't have to run n-body at all (why isn't it possible in the user prefs to disable n-body ???).


Ähm I'm a little confused about that - why should't I do n-body wu's?
When they pay less - that's not a reason, somebody has to do it.
ID: 49602 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Senilix

Send message
Joined: 8 Aug 08
Posts: 30
Credit: 74,566,409
RAC: 0
Message 49613 - Posted: 26 Jun 2011, 23:14:45 UTC - in response to Message 49595.  

Here's a 'optimized' CPU app which was compiled using the Intel(R) C++ Compiler XE 12.0.4.196 for Windows.

A SSE2 compatible CPU is required (AMD & Intel)!
Difference is that we're using Intels LibM especially the exp(e^x) function, which is faster than the 'stock' SSE2 polyn. eval....

download -> MilkyWay Separation SSE2 Intel&AMD


I am getting a 404 when i'm trying to download that opti app.
ID: 49613 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Len LE/GE

Send message
Joined: 8 Feb 08
Posts: 261
Credit: 104,050,322
RAC: 0
Message 49615 - Posted: 27 Jun 2011, 1:06:45 UTC - in response to Message 49595.  

Here's a 'optimized' CPU app which was compiled using the Intel(R) C++ Compiler XE 12.0.4.196 for Windows.

A SSE2 compatible CPU is required (AMD & Intel)!
Difference is that we're using Intels LibM especially the exp(e^x) function, which is faster than the 'stock' SSE2 polyn. eval....

download -> MilkyWay Separation SSE2 Intel&AMD


Can you do a Win 32bit ATI version with that actual code version too? Would be great to have the working initial wait, which is in the code since v0.88.
ID: 49615 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 49626 - Posted: 27 Jun 2011, 14:02:23 UTC
Last modified: 27 Jun 2011, 14:02:55 UTC

The new opti app (0.91) performes good. I watched an increase of performance up to 8% on my old XEON's.

thx a lot.
But today the download link is empty - why?

greetings Franz
ID: 49626 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 49628 - Posted: 27 Jun 2011, 14:58:42 UTC - in response to Message 49626.  

The new opti app (0.91) performes good. I watched an increase of performance up to 8% on my old XEON's.

thx a lot.
But today the download link is empty - why?

greetings Franz

Downloaded too many times?
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 49628 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 49631 - Posted: 27 Jun 2011, 15:27:56 UTC

I think I know!!

They get all invalid :-(

I return to stock app.
ID: 49631 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 49726 - Posted: 28 Jun 2011, 22:07:43 UTC - in response to Message 49576.  

(The whole new build system is also preventing me from releasing some more tuned binaries using the intel compiler etc... it's a real pain in the ass and i hate that cmake crap!!)
It shouldn't be. If you have a problem building with ICC I'll fix it.
ID: 49726 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Crunch3r
Volunteer developer
Avatar

Send message
Joined: 17 Feb 08
Posts: 363
Credit: 258,227,990
RAC: 0
Message 50045 - Posted: 10 Jul 2011, 22:45:10 UTC - in response to Message 49631.  
Last modified: 10 Jul 2011, 23:38:37 UTC

I think I know!!

They get all invalid :-(

I return to stock app.


Yes, that's why i pulled the download...(and sorry for the long time it took me to reply).. anyway with a propper solution file for MS VS without having to digg through useless cmake txt CRAP files that can't even generate proper makefiles without crashing/erroring out using the latest cmake downloads, it was just a guessing game miswsing some vital config parameters (TG Math... yeah, we got that using ICC and MW_SINCOS is computing CRAP using ICC...)...

Anyway... while digging through this unessassary cmake txt file BS... i finally got it working,linking and validating....

(if anyone want's to compile the code using a proper VS solution project (2005 or 2010)without the retarded hassle to digg through useless cmake txt files... let me know... i'll upload them to ease the pain i was ging through to get everything compiled and linked...)

So.. now that it works... we got a few new apps supporting Intel SSE4.1, Intel SSE3 and Intel/AMD SSE2 (PENTIUM4_SSE2 and AMD SSE2)

for those that know code take a look at -> http://board.mpits.net/viewtopic.php?f=32&t=77.. that one includes source code that replaces the stock 0.91 source from github (should hopefully be integratated in the next stock source code...!(Matt, that's your part :p))

Anyway...
all new optimized apps are linked, downloadable at http://www.mpits.net/opt_mw.php
(do not hotlink the zip files or modify them without permission!!!)

Optimized apps for Pentium4 (SSE2/SSE3) and AMD CPUs using AMD_SSE2 tuned ops(K8,K10) will be added tomorrow... stay tuned and look at http://www.mpits.net/opt_mw.php or http://board.mpits.net/viewtopic.php?f=32&t=77 for updates!!!

Changelog:

- NEW using GROMACS exp_pd function for SSE2 and SSE4.1(addidional 5% faster)(see code)

- NEW using _mm_fsqrt_pd (SSE approx. converting to SSE1(RCP_SRQT) and SSE2 newton raphson stuff... up to 52 bit precission) (see code)

- NEW using PENTIUM4 _mm_div_pd replacement function (see code)

- NEW, faster AMD_SSE(K8,K9,K10) _mm_div_pd replacement (see code)

JOIN BOINC United to get exclusive access to new prelelease optimized GPU & CPU apps!

Join Support science! Joinc Team BOINC United now!
ID: 50045 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Crunch3r
Volunteer developer
Avatar

Send message
Joined: 17 Feb 08
Posts: 363
Credit: 258,227,990
RAC: 0
Message 50047 - Posted: 11 Jul 2011, 0:33:40 UTC - in response to Message 50045.  
Last modified: 11 Jul 2011, 0:36:16 UTC

My own machines:

Dual Quad Xeon 5365 ES (8 cores)-> SSE3 app -> http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=62008

Core i3 @ 2.13 GHz / HT enabled (4 threads) SSE4.1 app -> http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=281287

Join Support science! Joinc Team BOINC United now!
ID: 50047 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Martin Chartrand
Avatar

Send message
Joined: 25 Mar 09
Posts: 65
Credit: 53,099,671
RAC: 0
Message 50048 - Posted: 11 Jul 2011, 1:35:58 UTC
Last modified: 11 Jul 2011, 2:10:29 UTC

I too show a 404 error for the optimized link to 4.1 and 4.2 compatible Chipset.
I downloaded the sse3 for now till it is fixed.
Or should just I keep the version optimized v0.88 for now?


Well..... rendered about 8 unusable. Reverted back to 0.88 for now until it finishes the rest.
I have a NVidia GTX 285 and a Q9550 processor.....

I also plan on reutilizing my 2nd computer that has a ATI 4870 running the old 0.19 apps on it. It did quite great but I disconnected it a while ago for basement reno. Can I just restart it and running it again or this is all obsolete stuff now?

Martin
ID: 50048 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
FruehwF

Send message
Joined: 28 Feb 10
Posts: 120
Credit: 109,840,492
RAC: 0
Message 50052 - Posted: 11 Jul 2011, 9:04:29 UTC

I'll test it on one of my machines and than i give feedback in a few hours.

greetings

Franz

ID: 50052 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Lastest Stock Apps - Optimized or Not

©2024 Astroinformatics Group