Welcome to MilkyWay@home

GTX 480

Message boards : Number crunching : GTX 480
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile [B^S] thierry@home
Avatar

Send message
Joined: 28 Aug 07
Posts: 20
Credit: 5,558,437
RAC: 0
Message 43075 - Posted: 21 Oct 2010, 21:21:34 UTC

I have just installed a GTX 480 and all the WUs die immediately in a Calculation error.
Must I install something?
Thanks
Tired to crunch alone? Join BOINC Synergy, the most exciting team in the galaxy.

.Join now!

ID: 43075 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [B^S] thierry@home
Avatar

Send message
Joined: 28 Aug 07
Posts: 20
Credit: 5,558,437
RAC: 0
Message 43077 - Posted: 21 Oct 2010, 21:56:37 UTC

Found the opti app. It works fine :-)
Tired to crunch alone? Join BOINC Synergy, the most exciting team in the galaxy.

.Join now!

ID: 43077 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
kepan

Send message
Joined: 28 Oct 10
Posts: 8
Credit: 80,345,093
RAC: 0
Message 43471 - Posted: 4 Nov 2010, 18:27:28 UTC - in response to Message 43077.  

What is 'the opti app.'? I have just installed an ASUS GTX480 and all work ends up in calculation error.
Boinc shows: NVIDIA GPU 0: GeForce GTX 480 (driver version 26099, CUDA version 3020, compute capability 2.0, 1503MB, 778 GFLOPS peak)

ID: 43471 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Sutaru Tsureku

Send message
Joined: 30 Apr 09
Posts: 101
Credit: 29,871,931
RAC: 386
Message 43476 - Posted: 4 Nov 2010, 20:14:23 UTC - in response to Message 43471.  

ID: 43476 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 43483 - Posted: 4 Nov 2010, 22:51:35 UTC
Last modified: 4 Nov 2010, 22:51:58 UTC

ID: 43483 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Crunch3r
Volunteer developer
Avatar

Send message
Joined: 17 Feb 08
Posts: 363
Credit: 258,227,990
RAC: 0
Message 43531 - Posted: 5 Nov 2010, 22:46:32 UTC - in response to Message 43471.  
Last modified: 5 Nov 2010, 22:56:30 UTC

What is 'the opti app.'? I have just installed an ASUS GTX480 and all work ends up in calculation error.
Boinc shows: NVIDIA GPU 0: GeForce GTX 480 (driver version 26099, CUDA version 3020, compute capability 2.0, 1503MB, 778 GFLOPS peak)


FWIW, it's not an "optimized" app. I just a recompile of the stock .024 cuda app code with a minor change to allow processing on GT4XX cards.

I only did this, because the project "devs" were not able/unwilling to recompile the app to add the support for the GT4XX cards...

It's just a few lines of code that need to be modded...



int choose_cuda_13()
{
  //check for and find a CUDA 1.3 (double precision)
  //capable card
  int device_count;
  cutilSafeCall(cudaGetDeviceCount(&device_count));
  fprintf(stderr, "Found %d CUDA cards\n", device_count);
  if (device_count < 1)
    {
      fprintf(stderr, "No CUDA cards found, you cannot run the GPU version\n");
      exit(1);
    }
  int *eligable_devices = (int*)malloc(sizeof(int) * device_count);
  int eligable_device_idx = 0;
  int max_gflops = 0;
  int device;
  char *chosen_device = 0;
  for(int idx = 0;idx<device_count;++idx)
    {
      cudaDeviceProp deviceProp;
      cutilSafeCall(cudaGetDeviceProperties(&deviceProp, idx));
      fprintf(stderr, "Found a %s\n", deviceProp.name);
      if ((deviceProp.major == 1 && deviceProp.minor == 3) || deviceProp.major == 2)
	{
	  eligable_devices[eligable_device_idx++] = idx;
	  fprintf(stderr, "Device can be used it has double precision support\n");
	  //check how many gflops it has
	  int gflops = deviceProp.multiProcessorCount * deviceProp.clockRate;
	  if (gflops >= max_gflops)
	    {
	      max_gflops = gflops;
	      device = idx;
	      if (chosen_device)
		free(chosen_device);
	      chosen_device = (char*) malloc(sizeof(char) * strlen(deviceProp.name)+1);
	      strncpy(chosen_device, deviceProp.name, strlen(deviceProp.name));
	      chosen_device[strlen(deviceProp.name)] = '\0';
	    }
	}
      else
	{
	  fprintf(stderr, "Device cannot be used, it does not have double precision support\n");
	}
    }
  free(eligable_devices);
  if (eligable_device_idx < 1) {
    fprintf(stderr, "No compute capability 1.3 or 2.x cards have been found, exiting...\n");
    free(chosen_device);
    return -1;
  } else {
    fprintf(stderr, "Chose device %s\n", chosen_device);
    cutilSafeCall(cudaSetDevice(device));  
    free(chosen_device);
    return device;
  }
}


Anyway, from looking at the code of the 0.24 app (and especially it's somewhat faulty successor v0.25).. there's still room for improving the code dramatically... the approximation of the exp function in in 0.25 is a good start but it's quite slow that way...

There are options to approximate the exp using some modified remez algorithm...
for SSE2 capable CPUs, it looks like this...

static inline __m128d _mm_fexp_poly_pd(__m128d x1) // precise,but slower than _mm_exp_pd
{
/*remez11_0_log2_sse*/
__m128i k1;
__m128d p1,a1;
__m128d xmm0, xmm1;

const __m128i offset = _mm_setr_epi32(1023, 1023, 0, 0);
x1 = _mm_min_pd(x1, _mm_set1_pd( 129.00000));
x1 = _mm_max_pd(x1, _mm_set1_pd(-126.99999));

        xmm0 = _mm_load_pd(log2e);
        xmm1 = _mm_setzero_pd();
        a1 = _mm_mul_pd(x1, xmm0);
        /* k = (int)floor(a); p = (float)k; */
        p1 = _mm_cmplt_pd(a1, xmm1);
        p1 = _mm_and_pd(p1, DONE);
        a1 = _mm_sub_pd(a1, p1);
        k1 = _mm_cvttpd_epi32(a1); // ipart
        p1 = _mm_cvtepi32_pd(k1);
        /* x -= p * log2; */
        xmm0 = _mm_load_pd(c1);
        xmm1 = _mm_load_pd(c2);
        a1 = _mm_mul_pd(p1, xmm0);
        x1 = _mm_sub_pd(x1, a1);
        a1 = _mm_mul_pd(p1, xmm1);
        x1 = _mm_sub_pd(x1, a1);
        /* Compute e^x using a polynomial approximation. */

		xmm0 = _mm_load_pd(w11);
        xmm1 = _mm_load_pd(w10);

		a1 = _mm_mul_pd(x1, xmm0);
        a1 = _mm_add_pd(a1, xmm1);
        xmm0 = _mm_load_pd(w9);
        xmm1 = _mm_load_pd(w8);
        a1 = _mm_mul_pd(a1, x1);
        a1 = _mm_add_pd(a1, xmm0);
        a1 = _mm_mul_pd(a1, x1);
        a1 = _mm_add_pd(a1, xmm1);

		xmm0 = _mm_load_pd(w7);
        xmm1 = _mm_load_pd(w6);
        a1 = _mm_mul_pd(a1, x1);
        a1 = _mm_add_pd(a1, xmm0);
        a1 = _mm_mul_pd(a1, x1);
        a1 = _mm_add_pd(a1, xmm1);

        xmm0 = _mm_load_pd(w5);
        xmm1 = _mm_load_pd(w4);
        a1 = _mm_mul_pd(a1, x1);
        a1 = _mm_add_pd(a1, xmm0);
        a1 = _mm_mul_pd(a1, x1);
        a1 = _mm_add_pd(a1, xmm1);

        xmm0 = _mm_load_pd(w3);
        xmm1 = _mm_load_pd(w2);
        a1 = _mm_mul_pd(a1, x1);
        a1 = _mm_add_pd(a1, xmm0);
        a1 = _mm_mul_pd(a1, x1);
        a1 = _mm_add_pd(a1, xmm1);

        xmm0 = _mm_load_pd(w1);
        xmm1 = _mm_load_pd(w0);
        a1 = _mm_mul_pd(a1, x1);
        a1 = _mm_add_pd(a1, xmm0);
        a1 = _mm_mul_pd(a1, x1);
        a1 = _mm_add_pd(a1, xmm1);
        /* p = 2^k; */
        k1 = _mm_add_epi32(k1, offset);
        k1 = _mm_slli_epi32(k1, 20);
        k1 = _mm_shuffle_epi32(k1, _MM_SHUFFLE(1,3,0,2));
        p1 = _mm_castsi128_pd(k1);
        /* a *= 2^k. */
        return  _mm_mul_pd(a1, p1);

}
  


This can be modified for cuda as well...
Anyway... I might do that when i have the time to do so, or modify Gipsels approach using a LUT, which is quite faster..Using my implementation of it, it beats Gipsels CPU SSE3 version by 20% in performance but it's not precise enough... Something i might work on when i have the time to do so...

Basically it's just fun beating the crap out of the stock apps... having a good laugh at the stock code and of course, waiting for the OpenCL ATI app... that'll be the icing on the cake and history will repeat itself....

Join Support science! Joinc Team BOINC United now!
ID: 43531 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>EDLS] Polynesia
Avatar

Send message
Joined: 5 Apr 09
Posts: 71
Credit: 6,120,786
RAC: 0
Message 43532 - Posted: 5 Nov 2010, 23:27:28 UTC - in response to Message 43483.  

Also on my site.

http://www.arkayn.us/milkyway


the application is for 0.21 and it is at 0.45 ...
Team Alliance francophone, boinc: 7.0.18

GA-P55-UD5, i7 860, Win 7 64 bits, 8g DDR3, GTX 470
ID: 43532 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile arkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
Message 43535 - Posted: 6 Nov 2010, 2:28:50 UTC - in response to Message 43532.  

I do not have the CPU apps linked right now, only the GPU apps.
ID: 43535 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>EDLS] Polynesia
Avatar

Send message
Joined: 5 Apr 09
Posts: 71
Credit: 6,120,786
RAC: 0
Message 43546 - Posted: 6 Nov 2010, 11:18:41 UTC - in response to Message 43535.  

[Quote] Je n'ai pas le processeur applications liées à l'heure actuelle, seules les applications GPU. [/ Quote]

Hello,

When is an application optimized for Nvidia? thank you
Team Alliance francophone, boinc: 7.0.18

GA-P55-UD5, i7 860, Win 7 64 bits, 8g DDR3, GTX 470
ID: 43546 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : GTX 480

©2024 Astroinformatics Group