Welcome to MilkyWay@home

OpenCL for Nvidia available for testing


Advanced search

Message boards : News : OpenCL for Nvidia available for testing
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
CTAPbIi

Send message
Joined: 4 Jan 10
Posts: 86
Credit: 51,753,924
RAC: 0
50 million credit badge11 year member badge
Message 44622 - Posted: 3 Dec 2010, 17:46:54 UTC - in response to Message 44620.  
Last modified: 3 Dec 2010, 17:47:24 UTC

The WUs are running fine now. :-)

how many secs on which card?
ID: 44622 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilearkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
50 million credit badge12 year member badge
Message 44623 - Posted: 3 Dec 2010, 17:53:12 UTC

Here is the stderr that came up on my 460.

Run time	614.917938
CPU time	31.6875
stderr out	

<core_client_version>6.12.6</core_client_version>
<![CDATA[
<stderr_txt>
<search_application> milkywayathome separation 0.48 Windows x86 double OpenCL </search_application>
Found 1 platforms
Platform 0 information:
  Platform name:       NVIDIA CUDA
  Platform version:    OpenCL 1.0 CUDA 3.2.1
  Platform vendor:     
  Platform profile:    
  Platform extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll 
Using device 0 on platform 0
Found 1 CL devices
Device GeForce GTX 460 (NVIDIA Corporation:0x10de)
Type:                CL_DEVICE_TYPE_GPU
Driver version:      263.06
Version:             OpenCL 1.0 CUDA
Compute capability:  2.1
Little endian:       CL_TRUE
Error correction:    CL_FALSE
Image support:       CL_TRUE
Address bits:        32
Max compute units:   7
Clock frequency:     1600 Mhz
Global mem size:     804847616
Max mem alloc:       201211904
Global mem cache:    114688
Cacheline size:      128
Local mem type:      CL_LOCAL
Local mem size:      49152
Max const args:      9
Max const buf size:  65536
Max parameter size:  4352
Max work group size: 1024
Max work item dim:   3
Max work item sizes: { 1024, 1024, 64 }
Mem base addr align: 4096
Min type align size: 128
Timer resolution:    1000 ns
Double extension:    MW_CL_KHR_FP64
Extensions:          cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll  cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 

Compiler flags:
-cl-mad-enable -cl-no-signed-zeros -cl-strict-aliasing -cl-finite-math-only -DUSE_CL_MATH_TYPES=0 -DUSE_MAD=0 -DUSE_FMA=0 -cl-nv-verbose   -DDOUBLEPREC=1 -DMILKYWAY_MATH_COMPILATION -DNSTREAM=3 -DFAST_H_PROB=1 -DAUX_BG_PROFILE=0 -DUSE_IMAGES=1 -DI_DONT_KNOW_WHY_THIS_DOESNT_WORK_HERE=1  

Build status: CL_BUILD_SUCCESS
Build log: 

: Considering profile 'compute_20' for gpu='sm_21' in 'cuModuleLoadDataEx_4'
Kernel work group info:
  Work group size = 512
  Kernel local mem size = 0
  Compile work group size = { 0, 0, 0 }
Lower n solution: n = 40, x = 0
Higher n solution: n = 40, x = 0
Using solution: n = 40, x = 0
Range:          { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Iteration area: 2240000
Chunk estimate: 40
Num chunks:     40
Added area:     0
Effective area: 2240000
Integration time: 584.577879 s. Average time per iteration = 913.402936 ms
<background_integral> 0.00057622614096211838 </background_integral>
<stream_integrals> 105.68890505238272000000 172.60556751830708000000 161.27537590087846000000 </stream_integrals>
<background_only_likelihood> -3.29958249045773530000 </background_only_likelihood>
<stream_only_likelihood> -37.34692154949922100000 -4.75605645991618700000 -3.79966623105932080000 </stream_only_likelihood>
<search_likelihood> -3.00484287881831240000 </search_likelihood>
10:44:59 (2892): called boinc_finish

</stderr_txt>
]]>

Validate state	Valid
Claimed credit	0.177164871378224
Granted credit	213.760359413782

ID: 44623 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
CTAPbIi

Send message
Joined: 4 Jan 10
Posts: 86
Credit: 51,753,924
RAC: 0
50 million credit badge11 year member badge
Message 44624 - Posted: 3 Dec 2010, 18:02:28 UTC
Last modified: 3 Dec 2010, 18:04:44 UTC

615 secs??? wow... even if u run 2 WUs concurrent, it too much. my 4870 run it in 325 secs and 4890 in 312 secs. but taking in consideration dp cropped by nvidia in fermi cards...

But anyway I'll try it on GTX275
ID: 44624 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilearkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
50 million credit badge12 year member badge
Message 44625 - Posted: 3 Dec 2010, 18:06:01 UTC

Yes I know, my 5830 runs the same type of unit in 130 seconds.
ID: 44625 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
CTAPbIi

Send message
Joined: 4 Jan 10
Posts: 86
Credit: 51,753,924
RAC: 0
50 million credit badge11 year member badge
Message 44628 - Posted: 3 Dec 2010, 18:19:45 UTC
Last modified: 3 Dec 2010, 18:22:57 UTC

So, there is room for improvement :-)
ID: 44628 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
10 million credit badge11 year member badge
Message 44630 - Posted: 3 Dec 2010, 18:24:45 UTC - in response to Message 44628.  

So, there is some room for inprovement :-)
The theoretical performance of doubles on ATI hardware is much higher than on Nvidia, so it's not expected to match that. It's expected to be about the same or marginally faster than the old CUDA application.
ID: 44630 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilearkayn
Avatar

Send message
Joined: 14 Feb 09
Posts: 999
Credit: 74,932,619
RAC: 0
50 million credit badge12 year member badge
Message 44631 - Posted: 3 Dec 2010, 18:30:19 UTC - in response to Message 44630.  

So, there is some room for inprovement :-)
The theoretical performance of doubles on ATI hardware is much higher than on Nvidia, so it's not expected to match that. It's expected to be about the same or marginally faster than the old CUDA application.


With the old CUDA app that Crunch3r fixed for Fermi cards I was running around 11 minutes for the 213 point units.
ID: 44631 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
CTAPbIi

Send message
Joined: 4 Jan 10
Posts: 86
Credit: 51,753,924
RAC: 0
50 million credit badge11 year member badge
Message 44632 - Posted: 3 Dec 2010, 18:32:30 UTC

yep, agree. dp is 1/8 of sp instead of 1/2... so, there are no changes in my plans to get pair of 6970 :-)

when u plan to release OpenCL app for Ati? :-)
ID: 44632 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileWerkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 139,214,996
RAC: 1,854
100 million credit badge13 year member badge
Message 44636 - Posted: 3 Dec 2010, 21:27:07 UTC

I have now a couple of wu's finished.
11min 34sec is a typical time for de_separation_16_3s_fix_1_1137162_...
GTX460 @ 715MHz, win7-64, E8400, BM 6.10.58, Forceware 260.99
GPU-Usage ~98%, Mem usage 255MB (Afterburner)
No invalid or errors logged (but they don't stay long..)
ID: 44636 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
One World, One Dream

Send message
Joined: 26 Dec 09
Posts: 1
Credit: 615,993
RAC: 0
500 thousand credit badge11 year member badge
Message 44638 - Posted: 3 Dec 2010, 22:39:42 UTC

I have tested the new OpenCL app with a Geforce GT 420m notebook GPU.

When I used Crunch3er's optimized app in the past, my work units took either 49 or 73 minutes to complete. With the new OpenCL app, work units need either 51 or 75 minutes to complete. I do not know why the new app is actually a bit slower, maybe it is because I was surfing on the web and displaying some images while running the work units? (though simple tasks like this never seemed to affect the work units of the old app by Crunch3er)

Furthermore, there is a significant difference in system responsiveness and a slight difference in GPU temperature between the two apps.

While Crunch3er's app did not cause any negative effects regarding system responsiveness, the new OpenCL app causes the system to react very sluggishly, so that comfortably writing something or surfing the web is not possible anymore.

Lastly, GPU temperature with the OpenCL app is about 3 degrees celsius higher than with Crunch3er's app. That is why I have now switched back to the old app for the time being.

I hope my information gathered about the performance of the new app is helpful.
ID: 44638 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
europa

Send message
Joined: 29 Oct 10
Posts: 89
Credit: 39,246,947
RAC: 0
30 million credit badge10 year member badge
Message 44644 - Posted: 4 Dec 2010, 1:00:55 UTC - in response to Message 44630.  

Matt,

I followed the link above to extract the tar file into the MW sub-dir under boinc-client on /var however I am still getting the exact same error message about the app not finding a double-precision card even though it accurately identifies the Fermi. I'm on 64-bit Ubuntu.

In addition, all of the WU's are id'd as cuda 23 units, there is no mention in the error message or in the WU log of Open CL.

I notice that in the app_info.xml it refers to:

<file_info>
<name>milkyway_separation_0.48_x86_64-pc-linux-gnu__cuda_opencl</name>
<executable/>
</file_info>

However, there is no such file. The only executable that it unpacked was:
milkyway_0.24_x86_64-pc-linux-gnu__cuda23

Also, does:
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>

refer to the number of processors or to its ID number? Mine always comes up as GPU0 which is why I ask.

Thanks,
Steve
ID: 44644 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
10 million credit badge11 year member badge
Message 44645 - Posted: 4 Dec 2010, 1:11:20 UTC - in response to Message 44644.  

Matt,

I followed the link above to extract the tar file into the MW sub-dir under boinc-client on /var however I am still getting the exact same error message about the app not finding a double-precision card even though it accurately identifies the Fermi. I'm on 64-bit Ubuntu.

In addition, all of the WU's are id'd as cuda 23 units, there is no mention in the error message or in the WU log of Open CL.

I notice that in the app_info.xml it refers to:


milkyway_separation_0.48_x86_64-pc-linux-gnu__cuda_opencl



However, there is no such file. The only executable that it unpacked was:
milkyway_0.24_x86_64-pc-linux-gnu__cuda23
Well BOINC is rather eager to delete anything that isn't mentioned in any of the XML files. It looks like something else was wrong, and then this got deleted and it attempted to download and use the CUDA one. You might need to chown what you extract to boinc:boinc for it to work. It seems to be unhappy when the boinc user doesn't own the files.
Also, does:

CUDA
1


refer to the number of processors or to its ID number?
It's the count of GPUs that will be used. The application only uses 1, so it should always be 1.
ID: 44645 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
10 million credit badge11 year member badge
Message 44646 - Posted: 4 Dec 2010, 2:07:15 UTC - in response to Message 44645.  

Well BOINC is rather eager to delete anything that isn't mentioned in any of the XML files. It looks like something else was wrong, and then this got deleted and it attempted to download and use the CUDA one. You might need to chown what you extract to boinc:boinc for it to work. It seems to be unhappy when the boinc user doesn't own the files.
Actually I just checked this. It doesn't need to be owned by boinc, but otherwise you need to be in the boinc group and the stuff needs to be group readable and executable.
ID: 44646 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
tolafoph

Send message
Joined: 24 Nov 10
Posts: 1
Credit: 41,702
RAC: 0
10 thousand credit badge10 year member badge
Message 44654 - Posted: 4 Dec 2010, 10:14:19 UTC - in response to Message 44646.  

ID: 44654 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
europa

Send message
Joined: 29 Oct 10
Posts: 89
Credit: 39,246,947
RAC: 0
30 million credit badge10 year member badge
Message 44657 - Posted: 4 Dec 2010, 12:38:46 UTC - in response to Message 44645.  

Matt,

Thanks for the feedback. I see where I wasn't the owner for some of the files.

I think that I've caught all of them.

It sounds like I should purge the WU's in progress and re-extract the tar?

Thanks,
Steve
ID: 44657 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Evil Penguin

Send message
Joined: 9 Nov 09
Posts: 9
Credit: 19,728,072
RAC: 0
10 million credit badge11 year member badge
Message 44660 - Posted: 4 Dec 2010, 14:43:30 UTC

Sorry to go a bit off topic, but will the ATi OpenCL version come out soon?
ID: 44660 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTravis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
10 thousand credit badge14 year member badge
Message 44664 - Posted: 4 Dec 2010, 16:22:00 UTC - in response to Message 44624.  

615 secs??? wow... even if u run 2 WUs concurrent, it too much. my 4870 run it in 325 secs and 4890 in 312 secs. but taking in consideration dp cropped by nvidia in fermi cards...

But anyway I'll try it on GTX275


nvidia GPUs aren't nearly as fast as the ATI GPUs for double precision calculations. So that's really not too bad.
ID: 44664 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zeddicus

Send message
Joined: 30 May 10
Posts: 2
Credit: 2,351
RAC: 0
1 credit badge11 year member badge
Message 44675 - Posted: 4 Dec 2010, 18:13:46 UTC - in response to Message 44664.  

I'm taking part in some cpu-based projects (like climateprediction.net and yoyo@home) and was looking for another project to run on my gpu (besides SETI). In the past milkyway told me that my gpu was lacking memory so i thought that maybe the OpenCL version would run. After installing the package and updating my NVIDIA driver to 260.99 I was happy to get some WUs - but they all ended up with "calculating error". - Okay, let's do it step by step... So at first I've updated Boinc to 6.10.58. Now milkyway says at start-up "Message from server: Your app_info.xml file doesn't have a version of MilkyWay@Home N-Body Simulation."

Hmmh, what does that tell me? Did I make any mistake? Or do I need the formerly mentioned 3.2 cudatoolkit from that guy "Crunch3er"? Any help is appreciated!

By the way: GeForce 8800 GTS (driver 26099, CUDA version 3020, compute capability 1.0).

Greetings from Germany,
Axel

P.S.: Bad English? Maybe that's because I've left school 25 years ago...
ID: 44675 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
10 million credit badge11 year member badge
Message 44676 - Posted: 4 Dec 2010, 18:44:15 UTC - in response to Message 44675.  

By the way: GeForce 8800 GTS (driver 26099, CUDA version 3020, compute capability 1.0).
That GPU doesn't have doubles and won't work. It needs at least compute capability 1.3.
ID: 44676 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile[AF>EDLS] Polynesia
Avatar

Send message
Joined: 5 Apr 09
Posts: 71
Credit: 6,120,786
RAC: 0
5 million credit badge12 year member badge
Message 44677 - Posted: 4 Dec 2010, 18:45:41 UTC
Last modified: 4 Dec 2010, 18:49:39 UTC

app_info essayez avec ce fichier:

<app_info>
<app>
<name>milkyway</name>
</app>
<file_info>
<name>milkyway_0.45_windows_x86_64.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>milkyway</app_name>
<version_num>45</version_num>
<file_ref>
<file_name>milkyway_0.45_windows_x86_64.exe</file_name>
<main_program/>
</file_ref>
</app_version>

<app>
<name>milkyway_nbody</name>
<user_friendly_name>MilkyWay@Home nbody Simulation</user_friendly_name>
</app>
<file_info>
<name>milkyway_nbody_0.21_windows_x86_64__sse2.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>milkyway_nbody</app_name>
<version_num>21</version_num>
<file_ref>
<file_name>milkyway_nbody_0.21_windows_x86_64__sse2.exe</file_name>
<main_program/>
</file_ref>
</app_version>

<app>
<name>milkyway</name>
<user_friendly_name>Milkyway@home Separation</user_friendly_name>
</app>
<file_info>
<name>milkyway_separation_0.48_windows_intelx86__cuda_opencl.exe</name>
<executable/>
</file_info>

<app_version>
<app_name>milkyway</app_name>
<version_num>47</version_num>
<plan_class>cuda_opencl</plan_class>
<avg_ncpus>0.05</avg_ncpus>
<max_ncpus>0.05</max_ncpus>
<flops>1.0e11</flops>
<coproc>
<type>CUDA</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>milkyway_separation_0.48_windows_intelx86__cuda_opencl.exe</file_name>
<main_program/>
</file_ref>
</app_version>

<app>
<name>milkyway</name>
</app>
<file_info>
<name>milkyway_windows_intelx86__cuda23.exe</name>
<executable/>
</file_info>
<file_info>
<name>cudart.dll</name>
<executable/>
</file_info>
<file_info>
<name>cutil32.dll</name>
<executable/>
</file_info>

<app_version>
<app_name>milkyway</app_name>
<version_num>24</version_num>
<plan_class>cuda23</plan_class>
<flops>1.0e11</flops>
<avg_ncpus>0.1</avg_ncpus>
<max_ncpus>0.1</max_ncpus>
<coproc>
<type>CUDA</type>
<count>1.0</count>
</coproc>
<cmdline></cmdline>
<file_ref>
<file_name>milkyway_windows_intelx86__cuda23.exe</file_name>
<main_program/>
</file_ref>
<file_ref>
<file_name>cudart.dll</file_name>
</file_ref>
<file_ref>
<file_name>cutil32.dll</file_name>
</file_ref>
</app_version>

</app_info>
Team Alliance francophone, boinc: 7.0.18

GA-P55-UD5, i7 860, Win 7 64 bits, 8g DDR3, GTX 470
ID: 44677 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : News : OpenCL for Nvidia available for testing

©2021 Astroinformatics Group