Welcome to MilkyWay@home

OpenCL for Nvidia available for testing

Message boards : News : OpenCL for Nvidia available for testing
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Zeddicus

Send message
Joined: 30 May 10
Posts: 2
Credit: 2,351
RAC: 0
Message 44678 - Posted: 4 Dec 2010, 18:56:24 UTC - in response to Message 44676.  
Last modified: 4 Dec 2010, 18:56:53 UTC

By the way: GeForce 8800 GTS (driver 26099, CUDA version 3020, compute capability 1.0).
That GPU doesn't have doubles and won't work. It needs at least compute capability 1.3.

Thanks for the information! But why did milkyway sent me any WUs then?
ID: 44678 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 44681 - Posted: 4 Dec 2010, 20:31:43 UTC - in response to Message 44678.  

By the way: GeForce 8800 GTS (driver 26099, CUDA version 3020, compute capability 1.0).
That GPU doesn't have doubles and won't work. It needs at least compute capability 1.3.

Thanks for the information! But why did milkyway sent me any WUs then?
If you tried manually installing this, it will try sending the workunits to it. You shouldn't get sent the Nvidia applications since you don't have doubles.
ID: 44681 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Werkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 141,284,369
RAC: 0
Message 44682 - Posted: 4 Dec 2010, 22:19:57 UTC - in response to Message 44617.  

Is anyone running it in a system with both ATI and Nvidia drivers installed? I just realized a hypothetical problem that might happen.


I have now a system running which has GTX460 and HD4850, running both MW and Collatz on both GPU's. No problems seen in the last 12 hours.
http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=168415
ID: 44682 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile cenit

Send message
Joined: 16 Mar 09
Posts: 58
Credit: 1,129,612
RAC: 0
Message 44683 - Posted: 4 Dec 2010, 22:29:31 UTC - in response to Message 44681.  
Last modified: 4 Dec 2010, 22:29:53 UTC

By the way: GeForce 8800 GTS (driver 26099, CUDA version 3020, compute capability 1.0).
That GPU doesn't have doubles and won't work. It needs at least compute capability 1.3.

Thanks for the information! But why did milkyway sent me any WUs then?
If you tried manually installing this, it will try sending the workunits to it. You shouldn't get sent the Nvidia applications since you don't have doubles.

to be honest, this is not the way boinc should be set to work.
In your server, you should analyze the host requiring work and even if he has installed the app manually, the server should not send work because the host didn't satisfy all the requirements.
If you leave it in this way it is really easy to trick your server into doing really bad things!
ID: 44683 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 44684 - Posted: 4 Dec 2010, 23:12:14 UTC - in response to Message 44683.  

to be honest, this is not the way boinc should be set to work.
In your server, you should analyze the host requiring work and even if he has installed the app manually, the server should not send work because the host didn't satisfy all the requirements.
If you leave it in this way it is really easy to trick your server into doing really bad things!
This is related to the problem I think is most annoying in BOINC. I think everything involving the version and capability management in BOINC should be better; pretty much everything about app_info.xml and the scheduling isn't user friendly and is inflexible. The plan class system is inflexible; adding versions with different system requirements involves modifying the server code and it isn't composable for different features.

app_info.xml doesn't handle updates and requires far too much manual intervention for what most people want. You might only want to run GPU workunits, or only N-body on the CPU and separation on the GPU, but right now you can't do that. You can say no CPU or no GPU, but not on a per application basis. You have to manually download files, put in the same information in several places in app_info.xml, and then you're stuck with whatever version you happened to install unless you go out of your way to update it.

There should be a finer grain way of telling the server what capabilities different applications and possibly workunits require, and on the client there should be a way of specifying capabilities you want to use with an actual interface of some sort beyond just stating that manually installed application X should be used in an XML file, and it should handle updates automatically. I kind of want to work on a replacement that's more intelligent, but I don't really have the time.
ID: 44684 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bill

Send message
Joined: 15 Jul 09
Posts: 12
Credit: 45,145,989
RAC: 0
Message 44686 - Posted: 5 Dec 2010, 1:28:43 UTC - in response to Message 44588.  

OK Matt, if you're still interested, your new app takes a little over 17 minutes to complete 1 opencl wu. This is with Seti@Home working on both cpus (e6600@2.4GHz) with Folding@home working also. Windows XP32, Nvidia driver 206.63.

Palit GeForce GTS 450 (Fermi) Sonic Platinum 1GB
ID: 44686 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 44690 - Posted: 5 Dec 2010, 3:10:01 UTC - in response to Message 44588.  

The update I've posted should help with better errors when the GPU doesn't have doubles, and should help with system responsiveness on lower end GPUs.
ID: 44690 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>EDLS] Polynesia
Avatar

Send message
Joined: 5 Apr 09
Posts: 71
Credit: 6,120,786
RAC: 0
Message 44716 - Posted: 5 Dec 2010, 22:35:48 UTC
Last modified: 5 Dec 2010, 22:51:30 UTC

What updates? an updated application?
Team Alliance francophone, boinc: 7.0.18

GA-P55-UD5, i7 860, Win 7 64 bits, 8g DDR3, GTX 470
ID: 44716 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 44717 - Posted: 5 Dec 2010, 22:50:35 UTC - in response to Message 44716.  

What updates? an updated application?
The second set of links I added to the original post.
ID: 44717 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>EDLS] Polynesia
Avatar

Send message
Joined: 5 Apr 09
Posts: 71
Credit: 6,120,786
RAC: 0
Message 44718 - Posted: 5 Dec 2010, 22:56:44 UTC

ok, thank you but I did not understand what the minor updates of the application?
Team Alliance francophone, boinc: 7.0.18

GA-P55-UD5, i7 860, Win 7 64 bits, 8g DDR3, GTX 470
ID: 44718 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Astromancer.
Avatar

Send message
Joined: 21 Nov 09
Posts: 49
Credit: 20,942,758
RAC: 0
Message 44749 - Posted: 6 Dec 2010, 22:34:27 UTC
Last modified: 6 Dec 2010, 22:38:15 UTC

With 48.0 I found that the runtimes are over 2m longer than with the CUDA app on my GTX260. 15:40 - 16:00 for OpenCL and 13:22 - 13:31 for CUDA.

The memory bandwith used on the card is up to about 50% as well which is a HUGE jump from CUDA (0%). As well as having a larger system memory footprint. Just found that interesting, it doesn't particularly concern me.

And it also uses more CPU time than the CUDA app, though I didn't notice the cpu being used while I was watching the task manager. Does it use most / all of the cycles at the start or end of the WU or something?
ID: 44749 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 44754 - Posted: 6 Dec 2010, 23:38:54 UTC - in response to Message 44749.  

With 48.0 I found that the runtimes are over 2m longer than with the CUDA app on my GTX260. 15:40 - 16:00 for OpenCL and 13:22 - 13:31 for CUDA.
Are you sure you're comparing the same things? Some of the newer workunits are quite a bit larger than they have been in the past. If it's not that, I there might be some mysterious problem I haven't quite figured out where there's a mysterious drop in performance at some points with how I break the problem up to keep the system responsive. There seemed to be strange peaks in run time at some points I try, and I haven't quite figured out a good rule for different GPUs. I think I had something, but I haven't actually played with it on slower GPUs. For me on the 285, it seems to be about 2% faster than the CUDA one.

The memory bandwith used on the card is up to about 50% as well which is a HUGE jump from CUDA (0%). As well as having a larger system memory footprint. Just found that interesting, it doesn't particularly concern me.
I don't see why that would happen. There's basically no transfer done except at the beginning / end.

And it also uses more CPU time than the CUDA app, though I didn't notice the cpu being used while I was watching the task manager. Does it use most / all of the cycles at the start or end of the WU or something?
Quite likely. I think the old CUDA one did the final likelihood calculation on the GPU, which takes a few seconds at the end on the CPU, but isn't actually worth the effort to do it on the GPU.
ID: 44754 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Astromancer.
Avatar

Send message
Joined: 21 Nov 09
Posts: 49
Credit: 20,942,758
RAC: 0
Message 44755 - Posted: 6 Dec 2010, 23:46:48 UTC - in response to Message 44754.  

Matt,

I downloaded the WU's all today and within about an hour of each other. I did 4 OpenCL then 3 CUDA before posting and another few CUDA after posting (With the same type of run time seen).

The memory bandwith usage struck me as odd as well which is why I posted it. I was running GPU-z to watch what was going on a bit to make sure it wasn't say using 50% of the GPU core or something and noticed that the "Memory Controller Load" was reading at 50% or over. I've only ever seen that with SETI before.

I'll give a try with 48.1 and see if anything different happens. If there is some kind of test WU I can run through the command line or the like to help you out any, I'd be more than willing to do it.
ID: 44755 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 44757 - Posted: 6 Dec 2010, 23:59:31 UTC - in response to Message 44755.  

I downloaded the WU's all today and within about an hour of each other. I did 4 OpenCL then 3 CUDA before posting and another few CUDA after posting (With the same type of run time seen).
The current workunits aren't uniformly larger. There is a mixture of different sizes out right now, so it's hard to tell if this means anything without knowing which workunits you ran on each.
ID: 44757 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Astromancer.
Avatar

Send message
Joined: 21 Nov 09
Posts: 49
Credit: 20,942,758
RAC: 0
Message 44761 - Posted: 7 Dec 2010, 2:36:23 UTC - in response to Message 44757.  

After I posted the last one I figured you would need more details since the deleter is deleting things right away. (Looks like I was right) So I went about getting them for you.

One other thing I noticed on my system is that the OpenCL tasks sit at 100% with the clock still going for about 30s.

I'll PM you with all the info so I don't make a huge post full of data useless to anyone but you and Travis.
ID: 44761 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
europa

Send message
Joined: 29 Oct 10
Posts: 89
Credit: 39,246,947
RAC: 0
Message 44805 - Posted: 7 Dec 2010, 22:39:51 UTC - in response to Message 44676.  

Matt,

I wanted to give you an update on my use of the OpenCL app.

To re-cap, I'm running 64-bit Ubuntu 10.10 on an AMD quad-core with 8GB of DDR2 RAM and a GTX460 Fermi card with the latest Nvidia driver from their website.

1. Simply extracting the tar to the top-level MW folder did not work since the executable that it put there was for cuda23, not Open CL. The WU's continued to be retrieved as cuda23 WU's and the completed one's continued to fail validation.

2. Having noticed the app.xml and correct executable had been extracted into a sub-folder in the MW main folder, I first moved them to another non-MW folder and then copied the contents into the main MW folder with the rest of the extracted files.

3. I then deleted the cuda23 executable.

4. Based on discussion at Collatz, I copied libcudart32_23.so into the MW folder.

5. I suspended the WU's and exited Boinc Manager.

6. I then opened a terminal window as root and typed "service boinc-client restart" [Enter] and closed the terminal window.

7. I re-started Boinc Manager and the apps and quickly saw work units id's as "open_cl" work units along with the notation that they were using 0.5CPU and 1.0 GPU. The WU's so far have taken about 15-20 min. to process vs. a few hours before.

6. However, at first,the WU's were coming back the message "completed, validation inconclusive". Since then, it has changed to "Successful" so I guess I'm ok.

Here is the stderr_text for one of them.
Task 263496737

Name de_separation_17_3s_fix_1_1719797_1291478684_1
Workunit 198514068
Created 4 Dec 2010 16:09:29 UTC
Sent 4 Dec 2010 16:10:24 UTC
Received 4 Dec 2010 19:06:06 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 228451
Report deadline 12 Dec 2010 16:10:24 UTC
Run time 1519.988412
CPU time 8.83
stderr out

<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
<search_application> milkywayathome separation 0.48 Linux x86_64 double OpenCL </search_application>
Found 1 platforms
Platform 0 information:
Platform name: NVIDIA CUDA
Platform version: OpenCL 1.0 CUDA 3.2.1
Platform vendor:
Platform profile:
Platform extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll
Using device 0 on platform 0
Found 1 CL devices
Device GeForce GTX 460 (NVIDIA Corporation:0x10de)
Type: CL_DEVICE_TYPE_GPU
Driver version: 260.19.21
Version: OpenCL 1.0 CUDA
Compute capability: 2.1
Little endian: CL_TRUE
Error correction: CL_FALSE
Image support: CL_TRUE
Address bits: 32
Max compute units: 7
Clock frequency: 1502 Mhz
Global mem size: 804454400
Max mem alloc: 201113600
Global mem cache: 114688
Cacheline size: 128
Local mem type: CL_LOCAL
Local mem size: 49152
Max const args: 9
Max const buf size: 65536
Max parameter size: 4352
Max work group size: 1024
Max work item dim: 3
Max work item sizes: { 1024, 1024, 64 }
Mem base addr align: 4096
Min type align size: 128
Timer resolution: 1000 ns
Double extension: MW_CL_KHR_FP64
Extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64

Compiler flags:
-cl-mad-enable -cl-no-signed-zeros -cl-strict-aliasing -cl-finite-math-only -DUSE_CL_MATH_TYPES=0 -DUSE_MAD=0 -DUSE_FMA=0 -cl-nv-verbose -DDOUBLEPREC=1 -DMILKYWAY_MATH_COMPILATION -DNSTREAM=3 -DFAST_H_PROB=1 -DAUX_BG_PROFILE=0 -DUSE_IMAGES=1 -DI_DONT_KNOW_WHY_THIS_DOESNT_WORK_HERE=1

Build status: CL_BUILD_SUCCESS
Build log:

: Considering profile 'compute_20' for gpu='sm_21' in 'cuModuleLoadDataEx_4'
Kernel work group info:
Work group size = 512
Kernel local mem size = 0
Compile work group size = { 0, 0, 0 }
Lower n solution: n = 40, x = 0
Higher n solution: n = 40, x = 0
Using solution: n = 40, x = 0
Range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Iteration area: 2240000
Chunk estimate: 40
Num chunks: 40
Added area: 0
Effective area: 2240000
Integration time: 795.240523 s. Average time per iteration = 1242.563318 ms
Kernel work group info:
Work group size = 512
Kernel local mem size = 0
Compile work group size = { 0, 0, 0 }
Lower n solution: n = 38, x = 1792
Higher n solution: n = 50, x = 0
Using solution: n = 38, x = 1792
Range: { nu_steps = 640, mu_steps = 400, r_steps = 1400 }
Iteration area: 560000
Chunk estimate: 40
Num chunks: 38
Added area: 1792
Effective area: 561792
Integration time: 343.485032 s. Average time per iteration = 536.695363 ms
Kernel work group info:
Work group size = 512
Kernel local mem size = 0
Compile work group size = { 0, 0, 0 }
Lower n solution: n = 38, x = 1792
Higher n solution: n = 50, x = 0
Using solution: n = 38, x = 1792
Range: { nu_steps = 640, mu_steps = 400, r_steps = 1400 }
Iteration area: 560000
Chunk estimate: 40
Num chunks: 38
Added area: 1792
Effective area: 561792
Integration time: 369.818321 s. Average time per iteration = 577.841126 ms
<background_integral> 0.00049448328945061429 </background_integral>
<stream_integrals> 98.00489805211894633885 736.88664944943548107403 0.00661912358812184829 </stream_integrals>
<background_only_likelihood> -3.28428541304979715321 </background_only_likelihood>
<stream_only_likelihood> -35.67127293699287093887 -4.01220139598113423318 -231.34968260617702640047 </stream_only_likelihood>
<search_likelihood> -3.04403874542270713732 </search_likelihood>
12:51:02 (3471): called boinc_finish

</stderr_txt>
]]>

Validate state Checked, but no consensus yet
Claimed credit 0.0784035175711632
Granted credit 0
application version Anonymous platform

I assume since it's not failing, that it is using the Fermi at the double-precision level.

Sorry for the delay in posting this, the computer quit and it me a couple of days to fix it. Thanks for getting me back in the game.

Regards,
Steve
ID: 44805 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 44812 - Posted: 7 Dec 2010, 23:48:17 UTC - in response to Message 44805.  

Thanks to everyone you posted information. I've looked at the pieces, and I think I've pieced together why I made the slower GPUs slower; I half missed something obvious.
ID: 44812 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 44820 - Posted: 8 Dec 2010, 7:21:52 UTC - in response to Message 44588.  

I've posted another minor update which should hopefully fix being slower on some GPUs.
ID: 44820 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>EDLS] Polynesia
Avatar

Send message
Joined: 5 Apr 09
Posts: 71
Credit: 6,120,786
RAC: 0
Message 44853 - Posted: 9 Dec 2010, 18:40:21 UTC
Last modified: 9 Dec 2010, 18:55:06 UTC

Big trouble!!

I'm trying now the 0.48.2 and I am having slowdowns my PC that I was not with 0.48.1 ... In addition, the unit does not and this calculation is boosting the temperature of my card ...

GPU load is yet to 99% ... but 0% CPU

I do not think so in this case recalculate these units 0.48.2 ...

I still do not think that just my version of boinc: 6.12.8?
Team Alliance francophone, boinc: 7.0.18

GA-P55-UD5, i7 860, Win 7 64 bits, 8g DDR3, GTX 470
ID: 44853 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
europa

Send message
Joined: 29 Oct 10
Posts: 89
Credit: 39,246,947
RAC: 0
Message 44859 - Posted: 9 Dec 2010, 22:57:17 UTC - in response to Message 44820.  

Matt,

The OpenCL setup continues to work like a charm on the first machine.

I've assembled a second machine that is virtually identical. Aside from being an AMD 6 core vs. AMD quad-core and having DDR3 RAM vs. DDR2, they are the same. I even made a point of getting the same model graphics card (MSI Twin-Frozr GTX-460). Both are 64-bit Ubuntu 10.10 and also have the backward compatibility 32-bit libraries. I updated the Nvidia driver, I've matched the permissions with those on the working machine BUTno matter what I do, I cannot get milkyway_separation_0.48_x86_64-pc-linux-gnu__cuda_opencl to run on this machine. The cuda 23 variant keeps re-appearing in the folder and executing. I've tried multiple bulk deletes of the MW folder contents and re-extractions of the OpenCL tar but I always end up with the cuda23 executable reappearing in the folder and taking over.

When I start up Boinc it sees the Fermi card.

As I said earlier, things continue to run like a charm on the first machine and this one is virtually identical. I don't figure out the problem.

Any suggestions? Thanks for any help that you can provide.

Regards,
Steve

ID: 44859 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : News : OpenCL for Nvidia available for testing

©2024 Astroinformatics Group