Message boards :
Application Code Discussion :
PPC64 Optimizations
Message board moderation
Author | Message |
---|---|
Send message Joined: 11 Mar 09 Posts: 6 Credit: 352,439 RAC: 0 |
Has anyone fooled around with optimizations for the PPC64 platform used for Powermac G5 Linux distributions, and the Cell BE platform etc. I haven't seen any posts on this subject, but perhaps its buried somewhere. I may fool around with this myself if no one else has tried anything. |
Send message Joined: 8 Nov 08 Posts: 178 Credit: 6,140,854 RAC: 0 |
Has anyone fooled around with optimizations for the PPC64 platform used for Powermac G5 Linux distributions, and the Cell BE platform etc. I haven't seen any posts on this subject, but perhaps its buried somewhere. I may fool around with this myself if no one else has tried anything. BOINC doesn't support the PowerPC 64 platform anymore. You could always do your own optimizations and compile though, and then use the application using the anonymous platform mechanism. |
Send message Joined: 11 Mar 09 Posts: 6 Credit: 352,439 RAC: 0 |
BOINC doesn't support the PowerPC 64 platform anymore. You could always do your own optimizations and compile though, and then use the application using the anonymous platform mechanism. Somewhat officially the PPC64 version of Boinc and a PS3 specific PPC64 port is up-kept at http://www.dotsch.de/boinc/BOINC%20Clients.html -- I've had success with these ports, though only a few projects, such as SETI have working builds. |
Send message Joined: 8 Nov 08 Posts: 178 Credit: 6,140,854 RAC: 0 |
BOINC doesn't support the PowerPC 64 platform anymore. You could always do your own optimizations and compile though, and then use the application using the anonymous platform mechanism. Oops, sorry... I thought you were referring to the PowerPC 64 platform on OS X, which isn't officially supported. As for whether any PPC64 optimization has been done, presumably not. At this point, I think PPC64 represent a significant minority, so the most work has been on getting them optimized for the x86 platform and its derivatives. |
Send message Joined: 11 Mar 09 Posts: 6 Credit: 352,439 RAC: 0 |
BOINC doesn't support the PowerPC 64 platform anymore. You could always do your own optimizations and compile though, and then use the application using the anonymous platform mechanism. Yeah understandably, well I might mess around with it and will let you all know if I have any success. |
Send message Joined: 18 Feb 09 Posts: 8 Credit: 2,424,453 RAC: 0 |
I've built BOINC client for cell/be (ps3) linux, and compiled a working ppu-g++ CELL optimized milkyway client (you can look at my account computers). However, it's quite slow, as it's only using the PPU at this time. I plan to work on the client with SPE support whenever I have time. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
I've built BOINC client for cell/be (ps3) linux, and compiled a working ppu-g++ CELL optimized milkyway client (you can look at my account computers). However, it's quite slow, as it's only using the PPU at this time. I plan to work on the client with SPE support whenever I have time. When it runs faster I'd give it a shot if/when I get linux going on my PS3. Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
I've built BOINC client for cell/be (ps3) linux, and compiled a working ppu-g++ CELL optimized milkyway client (you can look at my account computers). However, it's quite slow, as it's only using the PPU at this time. I plan to work on the client with SPE support whenever I have time. In principle the SPEs could be quite fast for single precision tasks. But last time I checked the SPEs of the Cell version used in the PS3 are not that well suited to double precision, isn't it? It works, but performance isn't great (maybe twice that of the PPE). One would like to use a PowerXCell 8i for MW. Would be faster than a GTX280. Anyone tried to mod his PS3 with such a beast? Just kidding. |
Send message Joined: 2 Jan 08 Posts: 23 Credit: 495,882,464 RAC: 0 |
I plan to work on the client with SPE support whenever I have time. I'm also trying to do it. Perhaps we can work togather ? |
Send message Joined: 2 Jan 08 Posts: 23 Credit: 495,882,464 RAC: 0 |
SPEs have 128 bit registers. Each ops can be executed on one quad precision, on two double precission or on four single precision. So, if the code could be vectorized, double precision is only two times slower than single. Perhaps it could be more quicly optimized if you could send me cpu sources part of your ATI optimization. Thanks in advance. Thierry. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
In principle the SPEs could be quite fast for single precision tasks. But last time I checked the SPEs of the Cell version used in the PS3 are not that well suited to double precision, isn't it? It works, but performance isn't great I don't think so. To cite the wikipeadia article on that topic: For double-precision floating point operations, as sometimes used in personal computers and often used in scientific computing, Cell performance drops by an order of magnitude, but still reaches 14 GFLOPS (the PowerXCell 8i variant, which was specifically designed for double-precision, reaches 102.4 GFLOPS in double-precision calculations). The single PPE should be already half the speed for double precision (6.4 GFlops peak) as all 6 usable SPEs together. Perhaps it could be more quicly optimized if you could send me cpu sources part of your ATI optimization. Thanks in advance.I only exchanged the calls to calculate_probabilities from calculate_integral with GPU versions. I doubt that the setup of some arrays and the transfer to the RAM of the graphics card needed for that would be helpful. I guess it works a bit different with Cell. But as a general recommendation, don't try to port too much for calculation on the SPEs. One really needs only the core of calculate_integral as this represents ~99.9% of the total operations. So probably the best would be to divide the integral calculation in 6 same sized parts (one for each SPE) and schedule them simultaneously. If you want to use also the PPE (as I mentioned quite powerful on double precision compared to the SPEs) you could divide it maybe in parts sized 3:1:1:1:1:1:1 or so. But wasn't the PPE multithreaded (2 threads)? How is this working with the BOINC client? Two WUs in parallel or are you using only one thread? @ mfl0p: Just seen the result on your PS3. 20,000 seconds for less than 4 TFlop (29 credit unit). That hurts badly. It's less than 200 MFlop/s on the PPE. Something has to be wrong there. Have you used the latest 0.18d sources? If yes, it appears the binary does not use the hardware resources (AltiVec) of the PPE but some kind of emulation mode. The PPE of the Cell runs also at 3.2GHz, isn't it? |
Send message Joined: 18 Feb 09 Posts: 8 Credit: 2,424,453 RAC: 0 |
That PS3 host with the ugly times is not running just milkyway, it's loaded pretty good, BOINC might be getting half of the PPU to work on. So it's going to have some really ugly times. The app was compiled with 0.18d and CELL-specific GCC compiler (with CELL-optimized compiling flags used). I did this quick PPU compile as a proof of concept to see if it would run. I do have another dedicated PS3 that I can run some real speed tests on in the future, pretty much barebones linux running in runlevel 3 that I ssh into. The CELL is similar to hyperthreading with respect to the PPU, the OS sees 2 processors (clocked at 3.2ghz). There are six SPEs available under Linux. I have already found out what you noted on calculate_integral, doing some tests to see what loops the app spends most of its time. |
Send message Joined: 20 Jan 09 Posts: 9 Credit: 17,289,621 RAC: 0 |
Unless you write SPE specific code, and plug it into the app - the performance you are going to see is that of 2 Power5 PPC chips using AltiVec. Which is dated (but not bad) The trick will be taking some function in the MW code (a large FFT would be good), and then using the SPUs to solve it. I am not sure of the state of opensource Cell libraries, but there must be something. Also, just using gcc on the linux command line will only generate PPE code (Power5 code), the SPU code must be compiled separately and then linked to the PPE code. |
Send message Joined: 18 Feb 09 Posts: 8 Credit: 2,424,453 RAC: 0 |
yes, just finding something in the MilyWay app to offload to the SPEs that they can calculate quickly, while at the same time not slowing the PPE app down due to DMA transfers is challenging. |
©2024 Astroinformatics Group