Welcome to MilkyWay@home

PPC64 Optimizations

Message boards : Application Code Discussion : PPC64 Optimizations
Message board moderation

To post messages, you must log in.

AuthorMessage
Lucas

Send message
Joined: 11 Mar 09
Posts: 6
Credit: 352,439
RAC: 0
Message 14855 - Posted: 11 Mar 2009, 2:07:10 UTC

Has anyone fooled around with optimizations for the PPC64 platform used for Powermac G5 Linux distributions, and the Cell BE platform etc. I haven't seen any posts on this subject, but perhaps its buried somewhere. I may fool around with this myself if no one else has tried anything.
ID: 14855 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jedirock
Avatar

Send message
Joined: 8 Nov 08
Posts: 178
Credit: 6,140,854
RAC: 0
Message 14868 - Posted: 11 Mar 2009, 2:46:11 UTC - in response to Message 14855.  

Has anyone fooled around with optimizations for the PPC64 platform used for Powermac G5 Linux distributions, and the Cell BE platform etc. I haven't seen any posts on this subject, but perhaps its buried somewhere. I may fool around with this myself if no one else has tried anything.

BOINC doesn't support the PowerPC 64 platform anymore. You could always do your own optimizations and compile though, and then use the application using the anonymous platform mechanism.
ID: 14868 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lucas

Send message
Joined: 11 Mar 09
Posts: 6
Credit: 352,439
RAC: 0
Message 14869 - Posted: 11 Mar 2009, 2:54:43 UTC - in response to Message 14868.  

BOINC doesn't support the PowerPC 64 platform anymore. You could always do your own optimizations and compile though, and then use the application using the anonymous platform mechanism.


Somewhat officially the PPC64 version of Boinc and a PS3 specific PPC64 port is up-kept at http://www.dotsch.de/boinc/BOINC%20Clients.html -- I've had success with these ports, though only a few projects, such as SETI have working builds.
ID: 14869 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jedirock
Avatar

Send message
Joined: 8 Nov 08
Posts: 178
Credit: 6,140,854
RAC: 0
Message 14871 - Posted: 11 Mar 2009, 3:16:19 UTC - in response to Message 14869.  

BOINC doesn't support the PowerPC 64 platform anymore. You could always do your own optimizations and compile though, and then use the application using the anonymous platform mechanism.


Somewhat officially the PPC64 version of Boinc and a PS3 specific PPC64 port is up-kept at http://www.dotsch.de/boinc/BOINC%20Clients.html -- I've had success with these ports, though only a few projects, such as SETI have working builds.

Oops, sorry... I thought you were referring to the PowerPC 64 platform on OS X, which isn't officially supported. As for whether any PPC64 optimization has been done, presumably not. At this point, I think PPC64 represent a significant minority, so the most work has been on getting them optimized for the x86 platform and its derivatives.
ID: 14871 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Lucas

Send message
Joined: 11 Mar 09
Posts: 6
Credit: 352,439
RAC: 0
Message 14872 - Posted: 11 Mar 2009, 3:19:39 UTC - in response to Message 14871.  

BOINC doesn't support the PowerPC 64 platform anymore. You could always do your own optimizations and compile though, and then use the application using the anonymous platform mechanism.


Somewhat officially the PPC64 version of Boinc and a PS3 specific PPC64 port is up-kept at http://www.dotsch.de/boinc/BOINC%20Clients.html -- I've had success with these ports, though only a few projects, such as SETI have working builds.

Oops, sorry... I thought you were referring to the PowerPC 64 platform on OS X, which isn't officially supported. As for whether any PPC64 optimization has been done, presumably not. At this point, I think PPC64 represent a significant minority, so the most work has been on getting them optimized for the x86 platform and its derivatives.


Yeah understandably, well I might mess around with it and will let you all know if I have any success.
ID: 14872 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mfl0p

Send message
Joined: 18 Feb 09
Posts: 8
Credit: 2,424,453
RAC: 0
Message 15087 - Posted: 12 Mar 2009, 22:30:43 UTC
Last modified: 12 Mar 2009, 22:31:10 UTC

I've built BOINC client for cell/be (ps3) linux, and compiled a working ppu-g++ CELL optimized milkyway client (you can look at my account computers). However, it's quite slow, as it's only using the PPU at this time. I plan to work on the client with SPE support whenever I have time.
ID: 15087 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 15088 - Posted: 12 Mar 2009, 22:36:54 UTC - in response to Message 15087.  

I've built BOINC client for cell/be (ps3) linux, and compiled a working ppu-g++ CELL optimized milkyway client (you can look at my account computers). However, it's quite slow, as it's only using the PPU at this time. I plan to work on the client with SPE support whenever I have time.


When it runs faster I'd give it a shot if/when I get linux going on my PS3.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 15088 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 15105 - Posted: 12 Mar 2009, 23:39:51 UTC - in response to Message 15087.  

I've built BOINC client for cell/be (ps3) linux, and compiled a working ppu-g++ CELL optimized milkyway client (you can look at my account computers). However, it's quite slow, as it's only using the PPU at this time. I plan to work on the client with SPE support whenever I have time.

In principle the SPEs could be quite fast for single precision tasks. But last time I checked the SPEs of the Cell version used in the PS3 are not that well suited to double precision, isn't it? It works, but performance isn't great (maybe twice that of the PPE).

One would like to use a PowerXCell 8i for MW. Would be faster than a GTX280. Anyone tried to mod his PS3 with such a beast? Just kidding.
ID: 15105 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>HFR>RR] ThierryH

Send message
Joined: 2 Jan 08
Posts: 23
Credit: 495,882,464
RAC: 0
Message 15180 - Posted: 13 Mar 2009, 10:52:04 UTC - in response to Message 15087.  

I plan to work on the client with SPE support whenever I have time.


I'm also trying to do it. Perhaps we can work togather ?

ID: 15180 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>HFR>RR] ThierryH

Send message
Joined: 2 Jan 08
Posts: 23
Credit: 495,882,464
RAC: 0
Message 15182 - Posted: 13 Mar 2009, 10:59:52 UTC - in response to Message 15105.  


In principle the SPEs could be quite fast for single precision tasks. But last time I checked the SPEs of the Cell version used in the PS3 are not that well suited to double precision, isn't it? It works, but performance isn't great (maybe twice that of the PPE).

One would like to use a PowerXCell 8i for MW. Would be faster than a GTX280. Anyone tried to mod his PS3 with such a beast? Just kidding.


SPEs have 128 bit registers. Each ops can be executed on one quad precision, on two double precission or on four single precision. So, if the code could be vectorized, double precision is only two times slower than single.

Perhaps it could be more quicly optimized if you could send me cpu sources part of your ATI optimization. Thanks in advance.

Thierry.
ID: 15182 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 15215 - Posted: 13 Mar 2009, 15:44:44 UTC - in response to Message 15182.  

In principle the SPEs could be quite fast for single precision tasks. But last time I checked the SPEs of the Cell version used in the PS3 are not that well suited to double precision, isn't it? It works, but performance isn't great

SPEs have 128 bit registers. Each ops can be executed on one quad precision, on two double precission or on four single precision. So, if the code could be vectorized, double precision is only two times slower than single.

I don't think so. To cite the wikipeadia article on that topic:

For double-precision floating point operations, as sometimes used in personal computers and often used in scientific computing, Cell performance drops by an order of magnitude, but still reaches 14 GFLOPS (the PowerXCell 8i variant, which was specifically designed for double-precision, reaches 102.4 GFLOPS in double-precision calculations).

The single PPE should be already half the speed for double precision (6.4 GFlops peak) as all 6 usable SPEs together.

Perhaps it could be more quicly optimized if you could send me cpu sources part of your ATI optimization. Thanks in advance.
I only exchanged the calls to calculate_probabilities from calculate_integral with GPU versions. I doubt that the setup of some arrays and the transfer to the RAM of the graphics card needed for that would be helpful. I guess it works a bit different with Cell.
But as a general recommendation, don't try to port too much for calculation on the SPEs. One really needs only the core of calculate_integral as this represents ~99.9% of the total operations.
So probably the best would be to divide the integral calculation in 6 same sized parts (one for each SPE) and schedule them simultaneously. If you want to use also the PPE (as I mentioned quite powerful on double precision compared to the SPEs) you could divide it maybe in parts sized 3:1:1:1:1:1:1 or so. But wasn't the PPE multithreaded (2 threads)? How is this working with the BOINC client? Two WUs in parallel or are you using only one thread?

@ mfl0p:
Just seen the result on your PS3. 20,000 seconds for less than 4 TFlop (29 credit unit). That hurts badly. It's less than 200 MFlop/s on the PPE. Something has to be wrong there.
Have you used the latest 0.18d sources? If yes, it appears the binary does not use the hardware resources (AltiVec) of the PPE but some kind of emulation mode. The PPE of the Cell runs also at 3.2GHz, isn't it?
ID: 15215 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mfl0p

Send message
Joined: 18 Feb 09
Posts: 8
Credit: 2,424,453
RAC: 0
Message 15273 - Posted: 13 Mar 2009, 23:05:45 UTC
Last modified: 13 Mar 2009, 23:10:15 UTC

That PS3 host with the ugly times is not running just milkyway, it's loaded pretty good, BOINC might be getting half of the PPU to work on. So it's going to have some really ugly times.

The app was compiled with 0.18d and CELL-specific GCC compiler (with CELL-optimized compiling flags used). I did this quick PPU compile as a proof of concept to see if it would run.

I do have another dedicated PS3 that I can run some real speed tests on in the future, pretty much barebones linux running in runlevel 3 that I ssh into.

The CELL is similar to hyperthreading with respect to the PPU, the OS sees 2 processors (clocked at 3.2ghz). There are six SPEs available under Linux.

I have already found out what you noted on calculate_integral, doing some tests to see what loops the app spends most of its time.
ID: 15273 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Otter

Send message
Joined: 20 Jan 09
Posts: 9
Credit: 17,289,621
RAC: 0
Message 16158 - Posted: 20 Mar 2009, 4:06:41 UTC

Unless you write SPE specific code, and plug it into the app - the performance you are going to see is that of 2 Power5 PPC chips using AltiVec. Which is dated (but not bad)

The trick will be taking some function in the MW code (a large FFT would be good), and then using the SPUs to solve it. I am not sure of the state of opensource Cell libraries, but there must be something.

Also, just using gcc on the linux command line will only generate PPE code (Power5 code), the SPU code must be compiled separately and then linked to the PPE code.
ID: 16158 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mfl0p

Send message
Joined: 18 Feb 09
Posts: 8
Credit: 2,424,453
RAC: 0
Message 16176 - Posted: 20 Mar 2009, 10:24:44 UTC

yes, just finding something in the MilyWay app to offload to the SPEs that they can calculate quickly, while at the same time not slowing the PPE app down due to DMA transfers is challenging.
ID: 16176 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Application Code Discussion : PPC64 Optimizations

©2024 Astroinformatics Group