Welcome to MilkyWay@home

Posts by Milksop at try

1) Message boards : Number crunching : Are we doing science yet? (Message 7888)
Posted 20 Dec 2008 by Milksop at try
Post:
Optimised app. The WUs are different too, but I don't know about shorter.

The improvement of the new app is larger than what one sees from the shorter crunch times. One should keep in mind that the new WUs are in fact approximately four times as "long" as the old ones. That means they are doing roughly 4 times the work of the old 270 credit WUs.
2) Message boards : Number crunching : credit comparison to other projects (Message 7318)
Posted 3 Dec 2008 by Milksop at try
Post:

Might want to take this to the code discussion, but we've been using the following for linux:
i686: -O2 -ftree-vectorize -funroll-loops
x86_64: -O2 -msse2 -ftree-vectorize -funroll-loops

and the following for osx:

ppc: -O2 -maltivec -mabi=altivec -mcpu=7400 -funroll-loops
i686: -O2 -msse2 -mfpmath=sse -mtune=prescott -ftree-vectorize -funroll-loops
x86_64: -O2 -mfpmath=sse -mtune=nocona -msse2 -ftree-vectorize -funroll-loops

Does that mean that the 32bit machines running linux do not use SSE2, but the 64bit machines do?

And does the same apply for Win systems?

Yes, I would think so.
That was actually the reason I only published 32bit versions. They did not use SSE/2 on any system. Distributing 64 Bit binaries (automatically using SSE2) are only fair, if the project also creates 32bit binaries using SSE2. In case of MW the speedup comes just from the SSE2, not from the 64bits.
3) Message boards : Number crunching : credit comparison to other projects (Message 7312)
Posted 3 Dec 2008 by Milksop at try
Post:
The application as it is right now has all of the optimization's that were suggested to us by milksop, and a couple of my own. If people find ways to improve the performance further we'll implement them if they're suggested in the forums.


Travis

What you say is true in that the coding for the current MW stock client V0.6 has optimised the standard code.

However, optimsed code in the SETI sense, and to a lesser extent with Einstein, is code is specifically optimised to a CPU's extensions. The code is optimised to use the CPU extensions like - MMX, SSE, SSE2, SSE3, SSSE3x and/or SSE4.1.

Your code is not optimised code in that SETI/Einstein sense.


Actually, we've compiled milkyway with architecture specific flags that have been suggested to us in the code discussion forum. In our case, sse2 was said to be the best performance improvement, so we've used that on architectures that support it.

I'm sure there's some more changes in the code that could make the app more efficient and i'm looking forward to what people come up with. Once I have the rest of the server-side stuff working the way I like it i'll probably take a deeper look myself.

However, I'm sure if people use architecture specific code optimizations, such as hand-coded vectorizations and things of that nature, even more performance could be squeezed out of the app -- I'm not sure if we'll be able to put those in the stock app since they're more specific than different compiler flags.

Yes, that's right. Hand coded vectorizations normally buy you a factor of two or so. This is of course an average number. Sometimes it's more, sometimes less. But these kind of low level optimizations you can leave to the hands of the gifted ones like Crunch3r. Often it is too specific to implement into the standard app if you don't create such switcher apps like employed over at Einstein.

As I said, I didn't have time to look at the new code yet. Nevertheless, I think there are still some high level optimizations left. Maybe I will find some time to look into it. But as you have now released the code very transparently (not somehow hidden like before), I guess there are more capable guys out there to say something about this. Either way, I and my team are very pleased that the efficiency of the computations at this project have been grossly improved. I will be happy to lose my first place at this project, soon ;)
4) Message boards : Application Code Discussion : compiler optimization flags (Message 6886)
Posted 27 Nov 2008 by Milksop at try
Post:
You may be right, that in the general case the output may change (most probably very slightly and unnoticable, one really needs some special cases to see real changes), but I would regard such an algorithm close to numerical unstable. And believe me, Milkyway isn't at that point. Hell, the bug with the number of the integration points didn't change the (decimal) output!

What you guys don't seem to understand is that an error, even on the 15th decimal in one operation can spread (especially during multiplications) till it becomes quite significant (on the 3rd decimal of the final result, for example).

And what you don't seem to understand is that an algorithm can actually work around this problem. If major deviations occur I would say the algorithm may have a problem.
I was talking about the official app using a wrong number of integration points (82) and was still generating the same output file as when using the correct number (30). If that is the case, don't tell me the ouput will change if the compiler rearranges the calculations a bit (to a mathematical identical expression). Actually I have done such things already by hand and MW appears to be quite stable against that.

If you still don't want to understand and insist on using -ffast-math, then my guess is that the project admins will end up turning up the validation by results comparisons (like SETI does), meaning the throughput for the project will be divided by three (as they will need to make each WUs calculated by at least three different computers and then compare the results, only returning the ones that are close enough to each others to denote a non-crippled result).

I'm not insisting on using it, in fact I have not used this option in my published versions either (just -O2, that was all). The reason is quite simple, you only turn to the compiler flags, if you have done the high level stuff already. The effort of testing all the combinations of compiler options and maybe different WU types is higher than to get the five or ten percent improvement from other changes to the code.

All I was saying it should be safe to use it here. But one have of course to check the results for deviations offline first (by comparing the result of the official app and a self compiled with the exact same WU). Everyone should do it either way if he compiles an own version. If it gives the same result, you can use it. It is really simple.
5) Message boards : Number crunching : New App status (Message 6842)
Posted 27 Nov 2008 by Milksop at try
Post:
new app costs money, which means funds from somewhere and so how and if are acknowledgments needed?

I want an acknowledgement, too! *LOL*
6) Message boards : Application Code Discussion : compiler optimization flags (Message 6841)
Posted 27 Nov 2008 by Milksop at try
Post:
In regards to all the -ffast-math discussion, I have to ask a question based on what I noticed with the difference between a K6 and my K8 (Athlon64 3700+). The K6 has an enormously inferior FPU, yet using Milksop's app it was able to come close to or hit the 108 cr/hr limit. This suggests, at least to me, that the application is not very FPU-intensive.

It is using quite some double precision math, but you have to understand, that the K6-FPU isn't that bad for some tasks. The theoretical throughput is only half of a Pentium2 if I remember right, but the latencies are very low. That may compensate the lower throughput in some cases. But a K6 reaches only roughly 60% of the performance of a P2 at the same clock here.

Regarding the ffast-math discussion, I'm definitely with Augustine. Thierry, you have to see that very few computational problems require the precision you are proposing here. If an algorithm would require such measures, it would also be sensitive to the arrangement of the arguments in the code. Furthermore, it would be hard to get the same results with x87 math compared to a PPC just because of the longer internal mantissa of the x87 FPU. You would need to flush it to memory after every operation to be really sure. Nobody does that (it is simply too slow).
You may be right, that in the general case the output may change (most probably very slightly and unnoticable, one really needs some special cases to see real changes), but I would regard such an algorithm close to numerical unstable. And believe me, Milkyway isn't at that point. Hell, the bug with the number of the integration points didn't change the (decimal) output!
7) Message boards : Application Code Discussion : milkyway code releases (Message 6725)
Posted 25 Nov 2008 by Milksop at try
Post:
Milksops Version takes about 5 min 7 sec on a 3 Ghz Quad under Vista 64.
Cheers Sabroe SMC

That is a number for the published 32bit compilation without any SSE. I have used VC98 to slow it down to the performance of the linux build (using gcc3.04) ;)

When compiling it with a newer intel Compiler with SSE2 enabled it would have been roughly a factor of two faster on newer CPUs. But the compatibility would have been worse.
8) Message boards : Number crunching : New App status (Message 6409)
Posted 22 Nov 2008 by Milksop at try
Post:
Actually to correct some misinformation here. The RPI computer science department as a whole does not run this project. There is myself, Nate, dave, and our 4 advisors (1 professor in the physics dept, and 3 in the cs dept). Our advisors have quite a few other projects they're working on (and I myself have another one or two). Either way, the bulk of the work on this project is done by myself, dave and nate. 3 people, not 440. We also have classes and our degree requirements to attend to. This is the nature of graduate level research.

Right now we're operating off an NSF grant which pays for our hardware, Nate, myself and a couple undergrad researchers. During the summer things are even slower due to the fact that it's summer and not everyone is around.

Either way, in the recent version of the application i've made an attempt to clean up all the legacy code and implement the optimizations that have been suggested, andI went a bit farther with a couple of my own.

I'll try and release the code sometime this weekend for people to look at, because I don't think there will be many more changes in terms of fixing any bugs that'll happen. I'm actually looking forward to people looking at the new code and giving some optimization suggestions, because right now i'm not quite sure how much more performance we can squeeze out of the integral loop.

Yeah, of course.
Btw, I am not running the whole physics departement I am working at, too. I'm just a PhD student who gets his money from a DFG grant (the equivalent of the NSF in Germany). In the moment I'm quite busy with compiling the application for the next four year period (600 something pages, there are 18 individual projects at our institute applying together for about 10 million euros). Unfortunately, I'm one of the few guys responsible for that. So we are in the same boat in some sense ;)

After the deadline for this (next friday), I will be busy with some measurements at FLASH (Free electron LASer in Hamburg). But after that, maybe I will find some time to look at the new code. But my professor really starts to urge me to finish my thesis so I cannot promise anything.

Either way, as I said, I had a very capable partner at this account who would also like to look at the new code. So if you want someone to doublecheck the improvements you implemented or just to chat about it and the further possibilities, feel free to contact me either by email (Dave has my) or PM.

Up to now, my partner and me have not shared the code (we have two independent versions). But with the new application this may change and we could come to a jointly developed version. If we are lucky, we can even convince Crunch3r to contribute (I admit, this will be hard) as his latest version (I know of) was still some 20% faster than my fastest one (the latest version of my mate was roughly the same speed as mine but it scaled a bit differently). From the algorithmic point of view, Crunch3r and my partner would be the ideal contacts, as they are both somehow experienced programmers. My advantage may be the understanding of what is actually done (I'm a physicist after all) not the implementation, but that requires some time from my side I may not have. We will see. I'm definitely looking forward to some message from you.
9) Message boards : Number crunching : How to Switch to the New Official App? (Message 6272)
Posted 18 Nov 2008 by Milksop at try
Post:
Simply delete the app_info.xml file in the project folder and restart boinc... however, i'm not sure why you would want to do that...

It's highly unlikely that the new official app will be faster.

Maybe it will be incompatible with some new WUs?

But I really hope the factor 2 to 3 speedup Travis reported is not a comparison to the old official one. As you certainly know, one would get that by changing 5 lines of code or so. Or for new CPUs even just by using another compiler ;)
10) Message boards : Number crunching : @MW staff: credit limit and credit level (Message 6198)
Posted 15 Nov 2008 by Milksop at try
Post:
And regarding the question what to compare with what. This is simple too. Milksop application is not an optimized one. His would be the standard level. The former application is an uncomparable heap of unnecessary calculations not to be compared to anything.
Optimized would mean to further integrate various instruction sets and so on. But according to what milksop wrote what he did, his appl. is a great achievement compared to the former ..., but in no way "optimized", compared to what degree of optimization there is in for example the seti appl..

That is the exact reason why I've put "optimization" in quotation marks in my profile. I guess with some more effort one can make the app still a lot faster. And not only by incorporating some SSEx stuff. There are still some high level things left.
11) Message boards : Number crunching : No further Support for Milkyway at this time! (Message 6196)
Posted 15 Nov 2008 by Milksop at try
Post:
But strange, that he didn't answer that more precisely.
Looks now somehow to me as he finds his own app not worthy enough to defend it.
In my eyes all this babble is a bit against him also.

I don't think so. To cite the first post in this thread from aendgraend:
The Development ot the optimised Application is highly appreciated - it gets the Projects Work done a lot faster than before.


What should I defend?

Furthermore I explicitly stated in my post that I don't want to discuss cross project comparisons of credits, just the fact that the current credits are unfair for the owners of fast machines compared to slower ones. I only referred to the comparison to make clear that the current credit limit does not help that issue either.
12) Message boards : Number crunching : @MW staff: credit limit and credit level (Message 6194)
Posted 15 Nov 2008 by Milksop at try
Post:
This is true. However, the new assimilator just got finished yesterday so we will be releasing the new app within a week and we will have to see how it compares.

Are you not sure the optimised app gives valid results? Then why grant them credits at all? Credits are only for proven valid results, not just some data sent back.

I trust Milksop on his app [..]

The faster app delivers valid results. This was not only checked by me but also the project itelf.
I guess Dave is just curious how the performance of the new app compares to the old as well as the faster one.
13) Message boards : Number crunching : No further Support for Milkyway at this time! (Message 6163)
Posted 14 Nov 2008 by Milksop at try
Post:
The point is, that it may be fair for the crunchers that are using the modified app. but it is really unfair for all other projects on the BOINC platform.

It is not fair even for the crunchers at this project running all the optimized app, as the credits are awarded per runtime, not per work done. A slower computer gets more credits/WU than a faster one. I would call that credit approach at least a strange one.
14) Message boards : Number crunching : @MW staff: credit limit and credit level (Message 6162)
Posted 14 Nov 2008 by Milksop at try
Post:
Approximately two weeks ago I made the faster apps public. At the same time I suggested a credit adjustment to keep things fair. Now, I think it is necessary to reinforce that suggestion. I've just seen, that SETI.Germany (among others) has given the advice not to crunch MW anymore. I'm sorry that it appears necessary for some well respected members of the BOINC community to take such measures, but in fact they are right.

The current situation is simply unbearable. I won't talk about cross project parity here, as this would just be another heated debate. I talk about the intra project fairness between the crunchers here at Milkyway. There is no correlation between the work done and the credit awarded.
Just an example, a very fast Core2 comes close to the 300WU/core limit. It will be awarded with slightly above 2,500 credits per core and day. If it is faster than 288s per WU (reaching the WU limit), it will actually get less credits per day than a slower box. At the opposite side of the spectrum just take a really ancient PentiumPro with 180MHz or so. It will take more than two hours for the same WU, but will get close to 260 credits for it (instead of just 8). It will only be able to calculate 10 of them a day or so, but will still get the same 2,500 something credits/day. I think it is not fair that the owner of the fast CPU calculates 30 times as many WUs a day and will not get a single credit more.

That is the result of that extremely stupid credit limit. Owners of fast boxes don't get more credits as already possible with the old app (and Linux64), despite calculating vastly more WUs. But now even extremely slow CPUs are getting the same! That is what needs to be corrected immediately.

And finally, I would like to draw the attention of the project staff to that table, referred to by David Anderson about two weeks ago. He claimed MW awards to much credits, as the average level was about a factor 1.9 higher than SETI according to that table. Not argueing about if this "calibration" to Seti makes sense or not, but the reaction of the project was to lower the credit limit to the current value of 0.03 credits/s and not to adjust the credits per WU. What is the result today? That factor is now at 2.4, hardly a success of that attempt.

To summarize it, the only sensible solution is to abandon the credit limit to make it fair again between the crunchers at this project. At the same time you can think about an adjustment to the general credit level to stay more in line with other BOINC project if you feel this is important (personally I would). As my app is generally a factor of 50 to 60 faster than the old one, I suggested already 2 weeks ago a division by that. That means about 4 to 5 credits per WU. Only users of the 64Bit Linux app (v1.24 supplied by Crunch3r) would see a significant decrease of the awarded credits. As about 98% of the crunchers here at MW are using either Windows or Linux on x86 CPUs, you can also distribute the faster app to all participants. The remaining 2% or so would have one or two weeks time to crunch for another project until you release the new app already announced for last week. I think this should be doable.
15) Message boards : Number crunching : No Work ? (Message 6080)
Posted 12 Nov 2008 by Milksop at try
Post:
I thought crunch3r did an optimised Mac app a long time ago - but I could be wrong.


I do remember that he(crunch3r I think) optimized it to the current specs. I don't know if he had a faster one.

Of course he has a faster one. When compiled for x86 Macs (including SSE2/3) it is even a factor of two faster than the published app.
16) Message boards : Number crunching : Process got signal 11 (Message 5821)
Posted 2 Nov 2008 by Milksop at try
Post:
I decided to try the optimized app proposed in this thread
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=433
and it seems to work. Should be a bit faster, too. Hopefully all those flames about credit cheating and whatever found in that thread don't mean this is an app which will gain me unfair credit, cause that's not what I want. It's simply the only app I tried that doesn't crash on my system, for whatever reason.

Good to hear it's working!
17) Message boards : Number crunching : Faster application (links inside) (Message 5791)
Posted 1 Nov 2008 by Milksop at try
Post:
Just a minor observation.

Hmmm....Is it the old Lada joke in your profile? LoL
You can choose between 2 colors when you buy a Lada red or red.

Profile quote

The faster app comes in two flavors
application for Windows 32 and 64Bit versions and
application for Windows 32 and 64Bit versions



Thx again for the optimized app.

Mac-Nic

Either it was somehow cached, or you wrote 30 minutes on that few lines ;)
I changed the profile (and corrected that error) before I referred to it here in the thread (the message just above yours).
18) Message boards : Number crunching : Faster application (links inside) (Message 5789)
Posted 1 Nov 2008 by Milksop at try
Post:
Btw. as it is now possible again, I've updated my profile.
The files linked there contain a fixed app_info.xml.

Or as a short cut:
application for Windows 32 and 64Bit versions
application for Linux 32 and 64Bit versions
19) Message boards : Number crunching : Faster application (links inside) (Message 5788)
Posted 1 Nov 2008 by Milksop at try
Post:
The obvious question that comes to mind is, why not just use your app as the official project app?

That is what I proposed to the project.
20) Message boards : Number crunching : Faster application (links inside) (Message 5785)
Posted 1 Nov 2008 by Milksop at try
Post:
This next comment will probably earn me a lot of scorn, but I refuse to download the unofficial app. I'm just uneasy about the way the whole things been handled. Seems like 2 or 3 people have held the project to ransom.
Fair enough if you think an app is inefficient, you can design a better one and submit it to the project or make suggestions to how they could improve it. Then it's up to the project whether they use it or not. They may have reasons for using the old inefficient app. If your not happy with there response you could post in the forum & let other people know, then everyone could vote with there computers. If people aren't happy with the way a project is run, there's always plenty of other projects to choose from.
Just my two cents worth :-)

Actually, what you propose is what was done :o

I (and independently some others too) have designed a better app, I made suggestions to the project how to improve the app. And it is of course up to the project to decide what of these suggestions they implement. And I clearly communicated here in the forum what I think about the decisions of the project. But actually I was not *that* unhappy with the *latest* decisions.

But just to quote one of the PMs I got before the release of the faster app (I gave the apps to them before, so they were able to evaluate it):
greetings. if you want to release your binary to the public. give the people what they want! we'll be releasing a new version within the next week but if you want to release yours in the midterm be my guest. ive checked it over and it looks in order.

Another one even sounded somehow enthusiastic ;)
Your applications are indeed working great!


So don't tell me I held the project to ransom!


Next 20

©2024 Astroinformatics Group