Welcome to MilkyWay@home

Posts by Paul D. Buck

21) Message boards : Number crunching : Can't stop CPU based WU on ATi system. (Message 39833)
Posted 19 May 2010 by Profile Paul D. Buck
Post:
That is why his direction was to use the web page. You are instructing MW not to send you CPU work.
22) Message boards : Number crunching : Bittersweet Milestone (Message 39830)
Posted 19 May 2010 by Profile Paul D. Buck
Post:
*Sigh* The abacus was such a wonderful beast in its day ....

It still is ... :)

In the early days of computers there was a "race" between one of the early machines and it lost to a room full of abaci ... I can't find the right reference in Google and am too lazy to search my file cabinets ...

But I do get your point about the future ... heck, I never imagined that I would be on the verge of 1M a day production on a regular basis ... I mean, that was years worth of work before ... years ...

Now, were my wife to win the lottery ... i7 980s all around and a 16 CPU Power Mac (or whatever they are calling them this week) ... a few more 5870s and ... hooo boy ... :)
23) Message boards : Number crunching : Errors on new 5970 (Message 39829)
Posted 19 May 2010 by Profile Paul D. Buck
Post:
Eeek! Getting quite a few invalid wu's again.

Aqua is using the MP feature of BOINC and I note that there was a change-set to change the way that MP scheduling is being handled. So, Kashi's note on Aqua is well taken, you may be seeing a collision with MP tasks and the GPU tasks ... or, a simple running out of main memory ...

That is the problem with the bleeding edge ... Ow! :)
24) Message boards : Number crunching : Aaargh! Server out of new work! (Message 39804)
Posted 18 May 2010 by Profile Paul D. Buck
Post:
Cant be avoided. Lots of people will run out of work with hosts like this, http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=171705.

That host have over 2000 tasks in cache while my 3850 is limited to only 6 tasks in progress. *sigh*

Don't know how that happened! I sure hope Travis takes a look at that one ... no way he should be getting more than 24 tasks at a time ... something bad is happening ... and it does not seem like he is returning any at all either ...
25) Message boards : Number crunching : Bittersweet Milestone (Message 39800)
Posted 18 May 2010 by Profile Paul D. Buck
Post:
My preferred project of Malaria Control doesn't have a GPU app rarely has work these days and gives some of the lowest credit going, so my CPUs are being kept busy with other better paying work in between times. Still my GPUs give the same amount as my CPUs do in a day in about 5 minutes.

My 5 computers have a total of 36 processing elements and they do all combined about 20K per day on the various projects (24,012 yesterday 803,508-188,293-82,746-18,735-489,722). So I feel your pain ...

THough it takes me longer to do the equivalent ... as I work it out, it takes me about 45 minutes of work by my GPUs to equal all of my CPUs ...

779,486 stones / 86,400 seconds a day = 9.02194 CS/second
24,012 / 9.02194 CS/sec = 2661.5118 seconds
2661.5118 sec / 60 = 44 min 21 seconds

Of course, all this proves is that someone has WAY too much time on his hands ... :)

I think I had reached about 3M total when I got my first GPU application going ... and that was for about 4 years worth of work on up to a total of 10 computers ... now I do that much (and more) in a week... A couple more upgrades and I am likely to be doing 1M a day on a regular basis ...
26) Message boards : Number crunching : Bittersweet Milestone (Message 39783)
Posted 17 May 2010 by Profile Paul D. Buck
Post:
Like the Cray super-computers of days gone past the GPU is king only for some specific classes of problems ... for others, not so much, for others, not at all ... look at the difficulties EaH is having getting the GPU to give them a significant increase in speed ... or SaH with VLAR tasks running at 1/10 to 1/100th the speed of the more "common"/"normal" tasks...

I too was a long time advocate of the GPUs as a candidate to improve our ability to solve problems ... and it has ... if we can continue to use the GPU where it helps, that frees the CPUs to be used on other problems where you cannot get the speed increases because the problem is not amenable to vectorizing ...

Sadly we are still being held back by the primitive nature of the support tools ... for all the improvements we are still in very early days ... it has been only 2 years I think before we really saw the first applications ...

But I do know how you feel, I just had my second 1M+ day of production ... something that took years before I am now doing in a single day (productivity-wise), though it is still only for a limited palate of projects ... :(
27) Message boards : Number crunching : GPU Requirements [OLD] (Message 39778)
Posted 17 May 2010 by Profile Paul D. Buck
Post:
To all you people with ATI cards: I really hate you right now. I just built a new el cheapo boinc cruncher with an Nvidia GTX260 because there is CUDA support for the majority of the astrophysics and other boinc projects I like to participate in. But the times you guys are posting for your ATI cards are just ridiculous. I'm assuming the WU's themselves are the same quantity of data as the ones for CUDA? Or are there differences in size that would make them process so much faster?

The main reason for the speed difference in MW task processing is that the 48xx and 58xx series cards are dual precision across all elements. Nvidia made a design choice that limits the number of double precision processing elements in the card... so, fewer elements, slower calculations... in effect only 1/3 or less of the card is actually working here ...

On Collatz and DNETC the playing field is a bit more level, there it is just better designs when you see speed differences in processing rates. To make you hate us more, the cards typically draw less power as well ...

On the positive side you have recognized that there are more CUDA implementations out there than ATI Stream though that is slowly changing ... and if/when OpenCL gets reasonably stable we may start to see some leveling there as well ...

In my case, the economics say migrate to ATI now as Nvidia lost this round to ATI and wait for the ATI applications to come ... we are up to 3 projects now with SaH optimizers working on a beta version of an ATI application as we speak ...
28) Message boards : Number crunching : Limit of 48 workunits at once (Message 39758)
Posted 16 May 2010 by Profile Paul D. Buck
Post:
Well, I have also been on him about the idiotic Strict FIFO rule ... which compounds things ...

Too true! That one really doesn't make a lot of sense.

It made sense at the time. There was an instability issue and this was an attempt to solve that issue. Sadly it didn't ... we still had instabilities. There was a later fix that solved the instabilities, or to put it another way, the real issue was located and changed... but the rule lingers... since UCB never explains I have no idea why they cling to this...
29) Message boards : Number crunching : Limit of 48 workunits at once (Message 39749)
Posted 15 May 2010 by Profile Paul D. Buck
Post:
I been trying to get ol' Doc Anderson to improve RRI so that we can be more selective ... I want RRI on GPU Grid because you get paid better for faster results, but I don't want it necessarily for MW because it hammers the server ... but so far to no avail ... it is all or nothing ... and sadly, that means with my configuration and project load and other considerations I have to beat on MW even though I don't want to do so ...

I think all GPU users beat the hell out of the MW server and all due to the 6wu/core limitation. I have my cache set at connect every 0.01 days and keep 0.1 days buffered. This and having a resource share pretty high, means that every time the MW back off completes BOINC asks for more MW work - hammer time. This is irrespective of RRI.

Well, I have also been on him about the idiotic Strict FIFO rule ... which compounds things ...
30) Message boards : Number crunching : Limit of 48 workunits at once (Message 39739)
Posted 15 May 2010 by Profile Paul D. Buck
Post:
Paul pretty much hit the nail on the head.

We really don't want to increase the caches anymore than they are right now. Part of the issue is that new work is being generated based on the results of the old work. So the larger your caches are, the older the work they're crunching (ie. it's less up to date so our searches progress slower).

Another issue is that having large caches makes our poor server cry for mommy. If we doubled the cache size for example, the database would have to deal with 2x as many workunits out there at any given time; and things are slow enough as it is :P

Glad I was close enough ... :)

I been trying to get ol' Doc Anderson to improve RRI so that we can be more selective ... I want RRI on GPU Grid because you get paid better for faster results, but I don't want it necessarily for MW because it hammers the server ... but so far to no avail ... it is all or nothing ... and sadly, that means with my configuration and project load and other considerations I have to beat on MW even though I don't want to do so ...
31) Message boards : Number crunching : BOINC.exe Memory Leak! (Message 39699)
Posted 14 May 2010 by Profile Paul D. Buck
Post:
Still hovering around 7MB. No idle GPUs either. Haven't tried any of the 'new' features though, but at least the basics appear to be working.

Yeah ...

I lost 15 hours of GPU compute time on one system because I got hit with the GPU Idle bug and I missed checking on the system soon enough ... was runnign 6.10.45 which is relatively immune though not completely ... On the one WIn7 machine I have it running on 12 hours in and still going ... usually I would be toast by now ...

I noticed a couple minor issues with debug messages I posted ... but, .55 looks like it will be my fall-back to replace .45 unless something more subtle and evil comes up ...
32) Message boards : Number crunching : Limit of 48 workunits at once (Message 39698)
Posted 14 May 2010 by Profile Paul D. Buck
Post:
Aaron,

There is no intent to penalize ... it is a consequence of the "flow" of the project ... (ignorant summarization of how the project works, but, for illustrative purposes only I will set forth this example) think of it this way ... we calculate x and return it ... now, you and I and the rest of the project participants are calculating x sub 1 to 1,000,000 ... now to generate x' sub 1 the project has to get a valid answer for x sub 1 FIRST ... and so on through 1,000,000 ... ok easy so far ... right?

Well, to get x'' sub z we need both x and x' of the same index ... for x''' we need all of the prior plus x'' sub z ... also easy ...

Now the fly in the ointment ... you and I with our super-fast cards are plowing through the tasks at light speed ... and yet ... in my simplified example I ignored the fact that there are about 2 orders of magnitude difference in processing speeds among the machines ... were I to run the tasks on the CPU side I would be taking hours to do each task an not the 90 seconds most of my GPUs take ...

So, in the stream of tasks coming and going you and I are generations out from x while the CPU guys are still back there crunching their hearts out ... so, instead of letting us GPU snobs hog the project and excluding those that are not as fortunate in being able to buy high end GPUs they made some compromises so everyone that wants to can be part of the work ... personally, I wish it could be otherwise and we could all store up more tasks, but, that is just not in the cards ...

All projects are delicate balancing acts and it is easy for us with our hot cards to forget that we are fewer than those that just have a small home computer ... were this not so I would not be in the world position I am with only what I think of as 5 inadequate computers ... yet, most who look at what I have might contemplate dark alleys and baseball bats ... :)

At any rate, the limits are on the server side and were he able to Travis would have already increased the number we can have on hand through some other measurement ... but, the issue is how many tasks can he have "live" in the wild at any given moment ... and to keep that lid on we have this current system ...

As to the last point, the application / platform segregation is both a server side and client side change, needed for several projects as it turns out ... and as far as I know not on the horizon anytime soon ... we need this change for proper DCF calculation more than anything else ... like PG with its multitude of applications with varying efficiencies ... and GPU / CPU projects where there can be wide variations in the "real" DCF for the GPU and CPU sides ... SaH has this issue as well ...
33) Message boards : Number crunching : BOINC.exe Memory Leak! (Message 39694)
Posted 14 May 2010 by Profile Paul D. Buck
Post:
So, maybe they did find it... :)

I am glad that this build seems to have fixed your issue ...

WOn't know for sure till tomorrow, but it looks like my idle the GPU issue may also be fixed... though I am having an upload problem to SaH at the moment ... other projects seem to be fine ...

Just installed it (6.10.55) on the Mac as well, so 2 out of 5 machines over to the latest ...

34) Message boards : Number crunching : Limit of 48 workunits at once (Message 39693)
Posted 14 May 2010 by Profile Paul D. Buck
Post:
Well, Travis is very much active, and very much regrets (if I may put words in his mouth based on his actions over the last umpteen months) the short queue sizes ... there are actually other issues involved as well such as some of the tasks are generated based on what we return (a la GPU Grid) so there is a "building" process where new work depends on old work returned...

Think of it like a brick layer... we work on story 1 ... before we work on story 2 ... it would be faster if we could put 9 women on it ... get the baby in a month ... much more efficient ...

So, as I said, this is the reason for the upcoming changes in the applications to eliminate data files ... perhaps, and I am speculating here, if the server load goes down enough maybe we can see an increase in task issue rates ... the problem is that some people like to have 10,000 tasks on hand ... can't happen with the server he has ... so, we all make do ...

I know, he should be independently wealthy and go out and buy a server farm like SaH so we can have as much work as we want ... sadly we don't live in that perfect world ... and even if he could buy a bigger farm to support us, it is still possible that the sequential nature of the beast would still get in the way ...

As to the playing nice, I have been arguing for months now that the Strict FIFO rule is causing more problems than it is worth for GPU oriented systems ... particularly with MW's issues in the mix ... so far to no avail ...
35) Message boards : Number crunching : Limit of 48 workunits at once (Message 39665)
Posted 13 May 2010 by Profile Paul D. Buck
Post:
The real issue here is that yes the fast cards do more work faster, but, we also have a lot of people that run these tasks on the CPU side ... so, the issue is that let us assume worst case, all of your and my tasks are run on a fast GPU (as fast as 90 seconds per) and we get "married" to wingmen that are all running on the CPU side ... it is easy to see that the issue is that we pump work out so fast that we are a very heavy burden on the server side both in how fast we pull tasks and how fast we send them back ... and yet, they may not be paired and validated that fast ... so, the server side builds up huge lists in the database ...

By limiting the tasks we each have on hand, the project keeps a little bit of a lid on how many tasks at one time each of us has in the pending situation ...

The good news is that there is work being done to ease this some with the elimination of data-files being generated ... if Travis can eliminate those, then he can eliminate the burden of the file deleter on the server ... with that done, well, no promises, but, it may be possible THEN to petition to see if the limit can be raised...

Right now it is limited to 6 per CPU core ... or for an 8 CPU machine... 48 tasks ... which as you noted, takes just over an hour to run through ... faster with dual GPUs of this class ...
36) Message boards : Number crunching : Crash! (Message 39664)
Posted 13 May 2010 by Profile Paul D. Buck
Post:
Perhaps, I am also seeing issues while running my older 6.10.45 as well ...

The problem is now, I am not sure which is killing which ... my CUDA card also seems to be throwing errors as well, also new, also while running 6.10.45 ... so, I am not sure where the problem lies at this time ...
37) Message boards : Number crunching : Crash! (Message 39654)
Posted 12 May 2010 by Profile Paul D. Buck
Post:
This tasl I think died and took out my video card at the same time... screen went dark ... I had one like this earlier today as well ...

Std err:

<core_client_version>6.10.54</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
Running Milkyway@home ATI GPU application version 0.23 (Win64, CAL 1.4) by Gipsel
ignoring unknown input argument in app_info.xml: -np
ignoring unknown input argument in app_info.xml: 20
ignoring unknown input argument in app_info.xml: -p
ignoring unknown input argument in app_info.xml: 0.8613169434420760000000000
ignoring unknown input argument in app_info.xml: 5.6377800191647910000000000
ignoring unknown input argument in app_info.xml: -1.3117317813933993000000000
ignoring unknown input argument in app_info.xml: 171.5995238733617600000000000
ignoring unknown input argument in app_info.xml: 22.0347537480075530000000000
ignoring unknown input argument in app_info.xml: 3.4228343050365915000000000
ignoring unknown input argument in app_info.xml: 0.0000671210988065000000000
ignoring unknown input argument in app_info.xml: 4.2839572782687570000000000
ignoring unknown input argument in app_info.xml: -4.1401974419966820000000000
ignoring unknown input argument in app_info.xml: 173.5069169847805300000000000
ignoring unknown input argument in app_info.xml: 0.7071268104636536000000000
ignoring unknown input argument in app_info.xml: 1.6979099014740920000000000
ignoring unknown input argument in app_info.xml: 2.0226133566859690000000000
ignoring unknown input argument in app_info.xml: 1.4347124521912300000000000
ignoring unknown input argument in app_info.xml: -2.0764539675261860000000000
ignoring unknown input argument in app_info.xml: 145.0000000000000000000000000
ignoring unknown input argument in app_info.xml: 21.5322226725128500000000000
ignoring unknown input argument in app_info.xml: 3.5072257558777840000000000
ignoring unknown input argument in app_info.xml: 5.7347333072474170000000000
ignoring unknown input argument in app_info.xml: 19.4858681420524960000000000
instructed by BOINC client to use device 0
Couldn't find input file [astronomy_parameters.txt] to read astronomy parameters.
APP: error reading astronomy parameters: 1

</stderr_txt>
]]>
38) Message boards : Number crunching : BOINC.exe Memory Leak! (Message 39646)
Posted 12 May 2010 by Profile Paul D. Buck
Post:
Rom says he thinks he has found the leak and the next drop should contain the fix for it ... which would be 6.10.54 ...

It may well have been fixed... but .54 itself is bad ... does not seem to start GPU work and freezes the system when a GPU is installed... so, don't try .54 ... or only try it on systems you don't want BOINC to run on ...
39) Message boards : Number crunching : BOINC.exe Memory Leak! (Message 39634)
Posted 11 May 2010 by Profile Paul D. Buck
Post:
Rom says he thinks he has found the leak and the next drop should contain the fix for it ... which would be 6.10.54 ...
40) Message boards : Number crunching : GPU Stops working 6.10.45 or later versions (Message 39633)
Posted 11 May 2010 by Profile Paul D. Buck
Post:
If you have had a situation while running 6.10.45 or later where BOINC stops using the GPU Rom is looking for details. *I* have had one instance of this happening with 6.10.45 though it is more common with 6.10.46 and later (I have personally seen it with 6.10.46 and .48, and I have seen another report on 6.10.47) ...

Anyway, Rom's message:

A number of people have reported that the CC starts failing to assign work to GPUs after a period of time.

Evidence of this can be found in your log file. For Nvidia GPUs it looks like:

[coproc] cuCtxCreate(0) returned 999

I'm not sure what is logged for an ATI GPU.

For those experiencing the issue, could you email me:

What OS are you using?

Number of GPUs?

What GPU driver version?

What GPU model version?

Amount of RAM for the computer?

Amount of RAM for the GPU?


For an ATI GPU the log looks like:
04-May-2010 22:44:52 [Milkyway@home] Can't get available GPU RAM: 2


Previous 20 · Next 20

©2024 Astroinformatics Group