Welcome to MilkyWay@home

Posts by Brian Silvers

1) Message boards : News : How the new validator works (Message 38889)
Posted 19 Apr 2010 by Brian Silvers
Post:
Is the error rate tracked per user or per computer? Is it possible to have this metric added in the appropriate section so that it is visible to us? With the quick purge rate specific task errors can quickly disappear from our sight. (Probably a change request for the BOINC dev team, but worthwhile since this figure is an important part in our contributions and how we manage our systems.)

Um, it should really be per computer per GPU class ... sad to say some of us have mixed ATI / Nvidia systems working and may see different error rates on the different GPU classes (see 58xx issues of awhile-ago) ...

Of course we have been asking for DCF to be on an application level for like forever and it is still not there yet ...


...and people keep ignoring my idea of a Homogeneous Redundancy-like thing for GPUs... That would have the various classes of GPUs, as best as I understand it when applied to CPUs... :shrug:
2) Message boards : News : server issues (Message 38551)
Posted 10 Apr 2010 by Brian Silvers
Post:
We're having some database issues which is why the server isn't sending out any work. I've contacted labstaff and elevated the ticket to emergency so hopefully we'll have things fixed shortly. I think we might have to move to yesterday's backup.


Are you not doing incrementals????????
3) Message boards : Number crunching : Feeder & validator need kicking (Message 38429)
Posted 8 Apr 2010 by Brian Silvers
Post:
I've got an Intel i7 based machine that has been getting only 4 CPU work units to crunch at a time (doing it on the CPU only). My safety stock is empty. Obviously the machine would like 8 or more units. I was running 6.10.18 but upgraded today to 6.10.43 as it is now the recommended version and I was hoping that it would solve the problem. Nope! I also detached/ reattached but that also didn't help. Yesterday I reset my preferences to 5 days of cache and updated the preferences. No help there either. My lap top has since received a nice cache but my main cruncher is at a loss for enough work. Any ideas?


Check these two settings under computing preferences:

On multiprocessors, use at most n processors
On multiprocessors, use at most
Enforced by version 6.1+ xxx % of the processors
4) Message boards : News : quorum down to 2 (Message 38230)
Posted 6 Apr 2010 by Brian Silvers
Post:

If things are not better soon, could you consider figuring out a way to utilize "Homogeneous Redundancy"-like capability to group GPUs away from CPUs? I know I'm not adding much in comparison to people with GPUs anymore, but now that you're grouping me with GPUs that are having problems validating, I'm burning electricity for nothing sometimes.


That won't fix the problem (at least on our end). We want the results to be accurate. If we're getting certain results from certain OS/architectures and certain results from others how do we know which ones are the right ones?

I'm really hoping we have this sorted out in the next couple days. I know having invalid results is really frustrating.


I can appreciate your issue. It, however, is not my issue. I do hope you understand that. I know I'm not adding much very often compared to GPUs, but it has been stated in other threads that the CPU results are definitely more accurate at this point than the ATI 5800 series cards.

As Crunch3r has said, you could restrict 5800 series participation until the issue is sorted with them.

In the meantime, I'm setting to no new tasks...

5) Message boards : News : quorum down to 2 (Message 38197)
Posted 6 Apr 2010 by Brian Silvers
Post:
The database is having a bit of trouble keeping up with all the new results due to a quorum of 3, so for the time being I'm dropping it to a quorum of 2.
<br>
On another note, we should have source code for the new application available tomorrow.


If things are not better soon, could you consider figuring out a way to utilize "Homogeneous Redundancy"-like capability to group GPUs away from CPUs? I know I'm not adding much in comparison to people with GPUs anymore, but now that you're grouping me with GPUs that are having problems validating, I'm burning electricity for nothing sometimes.

Thanks...

Edit: Oh, and my Pentium 4 got grouped with 3 other 5800 series GPUs, which in another thread you state aren't matching up to other architectures, so my result, which is probably the one you should've accepted, got dumped as invalid simply because they formed a quorum with their matching each other...

WU 90248204
6) Message boards : News : testing new validator (Message 38100)
Posted 5 Apr 2010 by Brian Silvers
Post:
Since I don't know if purges are running real quick, and I don't know what the major amount of noise is in this thread since I last read it, I'm going ahead and posting this. It will likely be formatted badly, and may already be covered by the numerous postings, but I just wanted to state that it's quite unfair to me to have an app that is known to be working fine and spend 4.5 hours on a task for zip, zap, zero... Oh, and in case the 4.5 hours didn't tell you which system is mine, it's the non-GPU system, the first one in the quorum...

name de_s11_3s_free_6_1544383_1270347650
application MilkyWay@Home
created 4 Apr 2010 2:20:50 UTC
minimum quorum 3
initial replication 4
max # of error/total/success tasks 3, 6, 1
errors Too many success results
This is displayed on the workunit pageTask ID
click for details Computer Sent Time reported
or deadline
explain Status Run time
(sec) CPU time
(sec) Claimed credit Granted credit Application
95656877 26452 4 Apr 2010 2:22:25 UTC 5 Apr 2010 5:49:34 UTC Completed, can't validate 0.00 16,347.27 70.47 0.00 Anonymous platform
96480761 26133 5 Apr 2010 5:50:54 UTC 5 Apr 2010 7:17:50 UTC Completed, can't validate 216.20 212.47 1.11 0.00 MilkyWay@Home v0.21 (ati13ati)
96480762 141414 5 Apr 2010 5:50:38 UTC 5 Apr 2010 6:27:01 UTC Completed, can't validate 89.52 87.25 0.61 0.00 MilkyWay@Home v0.21 (ati13ati)
7) Message boards : News : Server outage (Message 37994)
Posted 4 Apr 2010 by Brian Silvers
Post:
It's good to hear you're on top of things - I hope BOINC gives you enough capabilities to deal with the scammers. Will these changes affect the distribution of the anonymous platform apps? For instance, will new versions need to be validated before being allowed, assuming you have the capability to enforce validation?


Once the new validator gets up and going validation will work as follows:

Any result that could potentially improve one of our searches will be validated (with a quorum of 2 or 3).

Previously, any result that wouldn't improve our searches we ignored. I'll be validating 50 - 100% of these for the next couple weeks so everyones error rate gets update correctly.

After everyone's error rate has leveled out, I'll drop the validation done on these workunits to whatever % the error rate of host returning the result is (minimum 10%).

So we'll still be validating every potentially good result, but we'll be using BOINC's adaptive validation for everything else.


How does this deal with CPCW (Cherry-Picking Credit Whoring)? I just got two very different credit per hour rates. The const_v2 searches are considerably faster than const_v3, but yield the same credit. Are you going to be holding a second set of statistics for people who abort longer-running tasks?
8) Message boards : Number crunching : de_14_3s_free_5... shows no progress on 0.20 CPU client (Message 37760)
Posted 26 Mar 2010 by Brian Silvers
Post:

Btw., I will try to compile CPU versions with a fixed progress bar during the weekend.


Any progress on this?

Thanks...

Sorry, was busy with other stuff. What about next weekend?

The fix is really simple, I just need a free one hour time slot to compile the stuff for the five targets currently supported, do a short test on each, and to adapt the app_info.xml files.


"Next weekend" as in 2 days from now, or as in 9 days from now?

Without the progress percentage, those of us with CPU apps are somewhat "flying blind", hoping that we do not have a bad task. So far everything has been ok, but will it continue that way?


The ps_13_3s count fine for me.


I'll check that out then... I've been doing Cosmo on this system for the past few days... The other computer is running in service mode and I don't feel like looking at it right now... :/

Edit: Yeah, ps_13_3s_const_v3_5087818_1269576799_0 is working ok...

Still, if the project sends out whatever tasks that caused it not to work properly, that case needs to be fixed, unless those types of tasks were not supposed to come from the project. :shrug: All I know is I pulled my AMD off and switched to Cosmo while I waited and semi-monitored my Intel...
9) Message boards : Number crunching : de_14_3s_free_5... shows no progress on 0.20 CPU client (Message 37754)
Posted 25 Mar 2010 by Brian Silvers
Post:

Btw., I will try to compile CPU versions with a fixed progress bar during the weekend.


Any progress on this?

Thanks...

Sorry, was busy with other stuff. What about next weekend?

The fix is really simple, I just need a free one hour time slot to compile the stuff for the five targets currently supported, do a short test on each, and to adapt the app_info.xml files.


"Next weekend" as in 2 days from now, or as in 9 days from now?

Without the progress percentage, those of us with CPU apps are somewhat "flying blind", hoping that we do not have a bad task. So far everything has been ok, but will it continue that way?
10) Message boards : Number crunching : de_14_3s_free_5... shows no progress on 0.20 CPU client (Message 37731)
Posted 24 Mar 2010 by Brian Silvers
Post:

Btw., I will try to compile CPU versions with a fixed progress bar during the weekend.


Any progress on this?

Thanks...
11) Message boards : Number crunching : de_14_3s_free_5... shows no progress on 0.20 CPU client (Message 37377)
Posted 15 Mar 2010 by Brian Silvers
Post:
ok, so are these new tasks good, bad, or hit or miss?

Also, this inquiry is in regards to the *CPU* app only. I do not have a GPU-capable system. I came looking at this thread because of, well, the subject title... :-)
12) Message boards : Number crunching : Now that we have native ATI GPU support, how about longer tasks? (Message 35760)
Posted 17 Jan 2010 by Brian Silvers
Post:

In my opinion, you should consider looking at whether or not there is a way to use Homogeneous Redundancy classes to separate GPU from CPU and give GPUs the longer task and leave the shorter tasks to CPUs.


HR isn't a factor here. With a quorum of 1, you don't use multiple replications for validation.


Perhaps I'm getting ahead of the curve with trying to segregate tasks, regardless of quorum. Not sure if there's already a way to do that, but the whole point is that GPU users need to be placed in a different classification category than CPU users. You folks can exclusively have the 3-stream (longer-running) tasks, and leave CPU users with the 1-stream, 2-stream, or other shorter-running tasks.

Perhaps I am phrasing the BOINC equivalent wrong, and there is something there already, but if the planned "2 to 4 times increase" in runtime happens again, then that will undo the increase in deadline and will cause people with CPUs to start howling again...

I'm advocating making everyone happier, not just a few. Same as I've been doing all along... I think if something like what I'm suggesting is done, it will improve total project throughput and maybe, just maybe, allow you all to have a larger cache. Might not, but it is certainly worth a try if there is a way to do that already or if it is a minimal change.
13) Message boards : Number crunching : Now that we have native ATI GPU support, how about longer tasks? (Message 35721)
Posted 17 Jan 2010 by Brian Silvers
Post:
Thanks for implementing native ATI support!

And now that we have it, how about issuing tasks exclusively for ATI GPUs that run (say) for an hour (on a 4870)?

No change in credits/hour. That way we can fill up a normal queue of work lasting (say) a day or two.

In addition to letting us to weather downtime or network issues, it would DRAMATICALLY drop the load on the project server and network load.


I think that might really take some new science from the astronomers. We do have a change in the works that should increase the compute time by (hopefully) another factor of 2 - 4. Once we get the server side GPU issues settled, we'll be releasing that.


In my opinion, you should consider looking at whether or not there is a way to use Homogeneous Redundancy classes to separate GPU from CPU and give GPUs the longer task and leave the shorter tasks to CPUs.
14) Message boards : Number crunching : Testing ATI Application Availability (Message 35241)
Posted 9 Jan 2010 by Brian Silvers
Post:
Except that we should all my now be using the wonderful new automagically-downloaded-from-the-server apps. :)


Said by Anthony Waters only around 17-18 hours ago:

I can confirm that Catalyst 9.12 works, I was able to complete several work units in a row, thanks to The Gas Giant for bringing this to my attention. Currently only the Windows 64 bit version will be available and I think it relies on Catalyst 9.7+ for Windows 7/Vista, and 9.12 for Windows XP users. If all goes well the Windows 32 bit version will also be deployed on BOINC.
15) Message boards : Number crunching : Milky Way, Project unfriendly..... (Message 35186)
Posted 8 Jan 2010 by Brian Silvers
Post:
I'd be happy with an hour of work cached for MW so that when a project maintenance back off occurs I can keep crunching or loose very little time. But my hour is different to your hour or his hour as we have different set ups. For example I have a 4850 and a 4870 in my Q9450 box. The longer wu's take approx 3.5 min and 3 min respectively. So that would mean I'd need to have cached 38 wu's, currently I get 24 which is about 39 minutes. With a few shorties of 55 seconds, that drops down to less than 30 minutes.

In my old P4/HT with the 3850 in it, the long wu's take 10 minutes so I get closer to 2 hrs of cache or less with a few shorties.

Imagine a box with 3 or 4 5870's in it....holy cow!


The previously mentioned 5 minute -> 20 minute cache thing was on a 4850 or a 4870, can't remember which. At any rate, the same logistical problem is in place - these tasks were originally designed to be processed by CPUs, not GPUs. Giving you all 100% of the 3-stream tasks is just a band-aid. It will not address the root cause, which is that the tasks just aren't complex enough. Allegedly the MW_GPU project was going to provide tasks perhaps 100 times the complexity. For whatever reason, that idea was tossed out. It needs to be brought to the front burner again...
16) Message boards : Number crunching : Milky Way, Project unfriendly..... (Message 35178)
Posted 8 Jan 2010 by Brian Silvers
Post:
Why not limit the short WUs to CPU clients? That will at least help a little. Allowing GPU clients to cache more WUs would also solve the problem.

I would think / hope that the server would be able to differentiate between a GPU and a CPU, so once all is in place, in theory the 3-stream tasks could go to those of you with GPUs and perhaps increase your caches as well, up to perhaps double what they are now. After that, the 1-stream and 2-stream tasks can go to CPU participants, again with double the cache (from 6 to 12).

That would certainly help the situation and I agree with the exception that the cache size should be more than doubled for GPUs. On the fastest GPUs the long WUs take 80 seconds, so even with only one of those GPUs on a dual core processor the cache amounts to 16 minutes. That's just not enough to allow BOINC to do any kind of management. The end result is that many get frustrated and dump the project altogether. It's only 1 in a 100 dissatisfied users that post here or even understand why it's not working correctly. We need to get the queue size up to a minimum of several hours in order for BOINC manager to schedule properly.



Same problem exists as before. Runtimes were increased by 4. To be effective in handling you all with GPUs without totally crushing the server, it needs to be increased by another factor of 10, perhaps 20, particularly if you're wanting "minimum of several hours". Allegedly 6 tasks was like 5 minutes back then, so 6 tasks are only 20 minutes now. To get to 3 hours, that means 9 times...or 54 tasks, but again, it's only 3 hours. I have the feeling most of you won't be happy unless it goes up to 8 hours, so that means an increase in cache of 24 times, or 144 tasks. Bumping your caches up by factors of that much will only cause problems. I do not know if the current type of tasks have that much room for expansion in their scientific value. That's why this whole time, I've said that the real long-term fix is a separate GPU project or separate type of work.
17) Message boards : Number crunching : Milky Way, Project unfriendly..... (Message 35153)
Posted 7 Jan 2010 by Brian Silvers
Post:
Why not limit the short WUs to CPU clients? That will at least help a little. Allowing GPU clients to cache more WUs would also solve the problem.


That's what I was trying to say for a very long time, but got a bunch of static from you folks with GPUs. What I was proposing was longer tasks for GPUs only, either by segregating the tasks here in this project or with a separate project. Unfortunately, with what the project did in lengthening the runtime for all users, that helped the situation some at the expense of those on the lower end of the spectrum (re: the rise in complaints about the short deadlines).

I would think / hope that the server would be able to differentiate between a GPU and a CPU, so once all is in place, in theory the 3-stream tasks could go to those of you with GPUs and perhaps increase your caches as well, up to perhaps double what they are now. After that, the 1-stream and 2-stream tasks can go to CPU participants, again with double the cache (from 6 to 12).

18) Message boards : Number crunching : Deadline problem (Message 34999)
Posted 1 Jan 2010 by Brian Silvers
Post:
Well, I aborted everything and have it set not to get new wu's from milky way. If they want it set that only rich people who have advanced computers and/or can afford an extra computer to just run BOINC then the hell with them for being so elitist. I'll keep my spare computing power for cosmology, climate prediction, World Community Grid and the LHC project (if lhc ever gives any wu's) or maybe join a new one. If they decide to let normal people participate again maybe I'll come back.


Don't know what your problem is except wanting to complain. My P4 is only 2.66, so it is older than yours. I have no problem doing these tasks. Heck at times early on they were over 10 hours.


These complaints arise any time a project has a short/tight deadline and the person does not have the computer on enough and/or is running several other projects.

On the one hand, we had the people with fast GPUs howling about things. On the other, we now have people on the other end of the spectrum howling about things. Unless the 1-stream and 2-stream work is of no use to the project anymore, they could address some of the complaining by sending the 3-stream tasks to GPU users and the 1-stream and 2-stream tasks to CPU users. I would think that if the server code is updated for native GPU support, that kind of "homogeneous redundancy"-ish kind of setup should be doable...
19) Message boards : Number crunching : Deadline problem (Message 34745)
Posted 21 Dec 2009 by Brian Silvers
Post:
Another small thing that could be done is if the 5-10% performance improvement in the CUDA code that was changed recently makes any difference at all to CPU processing times, a new stock and 3rd-party optimized application could be made for CPU processing.

Those changes are only applicable to the GPU versions. It should be the last two in the set of optimizations mentioned in the PPAM publication linked on the front page. They are already in the ATI version starting with version 0.20.


OK... I didn't think there was going to be any benefit for CPU processing, but didn't know for sure...

At any rate, GPUs need longer running tasks, but not CPUs. It would be best if they could distribute the longer tasks to GPUs and start up the shorter-running searches that the scientists were told not to run and send those to CPU users. If not, then the project will probably need to do some PSAs and/or other ways to educate users on why the deadlines are what they are and why they really cannot be changed.
20) Message boards : Number crunching : Deadline problem (Message 34735)
Posted 20 Dec 2009 by Brian Silvers
Post:
I was and am a strong advocate of making the tasks longer, but primarily for GPU users. As best as I understand things, the scientists working on this project had to intentionally not run tasks that are of the 1-stream or 2-stream variety because they would not run for long enough so that the systems out here pounded on the server causing many problems, such as slow web site performance and, ironically, work outages.

As I said somewhere over the past few weeks, I knew that when task runtimes were increased for everyone, this kind of complaint would crop up. The way to deal with this to please more people is to find a way to send these longer-running 3-stream tasks (tasks that have 3s in their name) to GPU users, and then let the scientists generate 1 and 2-stream tasks again and direct those to CPU users.

Another small thing that could be done is if the 5-10% performance improvement in the CUDA code that was changed recently makes any difference at all to CPU processing times, a new stock and 3rd-party optimized application could be made for CPU processing.

The best thing though is to try to segregate the tasks...


Next 20

©2024 Astroinformatics Group