Message boards :
News :
Scheduled Maintenance Concluded
Message board moderation
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · Next
Author | Message |
---|---|
Send message Joined: 28 Nov 14 Posts: 51 Credit: 86,696,721 RAC: 0 |
Hi, Just found the drawback on the new bundl5's:-/ if the app processes all of the bundle BUT fails on the last of the 5, with a computational error.. ALL 5 are lost, not just the one that actually failed. Since the entire bundle is labelled as computational error.. But its survivable:-) Rather loose the odd bundle that way than not get ANY work at all. Regards, Cliff. -- Been there Done That, still no Damn T-Shirt |
Send message Joined: 22 Jan 08 Posts: 29 Credit: 242,730,423 RAC: 0 |
if the app processes all of the bundle BUT fails on the last of the 5, with Well, if that is correct, then Jake has to go back to the bench and improve the server logic with respect to the validation code. Michael. President of Rechenkraft.net e.V. - This planet's first and largest distributed computing organization. |
Send message Joined: 30 Apr 14 Posts: 67 Credit: 160,674,488 RAC: 0 |
if the app processes all of the bundle BUT fails on the last of the 5, with Actually that was logical. It wouldn't be easy to make task validation per chunk. There is such validation method in ClimatePredicion.net, but I believe it's rather custom, and only because each part takes 2 days or so :) |
Send message Joined: 30 Apr 14 Posts: 67 Credit: 160,674,488 RAC: 0 |
if the app processes all of the bundle BUT fails on the last of the 5, with Actually that was logical. It wouldn't be easy to make task validation per chunk. There is such validation method in ClimatePredicion.net, but I believe it's rather custom, and only because each part takes 2 days or so :) IMHO, fixing issues with Hosts that all generate only invalid issues will go a long way towards decreasing invalid results as a whole. |
Send message Joined: 13 Oct 16 Posts: 112 Credit: 1,174,293,644 RAC: 0 |
if the app processes all of the bundle BUT fails on the last of the 5, with +1000 for those damn Hosts to get blocked! |
Send message Joined: 2 Oct 16 Posts: 167 Credit: 1,008,060,949 RAC: 9,927 |
I run a few projects (Einstein, SETI, Milkyway, and soon Universe if they get GPU working). I've just been going by the floating point performance on GPUboss.com. This is why I single out my 280x for MW and I only want to run MW on it. The 7970 is the best single GPU besides Titans. There are better cards, 7990 is 2x 7970 GPUs on a single card. http://www.geeks3d.com/20140305/amd-radeon-and-nvidia-geforce-fp32-fp64-gflops-table-computing/ The page is old but no newer consumer card come close as NV and AMD have been increasing the ratio between FP32 and FP64. They are forcing those that want top FP64 performance to pay for it with workstation cards. Strength for stength, MW gets my 280x and other projects get the better SP cards. Universe, once released, will be DP as well so cards that perform well on MW will be good there too. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
I run a few projects (Einstein, SETI, Milkyway, and soon Universe if they get GPU working). I've just been going by the floating point performance on GPUboss.com. Well I'm not sure what to get now. I want good games performance (which is mostly SP?), and I want it to be good at SETI and Einstein too - which are SP - why are they SP when the smaller projects are DP?) Compared to my 290, the Fury X would be twice as fast at games and half my projects, and the same at the other half of the projects, while using the same electricity. I don't want to allocate certain cards to certain projects, I want to use everything for whatever project I want done at the moment. |
Send message Joined: 4 Nov 12 Posts: 96 Credit: 251,528,484 RAC: 0 |
I updated my Nvidia drivers to the newest version and gave my PC a reboot for some maintenance. I often have it up for weeks or even sometimes months at a time. When resetting and restarting the project, it is kind of strange that so many WUs would all error out at the same time as each other, but once the work loads disperse and even out, errors become very rare. The top several errors in the picture below are all from the very first run after restarting the project. https://i.imgur.com/rKzVjcN.png |
Send message Joined: 28 Nov 14 Posts: 51 Credit: 86,696,721 RAC: 0 |
Hi Arivald,
Having a bundle end up as a computational error isn't that important, it will be resent to others so anyone winging it will still get a valid result [if it completes ok and IS valid]. Its only the person[s] that generate an error that doesn't get credited for it. Regards, Cliff. -- Been there Done That, still no Damn T-Shirt |
Send message Joined: 30 Apr 14 Posts: 67 Credit: 160,674,488 RAC: 0 |
Hi Arivald, That's not really true. ALL people that took part in such a bundle DON'T get credits. If not, then please explain why I have "Can't validate" WUs? I have 10 invalids per day, NOT due to problems on my end. The same problem applies to every R280X. That work is wasted, I don't get any credits, so in essence we have wasted electricity = money. Mechanism to prevent trashing project by some hosts IS a part of BOINC. It's just not working in MW@H, so BOINC creators knew that this a real problem. When I see a host that still have "Tasks per day" at 10k (I process 2-5-3k Valid Tasks per day), and he creates 20k invalids per day, I know something is wrong. "Tasks per day" should start at 100 at this moment (that still makes 500 old WU), and grow linearly as more valid results are returned. Right now it is NOT lower than 10k. And some hosts still get WU even if they go above this limit. Such a host(s): - makes (D)DoS attack on DB server - constantly requests new work. No work really gets done. - trashes 6-7 times more Tasks than Top20 Host like mine. Assuming that there will be 2 like that (and there are when we sum them up), he and his buddies can invalidate work of 3 R280X. That's like 1m credits per days wasted (and like up to 9k WU per day). Would you really like to have a single Host with 3 R280X (that takes almost a 1kW from the wall), have all invalid results cause some other Host wants that? I DON'T! This needs to be fixed! |
Send message Joined: 28 Nov 14 Posts: 51 Credit: 86,696,721 RAC: 0 |
Hi Arivald, Well, [1] I don't use AMD GPU's, mine are NVidia. [2] As to BOINC's anti trashing code, I wasn't aware it even existed [3] I've not checked on who trashes what on MW@H, I agree if someone repeatedly trashes WU they shouldn't get any until they stop doing so. [4] Unfortunately projects using BOINC still persist in allowing anon users who cannot be contacted. I believe there is NO valid reason to allow anon crunchers, its a license for abuse, of both the project and those who are contactable and do crunch with more valid than otherwise tasks. [5] There always going to be tasks that fail to validate, for some reason, computer malfunction, driver problems or power failure whatever, but if you cant even contact such a user, there is no way of stopping them carrying on trashing tasks. Other than possibly the facility you mention if it can be implemented server side. However given the recent problems with MW@H servers, and the number of folks complaining about lack of WU and server errors I don't know if implementing that facility might not be a great a problem as the trahers:-/ Regards, Cliff. -- Been there Done That, still no Damn T-Shirt |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
if the app processes all of the bundle BUT fails on the last of the 5, with Are these hosts behind a proxy? I used to work in a school and set up loads of machines to run BOINC. I was emailed by a project admin (can't remember which one) saying the proxy was making a mess of the returns. |
Send message Joined: 30 Apr 14 Posts: 67 Credit: 160,674,488 RAC: 0 |
AMD R280X is just most powerful for this project. It doesn't really matter which GPU you do have, WU is distributed to ALL applications. As for 4) It's once again not true. You login with e-mail, and it's possible to write message to the user. I have sent message to most of the users I pointed out. It's not that they can't be contacted. They don't care, of they don't know that there is a problem. Some people want to use their AMD GPUs, but aren't aware that they do not have DP capability. I think that it's malice in very little amount of instances. BOINC and projects are supposed to be "out of the box". It seems that projects that require DP are not (since probably "Use AMD GPU" is enabled by default?) 5) It's implemented. But there seems to be a bug that allows sending tasks for certain Hosts. I saw that for some hosts mechanism DO work ok. I'm not saying that this is THE MOST IMPORTANT PROBLEM. Previous problems were definitely more critical, but if we will increase bundle size to 100, my PC will do 20-times less WUs, but it will lose still 10 daily from those problems. Right now I do 3k WU per day. I'll do 150 WU per day if bundle size will be increased to 100. Almost 10% of my work will be wasted. But sure, let's take another user that only have CPU. Currently he's doing like... 100 WU? If bundle size goes up to 100, he'll be doing 4! What if 1 of those fails due to such hosts - 25% of his work will be wasted! I can live with current <1% "can't validate" WUs, somehow. I can assume that some work will be wasted. But I don't want it to be wasted. But people will be very angry (and will rage quit), if 25% of their Host work is "cannot validate". Trust me. And this issue will only go bigger with bundle size increase (or overall WU duration). |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
It's not that they can't be contacted. They don't care, of they don't know that there is a problem. Some people want to use their AMD GPUs, but aren't aware that they do not have DP capability. That part caught my eye. I have an older Nvidea that SETI keeps sending a message through BOINC saying "upgrade your driver" to run SETI on your graphics card, but there isn't a newer driver, and it runs tasks anyway. Presumably this lacks DP? Funny thing is, they're showing as completed and validated. |
Send message Joined: 30 Apr 14 Posts: 67 Credit: 160,674,488 RAC: 0 |
It's not that they can't be contacted. They don't care, of they don't know that there is a problem. Some people want to use their AMD GPUs, but aren't aware that they do not have DP capability. Well... that are never drivers. Probably not for your GPU, but there are. Newest are: 375.95, you have "NVIDIA GeForce 9500 GS (512MB) driver: 341.95". BOINC isn't that "smart". It can't predict everything. Your GPU does have DP capability. If it wouldn't you'd only get "computation errors". |
Send message Joined: 14 Nov 14 Posts: 9 Credit: 214,644,261 RAC: 0 |
I was just thinking that maybe, just maybe that some of these posts would be better suited in another thread. Something like the crunching area. Would help others if they have a problem with a GPU card or want to talk about performance of their system. To me it isn't news about the new bundling of work units or new updates being released. Then others can read the title of the thread and see what people are talking about instead of wading through every thread to find something. Thanks, Rich |
Send message Joined: 4 Nov 12 Posts: 96 Credit: 251,528,484 RAC: 0 |
I was just thinking that maybe, just maybe that some of these posts would be better suited in another thread. Something like the crunching area. Would help others if they have a problem with a GPU card or want to talk about performance of their system. To me it isn't news about the new bundling of work units or new updates being released. Then others can read the title of the thread and see what people are talking about instead of wading through every thread to find something. With the way the new WUs are bundled, it may have some unexpected impacts on performance and reliability. Likewise with being bundled, if a task fails, then the failure is more significant as it takes up to 4 additional tasks with it, depending on the bundled task's failure point. I get computational errors almost elusively on the new WUs when I first start crunching for them, running 12 at the same time, 6 on each card, then almost none at all once the tasks have dispersed their start and stop times. It's a little disconcerting. Nvidia cards that are DP optimized (such as my Titan Black cards) have to crunch in parallel like this if they're to be significantly loaded and utilized. With the issues my setup seems to be having, it seems like the new WUs favor AMD cards a bit more, since they don't need to crunch as many tasks in parallel to load up their GPUs. But yeah, in general, I suppose people could make posts in other threads in other sections of the forum as well, if they felt like it. On the plus side, at least this provides feedback and related discussion in one easily accessible location. If you only want to follow Jake's posts, try here. → http://milkyway.cs.rpi.edu/milkyway/forum_user_posts.php?userid=792007 Cheers. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
There aren't newer drivers for my card, the newer ones don't work with my ancient card. So the message is just a generic one to make sure I'm using the latest and best drivers? MW doesn't run on it, so I assumed DP was missing from the card/driver. Oh well, if the results aren't erroring, it must be doing something useful. |
Send message Joined: 6 Oct 14 Posts: 46 Credit: 20,017,425 RAC: 0 |
Hi all, I noticed something interesting. Today I got new AIO water cooling for my old 2500K and overclock it default bios 3.2 to 4.2Ghz. When I look milkyway tasks percentace of "Completed, validation inconclusive" become high numbers then before. Mine R9 270x same not overclocked and same driver. Any thoughts about this ? |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Hi all, Odd, as I thought if a processor was too overclocked it crashed, as the OS is running on it. I know that GPUs can create errors without crashing if slightly overclocked (you get artefacts on the display). |
©2024 Astroinformatics Group