Welcome to MilkyWay@home

Scheduled Maintenance Concluded

Message boards : News : Scheduled Maintenance Concluded
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · Next

AuthorMessage
Profile Cliff
Avatar

Send message
Joined: 28 Nov 14
Posts: 51
Credit: 86,696,721
RAC: 0
Message 65896 - Posted: 18 Nov 2016, 2:49:56 UTC

Hi,
Just found the drawback on the new bundl5's:-/
if the app processes all of the bundle BUT fails on the last of the 5, with
a computational error.. ALL 5 are lost, not just the one that actually failed.

Since the entire bundle is labelled as computational error..

But its survivable:-) Rather loose the odd bundle that way than not get ANY work at all.
Regards,
Cliff.
--
Been there Done That, still no Damn T-Shirt
ID: 65896 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber

Send message
Joined: 22 Jan 08
Posts: 29
Credit: 242,730,423
RAC: 0
Message 65898 - Posted: 18 Nov 2016, 8:17:25 UTC - in response to Message 65896.  

if the app processes all of the bundle BUT fails on the last of the 5, with
a computational error.. ALL 5 are lost, not just the one that actually failed.

Since the entire bundle is labelled as computational error..

Well, if that is correct, then Jake has to go back to the bench and improve the server logic with respect to the validation code.

Michael.
President of Rechenkraft.net e.V. - This planet's first and largest distributed computing organization.

ID: 65898 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Arivald Ha'gel

Send message
Joined: 30 Apr 14
Posts: 67
Credit: 160,674,488
RAC: 0
Message 65899 - Posted: 18 Nov 2016, 8:36:59 UTC - in response to Message 65898.  

if the app processes all of the bundle BUT fails on the last of the 5, with
a computational error.. ALL 5 are lost, not just the one that actually failed.

Since the entire bundle is labelled as computational error..

Well, if that is correct, then Jake has to go back to the bench and improve the server logic with respect to the validation code.

Michael.


Actually that was logical. It wouldn't be easy to make task validation per chunk. There is such validation method in ClimatePredicion.net, but I believe it's rather custom, and only because each part takes 2 days or so :)
ID: 65899 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Arivald Ha'gel

Send message
Joined: 30 Apr 14
Posts: 67
Credit: 160,674,488
RAC: 0
Message 65900 - Posted: 18 Nov 2016, 8:37:51 UTC - in response to Message 65899.  

if the app processes all of the bundle BUT fails on the last of the 5, with
a computational error.. ALL 5 are lost, not just the one that actually failed.

Since the entire bundle is labelled as computational error..

Well, if that is correct, then Jake has to go back to the bench and improve the server logic with respect to the validation code.

Michael.


Actually that was logical. It wouldn't be easy to make task validation per chunk. There is such validation method in ClimatePredicion.net, but I believe it's rather custom, and only because each part takes 2 days or so :)

IMHO, fixing issues with Hosts that all generate only invalid issues will go a long way towards decreasing invalid results as a whole.
ID: 65900 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bluestang

Send message
Joined: 13 Oct 16
Posts: 112
Credit: 1,174,293,644
RAC: 0
Message 65905 - Posted: 18 Nov 2016, 14:51:10 UTC - in response to Message 65900.  

if the app processes all of the bundle BUT fails on the last of the 5, with
a computational error.. ALL 5 are lost, not just the one that actually failed.

Since the entire bundle is labelled as computational error..

Well, if that is correct, then Jake has to go back to the bench and improve the server logic with respect to the validation code.

Michael.


Actually that was logical. It wouldn't be easy to make task validation per chunk. There is such validation method in ClimatePredicion.net, but I believe it's rather custom, and only because each part takes 2 days or so :)

IMHO, fixing issues with Hosts that all generate only invalid issues will go a long way towards decreasing invalid results as a whole.


+1000 for those damn Hosts to get blocked!
ID: 65905 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mmonnin

Send message
Joined: 2 Oct 16
Posts: 167
Credit: 1,008,060,949
RAC: 9,927
Message 65906 - Posted: 18 Nov 2016, 17:58:56 UTC - in response to Message 65894.  

I run a few projects (Einstein, SETI, Milkyway, and soon Universe if they get GPU working). I've just been going by the floating point performance on GPUboss.com.

Oh and it's for playing games too.


This is why I single out my 280x for MW and I only want to run MW on it. The 7970 is the best single GPU besides Titans. There are better cards, 7990 is 2x 7970 GPUs on a single card.

http://www.geeks3d.com/20140305/amd-radeon-and-nvidia-geforce-fp32-fp64-gflops-table-computing/

The page is old but no newer consumer card come close as NV and AMD have been increasing the ratio between FP32 and FP64. They are forcing those that want top FP64 performance to pay for it with workstation cards.

Strength for stength, MW gets my 280x and other projects get the better SP cards. Universe, once released, will be DP as well so cards that perform well on MW will be good there too.
ID: 65906 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 65908 - Posted: 18 Nov 2016, 18:36:37 UTC - in response to Message 65906.  

I run a few projects (Einstein, SETI, Milkyway, and soon Universe if they get GPU working). I've just been going by the floating point performance on GPUboss.com.

Oh and it's for playing games too.


This is why I single out my 280x for MW and I only want to run MW on it. The 7970 is the best single GPU besides Titans. There are better cards, 7990 is 2x 7970 GPUs on a single card.

http://www.geeks3d.com/20140305/amd-radeon-and-nvidia-geforce-fp32-fp64-gflops-table-computing/

The page is old but no newer consumer card come close as NV and AMD have been increasing the ratio between FP32 and FP64. They are forcing those that want top FP64 performance to pay for it with workstation cards.

Strength for stength, MW gets my 280x and other projects get the better SP cards. Universe, once released, will be DP as well so cards that perform well on MW will be good there too.


Well I'm not sure what to get now. I want good games performance (which is mostly SP?), and I want it to be good at SETI and Einstein too - which are SP - why are they SP when the smaller projects are DP?)

Compared to my 290, the Fury X would be twice as fast at games and half my projects, and the same at the other half of the projects, while using the same electricity.

I don't want to allocate certain cards to certain projects, I want to use everything for whatever project I want done at the moment.
ID: 65908 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Wrend
Avatar

Send message
Joined: 4 Nov 12
Posts: 96
Credit: 251,528,484
RAC: 0
Message 65910 - Posted: 18 Nov 2016, 22:09:03 UTC
Last modified: 18 Nov 2016, 22:23:50 UTC

I updated my Nvidia drivers to the newest version and gave my PC a reboot for some maintenance. I often have it up for weeks or even sometimes months at a time.

When resetting and restarting the project, it is kind of strange that so many WUs would all error out at the same time as each other, but once the work loads disperse and even out, errors become very rare.

The top several errors in the picture below are all from the very first run after restarting the project.

https://i.imgur.com/rKzVjcN.png
ID: 65910 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Cliff
Avatar

Send message
Joined: 28 Nov 14
Posts: 51
Credit: 86,696,721
RAC: 0
Message 65911 - Posted: 19 Nov 2016, 0:43:13 UTC - in response to Message 65900.  

Hi Arivald,


Actually that was logical. It wouldn't be easy to make task validation per chunk. There is such validation method in ClimatePredicion.net, but I believe it's rather custom, and only because each part takes 2 days or so :)

IMHO, fixing issues with Hosts that all generate only invalid issues will go a long way towards decreasing invalid results as a whole.


Having a bundle end up as a computational error isn't that important, it will be resent to others so anyone winging it will still get a valid result [if it completes ok and IS valid].

Its only the person[s] that generate an error that doesn't get credited for it.
Regards,
Cliff.
--
Been there Done That, still no Damn T-Shirt
ID: 65911 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Arivald Ha'gel

Send message
Joined: 30 Apr 14
Posts: 67
Credit: 160,674,488
RAC: 0
Message 65912 - Posted: 19 Nov 2016, 9:30:20 UTC - in response to Message 65911.  
Last modified: 19 Nov 2016, 9:44:03 UTC

Hi Arivald,


Actually that was logical. It wouldn't be easy to make task validation per chunk. There is such validation method in ClimatePredicion.net, but I believe it's rather custom, and only because each part takes 2 days or so :)

IMHO, fixing issues with Hosts that all generate only invalid issues will go a long way towards decreasing invalid results as a whole.


Having a bundle end up as a computational error isn't that important, it will be resent to others so anyone winging it will still get a valid result [if it completes ok and IS valid].

Its only the person[s] that generate an error that doesn't get credited for it.


That's not really true. ALL people that took part in such a bundle DON'T get credits. If not, then please explain why I have "Can't validate" WUs? I have 10 invalids per day, NOT due to problems on my end. The same problem applies to every R280X. That work is wasted, I don't get any credits, so in essence we have wasted electricity = money.
Mechanism to prevent trashing project by some hosts IS a part of BOINC. It's just not working in MW@H, so BOINC creators knew that this a real problem. When I see a host that still have "Tasks per day" at 10k (I process 2-5-3k Valid Tasks per day), and he creates 20k invalids per day, I know something is wrong. "Tasks per day" should start at 100 at this moment (that still makes 500 old WU), and grow linearly as more valid results are returned. Right now it is NOT lower than 10k. And some hosts still get WU even if they go above this limit.

Such a host(s):
- makes (D)DoS attack on DB server - constantly requests new work. No work really gets done.
- trashes 6-7 times more Tasks than Top20 Host like mine. Assuming that there will be 2 like that (and there are when we sum them up), he and his buddies can invalidate work of 3 R280X. That's like 1m credits per days wasted (and like up to 9k WU per day).

Would you really like to have a single Host with 3 R280X (that takes almost a 1kW from the wall), have all invalid results cause some other Host wants that?
I DON'T!

This needs to be fixed!
ID: 65912 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Cliff
Avatar

Send message
Joined: 28 Nov 14
Posts: 51
Credit: 86,696,721
RAC: 0
Message 65913 - Posted: 19 Nov 2016, 11:14:27 UTC - in response to Message 65912.  

Hi Arivald,


Its only the person[s] that generate an error that doesn't get credited for it.


That's not really true. ALL people that took part in such a bundle DON'T get credits. If not, then please explain why I have "Can't validate" WUs? I have 10 invalids per day, NOT due to problems on my end. The same problem applies to every R280X. That work is wasted, I don't get any credits, so in essence we have wasted electricity = money.
Well,
[Would you really like to have a single Host with 3 R280X (that takes almost a 1kW from the wall), have all invalid results cause some other Host wants that?
I DON'T!

This needs to be fixed!

Well,
[1] I don't use AMD GPU's, mine are NVidia.
[2] As to BOINC's anti trashing code, I wasn't aware it even existed
[3] I've not checked on who trashes what on MW@H, I agree if someone repeatedly
trashes WU they shouldn't get any until they stop doing so.

[4] Unfortunately projects using BOINC still persist in allowing anon users who cannot be contacted. I believe there is NO valid reason to allow anon crunchers,
its a license for abuse, of both the project and those who are contactable and do crunch with more valid than otherwise tasks.

[5] There always going to be tasks that fail to validate, for some reason, computer malfunction, driver problems or power failure whatever, but if you cant
even contact such a user, there is no way of stopping them carrying on trashing tasks. Other than possibly the facility you mention if it can be implemented server side.

However given the recent problems with MW@H servers, and the number of folks complaining about lack of WU and server errors I don't know if implementing that facility might not be a great a problem as the trahers:-/
Regards,
Cliff.
--
Been there Done That, still no Damn T-Shirt
ID: 65913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 65915 - Posted: 19 Nov 2016, 12:32:17 UTC - in response to Message 65900.  

if the app processes all of the bundle BUT fails on the last of the 5, with
a computational error.. ALL 5 are lost, not just the one that actually failed.

Since the entire bundle is labelled as computational error..

Well, if that is correct, then Jake has to go back to the bench and improve the server logic with respect to the validation code.

Michael.


Actually that was logical. It wouldn't be easy to make task validation per chunk. There is such validation method in ClimatePredicion.net, but I believe it's rather custom, and only because each part takes 2 days or so :)

IMHO, fixing issues with Hosts that all generate only invalid issues will go a long way towards decreasing invalid results as a whole.


Are these hosts behind a proxy? I used to work in a school and set up loads of machines to run BOINC. I was emailed by a project admin (can't remember which one) saying the proxy was making a mess of the returns.
ID: 65915 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Arivald Ha'gel

Send message
Joined: 30 Apr 14
Posts: 67
Credit: 160,674,488
RAC: 0
Message 65916 - Posted: 19 Nov 2016, 12:35:44 UTC - in response to Message 65913.  
Last modified: 19 Nov 2016, 12:36:48 UTC


[1] I don't use AMD GPU's, mine are NVidia.
[2] As to BOINC's anti trashing code, I wasn't aware it even existed
[3] I've not checked on who trashes what on MW@H, I agree if someone repeatedly
trashes WU they shouldn't get any until they stop doing so.

[4] Unfortunately projects using BOINC still persist in allowing anon users who cannot be contacted. I believe there is NO valid reason to allow anon crunchers,
its a license for abuse, of both the project and those who are contactable and do crunch with more valid than otherwise tasks.

[5] There always going to be tasks that fail to validate, for some reason, computer malfunction, driver problems or power failure whatever, but if you cant
even contact such a user, there is no way of stopping them carrying on trashing tasks. Other than possibly the facility you mention if it can be implemented server side.

However given the recent problems with MW@H servers, and the number of folks complaining about lack of WU and server errors I don't know if implementing that facility might not be a great a problem as the trahers:-/


AMD R280X is just most powerful for this project. It doesn't really matter which GPU you do have, WU is distributed to ALL applications.

As for 4) It's once again not true. You login with e-mail, and it's possible to write message to the user. I have sent message to most of the users I pointed out. It's not that they can't be contacted. They don't care, of they don't know that there is a problem. Some people want to use their AMD GPUs, but aren't aware that they do not have DP capability. I think that it's malice in very little amount of instances. BOINC and projects are supposed to be "out of the box". It seems that projects that require DP are not (since probably "Use AMD GPU" is enabled by default?)

5) It's implemented. But there seems to be a bug that allows sending tasks for certain Hosts. I saw that for some hosts mechanism DO work ok.

I'm not saying that this is THE MOST IMPORTANT PROBLEM. Previous problems were definitely more critical, but if we will increase bundle size to 100, my PC will do 20-times less WUs, but it will lose still 10 daily from those problems. Right now I do 3k WU per day. I'll do 150 WU per day if bundle size will be increased to 100. Almost 10% of my work will be wasted.

But sure, let's take another user that only have CPU. Currently he's doing like... 100 WU? If bundle size goes up to 100, he'll be doing 4! What if 1 of those fails due to such hosts - 25% of his work will be wasted!
I can live with current <1% "can't validate" WUs, somehow. I can assume that some work will be wasted. But I don't want it to be wasted. But people will be very angry (and will rage quit), if 25% of their Host work is "cannot validate". Trust me. And this issue will only go bigger with bundle size increase (or overall WU duration).
ID: 65916 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 65917 - Posted: 19 Nov 2016, 12:40:33 UTC - in response to Message 65916.  

It's not that they can't be contacted. They don't care, of they don't know that there is a problem. Some people want to use their AMD GPUs, but aren't aware that they do not have DP capability.


That part caught my eye. I have an older Nvidea that SETI keeps sending a message through BOINC saying "upgrade your driver" to run SETI on your graphics card, but there isn't a newer driver, and it runs tasks anyway. Presumably this lacks DP? Funny thing is, they're showing as completed and validated.
ID: 65917 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Arivald Ha'gel

Send message
Joined: 30 Apr 14
Posts: 67
Credit: 160,674,488
RAC: 0
Message 65918 - Posted: 19 Nov 2016, 13:00:49 UTC - in response to Message 65917.  

It's not that they can't be contacted. They don't care, of they don't know that there is a problem. Some people want to use their AMD GPUs, but aren't aware that they do not have DP capability.


That part caught my eye. I have an older Nvidea that SETI keeps sending a message through BOINC saying "upgrade your driver" to run SETI on your graphics card, but there isn't a newer driver, and it runs tasks anyway. Presumably this lacks DP? Funny thing is, they're showing as completed and validated.


Well... that are never drivers. Probably not for your GPU, but there are. Newest are: 375.95, you have "NVIDIA GeForce 9500 GS (512MB) driver: 341.95". BOINC isn't that "smart". It can't predict everything.

Your GPU does have DP capability. If it wouldn't you'd only get "computation errors".
ID: 65918 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rich

Send message
Joined: 14 Nov 14
Posts: 9
Credit: 214,644,261
RAC: 0
Message 65919 - Posted: 19 Nov 2016, 13:20:15 UTC

I was just thinking that maybe, just maybe that some of these posts would be better suited in another thread. Something like the crunching area. Would help others if they have a problem with a GPU card or want to talk about performance of their system. To me it isn't news about the new bundling of work units or new updates being released. Then others can read the title of the thread and see what people are talking about instead of wading through every thread to find something.

Thanks,

Rich
ID: 65919 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Wrend
Avatar

Send message
Joined: 4 Nov 12
Posts: 96
Credit: 251,528,484
RAC: 0
Message 65920 - Posted: 19 Nov 2016, 14:02:46 UTC - in response to Message 65919.  
Last modified: 19 Nov 2016, 14:33:48 UTC

I was just thinking that maybe, just maybe that some of these posts would be better suited in another thread. Something like the crunching area. Would help others if they have a problem with a GPU card or want to talk about performance of their system. To me it isn't news about the new bundling of work units or new updates being released. Then others can read the title of the thread and see what people are talking about instead of wading through every thread to find something.

Thanks,

Rich


With the way the new WUs are bundled, it may have some unexpected impacts on performance and reliability. Likewise with being bundled, if a task fails, then the failure is more significant as it takes up to 4 additional tasks with it, depending on the bundled task's failure point.

I get computational errors almost elusively on the new WUs when I first start crunching for them, running 12 at the same time, 6 on each card, then almost none at all once the tasks have dispersed their start and stop times. It's a little disconcerting.

Nvidia cards that are DP optimized (such as my Titan Black cards) have to crunch in parallel like this if they're to be significantly loaded and utilized.

With the issues my setup seems to be having, it seems like the new WUs favor AMD cards a bit more, since they don't need to crunch as many tasks in parallel to load up their GPUs.

But yeah, in general, I suppose people could make posts in other threads in other sections of the forum as well, if they felt like it.

On the plus side, at least this provides feedback and related discussion in one easily accessible location.

If you only want to follow Jake's posts, try here. → http://milkyway.cs.rpi.edu/milkyway/forum_user_posts.php?userid=792007

Cheers.
ID: 65920 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 65921 - Posted: 19 Nov 2016, 15:19:12 UTC - in response to Message 65918.  

There aren't newer drivers for my card, the newer ones don't work with my ancient card. So the message is just a generic one to make sure I'm using the latest and best drivers?

MW doesn't run on it, so I assumed DP was missing from the card/driver.

Oh well, if the results aren't erroring, it must be doing something useful.
ID: 65921 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rymorea

Send message
Joined: 6 Oct 14
Posts: 46
Credit: 20,017,425
RAC: 0
Message 65922 - Posted: 19 Nov 2016, 17:53:37 UTC

Hi all,

I noticed something interesting. Today I got new AIO water cooling for my old 2500K and overclock it default bios 3.2 to 4.2Ghz. When I look milkyway tasks percentace of "Completed, validation inconclusive" become high numbers then before. Mine R9 270x same not overclocked and same driver. Any thoughts about this ?
ID: 65922 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 65923 - Posted: 19 Nov 2016, 18:23:18 UTC - in response to Message 65922.  

Hi all,

I noticed something interesting. Today I got new AIO water cooling for my old 2500K and overclock it default bios 3.2 to 4.2Ghz. When I look milkyway tasks percentace of "Completed, validation inconclusive" become high numbers then before. Mine R9 270x same not overclocked and same driver. Any thoughts about this ?


Odd, as I thought if a processor was too overclocked it crashed, as the OS is running on it. I know that GPUs can create errors without crashing if slightly overclocked (you get artefacts on the display).
ID: 65923 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · Next

Message boards : News : Scheduled Maintenance Concluded

©2024 Astroinformatics Group