Welcome to MilkyWay@home

Posts by Arivald Ha'gel

1) Message boards : News : No More Work From Modfit Project (Message 66072)
Posted 2 Jan 2017 by Arivald Ha'gel
Post:
Am I right in saying that there is no more mod fit app, & the mod fit program is now the main milkyway app? With it's only WUs being the 133.66 credit WUs?


That's most likely true.
Current WU name is:
de_modfit_fast_*_*_*_bundle5_ModfitConstraints*_*_*_*_*

So I think that we won't ever be back to old WUs. Since those are "fast" and most likely compute the same results using less GPU time.

Currently my AMD R280X (7970) when computing 4 Bundled (*5) WU at the same time gets a bundle in average time of 120s. This gives us:
4*Bundle = 120s
1 Bundle = 120s/4 = 30s.
1 WU = 30s/5 = 6s.

Credits throughput is similar to old WU (currently between 380k and 400k credits/day). CPU usage is higher. Bundles did increase throughput and stabilized server. I'd hope for Bundle20 or Bundle100, these would increase stability even further and allow my PC to contact it even less often, gave me a bigger work buffer, but... :)
2) Message boards : News : Server Update (Message 65980)
Posted 30 Nov 2016 by Arivald Ha'gel
Post:
Arivald,

It looks like you a running a homebrew application on that host. I would recommend recompiling to be the latest version on the github code or running it with the application provided from the server. If there are still issues, let me know.

Jake


What? I'm running?!?

It's not me. It's "Ingmar Hensler" (http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=606779)
He is running homebrew application (Anonymous Platform). But this is not the real problem. Problem is that "Max tasks per day" should be lower than 10 000, and should decrease by 1 (or more) for each "invalid" task. Or at least should not allow MORE than this amount of tasks per day.
But it is 10k, it's not being decreased, and it does allow more than 10k tasks for this host per day (around 20k+ as I remember). This causes many invalid WU, and also serves as a DoS attack on the server (and on MW@H processing).

I have already shown this problem here:
https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4052&postid=65875
(16 Nov, 2016 - around 2 weeks ago)
and here:
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3990
(3 Aug 2016 - 4 MONTHS ago)
a Topic that was ignored by the whole community, and MW@H team also.

Problem is that NOT ANY HOST can process 10k WU per day! Barely any can really. My single R280X processes ~2750 per day. So even with 4 of those, it's barely above 10k. So it should simply start at 1000 (still a high but reasonable number), and go up by 1 for each good task, and 1 down for each bad task.
3) Message boards : News : Server Update (Message 65977)
Posted 30 Nov 2016 by Arivald Ha'gel
Post:
Jake,

Sweet.

I see that "Workunits waiting for validation" came down also.
However I still see my favorite Hosts churning through a lot of WUs... all of them "validate errors"...

http://milkyway.cs.rpi.edu/milkyway/host_app_versions.php?hostid=606779
4) Message boards : News : Server Update (Message 65975)
Posted 30 Nov 2016 by Arivald Ha'gel
Post:
Jake,

shouldn't then those WU be cancelled by the server? It would be much better?

I also see that the amount of "Workunits waiting for validation" is climbing. Currently up to: 26,841. It seems bundle of 5 will not be enough to quench thirst for WUs.
Any update on "Max tasks per day" not working correctly?
5) Message boards : News : Scheduled Maintenance Concluded (Message 65918)
Posted 19 Nov 2016 by Arivald Ha'gel
Post:
It's not that they can't be contacted. They don't care, of they don't know that there is a problem. Some people want to use their AMD GPUs, but aren't aware that they do not have DP capability.


That part caught my eye. I have an older Nvidea that SETI keeps sending a message through BOINC saying "upgrade your driver" to run SETI on your graphics card, but there isn't a newer driver, and it runs tasks anyway. Presumably this lacks DP? Funny thing is, they're showing as completed and validated.


Well... that are never drivers. Probably not for your GPU, but there are. Newest are: 375.95, you have "NVIDIA GeForce 9500 GS (512MB) driver: 341.95". BOINC isn't that "smart". It can't predict everything.

Your GPU does have DP capability. If it wouldn't you'd only get "computation errors".
6) Message boards : News : Scheduled Maintenance Concluded (Message 65916)
Posted 19 Nov 2016 by Arivald Ha'gel
Post:

[1] I don't use AMD GPU's, mine are NVidia.
[2] As to BOINC's anti trashing code, I wasn't aware it even existed
[3] I've not checked on who trashes what on MW@H, I agree if someone repeatedly
trashes WU they shouldn't get any until they stop doing so.

[4] Unfortunately projects using BOINC still persist in allowing anon users who cannot be contacted. I believe there is NO valid reason to allow anon crunchers,
its a license for abuse, of both the project and those who are contactable and do crunch with more valid than otherwise tasks.

[5] There always going to be tasks that fail to validate, for some reason, computer malfunction, driver problems or power failure whatever, but if you cant
even contact such a user, there is no way of stopping them carrying on trashing tasks. Other than possibly the facility you mention if it can be implemented server side.

However given the recent problems with MW@H servers, and the number of folks complaining about lack of WU and server errors I don't know if implementing that facility might not be a great a problem as the trahers:-/


AMD R280X is just most powerful for this project. It doesn't really matter which GPU you do have, WU is distributed to ALL applications.

As for 4) It's once again not true. You login with e-mail, and it's possible to write message to the user. I have sent message to most of the users I pointed out. It's not that they can't be contacted. They don't care, of they don't know that there is a problem. Some people want to use their AMD GPUs, but aren't aware that they do not have DP capability. I think that it's malice in very little amount of instances. BOINC and projects are supposed to be "out of the box". It seems that projects that require DP are not (since probably "Use AMD GPU" is enabled by default?)

5) It's implemented. But there seems to be a bug that allows sending tasks for certain Hosts. I saw that for some hosts mechanism DO work ok.

I'm not saying that this is THE MOST IMPORTANT PROBLEM. Previous problems were definitely more critical, but if we will increase bundle size to 100, my PC will do 20-times less WUs, but it will lose still 10 daily from those problems. Right now I do 3k WU per day. I'll do 150 WU per day if bundle size will be increased to 100. Almost 10% of my work will be wasted.

But sure, let's take another user that only have CPU. Currently he's doing like... 100 WU? If bundle size goes up to 100, he'll be doing 4! What if 1 of those fails due to such hosts - 25% of his work will be wasted!
I can live with current <1% "can't validate" WUs, somehow. I can assume that some work will be wasted. But I don't want it to be wasted. But people will be very angry (and will rage quit), if 25% of their Host work is "cannot validate". Trust me. And this issue will only go bigger with bundle size increase (or overall WU duration).
7) Message boards : News : Scheduled Maintenance Concluded (Message 65912)
Posted 19 Nov 2016 by Arivald Ha'gel
Post:
Hi Arivald,


Actually that was logical. It wouldn't be easy to make task validation per chunk. There is such validation method in ClimatePredicion.net, but I believe it's rather custom, and only because each part takes 2 days or so :)

IMHO, fixing issues with Hosts that all generate only invalid issues will go a long way towards decreasing invalid results as a whole.


Having a bundle end up as a computational error isn't that important, it will be resent to others so anyone winging it will still get a valid result [if it completes ok and IS valid].

Its only the person[s] that generate an error that doesn't get credited for it.


That's not really true. ALL people that took part in such a bundle DON'T get credits. If not, then please explain why I have "Can't validate" WUs? I have 10 invalids per day, NOT due to problems on my end. The same problem applies to every R280X. That work is wasted, I don't get any credits, so in essence we have wasted electricity = money.
Mechanism to prevent trashing project by some hosts IS a part of BOINC. It's just not working in MW@H, so BOINC creators knew that this a real problem. When I see a host that still have "Tasks per day" at 10k (I process 2-5-3k Valid Tasks per day), and he creates 20k invalids per day, I know something is wrong. "Tasks per day" should start at 100 at this moment (that still makes 500 old WU), and grow linearly as more valid results are returned. Right now it is NOT lower than 10k. And some hosts still get WU even if they go above this limit.

Such a host(s):
- makes (D)DoS attack on DB server - constantly requests new work. No work really gets done.
- trashes 6-7 times more Tasks than Top20 Host like mine. Assuming that there will be 2 like that (and there are when we sum them up), he and his buddies can invalidate work of 3 R280X. That's like 1m credits per days wasted (and like up to 9k WU per day).

Would you really like to have a single Host with 3 R280X (that takes almost a 1kW from the wall), have all invalid results cause some other Host wants that?
I DON'T!

This needs to be fixed!
8) Message boards : News : Scheduled Maintenance Concluded (Message 65900)
Posted 18 Nov 2016 by Arivald Ha'gel
Post:
if the app processes all of the bundle BUT fails on the last of the 5, with
a computational error.. ALL 5 are lost, not just the one that actually failed.

Since the entire bundle is labelled as computational error..

Well, if that is correct, then Jake has to go back to the bench and improve the server logic with respect to the validation code.

Michael.


Actually that was logical. It wouldn't be easy to make task validation per chunk. There is such validation method in ClimatePredicion.net, but I believe it's rather custom, and only because each part takes 2 days or so :)

IMHO, fixing issues with Hosts that all generate only invalid issues will go a long way towards decreasing invalid results as a whole.
9) Message boards : News : Scheduled Maintenance Concluded (Message 65899)
Posted 18 Nov 2016 by Arivald Ha'gel
Post:
if the app processes all of the bundle BUT fails on the last of the 5, with
a computational error.. ALL 5 are lost, not just the one that actually failed.

Since the entire bundle is labelled as computational error..

Well, if that is correct, then Jake has to go back to the bench and improve the server logic with respect to the validation code.

Michael.


Actually that was logical. It wouldn't be easy to make task validation per chunk. There is such validation method in ClimatePredicion.net, but I believe it's rather custom, and only because each part takes 2 days or so :)
10) Message boards : News : Scheduled Maintenance Concluded (Message 65893)
Posted 17 Nov 2016 by Arivald Ha'gel
Post:
Are you sure? I've compared all the cards and worked out the cost of buying them plus the electricity they use, and an R9 Fury X is the best by far. Everything else uses as much electricity but does less work. The electricity is the main cost, not the card itself.


Radeon Fury X, Double Precision performance 537.6. Watts: 275.
Source: https://en.wikipedia.org/wiki/AMD_Radeon_Rx_300_series

Radeon 280X, Double Precision performance 870.4-1024. Watts: 250.
Source: https://en.wikipedia.org/wiki/AMD_Radeon_Rx_200_series

Winner is clear. For Single Precision, Fury X is better.
11) Message boards : News : Scheduled Maintenance Concluded (Message 65883)
Posted 17 Nov 2016 by Arivald Ha'gel
Post:
Next Hosts generating only Errors on opencl_ati_101 tasks:

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=642157&offset=0&show_names=0&state=6&appid=
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=539028&offset=0&show_names=0&state=6&appid=

Today's favorite:
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=587750&offset=0&show_names=0&state=6&appid=
(4k errors on page)
12) Message boards : News : Scheduled Maintenance Concluded (Message 65876)
Posted 16 Nov 2016 by Arivald Ha'gel
Post:
Hello everyone,

Is anyone running a 390 or 390X (290 or 290x may have the same problem)

I still have the problem, when i run several WUs at once, then after some time (from minutes to one hour or so) some WUs start to hang, and go on for ever, while one or two crunch on.

I have tested drivers since 15.9, always the same problem, win 7 or win10 does not matter either. Tried different hardware setups, new installations of windows or old ones no difference.

I hope someone can confirm the problem, so we can start searching for the root cause and maybe even a fix.

PS. Running on the 280X or 7970 doesn't give me the error. Also running one WU at a time is fine.
Running 2 Einstein@home WUs at the same time causes calculation error (invalid tasks)


I'm running 4WUs on R280X. No problems at all. I believe this might be strictly 390/390X issue. I might have had such an issue, long, long time ago but it was also happening when I was computing single WUs.
13) Message boards : News : Scheduled Maintenance Concluded (Message 65875)
Posted 16 Nov 2016 by Arivald Ha'gel
Post:

Discriminating against failed WU or similar does make more sense to me though as that might gunk up the works a bit, even though I might technically fall under this category for the time being. I can assure you that isn't my intent. Hopefully my comments in this thread have helped shed some light on these issues as well.

Pointing fingers doesn't solve the problem. My computer has done a lot of good work for this project, and as mentioned, has been within the top 5 performing hosts in the not too distant past. Off the top of my head, I think it got up to 4th place, but had been within the top 10 for a few months. I hope to get it back up there again after I get these issues sorted out.


Lol. :) I wasn't talking about you at all.
I was talking about (100% or almost 100% invalid or errored tasks):
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=606779&offset=0&show_names=0&state=5&appid=
(50k invalid tasks in 2 days)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=23432&offset=0&show_names=0&state=6&appid=
(2k invalid tasks)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=698232&offset=0&show_names=0&state=5&appid=
(3k+ invalid tasks)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=586986&offset=0&show_names=0&state=5&appid=
(3k+ invalid tasks)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=694659&offset=0&show_names=0&state=5&appid=
(2k+ invalid tasks)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=630056&offset=0&show_names=0&state=6&appid=
(almost 2k invalid tasks)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=259168&offset=0&show_names=0&state=6&appid=
(almost 2k invalid tasks)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=706643&offset=0&show_names=0&state=6&appid=
(almost 2k invalid tasks)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=709083&offset=0&show_names=0&state=6&appid=
(almost 1k invalid tasks)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=637080&offset=0&show_names=0&state=5&appid=
(500 invalid tasks)

Those Hosts don't really generate ANY credits.
And those are just from 15 of my "can't validate" WUs. So there are many more of those...
My personal favorite is Hosts that have a limit of 10k Tasks per day, and somehow receives more than 20k. After a single day, "Max tasks per day" should drop to 100, unless SOME WUs are correctly returned.
Some time ago I have created a thread:
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3990
which was "ignored" by the community, so "we" weren't interested in server performance. And I suspect that this time it will be similar.

Bonus points since it was the SAME HOST, that currently gives us 20k invalid results per day.
14) Message boards : News : Scheduled Maintenance Concluded (Message 65865)
Posted 16 Nov 2016 by Arivald Ha'gel
Post:
Hey Everyone,

Sorry for the silence yesterday, I decided to take a day off yesterday to recharge a bit after the huge push I've been doing for the last two weeks. I will be working on the Linux GPU apps today, the Mac applications, and if I have time I will look into fixing the cosmetic issues with the progress bar.

Jake


Jake,

I hope that Fixing issues with "Max tasks per day" will come next.
There are at least a dozen Hosts that spam server with Invalid tasks, and process over 10000-20000 Tasks every day. They also cause some problems with creating Invalid WU. Thus, some of our work is wasted.
IMHO this parameter can start at a lower numer and grow linearly as valid Tasks are generated.
15) Message boards : News : Scheduled Maintenance Concluded (Message 65859)
Posted 16 Nov 2016 by Arivald Ha'gel
Post:

I seem to be getting some computation errors now though, so I'm going to try dialing it back down to 5 task per GPU to see if that fixes it. We'll see. I've typically been running 4 per GPU in the more recent past (last few months).

Still getting some errors. It may be on the CPU end as the tasks seem to occasionally error out at the same time when they switch between the bundled tasks and load the CPU.


My setup have 0% failure rate. I think it's more likely a GPU issue than a CPU.
16) Message boards : News : Scheduled Maintenance Concluded (Message 65856)
Posted 16 Nov 2016 by Arivald Ha'gel
Post:

See my previous post for Titan Black performance. If left running, this will likely place my host PC within the top 5 performing PCs crunching for MW@H.


Cool. My single Radeon 280X can crunch up to 380k-400k per day.
Crunching 4WU at the same time. Times are around 110-130s. So 22-26s per WU.

Updated to 6 MW@H 1.43 WU bundles running per GPU, 12 total. Also allocated more CPU headroom so WUs are finishing about 3:28 (88s) total. GPUs are loaded up to around 93%, VRAM up to around 3668MB (59% on SLIed Titan Black cards, so the memory usage is double from being mirrored between cards – expect roughly half this on independent cards).


3:28 is 208s (not 88s)

208 / 6 (single card performance) = ~35s.
So it would appear that Titan Black gets around 70% performance of R280X in MW@H.
17) Message boards : News : Scheduled Maintenance Concluded (Message 65853)
Posted 16 Nov 2016 by Arivald Ha'gel
Post:
Like what? I have a 600W PSU on the XP machine but limited cooling. The Win10 machine is an HP slimline requireing a half-height boad and only has a 350W PSU.


For AMD/ATI Radeon 280/280X is still the best for DP. They're also quite cheap right now (around 150$? each)

As for NVidia GeForce GTX Titan, and GeForce GTX Titan Black (from GeForce 700 Series), but they're extra costly - I believe more than 500$ each. In my place they're really unavailable. I saw some on UK eBay - 700 Pounds each... oh my eyes.

Titan should be almost twice as effective per W in DP but in old MW@H performed a little worse than R280X per /s (so I assume that per W it's the same).
See benchmark thread:
https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3551&postid=64162#64162

In this thread it can be seen that really ANYTHING AMD/ATI is better than NVidia in terms of DP.

For both, 600W PSU should be enough.
18) Message boards : News : Scheduled Maintenance Concluded (Message 65838)
Posted 15 Nov 2016 by Arivald Ha'gel
Post:
Please observe these work units.

This is expected.
https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1887793313
Name: de_modfit_fast_19_3s_136_bundle5_ModfitConstraints3
Run time: 20 min 41 sec
Credit: 133.66

This is not expected.
https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1887790878
Name: de_modfit_fast_19_3s_136_ModfitConstraints3
Run time: 4 min 5 sec
Credit: 26.73

It appears there are some work units getting thru that are not bundled, but run 5x as long as an old single work unit and pay 1/5 as much. I have a handful of these.


They're not bundled since they where first computed when bundle was not yet available. I believe this is expected that sometimes (1%-2%) WU will still be not-bundled. But this will disappear in few days totally.

Check: https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1377085803
This is WU for this Task. First sent to MW@H 1.38 Client.
19) Message boards : News : Scheduled Maintenance Concluded (Message 65831)
Posted 15 Nov 2016 by Arivald Ha'gel
Post:
On the XP machine, Milkyway WUs taking about 20 min of run time and using about 40% CPU.

On the Win10 machine, Milkyway WUs taking about 30 min of run time and typically using less than 10% CPU time (fairly large variation in CPU time).


This is probably due to CPU/GPU combination. AMD 64 X2 CPU is 90nm, very old CPU (introduced in 2005/2006, newer versions 65nm from 2007/2008). AMD 7750 Dual Core CPU is from the same family, 65nm CPU from 2008.

I have wrote about CPU/GPU profile analysis few posts earlier.
Also GTX-750Ti nor GT-730 is especially efficient GPU for double precision work. It WILL be much more efficient in single precision work in Einstein@Home or on Seti@Home.

I don't consider that running the current Milkyway WUs on my XP machine is an efficient use of the machine due to the high CPU usage, so I have discontinued running them until such that (or if...) a new version of the app is released that requires less CPU time.


Using those GPU, MilkyWay@Home will probably never be more efficient than Einstein@Home or Seti@Home credits wise.

If you wish to contribute to MW@H more I can suggest you some DP GPUs that will be efficient.
20) Message boards : News : Scheduled Maintenance Concluded (Message 65829)
Posted 15 Nov 2016 by Arivald Ha'gel
Post:
I see we have some rouge Hosts that wasted some work of my GPU. Here's one example:
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=606779

About 20k invalid WU in a single day. I thought that BOINC was supposed to curb that one with "Number of tasks today", but it seems to be over 20k per day (?) :/

I have already PMed the owner.

Jake,
Could you look at this particular problem? With increased Bundle size this will become increasingly problematic.


Next 20

©2020 Astroinformatics Group