Welcome to MilkyWay@home

Posts by Wrend

21) Message boards : Number crunching : Validation inconclusive (Message 72459)
Posted 2 Apr 2022 by Profile Wrend
Post:
I just turned them off for a moment to transition some WUs. They will be back up shortly.

Good to know. Thanks. :)
22) Message boards : Number crunching : Validation inconclusive (Message 72456)
Posted 2 Apr 2022 by Profile Wrend
Post:
At the moment it seems several of the servers are offline. https://milkyway.cs.rpi.edu/milkyway/server_status.php
23) Message boards : Number crunching : Run Multiple WU's on Your GPU (Message 72357)
Posted 30 Mar 2022 by Profile Wrend
Post:
Bearing that in mind, I'm quite pleased that this project can make good use of the DP/FP64 capabilities of my Titan Black GPUs, whereas other projects can't. Yes, the project is more niche for it, but then aptly so are my GPUs. If anyone is to be held responsible for it, I think it would be best placed at Nvidia's feet for generally limiting the capabilities of their cards to profit more from segmenting their market.

As an update to my previous post, I have applied new thermal paste to my GPUs and it made a quite significant difference, dropping full load temperatures over 10°C; and I say over since I was hitting thermal throttling levels before. It would seem the stock thermal paste had an effective lifespan of up to about 5 years for cards that are in active use.

Currently I'm back to running two tasks per GPU to help keep room temperatures and fan speeds and noise down. It's nice to be having my cards doing some good work in the background again.

Best regards.
24) Message boards : Number crunching : Run Multiple WU's on Your GPU (Message 71100)
Posted 5 Sep 2021 by Profile Wrend
Post:
Just curious to see if anyone has really studied whether there's an appreciable gain in running >1 WU/GPU?
New here at MW@H, but in previous tests at other projects I never saw any significant gain.
Regards, Jim ...

Sorry for the late reply, but yes, for my Titan Black cards with DP optimization set in the Nvidia Control Panel it makes a huge difference. Running just one task only loads up one of my GPUs about 1/6 load. If I want to fully load up my GPUs (which I usually don't actually) I would have to run 5 to 6 tasks simultaneously per GPU for a total of up to 12.

At certain parts of the different tasks it seems the GPUs will sometimes briefly spike up to near 100% though.

Currently I'm running 2 tasks per GPU as they are running hot (they probably need new thermal paste) and to not drain so much power. With 2 tasks running per GPU (4 total) each card is loaded about 34%.

CPU usage is so low for this project on this computer that it is generally negligible, so I'm running 4 tasks (using 4 of 12 threads) for Einstein@Home as well currently to load up my CPU about 1/3 too.
25) Message boards : Number crunching : New Nvidia Driver 378.49 Causing Computation Errors (Message 66275)
Posted 5 Apr 2017 by Profile Wrend
Post:
Boinc does NOT benefit from using SLI, take it off if you are not a gamer, it's better to treat each gpu separately.

Yes, I am aware of this and we've probably had this conversation several years ago now too back when I was running SLIed 680s. :)

It's still good to mention it though for other users in general as it is a waste of VRAM capacity and could be an issue for people with insufficient VRAM.

I was mentioning the specifics of my setup there in case it helps troubleshoot what the issues with these newer drivers are. The older drivers 376.33 are still running fine for me and crunching right along with the same settings.
26) Message boards : Number crunching : New Nvidia Driver 378.49 Causing Computation Errors (Message 66257)
Posted 31 Mar 2017 by Profile Wrend
Post:
I tried the newer driver 378.92 and it seemed to be running fine for at least an hour before all the work units started erroring out yet again. Definitely something unfortunate going on with these newer drivers.

I'm using two Titan Black cards in SLI, the prefer maximum performance setting, 2x, 3x, and 4x DSR, double precision optimization, and the Nvidia patch to force enable PCIe gen 3 on an i7-3930K CPU.

I'll be rolling the drivers back to 376.33 yet again and will let you know if I have any issues with it.
27) Message boards : Number crunching : New Nvidia Driver 378.49 Causing Computation Errors (Message 66194)
Posted 15 Feb 2017 by Profile Wrend
Post:
Sorry, it looks like I spoke too soon – and it's too late to edit my previous post. All the tasks are now instantly erroring out again for some reason with the newer drivers that came out today now too. I'm going to have to roll the driver back again.

I'm assuming the work unites will run again fine with the older driver mentioned in the first post, but I will let you know if not.
28) Message boards : Number crunching : New Nvidia Driver 378.49 Causing Computation Errors (Message 66193)
Posted 15 Feb 2017 by Profile Wrend
Post:
I'm now using the newer drivers that came out today – 378.66 – and everything seems to be running fine with them.
29) Message boards : Number crunching : New Nvidia Driver 378.49 Causing Computation Errors (Message 66185)
Posted 12 Feb 2017 by Profile Wrend
Post:
Just a heads up that this driver is causing the MilkyWay@Home 1.43 (opencl_nvidia_101) work units to instantly fail – on my system at any rate.

I went back to the 376.33 Driver and the work units are crunching fine again.

Is anyone else having this issue?

Thanks.
30) Message boards : News : Scheduled Maintenance Concluded (Message 65926)
Posted 19 Nov 2016 by Profile Wrend
Post:
Hi all,

I noticed something interesting. Today I got new AIO water cooling for my old 2500K and overclock it default bios 3.2 to 4.2Ghz. When I look milkyway tasks percentace of "Completed, validation inconclusive" become high numbers then before. Mine R9 270x same not overclocked and same driver. Any thoughts about this ?


Odd, as I thought if a processor was too overclocked it crashed, as the OS is running on it. I know that GPUs can create errors without crashing if slightly overclocked (you get artefacts on the display).


I tested with primegrid cpu tasks no error. windows 10 running smoothly, no display glinch or lag. CPU temp 55-60 Celsius. GPU-Z shows GPU temp 58 celcius ,load %96. I don't know what cause this.

On that note, my PC is also Prime95 stable and error free, even while crunching for MW@H. It is kind of odd that WUs would get computation errors like this.

i7-3930K @ 4.2GHZ
31) Message boards : News : Scheduled Maintenance Concluded (Message 65920)
Posted 19 Nov 2016 by Profile Wrend
Post:
I was just thinking that maybe, just maybe that some of these posts would be better suited in another thread. Something like the crunching area. Would help others if they have a problem with a GPU card or want to talk about performance of their system. To me it isn't news about the new bundling of work units or new updates being released. Then others can read the title of the thread and see what people are talking about instead of wading through every thread to find something.

Thanks,

Rich


With the way the new WUs are bundled, it may have some unexpected impacts on performance and reliability. Likewise with being bundled, if a task fails, then the failure is more significant as it takes up to 4 additional tasks with it, depending on the bundled task's failure point.

I get computational errors almost elusively on the new WUs when I first start crunching for them, running 12 at the same time, 6 on each card, then almost none at all once the tasks have dispersed their start and stop times. It's a little disconcerting.

Nvidia cards that are DP optimized (such as my Titan Black cards) have to crunch in parallel like this if they're to be significantly loaded and utilized.

With the issues my setup seems to be having, it seems like the new WUs favor AMD cards a bit more, since they don't need to crunch as many tasks in parallel to load up their GPUs.

But yeah, in general, I suppose people could make posts in other threads in other sections of the forum as well, if they felt like it.

On the plus side, at least this provides feedback and related discussion in one easily accessible location.

If you only want to follow Jake's posts, try here. → http://milkyway.cs.rpi.edu/milkyway/forum_user_posts.php?userid=792007

Cheers.
32) Message boards : News : Scheduled Maintenance Concluded (Message 65910)
Posted 18 Nov 2016 by Profile Wrend
Post:
I updated my Nvidia drivers to the newest version and gave my PC a reboot for some maintenance. I often have it up for weeks or even sometimes months at a time.

When resetting and restarting the project, it is kind of strange that so many WUs would all error out at the same time as each other, but once the work loads disperse and even out, errors become very rare.

The top several errors in the picture below are all from the very first run after restarting the project.

https://i.imgur.com/rKzVjcN.png
33) Message boards : News : Scheduled Maintenance Concluded (Message 65888)
Posted 17 Nov 2016 by Profile Wrend
Post:
The 1.43 work units seem to be running great now on my end (knock on wood). I haven't really changed anything from what I've previously tested recently. I did give the work units some time to settle and disperse, evening out CPU and GPU loads a bit more consistently as they complete at different times.

I'm getting better and more consistent communication with the server now too.

So... Who knows for sure the cause of these issues, but work units seem to be working well for me now at least.

YMMV and best of luck!
34) Message boards : News : Scheduled Maintenance Concluded (Message 65886)
Posted 17 Nov 2016 by Profile Wrend
Post:
Yeah sucks about the 390 cards not being able to run more than 1 Einstein task.
My 280 was great until it died.

I haven't tried M@H yet because I'll have to try & remember how to change the xml file.


The AMD cards do a better job going through the double precision workloads without having to run multiple at the same time compared to the double precision optimized Nvidia cards, where for example, I have to run 6 MW@H 1.43 WU per GPU just to load them up, more if I want to keep them more fully loaded, which I don't.

Here's my config file. (The CPU values are currently set to these since CPU usage is set to 67% in BOINC and I run 6 other CPU tasks from Einstein@Home on the 12 threads of my CPU.)

C:\ProgramData\BOINC\projects\milkyway.cs.rpi.edu_milkyway\app_config.xml

<app_config>

<app>
<name>milkyway</name>
<max_concurrent>0</max_concurrent>
<gpu_versions>
<gpu_usage>0.16</gpu_usage>
<cpu_usage>0.08</cpu_usage>
</gpu_versions>
</app>

<app>
<name>milkyway_nbody</name>
<max_concurrent>0</max_concurrent>
<gpu_versions>
<gpu_usage>0.16</gpu_usage>
<cpu_usage>0.08</cpu_usage>
</gpu_versions>
</app>

<app>
<name>milkyway_separation__modified_fit</name>
<max_concurrent>0</max_concurrent>
<gpu_versions>
<gpu_usage>0.19</gpu_usage>
<cpu_usage>0.08</cpu_usage>
</gpu_versions>
</app>

</app_config>


You'll want to change these values to suit your own needs.
35) Message boards : News : Scheduled Maintenance Concluded (Message 65882)
Posted 17 Nov 2016 by Profile Wrend
Post:

To be fair, it's easier watching multiple things at once with the progress bar resetting. The only reason I see for changing this is for aesthetics, not serving any practical purpose that I'm aware of. Maybe to help avoid some confusion, though on the other hand it might also help illustrate how the WUs are running to people who aren't aware of them being bundled in this way.

I understand some people prefer form over function though... ;)

I'll make do either way.

Cheers, guys.


I prefer function over form actually, and to have a progress bar that moves back and forth is illogical.

It doesn't exactly follow the rules of expectations, if that's what you mean, but I'm not sure that it necessarily should, given that it isn't actually just running one WU, but running several one at a time separately from each other.


But as it's only 1 bar trying to show 5 things, it just makes a mess. When it says 35%, that could be any number of positions.

I agree that that aspect of it doesn't make much sense if you're tying to see at a glance when the whole WU will end.


It doesn't make sense in any way. If you see it reading 5%, where is it? You have no idea which of the bundle it's working on. If it only passes once, then you know it's a quarter way through the first one.


Yeah, proportionally in total toward being done.

It's easy to see when individual tasks start and stops as it is now though, moving between loading the GPU and CPU, which has been helpful for me in troubleshooting, watching 12 WU progress bars, 12 CPU thread loads, and two GPU loads.

Perhaps it's more useful to know at a glance where the whole bundle is at, I'm just not sure that it actually is though.

Not up to me either way, of course. Just providing some feedback.
36) Message boards : News : Scheduled Maintenance Concluded (Message 65880)
Posted 16 Nov 2016 by Profile Wrend
Post:

To be fair, it's easier watching multiple things at once with the progress bar resetting. The only reason I see for changing this is for aesthetics, not serving any practical purpose that I'm aware of. Maybe to help avoid some confusion, though on the other hand it might also help illustrate how the WUs are running to people who aren't aware of them being bundled in this way.

I understand some people prefer form over function though... ;)

I'll make do either way.

Cheers, guys.


I prefer function over form actually, and to have a progress bar that moves back and forth is illogical.

It doesn't exactly follow the rules of expectations, if that's what you mean, but I'm not sure that it necessarily should, given that it isn't actually just running one WU, but running several one at a time separately from each other.


But as it's only 1 bar trying to show 5 things, it just makes a mess. When it says 35%, that could be any number of positions.

I agree that that aspect of it doesn't make much sense if you're tying to see at a glance when the whole WU will end.
37) Message boards : News : Scheduled Maintenance Concluded (Message 65879)
Posted 16 Nov 2016 by Profile Wrend
Post:

Discriminating against failed WU or similar does make more sense to me though as that might gunk up the works a bit, even though I might technically fall under this category for the time being. I can assure you that isn't my intent. Hopefully my comments in this thread have helped shed some light on these issues as well.

Pointing fingers doesn't solve the problem. My computer has done a lot of good work for this project, and as mentioned, has been within the top 5 performing hosts in the not too distant past. Off the top of my head, I think it got up to 4th place, but had been within the top 10 for a few months. I hope to get it back up there again after I get these issues sorted out.


Lol. :) I wasn't talking about you at all.
I was talking about (100% or almost 100% invalid or errored tasks):
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=606779&offset=0&show_names=0&state=5&appid=
(50k invalid tasks in 2 days)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=23432&offset=0&show_names=0&state=6&appid=
(2k invalid tasks)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=698232&offset=0&show_names=0&state=5&appid=
(3k+ invalid tasks)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=586986&offset=0&show_names=0&state=5&appid=
(3k+ invalid tasks)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=694659&offset=0&show_names=0&state=5&appid=
(2k+ invalid tasks)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=630056&offset=0&show_names=0&state=6&appid=
(almost 2k invalid tasks)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=259168&offset=0&show_names=0&state=6&appid=
(almost 2k invalid tasks)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=706643&offset=0&show_names=0&state=6&appid=
(almost 2k invalid tasks)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=709083&offset=0&show_names=0&state=6&appid=
(almost 1k invalid tasks)
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=637080&offset=0&show_names=0&state=5&appid=
(500 invalid tasks)

Those Hosts don't really generate ANY credits.
And those are just from 15 of my "can't validate" WUs. So there are many more of those...
My personal favorite is Hosts that have a limit of 10k Tasks per day, and somehow receives more than 20k. After a single day, "Max tasks per day" should drop to 100, unless SOME WUs are correctly returned.
Some time ago I have created a thread:
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3990
which was "ignored" by the community, so "we" weren't interested in server performance. And I suspect that this time it will be similar.

Bonus points since it was the SAME HOST, that currently gives us 20k invalid results per day.


Apologies. I guess I'm just being a little defensive from the frustration of trying to sort out the issues my PC seems to be having with these new bundled WUs. I do like them overall, though wish they behaved a little better for me.

Best regards.
38) Message boards : News : Scheduled Maintenance Concluded (Message 65873)
Posted 16 Nov 2016 by Profile Wrend
Post:

To be fair, it's easier watching multiple things at once with the progress bar resetting. The only reason I see for changing this is for aesthetics, not serving any practical purpose that I'm aware of. Maybe to help avoid some confusion, though on the other hand it might also help illustrate how the WUs are running to people who aren't aware of them being bundled in this way.

I understand some people prefer form over function though... ;)

I'll make do either way.

Cheers, guys.


I prefer function over form actually, and to have a progress bar that moves back and forth is illogical.

It doesn't exactly follow the rules of expectations, if that's what you mean, but I'm not sure that it necessarily should, given that it isn't actually just running one WU, but running several one at a time separately from each other.
39) Message boards : News : Scheduled Maintenance Concluded (Message 65872)
Posted 16 Nov 2016 by Profile Wrend
Post:
Hey Everyone,

Sorry for the silence yesterday, I decided to take a day off yesterday to recharge a bit after the huge push I've been doing for the last two weeks. I will be working on the Linux GPU apps today, the Mac applications, and if I have time I will look into fixing the cosmetic issues with the progress bar.

Jake


Jake,

I hope that Fixing issues with "Max tasks per day" will come next.
There are at least a dozen Hosts that spam server with Invalid tasks, and process over 10000-20000 Tasks every day. They also cause some problems with creating Invalid WU. Thus, some of our work is wasted.
IMHO this parameter can start at a lower numer and grow linearly as valid Tasks are generated.


On that note, I've done a bunch of testing on my end, and am still not sure what the exact issue is. I've loaded up the CPU and GPUs much more than BOINC does doing these tests, Prime95 for a few hours and other GPU load and compute tests looking for any artifacts or similar. Everything seems to check out 100% so far.

I haven't started crunching for MW@H again yet as I still want to do some more tests, but so far it was looking like the first several WUs would have computational errors as they would "finish" bundled tasks and change CPU and GPU loads at the same time as each other for a while. Once they start deviating from each other, they seem to stop erring out.

I'm not wanting to give out bad work, waste server resources, nor anything similar.

Unless there is truly a shortage of WU and it is determined that more people should get them as apposed to more capable computers which can turn them around faster, I'm not sure that discriminating against work per time should be a thing. Theoretically the good of the project would be to get the most amount of work done in the shortest amount of time overall.

Discriminating against failed WU or similar does make more sense to me though as that might gunk up the works a bit, even though I might technically fall under this category for the time being. I can assure you that isn't my intent. Hopefully my comments in this thread have helped shed some light on these issues as well.

Pointing fingers doesn't solve the problem. My computer has done a lot of good work for this project, and as mentioned, has been within the top 5 performing hosts in the not too distant past. Off the top of my head, I think it got up to 4th place, but had been within the top 10 for a few months. I hope to get it back up there again after I get these issues sorted out.

Cheers.
40) Message boards : News : Scheduled Maintenance Concluded (Message 65864)
Posted 16 Nov 2016 by Profile Wrend
Post:

As for the cosmetic changes, I actually rather like being able to easily see on the progress bar when one of the bundled tasks ends as it's helping me to troubleshoot some issues I seem to be having on my end (see my last couple posts), being able to monitor bundled task progress, CPU load, and GPU load in real time. But hey, that's just me.


Yes it is just you :-P
I'd like to see the progress of the whole WU.
Since there are 5 in the bundle, you can just watch for 20/40/60/80/100% :-)


To be fair, it's easier watching multiple things at once with the progress bar resetting. The only reason I see for changing this is for aesthetics, not serving any practical purpose that I'm aware of. Maybe to help avoid some confusion, though on the other hand it might also help illustrate how the WUs are running to people who aren't aware of them being bundled in this way.

I understand some people prefer form over function though... ;)

I'll make do either way.

Cheers, guys.


Previous 20 · Next 20

©2024 Astroinformatics Group