21)
Message boards :
Number crunching :
Validation inconclusive
(Message 72459)
Posted 2 Apr 2022 by Wrend Post: I just turned them off for a moment to transition some WUs. They will be back up shortly. Good to know. Thanks. :) |
22)
Message boards :
Number crunching :
Validation inconclusive
(Message 72456)
Posted 2 Apr 2022 by Wrend Post: At the moment it seems several of the servers are offline. https://milkyway.cs.rpi.edu/milkyway/server_status.php |
23)
Message boards :
Number crunching :
Run Multiple WU's on Your GPU
(Message 72357)
Posted 30 Mar 2022 by Wrend Post: Bearing that in mind, I'm quite pleased that this project can make good use of the DP/FP64 capabilities of my Titan Black GPUs, whereas other projects can't. Yes, the project is more niche for it, but then aptly so are my GPUs. If anyone is to be held responsible for it, I think it would be best placed at Nvidia's feet for generally limiting the capabilities of their cards to profit more from segmenting their market. As an update to my previous post, I have applied new thermal paste to my GPUs and it made a quite significant difference, dropping full load temperatures over 10°C; and I say over since I was hitting thermal throttling levels before. It would seem the stock thermal paste had an effective lifespan of up to about 5 years for cards that are in active use. Currently I'm back to running two tasks per GPU to help keep room temperatures and fan speeds and noise down. It's nice to be having my cards doing some good work in the background again. Best regards. |
24)
Message boards :
Number crunching :
Run Multiple WU's on Your GPU
(Message 71100)
Posted 5 Sep 2021 by Wrend Post: Just curious to see if anyone has really studied whether there's an appreciable gain in running >1 WU/GPU? Sorry for the late reply, but yes, for my Titan Black cards with DP optimization set in the Nvidia Control Panel it makes a huge difference. Running just one task only loads up one of my GPUs about 1/6 load. If I want to fully load up my GPUs (which I usually don't actually) I would have to run 5 to 6 tasks simultaneously per GPU for a total of up to 12. At certain parts of the different tasks it seems the GPUs will sometimes briefly spike up to near 100% though. Currently I'm running 2 tasks per GPU as they are running hot (they probably need new thermal paste) and to not drain so much power. With 2 tasks running per GPU (4 total) each card is loaded about 34%. CPU usage is so low for this project on this computer that it is generally negligible, so I'm running 4 tasks (using 4 of 12 threads) for Einstein@Home as well currently to load up my CPU about 1/3 too. |
25)
Message boards :
Number crunching :
New Nvidia Driver 378.49 Causing Computation Errors
(Message 66275)
Posted 5 Apr 2017 by Wrend Post: Boinc does NOT benefit from using SLI, take it off if you are not a gamer, it's better to treat each gpu separately. Yes, I am aware of this and we've probably had this conversation several years ago now too back when I was running SLIed 680s. :) It's still good to mention it though for other users in general as it is a waste of VRAM capacity and could be an issue for people with insufficient VRAM. I was mentioning the specifics of my setup there in case it helps troubleshoot what the issues with these newer drivers are. The older drivers 376.33 are still running fine for me and crunching right along with the same settings. |
26)
Message boards :
Number crunching :
New Nvidia Driver 378.49 Causing Computation Errors
(Message 66257)
Posted 31 Mar 2017 by Wrend Post: I tried the newer driver 378.92 and it seemed to be running fine for at least an hour before all the work units started erroring out yet again. Definitely something unfortunate going on with these newer drivers. I'm using two Titan Black cards in SLI, the prefer maximum performance setting, 2x, 3x, and 4x DSR, double precision optimization, and the Nvidia patch to force enable PCIe gen 3 on an i7-3930K CPU. I'll be rolling the drivers back to 376.33 yet again and will let you know if I have any issues with it. |
27)
Message boards :
Number crunching :
New Nvidia Driver 378.49 Causing Computation Errors
(Message 66194)
Posted 15 Feb 2017 by Wrend Post: Sorry, it looks like I spoke too soon – and it's too late to edit my previous post. All the tasks are now instantly erroring out again for some reason with the newer drivers that came out today now too. I'm going to have to roll the driver back again. I'm assuming the work unites will run again fine with the older driver mentioned in the first post, but I will let you know if not. |
28)
Message boards :
Number crunching :
New Nvidia Driver 378.49 Causing Computation Errors
(Message 66193)
Posted 15 Feb 2017 by Wrend Post: I'm now using the newer drivers that came out today – 378.66 – and everything seems to be running fine with them. |
29)
Message boards :
Number crunching :
New Nvidia Driver 378.49 Causing Computation Errors
(Message 66185)
Posted 12 Feb 2017 by Wrend Post: Just a heads up that this driver is causing the MilkyWay@Home 1.43 (opencl_nvidia_101) work units to instantly fail – on my system at any rate. I went back to the 376.33 Driver and the work units are crunching fine again. Is anyone else having this issue? Thanks. |
30)
Message boards :
News :
Scheduled Maintenance Concluded
(Message 65926)
Posted 19 Nov 2016 by Wrend Post: Hi all, On that note, my PC is also Prime95 stable and error free, even while crunching for MW@H. It is kind of odd that WUs would get computation errors like this. i7-3930K @ 4.2GHZ |
31)
Message boards :
News :
Scheduled Maintenance Concluded
(Message 65920)
Posted 19 Nov 2016 by Wrend Post: I was just thinking that maybe, just maybe that some of these posts would be better suited in another thread. Something like the crunching area. Would help others if they have a problem with a GPU card or want to talk about performance of their system. To me it isn't news about the new bundling of work units or new updates being released. Then others can read the title of the thread and see what people are talking about instead of wading through every thread to find something. With the way the new WUs are bundled, it may have some unexpected impacts on performance and reliability. Likewise with being bundled, if a task fails, then the failure is more significant as it takes up to 4 additional tasks with it, depending on the bundled task's failure point. I get computational errors almost elusively on the new WUs when I first start crunching for them, running 12 at the same time, 6 on each card, then almost none at all once the tasks have dispersed their start and stop times. It's a little disconcerting. Nvidia cards that are DP optimized (such as my Titan Black cards) have to crunch in parallel like this if they're to be significantly loaded and utilized. With the issues my setup seems to be having, it seems like the new WUs favor AMD cards a bit more, since they don't need to crunch as many tasks in parallel to load up their GPUs. But yeah, in general, I suppose people could make posts in other threads in other sections of the forum as well, if they felt like it. On the plus side, at least this provides feedback and related discussion in one easily accessible location. If you only want to follow Jake's posts, try here. → http://milkyway.cs.rpi.edu/milkyway/forum_user_posts.php?userid=792007 Cheers. |
32)
Message boards :
News :
Scheduled Maintenance Concluded
(Message 65910)
Posted 18 Nov 2016 by Wrend Post: I updated my Nvidia drivers to the newest version and gave my PC a reboot for some maintenance. I often have it up for weeks or even sometimes months at a time. When resetting and restarting the project, it is kind of strange that so many WUs would all error out at the same time as each other, but once the work loads disperse and even out, errors become very rare. The top several errors in the picture below are all from the very first run after restarting the project. https://i.imgur.com/rKzVjcN.png |
33)
Message boards :
News :
Scheduled Maintenance Concluded
(Message 65888)
Posted 17 Nov 2016 by Wrend Post: The 1.43 work units seem to be running great now on my end (knock on wood). I haven't really changed anything from what I've previously tested recently. I did give the work units some time to settle and disperse, evening out CPU and GPU loads a bit more consistently as they complete at different times. I'm getting better and more consistent communication with the server now too. So... Who knows for sure the cause of these issues, but work units seem to be working well for me now at least. YMMV and best of luck! |
34)
Message boards :
News :
Scheduled Maintenance Concluded
(Message 65886)
Posted 17 Nov 2016 by Wrend Post: Yeah sucks about the 390 cards not being able to run more than 1 Einstein task. The AMD cards do a better job going through the double precision workloads without having to run multiple at the same time compared to the double precision optimized Nvidia cards, where for example, I have to run 6 MW@H 1.43 WU per GPU just to load them up, more if I want to keep them more fully loaded, which I don't. Here's my config file. (The CPU values are currently set to these since CPU usage is set to 67% in BOINC and I run 6 other CPU tasks from Einstein@Home on the 12 threads of my CPU.) C:\ProgramData\BOINC\projects\milkyway.cs.rpi.edu_milkyway\app_config.xml <app_config> You'll want to change these values to suit your own needs. |
35)
Message boards :
News :
Scheduled Maintenance Concluded
(Message 65882)
Posted 17 Nov 2016 by Wrend Post:
Yeah, proportionally in total toward being done. It's easy to see when individual tasks start and stops as it is now though, moving between loading the GPU and CPU, which has been helpful for me in troubleshooting, watching 12 WU progress bars, 12 CPU thread loads, and two GPU loads. Perhaps it's more useful to know at a glance where the whole bundle is at, I'm just not sure that it actually is though. Not up to me either way, of course. Just providing some feedback. |
36)
Message boards :
News :
Scheduled Maintenance Concluded
(Message 65880)
Posted 16 Nov 2016 by Wrend Post:
I agree that that aspect of it doesn't make much sense if you're tying to see at a glance when the whole WU will end. |
37)
Message boards :
News :
Scheduled Maintenance Concluded
(Message 65879)
Posted 16 Nov 2016 by Wrend Post:
Apologies. I guess I'm just being a little defensive from the frustration of trying to sort out the issues my PC seems to be having with these new bundled WUs. I do like them overall, though wish they behaved a little better for me. Best regards. |
38)
Message boards :
News :
Scheduled Maintenance Concluded
(Message 65873)
Posted 16 Nov 2016 by Wrend Post:
It doesn't exactly follow the rules of expectations, if that's what you mean, but I'm not sure that it necessarily should, given that it isn't actually just running one WU, but running several one at a time separately from each other. |
39)
Message boards :
News :
Scheduled Maintenance Concluded
(Message 65872)
Posted 16 Nov 2016 by Wrend Post: Hey Everyone, On that note, I've done a bunch of testing on my end, and am still not sure what the exact issue is. I've loaded up the CPU and GPUs much more than BOINC does doing these tests, Prime95 for a few hours and other GPU load and compute tests looking for any artifacts or similar. Everything seems to check out 100% so far. I haven't started crunching for MW@H again yet as I still want to do some more tests, but so far it was looking like the first several WUs would have computational errors as they would "finish" bundled tasks and change CPU and GPU loads at the same time as each other for a while. Once they start deviating from each other, they seem to stop erring out. I'm not wanting to give out bad work, waste server resources, nor anything similar. Unless there is truly a shortage of WU and it is determined that more people should get them as apposed to more capable computers which can turn them around faster, I'm not sure that discriminating against work per time should be a thing. Theoretically the good of the project would be to get the most amount of work done in the shortest amount of time overall. Discriminating against failed WU or similar does make more sense to me though as that might gunk up the works a bit, even though I might technically fall under this category for the time being. I can assure you that isn't my intent. Hopefully my comments in this thread have helped shed some light on these issues as well. Pointing fingers doesn't solve the problem. My computer has done a lot of good work for this project, and as mentioned, has been within the top 5 performing hosts in the not too distant past. Off the top of my head, I think it got up to 4th place, but had been within the top 10 for a few months. I hope to get it back up there again after I get these issues sorted out. Cheers. |
40)
Message boards :
News :
Scheduled Maintenance Concluded
(Message 65864)
Posted 16 Nov 2016 by Wrend Post:
To be fair, it's easier watching multiple things at once with the progress bar resetting. The only reason I see for changing this is for aesthetics, not serving any practical purpose that I'm aware of. Maybe to help avoid some confusion, though on the other hand it might also help illustrate how the WUs are running to people who aren't aware of them being bundled in this way. I understand some people prefer form over function though... ;) I'll make do either way. Cheers, guys. |
©2024 Astroinformatics Group