Message boards :
Number crunching :
Out Of Work?
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
Yes, it got really bad again - I am not getting ANY work, nor CPU, nor GPU. Only N-body tasks are available, apparently. |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
Getting plenty of work now. I hope it's fixed for good. |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey Everyone, I did some work tuning the database yesterday to improve insert query times for the workunit generator after determining that query was the bottle neck in work unit generation. Seems to have vastly improve the work unit availability. If you guys are still running out of work units please let me know. Jake |
Send message Joined: 8 Apr 13 Posts: 89 Credit: 517,085,245 RAC: 0 |
Today it works great. WUs are properly supplied, so all my GPUs are fully utilized. |
Send message Joined: 19 Feb 08 Posts: 350 Credit: 141,284,369 RAC: 0 |
Hey Everyone, Hi Jake, THX for the work, looks good now. Pls let me remind you on the HD4850 issues, they do not get work since ~ 2 weeks, worked fine before. BOINC reports these cards as: 04.10.2016 21:35:06 | | OpenCL: AMD/ATI GPU 0: ATI Radeon HD 4700/4800 (RV740/RV770) (driver version CAL 1.4.1734, device version OpenCL 1.0 AMD-APP (937.2), 512MB, 480MB available, 2080 GFLOPS peak) |
Send message Joined: 21 Feb 14 Posts: 1 Credit: 16,850,808 RAC: 19 |
Update by myself: Stupid MS update that kicks out OpenCL. Reinstallation of driver helped me out. Now my GPU is getting work units again! --------------------- Still no work units provided for NVIDIA GPU. Or do I have problems with driver? 05.10.2016 20:35:26 | | CUDA: NVIDIA GPU 0: GeForce GT 630 (driver version 372.90, CUDA version 8.0, compute capability 3.0, 2048MB, 1708MB available, 336 GFLOPS peak) |
Send message Joined: 8 Apr 13 Posts: 89 Credit: 517,085,245 RAC: 0 |
It was working great, but now I'm sometimes getting: Milkyway@Home 06-Oct-16 9:10:45 Server can't open database That causes to client to back-off and run out of tasks again. Can you please check this? |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
Yes, it happens on my machines too: 06/10/2016 10:12:34 | Milkyway@Home | Requesting new tasks for CPU and AMD/ATI GPU 06/10/2016 10:12:45 | Milkyway@Home | Scheduler request completed: got 0 new tasks 06/10/2016 10:12:45 | Milkyway@Home | Server can't open database After that error, communication with server is deferred for 60 minutes. But when I force communication manually (even after only a minute), I get no errors and receive plenty of tasks. However, if the machine was unattended, the queue would get empty within 5 minutes and then GPUs would go idle for next 55 minutes (or contact secondary BOINC project, depending on the configuration). So a lot of computing cycles are lost on unattended machines due to this error. |
Send message Joined: 18 Jul 10 Posts: 76 Credit: 636,452,055 RAC: 26,116 |
Vortac has it 100% correct. I have the same. |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
Getting no new workunits now 06/10/2016 16:39:09 | Milkyway@Home | update requested by user 06/10/2016 16:39:13 | Milkyway@Home | Sending scheduler request: Requested by user. 06/10/2016 16:39:13 | Milkyway@Home | Requesting new tasks for AMD/ATI GPU 06/10/2016 16:39:36 | Milkyway@Home | Scheduler request completed: got 0 new tasks |
Send message Joined: 19 Feb 08 Posts: 350 Credit: 141,284,369 RAC: 0 |
Indeed, the behaviour changed in the last days. I've increased the workbuffer to 0,2 / 0,2, but that did not help. Might it be possible that the workbuffer is not estimated correctly due to running more than one wu at a time and longer runtimes are reported? 0,4 days should be some hundred wu's, this buffer was nerver filled up. Is it possible, that the sceduler calculates the number of required wu's by the last request of the account, not the machine? This could explain, why a mixed account with fast and slow machines does not work well. |
Send message Joined: 2 Jul 14 Posts: 15 Credit: 20,991,384 RAC: 0 |
Considering "Milkyway@home" only has 82 tasks ready to send right now, and I grab 60 tasks every time I request more work, I'd say the work generator is either not keeping up, or something else is wrong. Personally I think we're overloading the project. Trying to log-in just to post here gave me an SQL error saying there were too many connections, and that no account with my e-mail address existed. |
Send message Joined: 6 Oct 14 Posts: 46 Credit: 20,017,425 RAC: 0 |
I think after shutdown news from Poem and A@H stop GPU projects and also GPUgrid projects not support old and low level GPUs a lot of boinc user comes here for crunching GPU WUs. So system become overloading and servers can not manage this kind of requests. I think managers try to re-arrange the WU lengths and gets a little bit long WUs. MY old GPU finished 1 WU nearly 33 seconds. Maybe more complex wus with longer calculation times relaxes the servers especially SQL bottlenecks. I don't like but I need to change project priorities and start to crunch prime number, collatz and asteroids as secondary projects. Don't say SETI I give up crunching it from 1999. |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey Everyone, I am working on improving work unit generation. Its a matter of tuning the database to improve database insert query times on the work unit generator. I had it running really fast yesterday, but then there were some connection issues later in the day. I am working on it though. Jake |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
Feedback for today: varying between database errors, no tasks available and getting plenty of tasks. |
Send message Joined: 6 Oct 14 Posts: 46 Credit: 20,017,425 RAC: 0 |
Feedback for today: varying between database errors, no tasks available and getting plenty of tasks. same for me too. Getting some and one minute later again start to count 60 minutes. Boinc change the secondary projects and get more units then milkyway, so after 60 minutes boinc not get new wus from milky cause it has a lot of wus from other projects. @Jake Weiss You really think about WU length. Maybe merge 5-10 wu as a one WU. I know its need real programing problems but SQL getting relax. And when I look server statistics a lot of wu waiting for validation. Maybe because of validation queue sql cant responds. I hope you will find a solution soon. |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Rymorae, There was an idea a little over a year ago that would have done something similar to what you suggest, but I do not believe Travis ever ended up finishing the implementation. If I could simply merge work units I would, but like you said it would take a lot programming to get it working (and probably a restructuring of the database tables). I think this is a problem I can solve with tuning the database, it will just take a couple days of testing different settings. Jake |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
Feedback for today: smoothest sailing ever. Plenty of work all the time, no database errors. Even browsing this website feels snappier. Turned off my BOINC backup project and raised the clocks on my GPUs. Full steam ahead. |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
Feedback for today: couple of database errors. Forcing a manual update fetches plenty of work immediately. |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Vortac, Glad you're getting plenty of work units. The database errors seem to be when too many people are requesting work at the same time, it can't handle so many connections. I tried increasing the ability to handle more connections last week but it slowed down individual queries too much (not sure why though still looking into that). For now, I am leaving it running faster with the occasional error instead of terribly slow. The errors are nothing catastrophic, should just result in you having to wait a minute to refill on work units (which it seems to always have enough now). Thank you for your feedback. Jake |
©2024 Astroinformatics Group