Welcome to MilkyWay@home

Out Of Work?

Message boards : Number crunching : Out Of Work?
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 65369 - Posted: 2 Oct 2016, 18:52:00 UTC
Last modified: 2 Oct 2016, 19:02:43 UTC

Yes, it got really bad again - I am not getting ANY work, nor CPU, nor GPU. Only N-body tasks are available, apparently.
ID: 65369 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 65377 - Posted: 4 Oct 2016, 13:33:51 UTC
Last modified: 4 Oct 2016, 13:34:15 UTC

Getting plenty of work now. I hope it's fixed for good.
ID: 65377 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65378 - Posted: 4 Oct 2016, 15:42:25 UTC

Hey Everyone,

I did some work tuning the database yesterday to improve insert query times for the workunit generator after determining that query was the bottle neck in work unit generation. Seems to have vastly improve the work unit availability. If you guys are still running out of work units please let me know.

Jake
ID: 65378 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Mumak
Avatar

Send message
Joined: 8 Apr 13
Posts: 89
Credit: 517,085,245
RAC: 0
Message 65382 - Posted: 4 Oct 2016, 17:57:27 UTC

Today it works great. WUs are properly supplied, so all my GPUs are fully utilized.
ID: 65382 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Werkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 141,284,369
RAC: 0
Message 65383 - Posted: 4 Oct 2016, 19:41:35 UTC - in response to Message 65378.  

Hey Everyone,

I did some work tuning the database yesterday to improve insert query times for the workunit generator after determining that query was the bottle neck in work unit generation. Seems to have vastly improve the work unit availability. If you guys are still running out of work units please let me know.

Jake


Hi Jake,

THX for the work, looks good now.
Pls let me remind you on the HD4850 issues, they do not get work since ~ 2 weeks, worked fine before.
BOINC reports these cards as:
04.10.2016 21:35:06 | | OpenCL: AMD/ATI GPU 0: ATI Radeon HD 4700/4800 (RV740/RV770) (driver version CAL 1.4.1734, device version OpenCL 1.0 AMD-APP (937.2), 512MB, 480MB available, 2080 GFLOPS peak)
ID: 65383 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Peter Kistler

Send message
Joined: 21 Feb 14
Posts: 1
Credit: 16,269,486
RAC: 2,796
Message 65388 - Posted: 5 Oct 2016, 18:31:08 UTC
Last modified: 5 Oct 2016, 19:16:49 UTC

Update by myself: Stupid MS update that kicks out OpenCL. Reinstallation of driver helped me out. Now my GPU is getting work units again!
---------------------

Still no work units provided for NVIDIA GPU.

Or do I have problems with driver?
05.10.2016 20:35:26 | | CUDA: NVIDIA GPU 0: GeForce GT 630 (driver version 372.90, CUDA version 8.0, compute capability 3.0, 2048MB, 1708MB available, 336 GFLOPS peak)
ID: 65388 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Mumak
Avatar

Send message
Joined: 8 Apr 13
Posts: 89
Credit: 517,085,245
RAC: 0
Message 65394 - Posted: 6 Oct 2016, 7:34:33 UTC

It was working great, but now I'm sometimes getting:
Milkyway@Home 06-Oct-16 9:10:45 Server can't open database


That causes to client to back-off and run out of tasks again.
Can you please check this?
ID: 65394 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 65395 - Posted: 6 Oct 2016, 8:27:50 UTC

Yes, it happens on my machines too:

06/10/2016 10:12:34 | Milkyway@Home | Requesting new tasks for CPU and AMD/ATI GPU
06/10/2016 10:12:45 | Milkyway@Home | Scheduler request completed: got 0 new tasks
06/10/2016 10:12:45 | Milkyway@Home | Server can't open database

After that error, communication with server is deferred for 60 minutes. But when I force communication manually (even after only a minute), I get no errors and receive plenty of tasks. However, if the machine was unattended, the queue would get empty within 5 minutes and then GPUs would go idle for next 55 minutes (or contact secondary BOINC project, depending on the configuration). So a lot of computing cycles are lost on unattended machines due to this error.
ID: 65395 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wb8ili

Send message
Joined: 18 Jul 10
Posts: 76
Credit: 635,998,708
RAC: 0
Message 65396 - Posted: 6 Oct 2016, 14:12:55 UTC

Vortac has it 100% correct. I have the same.
ID: 65396 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 65397 - Posted: 6 Oct 2016, 14:40:45 UTC
Last modified: 6 Oct 2016, 14:51:06 UTC

Getting no new workunits now

06/10/2016 16:39:09 | Milkyway@Home | update requested by user
06/10/2016 16:39:13 | Milkyway@Home | Sending scheduler request: Requested by user.
06/10/2016 16:39:13 | Milkyway@Home | Requesting new tasks for AMD/ATI GPU
06/10/2016 16:39:36 | Milkyway@Home | Scheduler request completed: got 0 new tasks
ID: 65397 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Werkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 141,284,369
RAC: 0
Message 65398 - Posted: 6 Oct 2016, 16:01:00 UTC

Indeed, the behaviour changed in the last days.
I've increased the workbuffer to 0,2 / 0,2, but that did not help.

Might it be possible that the workbuffer is not estimated correctly due to running more than one wu at a time and longer runtimes are reported?
0,4 days should be some hundred wu's, this buffer was nerver filled up. Is it possible, that the sceduler calculates the number of required wu's by the last request of the account, not the machine? This could explain, why a mixed account with fast and slow machines does not work well.
ID: 65398 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile SuperSluether
Avatar

Send message
Joined: 2 Jul 14
Posts: 15
Credit: 20,991,384
RAC: 46
Message 65399 - Posted: 6 Oct 2016, 16:05:25 UTC - in response to Message 65398.  

Considering "Milkyway@home" only has 82 tasks ready to send right now, and I grab 60 tasks every time I request more work, I'd say the work generator is either not keeping up, or something else is wrong.

Personally I think we're overloading the project. Trying to log-in just to post here gave me an SQL error saying there were too many connections, and that no account with my e-mail address existed.
ID: 65399 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rymorea

Send message
Joined: 6 Oct 14
Posts: 46
Credit: 20,017,425
RAC: 0
Message 65400 - Posted: 6 Oct 2016, 18:01:52 UTC
Last modified: 6 Oct 2016, 18:08:30 UTC

I think after shutdown news from Poem and A@H stop GPU projects and also GPUgrid projects not support old and low level GPUs a lot of boinc user comes here for crunching GPU WUs. So system become overloading and servers can not manage this kind of requests.

I think managers try to re-arrange the WU lengths and gets a little bit long WUs. MY old GPU finished 1 WU nearly 33 seconds. Maybe more complex wus with longer calculation times relaxes the servers especially SQL bottlenecks.

I don't like but I need to change project priorities and start to crunch prime number, collatz and asteroids as secondary projects. Don't say SETI I give up crunching it from 1999.
ID: 65400 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65403 - Posted: 6 Oct 2016, 19:20:24 UTC

Hey Everyone,

I am working on improving work unit generation. Its a matter of tuning the database to improve database insert query times on the work unit generator. I had it running really fast yesterday, but then there were some connection issues later in the day. I am working on it though.

Jake
ID: 65403 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 65404 - Posted: 7 Oct 2016, 15:41:28 UTC

Feedback for today: varying between database errors, no tasks available and getting plenty of tasks.
ID: 65404 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rymorea

Send message
Joined: 6 Oct 14
Posts: 46
Credit: 20,017,425
RAC: 0
Message 65405 - Posted: 7 Oct 2016, 15:56:15 UTC - in response to Message 65404.  
Last modified: 7 Oct 2016, 16:05:25 UTC

Feedback for today: varying between database errors, no tasks available and getting plenty of tasks.


same for me too. Getting some and one minute later again start to count 60 minutes. Boinc change the secondary projects and get more units then milkyway, so after 60 minutes boinc not get new wus from milky cause it has a lot of wus from other projects.

@Jake Weiss
You really think about WU length. Maybe merge 5-10 wu as a one WU. I know its need real programing problems but SQL getting relax. And when I look server statistics a lot of wu waiting for validation. Maybe because of validation queue sql cant responds.

I hope you will find a solution soon.
ID: 65405 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65406 - Posted: 7 Oct 2016, 16:28:46 UTC

Rymorae,

There was an idea a little over a year ago that would have done something similar to what you suggest, but I do not believe Travis ever ended up finishing the implementation. If I could simply merge work units I would, but like you said it would take a lot programming to get it working (and probably a restructuring of the database tables). I think this is a problem I can solve with tuning the database, it will just take a couple days of testing different settings.

Jake
ID: 65406 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 65407 - Posted: 8 Oct 2016, 14:56:02 UTC

Feedback for today: smoothest sailing ever. Plenty of work all the time, no database errors. Even browsing this website feels snappier. Turned off my BOINC backup project and raised the clocks on my GPUs. Full steam ahead.
ID: 65407 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 65411 - Posted: 9 Oct 2016, 15:53:01 UTC

Feedback for today: couple of database errors. Forcing a manual update fetches plenty of work immediately.
ID: 65411 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65412 - Posted: 10 Oct 2016, 4:03:00 UTC

Vortac,

Glad you're getting plenty of work units. The database errors seem to be when too many people are requesting work at the same time, it can't handle so many connections. I tried increasing the ability to handle more connections last week but it slowed down individual queries too much (not sure why though still looking into that). For now, I am leaving it running faster with the occasional error instead of terribly slow. The errors are nothing catastrophic, should just result in you having to wait a minute to refill on work units (which it seems to always have enough now).

Thank you for your feedback.

Jake
ID: 65412 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Out Of Work?

©2024 Astroinformatics Group