rpi_logo
New Separation Runs
New Separation Runs
log in

Advanced search

Message boards : News : New Separation Runs

Author Message
Profile Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 511
Credit: 39,934,674
RAC: 60,750

Message 68015 - Posted: 16 Jan 2019, 15:48:44 UTC

Hey Everyone,

Just wanted you all to know I put up some new separation runs. These runs are back to fitting 3 streams and are bundled in groups of 5. The names of these runs are:

de_modfit_80_bundle5_3s_NoContraintsWithDisk200_1
de_modfit_81_bundle5_3s_NoContraintsWithDisk200_1
de_modfit_82_bundle5_3s_NoContraintsWithDisk200_1
de_modfit_83_bundle5_3s_NoContraintsWithDisk200_1
de_modfit_84_bundle5_3s_NoContraintsWithDisk200_1
de_modfit_85_bundle5_3s_NoContraintsWithDisk200_2
de_modfit_86_bundle5_3s_NoContraintsWithDisk200_1

If you have any trouble with these runs, please let me know.

Thank you all for your continued support.

Jake

Rantanplan
Send message
Joined: 19 Aug 11
Posts: 31
Credit: 25,146,681
RAC: 24,511

Message 68019 - Posted: 16 Jan 2019, 22:21:23 UTC - in response to Message 68015.

ok, i got screwed. Watch my results.

Profile Finn the Human
Avatar
Send message
Joined: 23 Dec 18
Posts: 6
Credit: 1,411,400
RAC: 26,520

Message 68020 - Posted: 17 Jan 2019, 6:33:58 UTC - in response to Message 68019.

Hmm. I see 16 errors to compute (all GPU WUs) on your computer equipped with the GTX 670 and half of them failed within 3 seconds of starting. I have yet to see any task failing on your RX560 and my GPUs. It might just be your GTX 670 acting up.
____________
"Will happen, happening happened, [and] will happen again and again 'cause you and I will always be back then." - BMO

Rantanplan
Send message
Joined: 19 Aug 11
Posts: 31
Credit: 25,146,681
RAC: 24,511

Message 68021 - Posted: 17 Jan 2019, 10:11:04 UTC - in response to Message 68020.
Last modified: 17 Jan 2019, 10:11:17 UTC

I changed the driver version to newest 417.71 release. That worked well.

Max_Pirx
Send message
Joined: 13 Dec 17
Posts: 7
Credit: 210,681,943
RAC: 696,312

Message 68023 - Posted: 17 Jan 2019, 11:51:07 UTC

Hello,

for a while I've been noticing that my GPUs make a pause between the individual tasks in the bundle. I'm running 6 Radeon HD5850/5870 on Win7. On some of the GPUs these pauses are longer (up to a minute sometimes), on other are shorter. But now with the new 5-bundles couple of my GPUs make enormous pauses at 20%, 40%, etc. and it more than doubles the completion time. Effectively, the GPUs are idling more time between the tasks in the bundle tnah actually computing, and the idle time is still counted as computation when calculating the credit.
Is there anything I can do to minimise that?

xtatuk
Send message
Joined: 16 Jul 18
Posts: 2
Credit: 1,611,723
RAC: 509

Message 68024 - Posted: 17 Jan 2019, 12:05:55 UTC

To much problem as a boinc user with Milkyway database, homepage and boinc, so I have stopped contributing completely. No answer to support questions about database problems from admin or news about database situation and when they are going to be solved at Message Boards.

Profile Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 511
Credit: 39,934,674
RAC: 60,750

Message 68025 - Posted: 17 Jan 2019, 15:09:09 UTC

Hi Max,

There are necessary book keeping tasks that have to be performed between workunit runs. These include, doing the final parts of our likelihood calculation on all of the stars (after the GPU computes our large integral), recording the results of the current run, cleaning up data from the current run, and formatting data for the next run. Unfortunately, this does take some time to complete. If it makes you feel any better, these same tasks would also be completed during a normal single workunit, so it shouldn't be any different in overall runtime for 5 single work units or 5 bundled work units. The reason you are seeing different run times for these final parts likely has to do with the different number of stars in different jobs, and not anything to do with your GPUs.

xtatuk,

I understand your frustration with the database, please understand that we are frustrated with it too. Unfortunately, it is something that is not easy to fix. We are working on making the error messages a little more graceful and improving caching of our data driven web pages. This should hopefully make our website run smoother when our database is under heavy load from work requests. Also it should allow our database to prioritize on handling workunits when it is under heavy load. I doubt this will solve the problem entirely, but slowly we are working to improve the stability of the database through incremental changes.

Thank you all for your continued support.

Jake

mmonnin
Send message
Joined: 2 Oct 16
Posts: 110
Credit: 100,001,105
RAC: 504,531

Message 68027 - Posted: 17 Jan 2019, 15:52:37 UTC - in response to Message 68023.

Hello,

for a while I've been noticing that my GPUs make a pause between the individual tasks in the bundle. I'm running 6 Radeon HD5850/5870 on Win7. On some of the GPUs these pauses are longer (up to a minute sometimes), on other are shorter. But now with the new 5-bundles couple of my GPUs make enormous pauses at 20%, 40%, etc. and it more than doubles the completion time. Effectively, the GPUs are idling more time between the tasks in the bundle tnah actually computing, and the idle time is still counted as computation when calculating the credit.
Is there anything I can do to minimise that?


Run multiple tasks at once to keep GPU load high. I run 4x on my 280x.

Max_Pirx
Send message
Joined: 13 Dec 17
Posts: 7
Credit: 210,681,943
RAC: 696,312

Message 68030 - Posted: 17 Jan 2019, 17:03:54 UTC

Thanks for the replies.
I am not particularity bothered about the pauses since the WUs are completing OK and without errors. I was rather curios if there is something wrong on my part, but if this is normal then that's OK.

And I am running 4 WUs concurrently per GPU, that helps but on couple of the GPUs the pauses are still noticeable.

Profile Corla99 [Lombardia]
Send message
Joined: 9 Jan 17
Posts: 1
Credit: 10,090,996
RAC: 18,133

Message 68031 - Posted: 17 Jan 2019, 18:46:27 UTC

I have 2 pc's active on this project, one with a Vega 56 and one with GTX 1080Ti.

Most of today's wu are in "validation inconclusive"
In total I have 167 with nvidia gpu and 266 with the vega.


I run only one wu per gpu and I don't have any dedicated core for the wu.
The other projects that run on those rigs are WCG (cpu) and WUProp (nci)

bluestang
Send message
Joined: 13 Oct 16
Posts: 58
Credit: 165,012,880
RAC: 386,206

Message 68032 - Posted: 17 Jan 2019, 19:52:17 UTC
Last modified: 17 Jan 2019, 19:53:57 UTC

@ Corla99 [Lombardia], "validation inconclusive" means nothing...they will validate eventually. Think of them as pending.


@Jake, I think these "bundle5_3s" are much better approach. Especially when running concurrent WUs. The "bundle6_2s" take longer and pay less than the "bundle4_4s" which makes no sense. So these seem about right as far as runtime and credit received.

As far as database/server issues...if you increase the number of allowed "In Progess" tasks then maybe that would solve a lot of issues for people running out of work. If the DB is causing WUs to not be issued/downloaded because BOINC can't connect for let's say 2 Hours, then allow the "In Progess" amount to be somewhere around 2 Hours of work. 80 WUs max is too little, I think 200 is a good amount to test with to start. No???

Thanks,
blue

Profile Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 511
Credit: 39,934,674
RAC: 60,750

Message 68033 - Posted: 17 Jan 2019, 21:44:30 UTC

Hey Bluestang,

The number of workunits we bundle is directly related to the number of streams we have in our model. More streams means a higher number of parameters to fit on the commandline and also a longer integral time because of a more complex model. These are scientifically motivated and sometimes we have to run those types of runs to get the science we need done. We try to compensate for the difference in computation time by changing the number of credits given for each. Unfortunately, our credit algorithm does not always give the perfect compensation for the differing computation times on all machines. Since we do not run these types of runs often, we have not worked on better refining our credit allocation for these types of runs.

Everyone,

I am about to try something to better micromanage GPU vs CPU workunits. Not sure if it will work. I will make a separate thread about this.

Jake

bluestang
Send message
Joined: 13 Oct 16
Posts: 58
Credit: 165,012,880
RAC: 386,206

Message 68037 - Posted: 18 Jan 2019, 1:26:12 UTC - in response to Message 68033.
Last modified: 18 Jan 2019, 1:56:54 UTC

Thank you for explaining more and trying to find a solution. Always nice to see project people listening and working to better the project and helping users/volunteers.

Cheers!

EDIT: Although,I'm not quite sure of the 600 max limit for GPUs. What if someone has a fast multi-GPU setup? Just asking :)

Dakota
Send message
Joined: 7 Jan 16
Posts: 2
Credit: 362,071
RAC: 2,483

Message 68073 - Posted: 27 Jan 2019, 10:00:36 UTC

I'm not sure, mine has become more CPU time consuming to complete a task on the GPU. Fermi is an old-fashioned architecture, but has a generally good figure for fp64.

bluestang
Send message
Joined: 13 Oct 16
Posts: 58
Credit: 165,012,880
RAC: 386,206

Message 68074 - Posted: 28 Jan 2019, 14:31:52 UTC - in response to Message 68073.

I'm not sure, mine has become more CPU time consuming to complete a task on the GPU. Fermi is an old-fashioned architecture, but has a generally good figure for fp64.


OpenCL on Nvidia needs more CPU than AMD if I'm not mistaken. Nvidia isn't optimized for OpenCL as they want people to use their proprietary CUDA instead.

Robby1959
Send message
Joined: 1 Feb 13
Posts: 46
Credit: 46,367,336
RAC: 112,833

Message 68141 - Posted: 12 Feb 2019, 2:55:14 UTC

are these runs way longer ?? I went from running 2 gpu units at 3 minutes to running hours and barly hitting the GPU

mmonnin
Send message
Joined: 2 Oct 16
Posts: 110
Credit: 100,001,105
RAC: 504,531

Message 68148 - Posted: 12 Feb 2019, 11:49:33 UTC - in response to Message 68141.

are these runs way longer ?? I went from running 2 gpu units at 3 minutes to running hours and barly hitting the GPU


Thats a driver crash, not a task change. Restart your PC.


Post to thread

Message boards : News : New Separation Runs


Main page · Your account · Message boards


Copyright © 2019 AstroInformatics Group