Welcome to MilkyWay@home

New Nbody Run

Message boards : News : New Nbody Run
Message board moderation

To post messages, you must log in.

AuthorMessage
Colin Rice
Volunteer moderator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 5 Oct 11
Posts: 3
Credit: 485
RAC: 0
Message 53981 - Posted: 11 Apr 2012, 17:46:25 UTC
Last modified: 11 Apr 2012, 17:48:03 UTC

A new NBody run has been posted. We are trying to determine the initial dark matter distribution of some test data. There have been some improvements to our model which should hopefully result in faster convergence times. The run is titled nbody_100K_Plum_EMD. All previous Nbody runs have been taken down.
ID: 53981 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile John Black

Send message
Joined: 3 May 10
Posts: 74
Credit: 1,532,760
RAC: 0
Message 53983 - Posted: 11 Apr 2012, 19:49:16 UTC

Hi Colin,

If you look at the Number Crunching MB you will see that myself and greg_be have had calculation problems with some of these.

Let me know if these are just the odd glitch or if we have a systemic problem.

Thanks
John
ID: 53983 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Robert Gammon

Send message
Joined: 29 Nov 10
Posts: 4
Credit: 4,783,425
RAC: 0
Message 53985 - Posted: 11 Apr 2012, 21:48:42 UTC - in response to Message 53981.  

i have one of these now, All other runs on this machine take 2.x hours max to run. This one has a forecasted run time of 42 hours
ID: 53985 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile John Black

Send message
Joined: 3 May 10
Posts: 74
Credit: 1,532,760
RAC: 0
Message 53997 - Posted: 12 Apr 2012, 21:16:10 UTC - in response to Message 53985.  

Hi Robert,

I am not sure if your problem is the same as mine as my WUs dont take that long.

If you look at the Number Crunching MB in a thread started by greg_be then you will see the details of the error sequence.

John
ID: 53997 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
YukonMan
Avatar

Send message
Joined: 22 Mar 12
Posts: 1
Credit: 389,408
RAC: 0
Message 54087 - Posted: 21 Apr 2012, 2:11:17 UTC

I haven't received any new downloads for this project in days now? Any idea the problem, is it something I'm doing wrong?
____________
YukonMan.com
ID: 54087 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake on Space Coast

Send message
Joined: 18 Feb 12
Posts: 8
Credit: 10,392,722
RAC: 0
Message 54088 - Posted: 21 Apr 2012, 2:36:54 UTC
Last modified: 21 Apr 2012, 2:41:07 UTC

Your deadlines are 11 days out, apps with deadlines 4 days out aren't getting a single CPU cycle. If your 4 apps continues to hog all 4 CPUs on this machine, you'll find the MIPS I donate no longer available to you. Simple guys. Back it off and quickly. You have 16 elapsed hours from the time of this post.
ID: 54088 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
greg_be

Send message
Joined: 18 Aug 09
Posts: 122
Credit: 20,669,525
RAC: 7,418
Message 54098 - Posted: 21 Apr 2012, 20:15:10 UTC
Last modified: 21 Apr 2012, 20:18:21 UTC

Your new Nbody_110K_Plum_xxxxxxxx burns through my system in 1:33 (1 minute 33 seconds)
What kind of data are you running that disappears that quick?

Also why is it using all 4 cores of my cpu and running in high priority? Its not due until may 3 for crying out loud.

Once again I am disabling Nbody because it is just buggy when it comes to run on my CPU only system.
ID: 54098 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
floyd

Send message
Joined: 13 Sep 11
Posts: 17
Credit: 3,263,835
RAC: 0
Message 54099 - Posted: 22 Apr 2012, 1:39:57 UTC - in response to Message 54088.  

If your 4 apps continues to hog all 4 CPUs on this machine, you'll find the MIPS I donate no longer available to you. Simple guys. Back it off and quickly. You have 16 elapsed hours from the time of this post.


First of all, CALM DOWN!

I think this is caused by a bug that was introduced with the new dont_use_dcf command in Boinc 7.0.24. It makes the client assume that all tasks are about to miss the deadline, thus run them in high priority mode. This should affect all tasks of all projects that set dont_use_dcf.

By the way, if I understand correctly how dont_use_dcf works, one should try to avoid using an app_info.xml on such projects, bug or not.
ID: 54099 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
greg_be

Send message
Joined: 18 Aug 09
Posts: 122
Credit: 20,669,525
RAC: 7,418
Message 54102 - Posted: 22 Apr 2012, 6:33:02 UTC - in response to Message 54099.  

If your 4 apps continues to hog all 4 CPUs on this machine, you'll find the MIPS I donate no longer available to you. Simple guys. Back it off and quickly. You have 16 elapsed hours from the time of this post.


First of all, CALM DOWN!

I think this is caused by a bug that was introduced with the new dont_use_dcf command in Boinc 7.0.24. It makes the client assume that all tasks are about to miss the deadline, thus run them in high priority mode. This should affect all tasks of all projects that set dont_use_dcf.

By the way, if I understand correctly how dont_use_dcf works, one should try to avoid using an app_info.xml on such projects, bug or not.



For those of us not experienced in editing or removing sections of the Boinc Mgr program, how do you change whatever is causing this problem. On the other hand 3 other projects do not have this problem and the PS tasks also do not have this problem. So why do the Nbody tasks have this issue? Why not modify what ever the offending code is in the task rather than have us novices go monkying about in the programs behind boinc mgr?
ID: 54102 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
floyd

Send message
Joined: 13 Sep 11
Posts: 17
Credit: 3,263,835
RAC: 0
Message 54103 - Posted: 22 Apr 2012, 9:20:24 UTC - in response to Message 54102.  

For those of us not experienced in editing or removing sections of the Boinc Mgr program, how do you change whatever is causing this problem.

You can't. The problem is in your local Boinc installation and you'll have to recompile the code or find someone to do it for you. This is the patch I used:

diff -Naur boinc_core_release_7_0_25/client/cpu_sched.cpp boinc_core_release_7_0_25_patched/client/cpu_sched.cpp
--- boinc_core_release_7_0_25/client/cpu_sched.cpp      2012-04-14 10:02:12.000000000 +0200
+++ boinc_core_release_7_0_25_patched/client/cpu_sched.cpp      2012-04-22 02:14:30.000000000 +0200
@@ -470,7 +470,7 @@
 
         // treat projects with DCF>90 as if they had deadline misses
         //
-        if (!p->dont_use_dcf && p->duration_correction_factor < 90.0) {
+        if (p->dont_use_dcf || p->duration_correction_factor < 90.0) {
             if (p->rsc_pwf[rsc_type].deadlines_missed_copy <= 0) {
                 continue;
             }


This is part of the code where Boinc looks for tasks to run in high priority. Basically it means, if a project uses DCF (Duration Correction Factor) and the value is somewhat reasonable, consider to leave the task alone. Otherwise it's a candidate for special treatment. Unfortunately, if a project does not use DCF, the "leave it alone" part is never reached.

On the other hand 3 other projects do not have this problem and the PS tasks also do not have this problem. So why do the Nbody tasks have this issue?

If a project uses DCF, which I think most still do, you won't notice anything of this. Look in your client_state.xml file, if you find a line reading "<dont_use_dcf/>" the project that it belongs to should be affected. The whole project. And I say "should" because Milkyway is the only one for me, too, so I can't verify this. But the PS tasks do have the problem, that's why I looked into this in the first place. I noticed both of my GPUs running nothing but Milkyway for half a day.

Why not modify what ever the offending code is in the task rather than have us novices go monkying about in the programs behind boinc mgr?

The offending code is not in the task, nor is it with the project. It is at your end and you can't change it yourself. The project could probably work around it, but you would have to do that for every single project seperately. It's more efficient to fix the problem where it is.

I'll look over at the Boinc site for a contact address, if I find one I'll suggest to change this in the next release. But if somebody knows whom to inform, please feel free to point them here.
ID: 54103 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
greg_be

Send message
Joined: 18 Aug 09
Posts: 122
Credit: 20,669,525
RAC: 7,418
Message 54107 - Posted: 22 Apr 2012, 15:08:09 UTC - in response to Message 54103.  

Interesting that PS tasks go bad on your GPU while Nbody tasks go crazy on my CPU.
PS works great for me.

But I am curious how Nbody tasks that are out there now burn through the data so quick. I thought they were to run slower, since I thought they had a lot of data to go through.
ID: 54107 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
greg_be

Send message
Joined: 18 Aug 09
Posts: 122
Credit: 20,669,525
RAC: 7,418
Message 54108 - Posted: 22 Apr 2012, 15:11:07 UTC - in response to Message 54103.  

For those of us not experienced in editing or removing sections of the Boinc Mgr program, how do you change whatever is causing this problem.

You can't. The problem is in your local Boinc installation and you'll have to recompile the code or find someone to do it for you. This is the patch I used:

diff -Naur boinc_core_release_7_0_25/client/cpu_sched.cpp boinc_core_release_7_0_25_patched/client/cpu_sched.cpp
--- boinc_core_release_7_0_25/client/cpu_sched.cpp      2012-04-14 10:02:12.000000000 +0200
+++ boinc_core_release_7_0_25_patched/client/cpu_sched.cpp      2012-04-22 02:14:30.000000000 +0200
@@ -470,7 +470,7 @@
 
         // treat projects with DCF>90 as if they had deadline misses
         //
-        if (!p->dont_use_dcf && p->duration_correction_factor < 90.0) {
+        if (p->dont_use_dcf || p->duration_correction_factor < 90.0) {
             if (p->rsc_pwf[rsc_type].deadlines_missed_copy <= 0) {
                 continue;
             }


This is part of the code where Boinc looks for tasks to run in high priority. Basically it means, if a project uses DCF (Duration Correction Factor) and the value is somewhat reasonable, consider to leave the task alone. Otherwise it's a candidate for special treatment. Unfortunately, if a project does not use DCF, the "leave it alone" part is never reached.

On the other hand 3 other projects do not have this problem and the PS tasks also do not have this problem. So why do the Nbody tasks have this issue?

If a project uses DCF, which I think most still do, you won't notice anything of this. Look in your client_state.xml file, if you find a line reading "<dont_use_dcf/>" the project that it belongs to should be affected. The whole project. And I say "should" because Milkyway is the only one for me, too, so I can't verify this. But the PS tasks do have the problem, that's why I looked into this in the first place. I noticed both of my GPUs running nothing but Milkyway for half a day.

Why not modify what ever the offending code is in the task rather than have us novices go monkying about in the programs behind boinc mgr?

The offending code is not in the task, nor is it with the project. It is at your end and you can't change it yourself. The project could probably work around it, but you would have to do that for every single project seperately. It's more efficient to fix the problem where it is.

I'll look over at the Boinc site for a contact address, if I find one I'll suggest to change this in the next release. But if somebody knows whom to inform, please feel free to point them here.



Where is the file that this patch should be applied to? Only something for MWAH? or to a file in the BOINC Mgr?
ID: 54108 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
greg_be

Send message
Joined: 18 Aug 09
Posts: 122
Credit: 20,669,525
RAC: 7,418
Message 54122 - Posted: 24 Apr 2012, 7:10:03 UTC

reverted back to older version of boinc to solve the problem
ID: 54122 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Overtonesinger
Avatar

Send message
Joined: 15 Feb 10
Posts: 63
Credit: 1,836,010
RAC: 0
Message 54125 - Posted: 24 Apr 2012, 10:14:31 UTC - in response to Message 54122.  

reverted back to older version of boinc to solve the problem


oh, really??? I must TRY this solution!

I hate it when GPU tasks are *PAUSED* while CPU-ONLY app "N-Body" is running in high-priority !
This is DEFINITELY a bug. Yes, BOINC Manager bug, probably.
I will downgrade the BOINC immediatelly , NOW! :)

Thanx for this solution, greg_be !
Melwen - Child of the Fangorn Forest
Rig "BRISINGR" [ASUS G73-JH, i7 720QM 1.73, 4x2GB DDR3 1333 CL7, ATi HD5870M 1GB GDDR5],bought on 2011-02-24
ID: 54125 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Overtonesinger
Avatar

Send message
Joined: 15 Feb 10
Posts: 63
Credit: 1,836,010
RAC: 0
Message 54132 - Posted: 24 Apr 2012, 18:11:12 UTC - in response to Message 54125.  

reverted back to older version of boinc to solve the problem


oh, really??? I must TRY this solution!

I hate it when GPU tasks are *PAUSED* while CPU-ONLY app "N-Body" is running in high-priority !
This is DEFINITELY a bug. Yes, BOINC Manager bug, probably.
I will downgrade the BOINC immediatelly , NOW! :)

Thanx for this solution, greg_be !


--------------------
CONFIRMED - reverting back to BOINC ver. 6.12.34 solved the GPU issue!
(on BOINC ver. 7.xx, GPU tasks were NOT running while NBody goes crazy on all 8 CPUs, 1 minute 30 sec. run per WU... while guessing estimated: 3 hours).
- need to find the DCF for this app and change it manually to something reasonable on my mobile CPU Core i7 @ 1.73 :)
ID: 54132 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
floyd

Send message
Joined: 13 Sep 11
Posts: 17
Credit: 3,263,835
RAC: 0
Message 54133 - Posted: 24 Apr 2012, 18:34:13 UTC

As I wrote, the bug was introduced with a change in 7.0.24, so previous releases are safe from it. Of course older 7.0 versions can't be recommended, so a possible solution would be to switch back to BOINC 6 as you did. Now we all know that you can't simply downgrade from v7 to v6 (we have read the release notes, haven't we?) so here's instructions how to do it.

Meanwhile the fix has been added to the source code so it will be in the next BOINC release, whenever that may me. If you can wait for that, that's another possible solution.
ID: 54133 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
greg_be

Send message
Joined: 18 Aug 09
Posts: 122
Credit: 20,669,525
RAC: 7,418
Message 54149 - Posted: 25 Apr 2012, 15:27:00 UTC - in response to Message 54133.  

As I wrote, the bug was introduced with a change in 7.0.24, so previous releases are safe from it. Of course older 7.0 versions can't be recommended, so a possible solution would be to switch back to BOINC 6 as you did. Now we all know that you can't simply downgrade from v7 to v6 (we have read the release notes, haven't we?) so here's instructions how to do it.

Meanwhile the fix has been added to the source code so it will be in the next BOINC release, whenever that may me. If you can wait for that, that's another possible solution.



That is true, you do lose all your work from Vs 7 when you revert to 6. But then that is a price to pay to have control over your system again, unless you did what they recommended. I didn't let the work continue and just reverted back before all the tasks finished. Oh well. Someone else can have that headache for me. Happy with Vs. 6 now.
ID: 54149 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Colin Rice
Volunteer moderator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 5 Oct 11
Posts: 3
Credit: 485
RAC: 0
Message 54259 - Posted: 30 Apr 2012, 16:53:38 UTC

Sorry for not paying a lot of attention to this thread. I didn't realize people could comment on the news items.

John Black:
A Couple of these runs have been bad due to me messing up parameters etc... I test them once they get on the server and pull the run if I messed up and need to republish it.

Robert Gammon:
These runs are unpredictable. Its a very hard problem to predict the run times so we are very generous so they don't time out. We also are using 100,000 particles to deal with the chaotic motion.

YukonMan:
We had a nbody generation bug with the nbody runs. It should be fixed.
ID: 54259 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Overtonesinger
Avatar

Send message
Joined: 15 Feb 10
Posts: 63
Credit: 1,836,010
RAC: 0
Message 54419 - Posted: 15 May 2012, 13:25:27 UTC

Hello, Colin Rice. All is running fine here.

Running nbody_100K_Plum_EMD and suddenly: there is limit of only one task for my computer (2.2 GHz P4 HT, model E2200 with 2 logical cores).
Yesterday there was 6 tasks. Did You change something? :)

I need at least two, so the second will start immediatelly - not having to wait for the server to be contacted and respond, accept the completed task and finally send me a new one + waiting a few seconds for download of the new one. So, my computer cannot use 100 percent of free CPU cycles... some of them are wasted. Why it has to be limit 1 now? :(

P.S.
Finally, after several hundreds of units, estimated time seems to establish itself near the truth, now around 16 minutes while the real runtime is around 6 minutes. Thats good. :)
ID: 54419 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Overtonesinger
Avatar

Send message
Joined: 15 Feb 10
Posts: 63
Credit: 1,836,010
RAC: 0
Message 54420 - Posted: 15 May 2012, 13:36:38 UTC - in response to Message 54419.  

suddenly: there is limit of only one task for my computer

- the same happened on my core i7 (1.73 GHz, 8 threads) notebook several days ago, also when the ESTIMATED time finally did get near the truth: 1 minute (estimated are 3 minutes). I still have in setting that it shall download work for 1 day... same situation: I get from the server: not sending work, reached limit on tasks in progress for computer. Hmmm, interesting...

Does not seem to be BOINC bug. This is specific for Milkyway project or SERVER only. Can You see the logs what seems to be the cause, please? Cause it makes me wonder why the good estimated time causes a thing so bad!
Fascinating, isn't it?
I just cant get it. :)))
Overtonesinger
ID: 54420 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : News : New Nbody Run

©2024 Astroinformatics Group