Welcome to MilkyWay@home

N-Body 1.18

Message boards : News : N-Body 1.18
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Jake Bauer
Project developer
Project tester
Project scientist

Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0
Message 58530 - Posted: 6 Jun 2013, 18:05:10 UTC

I am releasing what will hopefully be the final update to the N-body code. It will be released as 1.18. You will see 1.14 and 1.16, and these will be outdated by tonight. I apologize for everything that has been happening with all of these updates, but things needed to be fixed. We will try to address multithreading issues with the following update.

Jake
ID: 58530 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rjs5

Send message
Joined: 18 Nov 10
Posts: 18
Credit: 174,155,791
RAC: 46,252
Message 58542 - Posted: 7 Jun 2013, 18:53:12 UTC - in response to Message 58530.  

I have been wondering what happened to the multithreading operation on my machine. I thought my machine was configured incorrectly. Are their a lot of issues and has someone summarized them somewhere? If you have time, I would interested in knowing what the problems are.
thanks
ID: 58542 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Bauer
Project developer
Project tester
Project scientist

Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0
Message 58544 - Posted: 7 Jun 2013, 20:51:14 UTC - in response to Message 58542.  

I cannot explain what the cause of this problem is as I am still investigating, but many (so yes, it is widespread) users are having issues where N-body either:

A) does not multithread at all
B) does not multithread as much as it should

Since most of our development team is composed of students, many are not here right now. I am doing my best to address these concerns and will keep everyone updated.

Jake
ID: 58544 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Bauer
Project developer
Project tester
Project scientist

Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0
Message 58545 - Posted: 7 Jun 2013, 20:51:15 UTC - in response to Message 58542.  

I cannot explain what the cause of this problem is as I am still investigating, but many (so yes, it is widespread) users are having issues where N-body either:

A) does not multithread at all
B) does not multithread as much as it should

Since most of our development team is composed of students, many are not here right now. I am doing my best to address these concerns and will keep everyone updated.

Jake
ID: 58545 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 58550 - Posted: 7 Jun 2013, 22:37:48 UTC - in response to Message 58544.  

Jake,

Are you aware of the private message conversation which I had with Jeff Thompson on 2nd April this year? If not, please read my side of the conversation, which I re-posted in public at message 58339 a couple of weeks ago.

It was clear from the conversation that Jeff had been misdirected by his supervisors/superiors as to the nature of the BOINC 'plan_class' mechanism under which the application had been deployed on the MilkyWay BOINC server. Please read my explanation, think about it, and read it again. It's clear that no-body (pun intended) at MilkyWay understands how to integrate N-Body into the BOINC infrastructure.

While composing this post, I re-activated host 479865 - Windows 7/64, running the current BOINC client v7.0.64

It has already returned two v1.18 tasks, showing the expected (by me)

Using OpenMP 1 max threads on a system with 4 processors

This is because you have not defined the app_version on the server to be a member of a plan_class with 'mt' in the name, and passing an appropriate --nthreads <cmdline> tag.

If you would care to refer back to the Nobdy Release 1.02 thread (which you yourself closed with the words 'Don't worry! N-Body is getting a lot of attention.'), you will see that I posted details of the missing multi-threading commands last November, over 6 months ago.

I despair, I really do. What else, and how much else, can I do to point you in the right direction?

I cannot explain what the cause of this problem is as I am still investigating, but many (so yes, it is widespread) users are having issues where N-body either:

A) does not multithread at all
B) does not multithread as much as it should

Since most of our development team is composed of students, many are not here right now. I am doing my best to address these concerns and will keep everyone updated.

Jake
ID: 58550 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jeffery M. Thompson
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 23 Sep 12
Posts: 159
Credit: 16,977,106
RAC: 0
Message 58552 - Posted: 7 Jun 2013, 23:17:16 UTC

I think I see the issue with the plan classes
They were suppose to use the built in __mt plan class.
The binaries have this but not the apps itself.
So we pulled the gpu plan classes to avoid those issues but need to add the __mt directories.
Working on it now.
ID: 58552 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 58553 - Posted: 7 Jun 2013, 23:38:57 UTC - in response to Message 58552.  

Bed-time in the UK. I look forward to taking it for a whirl in the morning.
ID: 58553 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jeffery M. Thompson
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 23 Sep 12
Posts: 159
Credit: 16,977,106
RAC: 0
Message 58554 - Posted: 8 Jun 2013, 1:46:34 UTC

I have added the mt classes and they are showing in the application lists now.


ID: 58554 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 58556 - Posted: 8 Jun 2013, 7:03:18 UTC - in response to Message 58554.  

I tried allowing new work again, and got a nice variety of work - de_ and ps_, dark and nodark - but all of it was assigned as single-CPU, none from the new MT plan class. I'm not sure how the scheduler chooses which version to send when both are available - you may need to deprecate the non-MT versions to force a test, or there may be something in project configuration. I'll take a look.
ID: 58556 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Moonwrist

Send message
Joined: 15 Sep 11
Posts: 1
Credit: 2,047,878
RAC: 0
Message 58558 - Posted: 8 Jun 2013, 10:38:15 UTC

I'm not sure if this is a problem of N-Body or Boinc Tasks but anyway.
In the WU "time left" it says 354 days but if you count real time left from progress and elapsed time the WU should be completed in 6 hours.
ID: 58558 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 58561 - Posted: 8 Jun 2013, 12:17:40 UTC - in response to Message 58558.  

I'm not sure if this is a problem of N-Body or Boinc Tasks but anyway.
In the WU "time left" it says 354 days but if you count real time left from progress and elapsed time the WU should be completed in 6 hours.

It's a two-part problem.

1) there seems to be a very wide variation in the 'size' of the n-body tasks prepared by the server, and little (or no) correlation with the eventual running time. Here are the figures for the six tasks received by my test machine this morning:

-----------------------------------------------------------
Name: de_nbody_06_05_nodark_2_1369931835_143443
Size: 1,841,130,000,000
Time: 10368

Name: ps_nbody_06_06_dark_1370577207_50707
Size: 651,009,000,000
Time: 676

Name: de_nbody_06_06_nodark_3_1370577207_63977
Size: 2,929,460,000,000
Time: 502

Name: ps_nbody_06_06_nodark_3_1370577207_64060
Size: 2,034,840,000,000
Time: 678

Name: de_nbody_06_06_dark_1370577207_17850
Size: 59,430,100,000,000
Time: (18000+ - still running, 70% complete)

Name: de_nbody_100k_chisq_alt_40913_1366886102_615378
Size: 75,392,100,000,000,000
Time: (estimated 18,119:46:12 - slightly over two YEARS)
-----------------------------------------------------------

"Size" is the <rsc_fpops_est> value set by the server for the task, which is converted into a runtime estimate by your BOINC client (using its understanding - accurate or not - of your computer's speed).

"Time" is the final CPU time (in seconds) for completed tasks, or as noted.

2) The 'remaining time' estimation for a running task is calculated by the BOINC core client. (Boinc Tasks simply displays the values it's given - don't shoot the messenger)

For some reason I didn't quite catch at the time, the 'remaining' estimate was changed in v7.0.64: it is now weighted far more strongly by the initial estimate, and only gives a very small weight to the actual running time until the task nears completion. You'll see that 'time left' drop like a stone when the task passes 90%, 95%, 99% done.
ID: 58561 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rjs5

Send message
Joined: 18 Nov 10
Posts: 18
Credit: 174,155,791
RAC: 46,252
Message 58571 - Posted: 8 Jun 2013, 18:26:03 UTC - in response to Message 58561.  

Thanks Richard and Jeffery (I think it was you two who put mt back in working order)


Richard,

It looks like I inadvertently played your "straight man". It appears that your information has put "mt" back in play.

It was not automatic but I am now running (it appears) mt MilkyWay workloads with multiple CPU.

UPDATING caused MW mt workload to think it was running mt mode but only used one CPU.
A DETACH and ATTACH seemed to fix it.

The DETACH/ATTACH seemed to work for me. I am not suggesting that it be the general solution for everyone. I leave the general solution to those who know what is going on.

ID: 58571 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jeffery M. Thompson
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 23 Sep 12
Posts: 159
Credit: 16,977,106
RAC: 0
Message 58572 - Posted: 8 Jun 2013, 18:36:25 UTC

It was all Richard he did hand us everything we needed to get it done.
I messed it up in the 1.10 release it was a small mistake easy to fix but we had other bugs in the software to sort through at the same time with those sorted I had the ability to just look at my mistake and understand it on its own instead of with all the other bits.

I have updated our documentation spelling out how to use the default classes in our release instructions to make sure as people will come and go in an academic environment that we don't lose the steps in transitions between teams as we did last time.

I do believe there will be so smaller issues to shake out as we progress with this but I think the major misconfigurations are corrected. If there are other things happening let us know.


Jeff
ID: 58572 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 58573 - Posted: 8 Jun 2013, 18:59:46 UTC

Thanks. I doubt I'll be able to look at the MT solution tonight - my "two year" task is still plodding away on a single CPU - but as soon as BOINC will let me fetch new work, I'll grab a bundle and see what I can make of it. As you say, there are always some smaller issues left behind, but now we're on the same page it should be easier to get them sorted.

That two-year task, by the way: it's just reaching 40% at 6 hours, so it should finish in a total of 15 hours or so. But BOINC still thinks it will take a further 15,275 hours (that's BOINC's problem, not mine or yours).
ID: 58573 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rjs5

Send message
Joined: 18 Nov 10
Posts: 18
Credit: 174,155,791
RAC: 46,252
Message 58580 - Posted: 8 Jun 2013, 23:00:09 UTC - in response to Message 58572.  

I am still running but ran into what appears to be a scheduler problem.

I have an 8-core i7 Sandy Bridge and an EVGA GTX 650 Ti Nvidia GPU.

The CPU mt started up "running 8 cpus".
The mt GPU task started up and takes 0.417 cpu.


I see a pattern. Only one of the two will run under normally scheduling. They Ping-Pong back and forth where the 8-CPU mt version only runs during reload of the next GPU task and a short time following.


If I suspend all GPU, the 8cpu mt starts. If I resume GPU both run for a short period of time and then the 8cpu mt job suspends.


It appears that the CPU mt version wants ALL the CPU but the GPU starts and wants a CPU FRACTION with the total of the two is > total.




6/8/2013 3:07:48 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:07:50 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks
6/8/2013 3:07:50 PM | Milkyway@Home | Project has no tasks available
6/8/2013 3:15:28 PM | Milkyway@Home | Computation for task de_separation_79_DR8_rev_2_1370577207_678480_2 finished
6/8/2013 3:15:28 PM | Milkyway@Home | Starting task de_separation_79_DR8_rev_3_1370577207_1062567_0 using milkyway version 102 (opencl_nvidia) in slot 10
6/8/2013 3:15:31 PM | Milkyway@Home | Sending scheduler request: To fetch work.
6/8/2013 3:15:31 PM | Milkyway@Home | Reporting 1 completed tasks
6/8/2013 3:15:31 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:15:33 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks
6/8/2013 3:20:29 PM | Milkyway@Home | Restarting task de_nbody_06_06_dark_1370577207_85348_0 using milkyway_nbody version 118 (mt) in slot 9
6/8/2013 3:24:19 PM | Milkyway@Home | Computation for task de_separation_79_DR8_rev_3_1370577207_1062567_0 finished
6/8/2013 3:24:19 PM | Milkyway@Home | Starting task de_separation_79_DR8_rev_3_1370577207_1062563_0 using milkyway version 102 (opencl_nvidia) in slot 10
6/8/2013 3:24:24 PM | Milkyway@Home | Sending scheduler request: To fetch work.
6/8/2013 3:24:24 PM | Milkyway@Home | Reporting 1 completed tasks
6/8/2013 3:24:24 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:24:26 PM | Milkyway@Home | Scheduler request completed: got 2 new tasks
6/8/2013 3:25:31 PM | Milkyway@Home | Sending scheduler request: To fetch work.
6/8/2013 3:25:31 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:25:33 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks
6/8/2013 3:25:33 PM | Milkyway@Home | Project has no tasks available
6/8/2013 3:33:12 PM | Milkyway@Home | Computation for task de_separation_79_DR8_rev_3_1370577207_1062563_0 finished
6/8/2013 3:33:12 PM | Milkyway@Home | Starting task de_separation_79_DR8_rev_3_1370577207_677538_2 using milkyway version 102 (opencl_nvidia) in slot 10
6/8/2013 3:33:14 PM | Milkyway@Home | Sending scheduler request: To fetch work.
6/8/2013 3:33:14 PM | Milkyway@Home | Reporting 1 completed tasks
6/8/2013 3:33:14 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:33:16 PM | Milkyway@Home | Scheduler request completed: got 1 new tasks
6/8/2013 3:34:22 PM | Milkyway@Home | Sending scheduler request: To fetch work.
6/8/2013 3:34:22 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:34:24 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks
6/8/2013 3:34:24 PM | Milkyway@Home | Project has no tasks available
6/8/2013 3:41:13 PM | | Reading preferences override file
6/8/2013 3:41:13 PM | | Preferences:
6/8/2013 3:41:13 PM | | max memory usage when active: 8183.22MB
6/8/2013 3:41:13 PM | | max memory usage when idle: 14729.80MB
6/8/2013 3:41:13 PM | | max disk usage: 100.00GB
6/8/2013 3:41:13 PM | | don't use GPU while active
6/8/2013 3:41:13 PM | | suspend work if non-BOINC CPU load exceeds 25 %
6/8/2013 3:41:13 PM | | (to change preferences, visit a project web site or select Preferences in the Manager)
6/8/2013 3:41:28 PM | | Suspending GPU computation - user request
6/8/2013 3:41:28 PM | Milkyway@Home | Restarting task de_nbody_06_06_dark_1370577207_85348_0 using milkyway_nbody version 118 (mt) in slot 9
6/8/2013 3:41:37 PM | | Resuming GPU computation
6/8/2013 3:41:37 PM | Milkyway@Home | Restarting task de_separation_79_DR8_rev_3_1370577207_677538_2 using milkyway version 102 (opencl_nvidia) in slot 10
6/8/2013 3:42:24 PM | Milkyway@Home | Computation for task de_separation_79_DR8_rev_3_1370577207_677538_2 finished
6/8/2013 3:42:24 PM | Milkyway@Home | Starting task de_separation_79_DR8_rev_3_1370577207_1062568_0 using milkyway version 102 (opencl_nvidia) in slot 10
6/8/2013 3:42:26 PM | Milkyway@Home | Sending scheduler request: To fetch work.
6/8/2013 3:42:26 PM | Milkyway@Home | Reporting 1 completed tasks
6/8/2013 3:42:26 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:42:29 PM | Milkyway@Home | Scheduler request completed: got 1 new tasks
6/8/2013 3:43:34 PM | Milkyway@Home | Sending scheduler request: To fetch work.
6/8/2013 3:43:34 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:43:37 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks
6/8/2013 3:43:37 PM | Milkyway@Home | Project has no tasks available
6/8/2013 3:49:42 PM | Milkyway@Home | Sending scheduler request: To fetch work.
6/8/2013 3:49:42 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:49:44 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks
6/8/2013 3:49:44 PM | Milkyway@Home | Project has no tasks available
ID: 58580 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rjs5

Send message
Joined: 18 Nov 10
Posts: 18
Credit: 174,155,791
RAC: 46,252
Message 58581 - Posted: 8 Jun 2013, 23:07:29 UTC - in response to Message 58572.  

I was installing a new compiler and EVERYTHING was operating normally, I think.


Please ignore my last post.
ID: 58581 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jdzukley

Send message
Joined: 26 May 11
Posts: 32
Credit: 43,959,896
RAC: 794
Message 58593 - Posted: 9 Jun 2013, 13:43:03 UTC
Last modified: 9 Jun 2013, 14:02:42 UTC

FYI, Reference mt & nbody. I have a 12 core computer and the nodark uses all 12 cores just fine. However, I have yet to observe the mt dark tasks utilize more than 1 equivalent core even though the task has taken control of all 12 cores. I am basing these comments viewing the resource monitor. Dark never gets above 10% CPU utilized and this has been after view many tasks. Many of the "dark" cpu cores are marked "parked" on the resource monitor and not utilized.

Time to eat my words somewhat, finally got a dark that is utilizing all cores...
ID: 58593 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Bauer
Project developer
Project tester
Project scientist

Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0
Message 58594 - Posted: 9 Jun 2013, 16:07:50 UTC - in response to Message 58593.  

This is a problem. Are you positive it is only the dark work units? If so, I will take the search down.

Jake
ID: 58594 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jdzukley

Send message
Joined: 26 May 11
Posts: 32
Credit: 43,959,896
RAC: 794
Message 58595 - Posted: 9 Jun 2013, 17:00:35 UTC

Continued Observations: So far AS I HAVE OBSERVED only 1 dark mt job has utilized all 12 cores. All of the short jobs - estimated at less than 10 minutes have all have many cores "parked". The one dark mt job that used all cores had an estimated time in the 0'000 hours, and took say 45 minutes to run...
ID: 58595 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jdzukley

Send message
Joined: 26 May 11
Posts: 32
Credit: 43,959,896
RAC: 794
Message 58596 - Posted: 9 Jun 2013, 17:00:40 UTC

Continued Observations: So far AS I HAVE OBSERVED only 1 dark mt job has utilized all 12 cores. All of the short jobs - estimated at less than 10 minutes have all have many cores "parked". The one dark mt job that used all cores had an estimated time in the 0'000 hours, and took say 45 minutes to run...
ID: 58596 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : News : N-Body 1.18

©2024 Astroinformatics Group