Welcome to MilkyWay@home

Nbody 1.04

Message boards : News : Nbody 1.04
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

AuthorMessage
Miklos M

Send message
Joined: 29 Dec 11
Posts: 26
Credit: 1,456,736,094
RAC: 0
Message 56802 - Posted: 9 Jan 2013, 20:22:59 UTC - in response to Message 56801.  

Thank you Richard.
ID: 56802 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
M0CZY
Avatar

Send message
Joined: 26 Jun 09
Posts: 16
Credit: 357,054
RAC: 718
Message 56812 - Posted: 10 Jan 2013, 12:39:51 UTC

My single processor machine isn't getting any Nbody work, although there appears to be plenty available.
Are single processor computers without hyperthreading not supported by this application?
ID: 56812 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
cornishteddyboy

Send message
Joined: 29 Nov 10
Posts: 7
Credit: 17,351,897
RAC: 0
Message 56813 - Posted: 10 Jan 2013, 14:42:49 UTC

After watching my laptop spend 62 hours completing one of these new 300 odd hour projects without a problem, I saw as it was finishing after saying 100% complete, ready to upload, it suddenly comes up with computing error. Strange.
ID: 56813 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Timothy Dickey

Send message
Joined: 11 Jul 10
Posts: 1
Credit: 434,476
RAC: 5
Message 56816 - Posted: 11 Jan 2013, 3:43:04 UTC

I have reconnected a few of my computers back to receive n-body units. I was hoping to have them back working like in the past for one WU per multiple cores. I have received about 3 WU per computer but each WU is running per single core not multiple cores like in the past. I know that when you have a WU with 20 to 30 HRS on it and it finishes in 3 to 10 minutes on a multi-core CPU computer.

For now each WU will finish in about 10 to 12 hours if I am lucky.

My plan is to dedicate one computer to work on N-body units like they were before, but now I have to think about something else till any other bugs get worked out.

The computers I am testing range from AMD-64 to Intel - 32 and 64. Still all are n-body 1.04 versions and all are running one WU per core. All are -mt machines.

Good luck to all everyone. I will read some more to find out any other news about the project.
ID: 56816 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 56817 - Posted: 11 Jan 2013, 8:57:22 UTC

WU 292870082 is more like it. 229,870.48 seconds, but completed without error and the result was accepted as valid at the first attempt.
ID: 56817 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 56818 - Posted: 11 Jan 2013, 10:43:16 UTC - in response to Message 56816.  

I have reconnected a few of my computers back to receive n-body units. I was hoping to have them back working like in the past for one WU per multiple cores. I have received about 3 WU per computer but each WU is running per single core not multiple cores like in the past. I know that when you have a WU with 20 to 30 HRS on it and it finishes in 3 to 10 minutes on a multi-core CPU computer.

For now each WU will finish in about 10 to 12 hours if I am lucky.

My plan is to dedicate one computer to work on N-body units like they were before, but now I have to think about something else till any other bugs get worked out.

The computers I am testing range from AMD-64 to Intel - 32 and 64. Still all are n-body 1.04 versions and all are running one WU per core. All are -mt machines.

Good luck to all everyone. I will read some more to find out any other news about the project.


Actually, the tasks are still running multi-threaded but if all you are running is nBody it can be hard to see that. But one thing for certain is if there is one nBody task running for every core available on the machine, then there isn't going to be a very large runtime reduction on a per task basis compared to the way nBody used to run.

In my case, I have a mix of projects running and when a 1.04 nBody comes up to run I can see it grab CPU time from the other tasks running with it periodically, including any other standard MW CPU tasks. If more than one nBody is running the percentage of time grabbed from the others goes up some more. This effect is also evidenced by a decrease in the reported CPU efficiency for the other projects' tasks as reported by tools like BoincView, etc. when the nBody tasks are running.

Whether this has a significant effect on the overall performance of the host with regard to all projects running on it is hard to say at this point, since we don't have a lot of track record with it yet. That being said, my preference would be to have nBody get the machine exclusively when it's turn comes up in the run queue, and limit it to have only one of them run at a time so as to complete it as quickly as possible (the whole point of being MT in the first place, IMHO). My second choice would be if allowed to run with other tasks simultaneously, then still limit it to run one nBody at a time to minimize the impact on the other ones running with it.


ID: 56818 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 56820 - Posted: 11 Jan 2013, 11:29:01 UTC - in response to Message 56818.  

I have reconnected a few of my computers back to receive n-body units. I was hoping to have them back working like in the past for one WU per multiple cores. I have received about 3 WU per computer but each WU is running per single core not multiple cores like in the past. I know that when you have a WU with 20 to 30 HRS on it and it finishes in 3 to 10 minutes on a multi-core CPU computer.

For now each WU will finish in about 10 to 12 hours if I am lucky.

My plan is to dedicate one computer to work on N-body units like they were before, but now I have to think about something else till any other bugs get worked out.

The computers I am testing range from AMD-64 to Intel - 32 and 64. Still all are n-body 1.04 versions and all are running one WU per core. All are -mt machines.

Good luck to all everyone. I will read some more to find out any other news about the project.

Actually, the tasks are still running multi-threaded but if all you are running is nBody it can be hard to see that. But one thing for certain is if there is one nBody task running for every core available on the machine, then there isn't going to be a very large runtime reduction on a per task basis compared to the way nBody used to run.

In my case, I have a mix of projects running and when a 1.04 nBody comes up to run I can see it grab CPU time from the other tasks running with it periodically, including any other standard MW CPU tasks. If more than one nBody is running the percentage of time grabbed from the others goes up some more. This effect is also evidenced by a decrease in the reported CPU efficiency for the other projects' tasks as reported by tools like BoincView, etc. when the nBody tasks are running.

Whether this has a significant effect on the overall performance of the host with regard to all projects running on it is hard to say at this point, since we don't have a lot of track record with it yet. That being said, my preference would be to have nBody get the machine exclusively when it's turn comes up in the run queue, and limit it to have only one of them run at a time so as to complete it as quickly as possible (the whole point of being MT in the first place, IMHO). My second choice would be if allowed to run with other tasks simultaneously, then still limit it to run one nBody at a time to minimize the impact on the other ones running with it.

Hi Alinator - long time no see.

Check that one carefully. There are reports (maybe in NC) that the Linux app is running hell-for-leather on all available cores, although BOINC is scheduling it as a single-core app, and letting other tasks run alongside. That's a simple deployment problem between the app developer and the server administrator here - the admin hasn't put the MT plan_class in place yet.

I'm only running the Windows version, and that's definitely running single-threaded: check the OpenMP report in std_err, and that's what Windows task manager shows as well.
ID: 56820 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 56822 - Posted: 11 Jan 2013, 12:03:14 UTC - in response to Message 56820.  
Last modified: 11 Jan 2013, 12:16:45 UTC


Hi Alinator - long time no see.

Check that one carefully. There are reports (maybe in NC) that the Linux app is running hell-for-leather on all available cores, although BOINC is scheduling it as a single-core app, and letting other tasks run alongside. That's a simple deployment problem between the app developer and the server administrator here - the admin hasn't put the MT plan_class in place yet.

I'm only running the Windows version, and that's definitely running single-threaded: check the OpenMP report in std_err, and that's what Windows task manager shows as well.


Hi Richard,

Yeah, It's been a while... lot's of other stuff going on.

One thing I should have mentioned is the host I do most of my testing on is running XPP-64 and BOINC 6.12.34, so that might make a difference here.

I have been double checking my facts here just to make sure I'm seeing what I think I'm seeing.

As a further test, I suspended all the other tasks running on the host and then resumed just the nBody one. I don't generally use Windows Task Manager, but with Process Explorer the task restarted and immediately indicated that it was using 90+ percent of the CPU. A single threaded app never shows significantly above 25 percent for any significant amount of time even if run exclusively, as displayed with PE. Also, the CPU graphic display shows all four cores are being loaded fully. When I resumed all the other tasks, the machine's behavior returned to what I described in my previous post.

Therefore, I came to the conclusion the nBody app was in fact running MT, even if it's not all that apparent when the host is running in 'normal' mode.

One curious thing though, when I look at the stderr file for the nBody task I'm not seeing the info line pertaining to the specifics about what OpenMP intends to do with the task.

Al
ID: 56822 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 56825 - Posted: 11 Jan 2013, 13:19:52 UTC
Last modified: 11 Jan 2013, 13:27:04 UTC

Just a followup to my previous posts.

I have confirmed the reason I'm getting nBody to run MT at the moment is the host I'm using for testing is running XPP-64.

I do have a host running W7 and BOINC 7.0.28, and all indications are it is running the app in ST mode.

Apparently without the ktm32 DLL (Vista and higher only) the app will run MT regardless of other considerations.

So I guess the upside from this is that a lot of the problems seem to be related to the way Vista and higher are different from a OS security (and/or OpenMP problem) POV, rather than a design flaw in the MW app per se.
ID: 56825 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 56826 - Posted: 11 Jan 2013, 13:28:01 UTC - in response to Message 56825.  
Last modified: 11 Jan 2013, 13:30:30 UTC

Just a followup to my previous posts.

I have confirmed the reason I'm getting nBody to run MT at the moment is the host I'm using for testing is running XPP-64.

I do have a host running W7 and BOINC 7.0.28, and all indications are it is running the app in ST mode.

Apparently without the ktm32 DLL (Vista and higher only) the app will run MT regardless of other considerations.

That makes sense - I'm running Win7/64 and BOINC v7.0.42 (then - now .44), and I haven't (knowingly) got that DLL. So there's the difference - another one that will have to wait for an administrator to sort out the plan classes.

Edit - task on the above machine is showing single-threaded on every available measure - 12.5% of 8 cores, according to Process Explorer.
ID: 56826 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 56828 - Posted: 11 Jan 2013, 13:52:34 UTC - in response to Message 56826.  

<snip>
Edit - task on the above machine is showing single-threaded on every available measure - 12.5% of 8 cores, according to Process Explorer.


Well, there's some progress. At least now we know how to compare apples to oranges WRT these types of issues! ;-)

Another interesting thing about the CPU graphic display in PE is that if you hover over the various areas of a core display, you can actually see that the app is running MT from the data displayed in the balloon popup. Very Cool!
ID: 56828 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rjs5

Send message
Joined: 18 Nov 10
Posts: 18
Credit: 173,817,321
RAC: 66,016
Message 56829 - Posted: 11 Jan 2013, 14:10:09 UTC

I am running Win7 64-bit and Boinc 7.0.28 64-bit (computer 399149 ) and all the MilkyWay@Home N-Body Simulation v1.04 tasks error out in just a few seconds. I looked over this thread and was wondering if there is anything I need to fix to stop the error outs?

I tried removing and reattaching the project with no change in behavior. The output seems to be a short stderr message.

Thoughts?
thanks
rjs


Stderr output
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
- exit code -1073741515 (0xc0000135)
</message>
]]>



http://milkyway.cs.rpi.edu/milkyway/results.php?userid=135958&offset=0&show_names=0&state=5&appid=7
ID: 56829 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 56830 - Posted: 11 Jan 2013, 14:33:29 UTC - in response to Message 56829.  
Last modified: 11 Jan 2013, 14:35:32 UTC

I am running Win7 64-bit and Boinc 7.0.28 64-bit (computer 399149 ) and all the MilkyWay@Home N-Body Simulation v1.04 tasks error out in just a few seconds. I looked over this thread and was wondering if there is anything I need to fix to stop the error outs?

I tried removing and reattaching the project with no change in behavior. The output seems to be a short stderr message.

Thoughts?
thanks
rjs


Stderr output
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
- exit code -1073741515 (0xc0000135)
</message>
]]>



http://milkyway.cs.rpi.edu/milkyway/results.php?userid=135958&offset=0&show_names=0&state=5&appid=7


IIRC, the error code being generated is a missing dll one.

So assuming there isn't something really wonky with your W7 installation, my guess would be there is something else the machine is doing slowing down the initialization of the app enough for it to time out, and thus the app generates the dll error since that was what it was trying to load at that moment.

Possible causes for that could range from AV scanning, to Security/Permissions settings, to not being fully up to date with Windows. So you would have to do some in depth checking and/or testing to narrow down the potential candidates.

HTH
ID: 56830 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Bill

Send message
Joined: 25 Aug 11
Posts: 2
Credit: 250,474
RAC: 0
Message 56831 - Posted: 11 Jan 2013, 14:53:16 UTC

Hello, I run win7 64 bit and was sent a work unit for de_nbody_105 that has been running for 34:00:32 and is at 19.512% complete. The est run time is 2168 hrs. What do I do with this fella?? I'm not a computer wiz in any sense of the imagination, just crunchin' numbers for BOINC. Any suggestions??
ID: 56831 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 56832 - Posted: 11 Jan 2013, 15:05:01 UTC - in response to Message 56831.  
Last modified: 11 Jan 2013, 15:06:28 UTC

Hmmmm....

Well that's on pace to be the longest running one I've heard of yet (around 174 hours given the current data). ;-)

However, since this is a testing/debug run for the project team, I'd say let it continue if the progress indicator is still advancing.

You can always abort it later, if something new develops about really long running tasks.

Al
ID: 56832 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Ray Murray
Avatar

Send message
Joined: 8 Oct 07
Posts: 24
Credit: 111,325
RAC: 6
Message 56833 - Posted: 11 Jan 2013, 15:18:04 UTC
Last modified: 11 Jan 2013, 15:19:10 UTC

This long one, 112hrs, surprisingly didn't return an error even after being stopped a number of times, with machine restarts to allow various windows and driver updates. It does, however, show as validation inconclusive, which isn't uncommon. Just have to wait and see how GLNilsen fares with it.
ID: 56833 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rjs5

Send message
Joined: 18 Nov 10
Posts: 18
Credit: 173,817,321
RAC: 66,016
Message 56834 - Posted: 11 Jan 2013, 15:45:58 UTC - in response to Message 56830.  

I probably have a non-standard installation. I put the ProgramData directory on an SSD drive "K:\". Is it possible that the project makes an assumption that the program data is on the same drive as the binary?

C:\Program Files\BOINC
K:\ProgramData\BOINC\projects\milkyway.cs.rpi.edu_milkyway
ID: 56834 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rjs5

Send message
Joined: 18 Nov 10
Posts: 18
Credit: 173,817,321
RAC: 66,016
Message 56835 - Posted: 11 Jan 2013, 16:13:03 UTC - in response to Message 56834.  

deleted everything and reinstalled. same error.

is it possible to check system calls for error status after the first call so some better diagionstics could be returned? doesn't seem too hard.
ID: 56835 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 56836 - Posted: 11 Jan 2013, 16:54:36 UTC - in response to Message 56835.  
Last modified: 11 Jan 2013, 17:33:59 UTC

deleted everything and reinstalled. same error.

is it possible to check system calls for error status after the first call so some better diagionstics could be returned? doesn't seem too hard.


OK, I agree this sounds like you're going to have to do more in depth sleuthing to figure this one out.

FWIW, changing the default location of the main program and data directories typically doesn't cause problems. However it's been quite a while since I last made any project changes on the W7 host I have running as well as it being at a remote site. I do remember there were some initial problems getting BOINC to settle down and run properly, but as I recall it was related to going to a 7x version of BOINC and affected all the projects and not just MW.

So a couple of things to try would be to double check the program and folder permissions, just to rule that out. You could also give Sysinternals Process Monitor a try to get some more insight to what's going on when an nBody tries to start up.

Be advised though, it takes a little time to get used to what Process Monitor is trying to tell you as the logging is VERY comprehensive. The easiest way is to try it out on one of your hosts which is running properly, so you know ahead of time what to expect and what to look for on the one which is malfunctioning.

HTH
ID: 56836 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 56837 - Posted: 11 Jan 2013, 17:10:54 UTC - in response to Message 56833.  

This long one, 112hrs, surprisingly didn't return an error even after being stopped a number of times, with machine restarts to allow various windows and driver updates. It does, however, show as validation inconclusive, which isn't uncommon. Just have to wait and see how GLNilsen fares with it.


Agreed, this one is very interesting, especially since one of the restarts was due to a lost CC heartbeat exit (an 'abnormal' situation by definition). That implies the app had to cleanup after itself while it thought it was running as an 'orphaned' process, and still managed to restart and complete successfully.

The other interesting point is it was on 7.0.28 as well. I'm pretty sure Richard will find that interesting since it could indicate the problem is more BOINC related than MW specifically.
ID: 56837 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next

Message boards : News : Nbody 1.04

©2024 Astroinformatics Group