Message boards :
News :
Nbody 1.04
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · Next
Author | Message |
---|---|
Send message Joined: 29 Dec 11 Posts: 26 Credit: 1,462,456,201 RAC: 35,789 |
Thank you Richard. |
Send message Joined: 26 Jun 09 Posts: 18 Credit: 468,560 RAC: 916 |
My single processor machine isn't getting any Nbody work, although there appears to be plenty available. Are single processor computers without hyperthreading not supported by this application? |
Send message Joined: 29 Nov 10 Posts: 7 Credit: 17,351,897 RAC: 0 |
After watching my laptop spend 62 hours completing one of these new 300 odd hour projects without a problem, I saw as it was finishing after saying 100% complete, ready to upload, it suddenly comes up with computing error. Strange. |
Send message Joined: 11 Jul 10 Posts: 1 Credit: 435,987 RAC: 0 |
I have reconnected a few of my computers back to receive n-body units. I was hoping to have them back working like in the past for one WU per multiple cores. I have received about 3 WU per computer but each WU is running per single core not multiple cores like in the past. I know that when you have a WU with 20 to 30 HRS on it and it finishes in 3 to 10 minutes on a multi-core CPU computer. For now each WU will finish in about 10 to 12 hours if I am lucky. My plan is to dedicate one computer to work on N-body units like they were before, but now I have to think about something else till any other bugs get worked out. The computers I am testing range from AMD-64 to Intel - 32 and 64. Still all are n-body 1.04 versions and all are running one WU per core. All are -mt machines. Good luck to all everyone. I will read some more to find out any other news about the project. |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
WU 292870082 is more like it. 229,870.48 seconds, but completed without error and the result was accepted as valid at the first attempt. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
I have reconnected a few of my computers back to receive n-body units. I was hoping to have them back working like in the past for one WU per multiple cores. I have received about 3 WU per computer but each WU is running per single core not multiple cores like in the past. I know that when you have a WU with 20 to 30 HRS on it and it finishes in 3 to 10 minutes on a multi-core CPU computer. Actually, the tasks are still running multi-threaded but if all you are running is nBody it can be hard to see that. But one thing for certain is if there is one nBody task running for every core available on the machine, then there isn't going to be a very large runtime reduction on a per task basis compared to the way nBody used to run. In my case, I have a mix of projects running and when a 1.04 nBody comes up to run I can see it grab CPU time from the other tasks running with it periodically, including any other standard MW CPU tasks. If more than one nBody is running the percentage of time grabbed from the others goes up some more. This effect is also evidenced by a decrease in the reported CPU efficiency for the other projects' tasks as reported by tools like BoincView, etc. when the nBody tasks are running. Whether this has a significant effect on the overall performance of the host with regard to all projects running on it is hard to say at this point, since we don't have a lot of track record with it yet. That being said, my preference would be to have nBody get the machine exclusively when it's turn comes up in the run queue, and limit it to have only one of them run at a time so as to complete it as quickly as possible (the whole point of being MT in the first place, IMHO). My second choice would be if allowed to run with other tasks simultaneously, then still limit it to run one nBody at a time to minimize the impact on the other ones running with it. |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
I have reconnected a few of my computers back to receive n-body units. I was hoping to have them back working like in the past for one WU per multiple cores. I have received about 3 WU per computer but each WU is running per single core not multiple cores like in the past. I know that when you have a WU with 20 to 30 HRS on it and it finishes in 3 to 10 minutes on a multi-core CPU computer. Hi Alinator - long time no see. Check that one carefully. There are reports (maybe in NC) that the Linux app is running hell-for-leather on all available cores, although BOINC is scheduling it as a single-core app, and letting other tasks run alongside. That's a simple deployment problem between the app developer and the server administrator here - the admin hasn't put the MT plan_class in place yet. I'm only running the Windows version, and that's definitely running single-threaded: check the OpenMP report in std_err, and that's what Windows task manager shows as well. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
Hi Richard, Yeah, It's been a while... lot's of other stuff going on. One thing I should have mentioned is the host I do most of my testing on is running XPP-64 and BOINC 6.12.34, so that might make a difference here. I have been double checking my facts here just to make sure I'm seeing what I think I'm seeing. As a further test, I suspended all the other tasks running on the host and then resumed just the nBody one. I don't generally use Windows Task Manager, but with Process Explorer the task restarted and immediately indicated that it was using 90+ percent of the CPU. A single threaded app never shows significantly above 25 percent for any significant amount of time even if run exclusively, as displayed with PE. Also, the CPU graphic display shows all four cores are being loaded fully. When I resumed all the other tasks, the machine's behavior returned to what I described in my previous post. Therefore, I came to the conclusion the nBody app was in fact running MT, even if it's not all that apparent when the host is running in 'normal' mode. One curious thing though, when I look at the stderr file for the nBody task I'm not seeing the info line pertaining to the specifics about what OpenMP intends to do with the task. Al |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
Just a followup to my previous posts. I have confirmed the reason I'm getting nBody to run MT at the moment is the host I'm using for testing is running XPP-64. I do have a host running W7 and BOINC 7.0.28, and all indications are it is running the app in ST mode. Apparently without the ktm32 DLL (Vista and higher only) the app will run MT regardless of other considerations. So I guess the upside from this is that a lot of the problems seem to be related to the way Vista and higher are different from a OS security (and/or OpenMP problem) POV, rather than a design flaw in the MW app per se. |
Send message Joined: 4 Sep 12 Posts: 219 Credit: 456,474 RAC: 0 |
Just a followup to my previous posts. That makes sense - I'm running Win7/64 and BOINC v7.0.42 (then - now .44), and I haven't (knowingly) got that DLL. So there's the difference - another one that will have to wait for an administrator to sort out the plan classes. Edit - task on the above machine is showing single-threaded on every available measure - 12.5% of 8 cores, according to Process Explorer. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
<snip> Well, there's some progress. At least now we know how to compare apples to oranges WRT these types of issues! ;-) Another interesting thing about the CPU graphic display in PE is that if you hover over the various areas of a core display, you can actually see that the app is running MT from the data displayed in the balloon popup. Very Cool! |
Send message Joined: 18 Nov 10 Posts: 19 Credit: 180,970,856 RAC: 15,824 |
I am running Win7 64-bit and Boinc 7.0.28 64-bit (computer 399149 ) and all the MilkyWay@Home N-Body Simulation v1.04 tasks error out in just a few seconds. I looked over this thread and was wondering if there is anything I need to fix to stop the error outs? I tried removing and reattaching the project with no change in behavior. The output seems to be a short stderr message. Thoughts? thanks rjs Stderr output <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code -1073741515 (0xc0000135) </message> ]]> http://milkyway.cs.rpi.edu/milkyway/results.php?userid=135958&offset=0&show_names=0&state=5&appid=7 |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
I am running Win7 64-bit and Boinc 7.0.28 64-bit (computer 399149 ) and all the MilkyWay@Home N-Body Simulation v1.04 tasks error out in just a few seconds. I looked over this thread and was wondering if there is anything I need to fix to stop the error outs? IIRC, the error code being generated is a missing dll one. So assuming there isn't something really wonky with your W7 installation, my guess would be there is something else the machine is doing slowing down the initialization of the app enough for it to time out, and thus the app generates the dll error since that was what it was trying to load at that moment. Possible causes for that could range from AV scanning, to Security/Permissions settings, to not being fully up to date with Windows. So you would have to do some in depth checking and/or testing to narrow down the potential candidates. HTH |
Send message Joined: 25 Aug 11 Posts: 2 Credit: 250,474 RAC: 0 |
Hello, I run win7 64 bit and was sent a work unit for de_nbody_105 that has been running for 34:00:32 and is at 19.512% complete. The est run time is 2168 hrs. What do I do with this fella?? I'm not a computer wiz in any sense of the imagination, just crunchin' numbers for BOINC. Any suggestions?? |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
Hmmmm.... Well that's on pace to be the longest running one I've heard of yet (around 174 hours given the current data). ;-) However, since this is a testing/debug run for the project team, I'd say let it continue if the progress indicator is still advancing. You can always abort it later, if something new develops about really long running tasks. Al |
Send message Joined: 8 Oct 07 Posts: 24 Credit: 111,325 RAC: 0 |
This long one, 112hrs, surprisingly didn't return an error even after being stopped a number of times, with machine restarts to allow various windows and driver updates. It does, however, show as validation inconclusive, which isn't uncommon. Just have to wait and see how GLNilsen fares with it. |
Send message Joined: 18 Nov 10 Posts: 19 Credit: 180,970,856 RAC: 15,824 |
I probably have a non-standard installation. I put the ProgramData directory on an SSD drive "K:\". Is it possible that the project makes an assumption that the program data is on the same drive as the binary? C:\Program Files\BOINC K:\ProgramData\BOINC\projects\milkyway.cs.rpi.edu_milkyway |
Send message Joined: 18 Nov 10 Posts: 19 Credit: 180,970,856 RAC: 15,824 |
deleted everything and reinstalled. same error. is it possible to check system calls for error status after the first call so some better diagionstics could be returned? doesn't seem too hard. |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
deleted everything and reinstalled. same error. OK, I agree this sounds like you're going to have to do more in depth sleuthing to figure this one out. FWIW, changing the default location of the main program and data directories typically doesn't cause problems. However it's been quite a while since I last made any project changes on the W7 host I have running as well as it being at a remote site. I do remember there were some initial problems getting BOINC to settle down and run properly, but as I recall it was related to going to a 7x version of BOINC and affected all the projects and not just MW. So a couple of things to try would be to double check the program and folder permissions, just to rule that out. You could also give Sysinternals Process Monitor a try to get some more insight to what's going on when an nBody tries to start up. Be advised though, it takes a little time to get used to what Process Monitor is trying to tell you as the logging is VERY comprehensive. The easiest way is to try it out on one of your hosts which is running properly, so you know ahead of time what to expect and what to look for on the one which is malfunctioning. HTH |
Send message Joined: 7 Jun 08 Posts: 464 Credit: 56,639,936 RAC: 0 |
This long one, 112hrs, surprisingly didn't return an error even after being stopped a number of times, with machine restarts to allow various windows and driver updates. It does, however, show as validation inconclusive, which isn't uncommon. Just have to wait and see how GLNilsen fares with it. Agreed, this one is very interesting, especially since one of the restarts was due to a lost CC heartbeat exit (an 'abnormal' situation by definition). That implies the app had to cleanup after itself while it thought it was running as an 'orphaned' process, and still managed to restart and complete successfully. The other interesting point is it was on 7.0.28 as well. I'm pretty sure Richard will find that interesting since it could indicate the problem is more BOINC related than MW specifically. |
©2024 Astroinformatics Group