Message boards :
News :
started a new nbody search: de_nbody_model1_1
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
The workunits should take much longer to complete. Let me know how they are doing here (and I suppose you can complain if the credit is too much/too little). This should hopefully fix the problem with the workunits terminating prematurely as well. |
Send message Joined: 28 Mar 09 Posts: 68 Credit: 1,003,982,681 RAC: 0 |
ummm...how much longer? I have a couple running. One after 4 hrs only 6% complete. Another after 4 hrs 15% complete. Another after 3 hrs 70% complete. |
Send message Joined: 15 Mar 10 Posts: 17 Credit: 1,221,936,867 RAC: 0 |
Most (about 70%) abort in the first second. On my two systems, they are taking 20-40 minutes, of the few that don't abort immediately. And I have had a couple of 'runaways', that completed less than 1% after 20-25 minutes, with an ever increasing estimated time of completion well over an hour. I aborted these manually |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
ummm...how much longer? The runtimes are probably going to vary pretty drastically depending on the input parameters. |
Send message Joined: 20 Feb 10 Posts: 3 Credit: 5,030,476 RAC: 0 |
These are indeed taking quite a bit longer. I have one that has been running for 25 hours and is just about completed and I have 8 cores running 90% utilized. Most are estimated at about 10-12 hours though. |
Send message Joined: 18 Jun 09 Posts: 35 Credit: 11,811,888 RAC: 0 |
well for me the new workunits go hell faster normal unit works for 15-18 hours and the nbody just TWO hours to do-isn't it wieerd? |
Send message Joined: 8 Jun 10 Posts: 2 Credit: 109,213 RAC: 0 |
My workunits abort either immediately or after no more than 5 seconds. What`s going on? |
Send message Joined: 17 Oct 08 Posts: 36 Credit: 411,744 RAC: 0 |
I got my first workunits tonight for N-Body Simulation v0.04. The outcome was rather odd: Three workunits were from the de_nbody_test_10 series and they were all completed and validated. The five others were from the de_nbody_model1_1 series and they all crashed after one or two seconds. Looking on the wingmen I can not see any pattern. Sometimes they crash also on a wingman, sometimes they seem to finish without error. Well, I try to get some more, maybe I catch a good one *g* ... Regards List of Error results |
Send message Joined: 17 Oct 08 Posts: 36 Credit: 411,744 RAC: 0 |
Update: Five more workunits, all de_nbody_model1_1, and for a change all now completed, four of them already validated. Run time between 30 and 60 minutes. Still don't have a clue why some crash and others not. |
Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0 |
Update: Five more workunits, all de_nbody_model1_1, and for a change all now completed, four of them already validated. Run time between 30 and 60 minutes. Still don't have a clue why some crash and others not. The Windows checkpointing is currently broken (it will always restart from the beginning), but I think I've fixed all the problems with it. There were some things I fixed a long time ago in the posix version of the checkpointing, which I apparently didn't also fix in the Win32 version, as well as a few windows specific problems. I think some of the problems are because I was using some temporary file flag when opening the checkpoint file on Windows, even though it shouldn't count as one for the Windows checkpointing. Also weird permission problem seem to sometimes happen on Windows 7. I think that some might end up sometimes crashing if it attempts to open the checkpoint after restarting with some permission related error. There's also the linking problem which causes it to crash on OS X 10.5, which I might have fixed (again), but I don't have a way to test on 10.5 so I'm not sure. I'll try to update the binaries sometime today. |
Send message Joined: 13 Feb 09 Posts: 51 Credit: 72,827,746 RAC: 2,413 |
Well, I had one yesterday that had run 20-some hours and showed 137 hours remaining! I aborted it. Another has been running about 8 hours and shows another 23 hours to go. Guess I'll leave that one alone and see what happens. I'm running a dual core 2.4 GHz AMD CPU. My GPU won't handle milkyway WWs. |
Send message Joined: 19 Feb 08 Posts: 350 Credit: 141,284,369 RAC: 0 |
Could someone please post a download-link for the actual version? THX Alexander |
Send message Joined: 18 Jun 09 Posts: 35 Credit: 11,811,888 RAC: 0 |
Update: Five more workunits, all de_nbody_model1_1, and for a change all now completed, four of them already validated. Run time between 30 and 60 minutes. Still don't have a clue why some crash and others not. ooh i'm running milkyway on MAC OS X 10.6.4 so maybe the sprint-times of nbody are cause of this instead of Windows? |
Send message Joined: 27 Nov 09 Posts: 108 Credit: 430,760,953 RAC: 0 |
Well, I had one yesterday that had run 20-some hours and showed 137 hours remaining!I also had one of these "model1" WU's self-abort with "maximum time exceeded" after 29.6 hours of processing time. I hope this doesn't become a habit. |
Send message Joined: 17 Oct 08 Posts: 36 Credit: 411,744 RAC: 0 |
Hello Matt, is this fix already included in the current version 0.04? Or will it be in the upcoming one? I currently have the longest running workunit up to now. 7 h run time were already done and approx. 8 h were still to go, when I had to close BOINC. After restart, it started again at 0 % progress, but run time started at the approx. 7 h were I stopped it before. So currently I am at 5.4 % again and the total run time has risen from 15 h to approx. 22 h now. So something is wrong with checkpointing, I guess. Regards Alex |
Send message Joined: 19 Feb 09 Posts: 29 Credit: 5,452,691 RAC: 0 |
Hi my latest one is a de_12 and has run 2 hours and is showing 9.259% done so this looks to be going to take over 200 hours it is due by 21/9 so I will need be running for 24 hours a day to get it done in time or should I abort it. regards Paul |
Send message Joined: 17 Oct 08 Posts: 36 Credit: 411,744 RAC: 0 |
has run 2 hours and is showing 9.259% done Hi Paul, 10% in 2 hours should be 100% in 20 hours, right? So this should be fine. Brian has also reported here that the workunits will be terminated with "max. time exceeded" error at some point (should depend on the system on which they run), I guess that means they can not really run into the deadline of 8 days until you have a very slow system. |
Send message Joined: 19 Feb 09 Posts: 29 Credit: 5,452,691 RAC: 0 |
Hi it is nealy midnight in the UK so my maths have gone up the creek today thanks |
Send message Joined: 8 May 10 Posts: 576 Credit: 15,979,383 RAC: 0 |
Hello Matt, is this fix already included in the current version 0.04? Or will it be in the upcoming one? The upcoming one. So currently I am at 5.4 % again and the total run time has risen from 15 h to approx. 22 h now. Also the run times vary widely with the parameters. In the worst possible case for 10,000 bodies, it took about 12.5 hours to run on my core 2 q6600 @3Ghz, 64 bit. I'm not sure about some of the other sizes. Edit: Remove comment about 64 bit version being faster. I'm not sure it's true anymore; it was last time I checked months ago. |
Send message Joined: 17 Oct 08 Posts: 36 Credit: 411,744 RAC: 0 |
Just for the records (because we now have moved to a new app version): the workunit mentioned above was finished this morning and is now validated. The stderr out has some interesting info about the checkpointing problem, excerpt:
Btw, claimed credit 495.43, granted credit 65.73 is a bit disappointing. Never mind. ;) |
©2024 Astroinformatics Group