| log in |
Message boards : News : started a new nbody search: de_nbody_model1_1
| Author | Message |
|---|---|
|
The workunits should take much longer to complete. Let me know how they are doing here (and I suppose you can complain if the credit is too much/too little). This should hopefully fix the problem with the workunits terminating prematurely as well. | |
| ID: 42036 | Rating: 0 | rate:
| |
|
ummm...how much longer? | |
| ID: 42045 | Rating: 0 | rate:
| |
|
Most (about 70%) abort in the first second. | |
| ID: 42052 | Rating: 0 | rate:
| |
ummm...how much longer? The runtimes are probably going to vary pretty drastically depending on the input parameters. ____________ | |
| ID: 42056 | Rating: 0 | rate:
| |
|
These are indeed taking quite a bit longer. I have one that has been running for 25 hours and is just about completed and I have 8 cores running 90% utilized. | |
| ID: 42069 | Rating: 0 | rate:
| |
|
well for me the new workunits go hell faster | |
| ID: 42078 | Rating: 0 | rate:
| |
|
My workunits abort either immediately or after no more than 5 seconds. What`s going on? | |
| ID: 42083 | Rating: 0 | rate:
| |
|
I got my first workunits tonight for N-Body Simulation v0.04. The outcome was rather odd: Three workunits were from the de_nbody_test_10 series and they were all completed and validated. The five others were from the de_nbody_model1_1 series and they all crashed after one or two seconds. Looking on the wingmen I can not see any pattern. Sometimes they crash also on a wingman, sometimes they seem to finish without error. | |
| ID: 42085 | Rating: 0 | rate:
| |
|
Update: Five more workunits, all de_nbody_model1_1, and for a change all now completed, four of them already validated. Run time between 30 and 60 minutes. Still don't have a clue why some crash and others not. | |
| ID: 42088 | Rating: 0 | rate:
| |
Update: Five more workunits, all de_nbody_model1_1, and for a change all now completed, four of them already validated. Run time between 30 and 60 minutes. Still don't have a clue why some crash and others not. The Windows checkpointing is currently broken (it will always restart from the beginning), but I think I've fixed all the problems with it. There were some things I fixed a long time ago in the posix version of the checkpointing, which I apparently didn't also fix in the Win32 version, as well as a few windows specific problems. I think some of the problems are because I was using some temporary file flag when opening the checkpoint file on Windows, even though it shouldn't count as one for the Windows checkpointing. Also weird permission problem seem to sometimes happen on Windows 7. I think that some might end up sometimes crashing if it attempts to open the checkpoint after restarting with some permission related error. There's also the linking problem which causes it to crash on OS X 10.5, which I might have fixed (again), but I don't have a way to test on 10.5 so I'm not sure. I'll try to update the binaries sometime today. | |
| ID: 42093 | Rating: 0 | rate:
| |
|
Well, I had one yesterday that had run 20-some hours and showed 137 hours remaining! I aborted it. Another has been running about 8 hours and shows another 23 hours to go. Guess I'll leave that one alone and see what happens. I'm running a dual core 2.4 GHz AMD CPU. My GPU won't handle milkyway WWs. | |
| ID: 42096 | Rating: 0 | rate:
| |
|
Could someone please post a download-link for the actual version? | |
| ID: 42097 | Rating: 0 | rate:
| |
Update: Five more workunits, all de_nbody_model1_1, and for a change all now completed, four of them already validated. Run time between 30 and 60 minutes. Still don't have a clue why some crash and others not. ooh i'm running milkyway on MAC OS X 10.6.4 so maybe the sprint-times of nbody are cause of this instead of Windows? | |
| ID: 42100 | Rating: 0 | rate:
| |
Well, I had one yesterday that had run 20-some hours and showed 137 hours remaining!I also had one of these "model1" WU's self-abort with "maximum time exceeded" after 29.6 hours of processing time. I hope this doesn't become a habit. | |
| ID: 42101 | Rating: 0 | rate:
| |
Hello Matt, is this fix already included in the current version 0.04? Or will it be in the upcoming one? I currently have the longest running workunit up to now. 7 h run time were already done and approx. 8 h were still to go, when I had to close BOINC. After restart, it started again at 0 % progress, but run time started at the approx. 7 h were I stopped it before. So currently I am at 5.4 % again and the total run time has risen from 15 h to approx. 22 h now. So something is wrong with checkpointing, I guess. Regards Alex | |
| ID: 42102 | Rating: 0 | rate:
| |
|
Hi my latest one is a de_12 and has run 2 hours and is showing 9.259% done so this looks to be going to take over 200 hours it is due by 21/9 so I will need be running for 24 hours a day to get it done in time or should I abort it. | |
| ID: 42104 | Rating: 0 | rate:
| |
has run 2 hours and is showing 9.259% done Hi Paul, 10% in 2 hours should be 100% in 20 hours, right? So this should be fine. Brian has also reported here that the workunits will be terminated with "max. time exceeded" error at some point (should depend on the system on which they run), I guess that means they can not really run into the deadline of 8 days until you have a very slow system. | |
| ID: 42105 | Rating: 0 | rate:
| |
|
Hi it is nealy midnight in the UK so my maths have gone up the creek today | |
| ID: 42106 | Rating: 0 | rate:
| |
Hello Matt, is this fix already included in the current version 0.04? Or will it be in the upcoming one? The upcoming one. So currently I am at 5.4 % again and the total run time has risen from 15 h to approx. 22 h now. Also the run times vary widely with the parameters. In the worst possible case for 10,000 bodies, it took about 12.5 hours to run on my core 2 q6600 @3Ghz, 64 bit. I'm not sure about some of the other sizes. Edit: Remove comment about 64 bit version being faster. I'm not sure it's true anymore; it was last time I checked months ago. | |
| ID: 42107 | Rating: 0 | rate:
| |
Just for the records (because we now have moved to a new app version): the workunit mentioned above was finished this morning and is now validated. The stderr out has some interesting info about the checkpointing problem, excerpt:
Btw, claimed credit 495.43, granted credit 65.73 is a bit disappointing. Never mind. ;) | |
| ID: 42113 | Rating: 0 | rate:
| |
Are you seeing a lot of WUs with granted credit much lower than the claimed credit? I don't think that should be happening. ____________ | |
| ID: 42140 | Rating: 0 | rate:
| |
Are you seeing a lot of WUs with granted credit much lower than the claimed credit? No. None with such big differences. My guess was that in the above case it was caused somehow by the restart, the checkpoint bug and the ongoing count of the run time, making the total run time somewhat bigger than it actually was. | |
| ID: 42143 | Rating: 0 | rate:
| |
|
Is this new, drastically longer nbody search in preparation for new GPU apps, or are they just longer to keep less load on the server? | |
| ID: 42148 | Rating: 0 | rate:
| |
Is this new, drastically longer nbody search in preparation for new GPU apps, or are they just longer to keep less load on the server? They are longer because they are going to be actual work units. The 4096 ones have just been for testing the application and the actual search. More bodies are needed for enough resolution. The work unit times also vary drastically depending on the other parameters. For 10,000 bodies the worst case runs for around 12 hours, to only a few minutes in the best cases. Yeah, I'm being a intentionally aannoying about wanting a GPU app because quite frankly, I'd like to be putting my processing power towards something a bit more important than trying to prove/disprove a mathematical conjecture. That will happen eventually. It mostly depends on how much time I have after homework and classes this semester. The O(n log n) tree n-body will be somewhat tricky to get working on the GPU, while the basic O(n^2) one is pretty trivial and seems to be the most commonly used GPGPU example. I'm not sure how long it will take to get working. First I'm trying to get a working OpenCL version of the separation code, which is mostly done. We're also talking about doing the rough phases of the search with single precision which would allow more GPUs to work on it. | |
| ID: 42150 | Rating: 0 | rate:
| |
We're also talking about doing the rough phases of the search with single precision which would allow more GPUs to work on it. Would this be an application which does a part of the calculations on the (single precision) GPU and the other part on the CPU? A bit like the current Einstein CUDA application, using the GPU really as a coprocessor? Sounds interesting. | |
| ID: 42151 | Rating: 0 | rate:
| |
Would this be an application which does a part of the calculations on the (single precision) GPU and the other part on the CPU? A bit like the current Einstein CUDA application, using the GPU really as a coprocessor? Sounds interesting. No. I only know a little bit about the search; this is Travis' area. It would be more like double precision results would only be needed as the likelihoods get closer. The float result is significantly different from the double result, but still close enough to be sort of useful. Lots of float results could be used to do a rough search, and then as the fitnesses get closer, double results would be needed. | |
| ID: 42156 | Rating: 0 | rate:
| |
|
Hi | |
| ID: 42162 | Rating: 0 | rate:
| |
No. I only know a little bit about the search; this is Travis' area. It would be more like double precision results would only be needed as the likelihoods get closer. The float result is significantly different from the double result, but still close enough to be sort of useful. Lots of float results could be used to do a rough search, and then as the fitnesses get closer, double results would be needed. That idea was thrown around for the other applications too, but at the time it meant setting up a second project for the single precision work. If you can make it work for the nbody search, I wonder if the other searches can switch over to a similar system? | |
| ID: 42166 | Rating: 0 | rate:
| |
There are SETI WUs for GPUs that you could run -- assuming they ever get their air conditioning problems fixed so their servers can be put back on line. :-( ____________ | |
| ID: 42172 | Rating: 0 | rate:
| |
|
some wu's were not validated due to 'Checked, but no consensus yet' | |
| ID: 42180 | Rating: 0 | rate:
| |
some wu's were not validated due to 'Checked, but no consensus yet' <search_likelihood>-1662.3825647507408</search_likelihood> <search_application>milkywayathome nbody 0.04 Windows x86 double</search_application> <search_likelihood>-50430.548520685144</search_likelihood> <search_application>milkywayathome nbody 0.07 Windows x86 double</search_application> Something changed in the calculations? | |
| ID: 42182 | Rating: 0 | rate:
| |
|
my latest one took 26 hours so you can see it has hogged one of my cpus for a whole 24 hours and I just get credits of 213 again | |
| ID: 42197 | Rating: 0 | rate:
| |
|
Same here 90 hours for one and 75 for an other, | |
| ID: 42198 | Rating: 0 | rate:
| |
|
as an example I have just got 141 credits on another project for 7 hours so they a giving 20 credits an hour not 8 | |
| ID: 42199 | Rating: 0 | rate:
| |
|
boinc manager do not manage to download the apps for uts 64 b 0.06 No body, why do you think? | |
| ID: 42212 | Rating: 0 | rate:
| |
|
We're at v0.07 for Windows already, please see here: http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=1917 | |
| ID: 42213 | Rating: 0 | rate:
| |
my latest one took 26 hours so you can see it has hogged one of my cpus for a whole 24 hours and I just get credits of 213 again That's strange ... I just completed a WU that took 8.15 hrs total run time and 7.53 hrs CPU time on a dual core 2.4Ghz AMD running Windows XP 32 and was granted 213.76 points. ____________ | |
| ID: 42237 | Rating: 0 | rate:
| |
my latest one took 26 hours so you can see it has hogged one of my cpus for a whole 24 hours and I just get credits of 213 again Never mind ... that was NOT an n_body WU! (Engage brain before activating keyboard... :-) ). | |
| ID: 42238 | Rating: 0 | rate:
| |
Message boards :
News :
started a new nbody search: de_nbody_model1_1