New Nbody Version 1.50
log in

Advanced search

Message boards : News : New Nbody Version 1.50

Previous · 1 · 2
Author Message
Profile Death
Avatar
Send message
Joined: 8 Oct 08
Posts: 4
Credit: 125,962
RAC: 0

Message 63589 - Posted: 15 May 2015, 16:29:08 UTC

fix this

____________
====
wbr, Me. Dead J. Dona
Jodis | Search Any Info

Jacob Klein
Send message
Joined: 22 Jun 11
Posts: 32
Credit: 2,448,134
RAC: 7,070

Message 63590 - Posted: 15 May 2015, 17:32:47 UTC - in response to Message 63589.

Hey Death,
Not sure if you know this, but you are in the News message board, in an application release thread. If you have a problem that is not related, please post it elsewhere, like maybe the Number Crunching message board maybe.

Thanks,
Jacob

Richard Haselgrove
Send message
Joined: 4 Sep 12
Posts: 218
Credit: 448,778
RAC: 0

Message 63591 - Posted: 15 May 2015, 18:29:45 UTC - in response to Message 63580.

I'm also noticing the same behavior, on:
ps_nbody_5_12_15_orphan_sim_1_1431361804_28199_0
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1117230409

Admins:
1: Is the behavior (reserving multiple cores, despite using only 1) ... expected behavior?
2: Is the behavior of going to 100%, then still running for hours/days after that ... expected behavior?
3: Will the task ever end?


It would help tremendously, if you could very thoroughly describe the expected behavior for these work units. People are aborting them, because the tasks look odd/broken, and if they're not broken, you need to do a better job of communicating your expectations.

Thanks in advance for your reply,
Jacob

I second Jacob's call for an 'expected' (developer's viewpoint) description of the runtime profile, in terms of CPU usage over time.

But I would also urge users to monitor this new application with additional tools, not just BOINC Manager.

I've just got back home after a few days away, and I won't even attempt to run one of these tasks (it will be under Windows) until later in the weekend. But what I've just read from several different users is a perfect description of what BOINC v7.4.xx is designed to display when no actual work at all is being reported by the science application. That might be because of an error in the progress reporting or checkpointing functions, or it might mean that nothing is being done. That gradual approach, getting closer and closer to 99.999999% done, but never quite reaching 100%, is what exactly what you should see if an application stalls at startup and goes nowhere.

Jacob Klein
Send message
Joined: 22 Jun 11
Posts: 32
Credit: 2,448,134
RAC: 7,070

Message 63592 - Posted: 15 May 2015, 19:34:46 UTC

My task has now hit 24 hours, at 100%.
Should I let it continue to run?
And why or why not?

Frustrated.

Richard Haselgrove
Send message
Joined: 4 Sep 12
Posts: 218
Credit: 448,778
RAC: 0

Message 63593 - Posted: 15 May 2015, 21:46:30 UTC - in response to Message 63592.

My task has now hit 24 hours, at 100%.
Should I let it continue to run?
And why or why not?

Frustrated.

What does Process Explorer say about what it's doing?

Jacob Klein
Send message
Joined: 22 Jun 11
Posts: 32
Credit: 2,448,134
RAC: 7,070

Message 63594 - Posted: 15 May 2015, 22:13:48 UTC
Last modified: 15 May 2015, 22:27:15 UTC

Same as it has been, for a long time now --- using 1 core, despite BOINC allocating 4.

Task properties:
Resources: 4 CPUs
CPU time at last checkpoint: --- (never any checkpoint)
CPU time: 25:12:13
Elapsed time: 26:38:51
Estimated time remaining: ---
Fraction done: 100.000%

The times show that it has actually run single-threaded for the majority of the time, thus wasting 3 of my CPUs, not allowing BOINC to allocate them for other tasks.

Admins:

Should I let it continue to run?
And why or why not?

Frustrated.

Profile Blurf
Volunteer moderator
Project administrator
Send message
Joined: 13 Mar 08
Posts: 798
Credit: 26,380,161
RAC: 0

Message 63596 - Posted: 15 May 2015, 22:45:51 UTC

I am sorry for the delay people are experiencing in getting assistance. I am working with Professor Newberg to improve communications.

Here's something I from Sidd:

It seems that it is only occurring for people with windows. The other platforms seem to be running ok. We had some trouble getting the binaries for windows in order to release, I am thinking something went wrong there, as none of the windows runs are passing. I took down the Windows version for now until I figure it all out.

____________

Jacob Klein
Send message
Joined: 22 Jun 11
Posts: 32
Credit: 2,448,134
RAC: 7,070

Message 63600 - Posted: 15 May 2015, 23:52:46 UTC - in response to Message 63596.
Last modified: 15 May 2015, 23:53:28 UTC

Thank you, Blurf.
We look forward to any more details that you or the admins can provide.

I'm sorry that we sound so "complainy". We're frustrated. But in the end, we're here to help solve the problems we're seeing. If there's anything we can do to help, let us know.

Thanks,
Jacob

Sidd
Project developer
Project tester
Project scientist
Send message
Joined: 19 May 14
Posts: 60
Credit: 326,016
RAC: 1,029

Message 63607 - Posted: 16 May 2015, 17:05:37 UTC

Hey All,

I apologize for the silence. I am looking into the problem. We had issues with getting the windows version ready for release, and I believe something went wrong in the process. I ran those binaries on windows and they seemed to have worked properly at the time. However, it seems that no windows runs are working. The other versions of the code are ok so far and I am getting results on my end.

If you have nbody running and stalled abort it for now. I deprecated the windows version of the app but I believe some may still be sent out before it is completely down.

This was an unexpected problem with the binaries as they worked on our end, and I apologize for the issues you are having and the frustration incurred. Please be patient, as we are working on the issues.

Thanks,
Sidd

europa
Send message
Joined: 29 Oct 10
Posts: 89
Credit: 39,246,947
RAC: 0

Message 63618 - Posted: 18 May 2015, 13:29:30 UTC

Just to provide some extra information on troubleshooitng the N-body wu's....

I'm running Linux Mint 17.1 (Rebecca) on a 5 core AMD with ATI 7850 card.

Unlike previously, the N-body wu's are completing, however, I just checked and they are all "validation inconclusive."

I also notice that it only processes 1 N-body wu at a time and says it is using 4 cores. This seems to apply ONLY to the MW wu's since the Einstein GPU wu's are processing multiple WU's simultaneously and concurrently with the MW and say they're using the equivalent of 2 cores!

Using 6 cores simultaneously on a 5 core cpu!

Ya gotta love the math!! :)

Regards,
Steve

Odysseus
Send message
Joined: 10 Nov 07
Posts: 95
Credit: 5,818,262
RAC: 7,633

Message 63622 - Posted: 21 May 2015, 1:41:54 UTC

I’m seeing similar behaviour on my MacBook Pro as has been reported on other platforms: Nbody claims to be using both CPUs, so my other tasks are Waiting, but Activity Monitor shows much less CPU usage than typical. I actually noticed this from abnormally low temperatures: I keep a close eye on this system because it has a dodgy fan, and it’s reading in the high 50s C instead of the usual low 60s. Progress so far—still pretty early—is consistent with CPU times.
____________

Odysseus
Send message
Joined: 10 Nov 07
Posts: 95
Credit: 5,818,262
RAC: 7,633

Message 63623 - Posted: 21 May 2015, 9:30:08 UTC - in response to Message 63622.
Last modified: 21 May 2015, 9:36:57 UTC

The task in question finished much sooner than projected (2.3 h* instead of about 10), and is now awaiting validation. Computer is back to normal, crunching (for other projects) on both CPUs.

* Comparing the reported run time of 2:17 to the CPU time of 2:23:35 seems to imply an efficiency of 52.5%, or that only 5% of the second CPU‘s capacity was used.

Jacob Klein
Send message
Joined: 22 Jun 11
Posts: 32
Credit: 2,448,134
RAC: 7,070

Message 63625 - Posted: 21 May 2015, 15:20:13 UTC

On one of my machines, an N-Body 1.50 x64 task ran for 145 hours, single-threaded (despite saying 8-CPUs and wasting them), and without a single checkpoint... before I killed it.

I hope that the server has been set to not resend these N-Body tasks to us Windows users. But, because of the lack of communication and transparency, I will have to turn it off in my web preferences. I'll probably forget to turn it back on. It's a shame that the user has to micromanage these.

I just wanted to get that feedback out there. Frustrating.

Jacob

Ulrich Metzner
Avatar
Send message
Joined: 11 Apr 15
Posts: 42
Credit: 16,135,406
RAC: 32,181

Message 63628 - Posted: 23 May 2015, 14:23:05 UTC - in response to Message 63555.

Doesn't work correctly on my end. It runs fast in the beginning of the simulation and gets slower as it gets to the end and then just stalls at 100% and at which point it has to be aborted because it just hangs. All my other computers have the same issue with this simulation.

Same issue here, starts very sporty and then begins to lag more and more. Additionally, when BOINC is restarted, the application starts again at 0% - what a bummer!
____________
Aloha, Uli

europa
Send message
Joined: 29 Oct 10
Posts: 89
Credit: 39,246,947
RAC: 0

Message 63629 - Posted: 24 May 2015, 10:19:32 UTC

I'm running Linux Mint and after an initial run of WU's running but getting "validation inconclusive" for them, then I started having the old problem of asymtotic WU's (approaching infinitely close but never completing).

I ended up aborting all of the wu's before I noticed that some were 1.50 and some were 1.50 (mt). I don't know if there was any pattern to the one's that just keep running and the one's that run but give the "validation incomplete" message.

I've delisted the N-body Simulation from my machines for now.

Regards,
Steve

Larry
Send message
Joined: 20 Oct 13
Posts: 1
Credit: 225,511
RAC: 254

Message 63630 - Posted: 24 May 2015, 23:40:17 UTC
Last modified: 24 May 2015, 23:46:51 UTC

Strange that I received 12 of these "bad" N-body tasks when I already had the N-body tasks removed from my job preference list. Perhaps there was no other work available from the other applications. In any case, the offending N-body tasks on my Windows 7 (64-bit) have been aborted.

mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2033
Credit: 180,792,604
RAC: 301,220

Message 63631 - Posted: 25 May 2015, 10:42:20 UTC - in response to Message 63630.

Strange that I received 12 of these "bad" N-body tasks when I already had the N-body tasks removed from my job preference list. Perhaps there was no other work available from the other applications. In any case, the offending N-body tasks on my Windows 7 (64-bit) have been aborted.


Yes that is another check box you must uncheck for right now. It is at the bottom of the same list where you unchecked getting the n-body units.

TomB
Send message
Joined: 6 Oct 09
Posts: 3
Credit: 2,266,199
RAC: 2,849

Message 63633 - Posted: 25 May 2015, 21:48:33 UTC

This doesn't seem to be a windows only problem. I've currently got one on my Mac Mini that should have finished within an hour or so, but is currently been running for over 10 hours with the last 3 hours indicating 100% complete. This is the second time it has tried to run, since it isn't saving a checkpoint the last time the computer was shut down the work unit restarted from 0%. It was running over 24 hours the first time. Also looking at the activity monitor, it doesn't look like it uses more than one processor when it claims it is using all 8.

The ones I've watched that do work also seem strange where they will slow down the closer they are to 100%, then will go back near 0% and rapidly count up to completion and finish normally.

I'll see how long I can keep this one going to see if it finishes, but if it goes too long I'll have to abort it since no other work units can run.

TomB
Send message
Joined: 6 Oct 09
Posts: 3
Credit: 2,266,199
RAC: 2,849

Message 63641 - Posted: 27 May 2015, 18:37:12 UTC - in response to Message 63633.

The work unit from the previous message finally finished. It took it 50.5 hours (about 50 hours longer than the original estimate), but it is finished. Has to be some sort of bug, but it looks like they will finish eventually, and with no checkpoints you still run the risk of them starting over if the computer is restarted for any reason.

Jake
Send message
Joined: 27 Oct 14
Posts: 2
Credit: 1,305,853
RAC: 4,048

Message 63646 - Posted: 30 May 2015, 17:36:51 UTC

Currently a Milkyway@Home project says Running(4 CPUs) but when viewing the system load only one core is running at 100% while the other three are idle, the core with 100% use switches every several seconds (load balancing or something im assuming) why is this process claming its using 4 cpus but only using one? no other tasks can run while this one is since it is claming that it is going to be using all four cores. The tasks do complete sucessfully in about 25 minutes though.

BOINC Client:7.4.23
BOINC Manager:7.4.23 (x64)
Operating System: ubuntu 15.04 64-bit

Previous · 1 · 2
Post to thread

Message boards : News : New Nbody Version 1.50


Main page · Your account · Message boards


Copyright © 2017 AstroInformatics Group