Welcome to MilkyWay@home

Posts by Jacob Klein

1) Message boards : News : Windows Users-please abort Nbody tasks (Message 63676)
Posted 5 Jun 2015 by Jacob Klein
Post:
Yes. BOINC and the projects appear to be working just fine with Windows 10, identically to previous versions of Windows.
2) Message boards : News : New Nbody Version 1.50 (Message 63625)
Posted 21 May 2015 by Jacob Klein
Post:
On one of my machines, an N-Body 1.50 x64 task ran for 145 hours, single-threaded (despite saying 8-CPUs and wasting them), and without a single checkpoint... before I killed it.

I hope that the server has been set to not resend these N-Body tasks to us Windows users. But, because of the lack of communication and transparency, I will have to turn it off in my web preferences. I'll probably forget to turn it back on. It's a shame that the user has to micromanage these.

I just wanted to get that feedback out there. Frustrating.

Jacob
3) Message boards : News : Windows Users-please abort Nbody tasks (Message 63611)
Posted 16 May 2015 by Jacob Klein
Post:
There is no "MilkyWay@Home N-Body Simulation" checkbox in Preferences. There isn't even a Preferences. There is a Tools -> Computing Preferences dialog box. Should I be looking elsewhere?


Yes. You need to look at the web preferences of a project, in order to enable/disable applications for any/all of your venues (default/home/school/work).

So, for MilkyWay, you'd go to:
http://milkyway.cs.rpi.edu/milkyway/prefs.php?subset=project
... and then click "Edit MilkyWay@Home preferences".

Every project you are attached to, has project-specific web settings, that let you disable certain applications if you want. They can be accessed from your account page, on the project's website.

Regards,
Jacob

Edit: Nice trigger finger, Richard. Nice.
4) Message boards : News : Windows Users-please abort Nbody tasks (Message 63601)
Posted 16 May 2015 by Jacob Klein
Post:
Can't you abort them server-side, and then also instruct the server software not to resend them to Windows users?
That'd be the best approach, if you're not doing that already.
Edit: I just checked, and it looks like you are still issuing them to Windows users. I hope you can change that.
5) Message boards : News : New Nbody Version 1.50 (Message 63600)
Posted 15 May 2015 by Jacob Klein
Post:
Thank you, Blurf.
We look forward to any more details that you or the admins can provide.

I'm sorry that we sound so "complainy". We're frustrated. But in the end, we're here to help solve the problems we're seeing. If there's anything we can do to help, let us know.

Thanks,
Jacob
6) Message boards : News : New Nbody Version 1.50 (Message 63594)
Posted 15 May 2015 by Jacob Klein
Post:
Same as it has been, for a long time now --- using 1 core, despite BOINC allocating 4.

Task properties:
Resources: 4 CPUs
CPU time at last checkpoint: --- (never any checkpoint)
CPU time: 25:12:13
Elapsed time: 26:38:51
Estimated time remaining: ---
Fraction done: 100.000%

The times show that it has actually run single-threaded for the majority of the time, thus wasting 3 of my CPUs, not allowing BOINC to allocate them for other tasks.

Admins:

Should I let it continue to run?
And why or why not?

Frustrated.
7) Message boards : News : New Nbody Version 1.50 (Message 63592)
Posted 15 May 2015 by Jacob Klein
Post:
My task has now hit 24 hours, at 100%.
Should I let it continue to run?
And why or why not?

Frustrated.
8) Message boards : News : New Nbody Version 1.50 (Message 63590)
Posted 15 May 2015 by Jacob Klein
Post:
Hey Death,
Not sure if you know this, but you are in the News message board, in an application release thread. If you have a problem that is not related, please post it elsewhere, like maybe the Number Crunching message board maybe.

Thanks,
Jacob
9) Message boards : News : New Nbody Version 1.50 (Message 63580)
Posted 15 May 2015 by Jacob Klein
Post:
I'm also noticing the same behavior, on:
ps_nbody_5_12_15_orphan_sim_1_1431361804_28199_0
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1117230409

Admins:
1: Is the behavior (reserving multiple cores, despite using only 1) ... expected behavior?
2: Is the behavior of going to 100%, then still running for hours/days after that ... expected behavior?
3: Will the task ever end?


It would help tremendously, if you could very thoroughly describe the expected behavior for these work units. People are aborting them, because the tasks look odd/broken, and if they're not broken, you need to do a better job of communicating your expectations.

Thanks in advance for your reply,
Jacob
10) Message boards : News : New Nbody release tomorrow (Message 63579)
Posted 15 May 2015 by Jacob Klein
Post:
Moved.
11) Message boards : News : New Nbody version 1.48 (Message 63259)
Posted 25 Mar 2015 by Jacob Klein
Post:
The biggest problem with initialization taking longer and not checkpointing comes when initialization takes longer than 60 minutes. By default BOINC switches tasks every 60 minutes. I just figured out today that for the past few weeks one of my boxes has spent 60 minutes running Nbody initialization, then it switches to a different project for 60 minutes, then it restarts the same initialization from scratch. Basically it has completely wasted half of the processing time since upgrading to 1.48.


That setting "Switch between apps every x minutes" is more sophisticated than you think. I was under the impression that it was only supposed to switch away from an app only when it had checkpointed, or was being pre-empted by a task that was in deadline jeopardy. So... the user's setting is basically just a suggestion for BOINC to try to switch if it can, but only when a checkpoint occurs.

So maybe you're hitting on a BOINC bug there? Are you using the latest version of BOINC? Are you using the "Leave application in memory" option? And can you describe the steps necessary to easily reproduce the problem?
12) Message boards : News : New Nbody version 1.48 (Message 63230)
Posted 14 Mar 2015 by Jacob Klein
Post:
If you think of it as a process that is incapable of determining its own length/progress (which some BOINC tasks are) .... then what should it do?

Approach 100%, but never hit it?
Sit at 100%?
Does it matter?

My point is that it's actually probably NOT a waste of electricity, if it's still running. It's probably just incapable of determining its own progress correctly. That number, is just a number. Even when it "started off really cranking", that number was just a number.
13) Message boards : News : New Nbody version 1.48 (Message 63221)
Posted 12 Mar 2015 by Jacob Klein
Post:
Is it possible that your CPU task(s) are in jeopardy of meeting their deadline? If BOINC thinks any of them are, they correctly get prioritized ahead of GPU tasks, and GPUs can be left idle. It happens sometimes to me, especially on projects that have tasks with tight (5 days or less) deadlines.
14) Message boards : News : New Nbody version 1.48 (Message 63194)
Posted 3 Mar 2015 by Jacob Klein
Post:
Unless I am mistaken, the question "Can 1.48 tasks get stuck in a loop at 100%?" remains unanswered. Can we get a solid answer on it? And, if yes, can we get instruction as to what to do?

Simple questions, really. Trying to figure out what we can expect from the 1.48 mt app...
15) Message boards : News : New Nbody version 1.48 (Message 63161)
Posted 18 Feb 2015 by Jacob Klein
Post:
Good call, Richard, on asking the correct question.
However, I've already aborted my 2 long-running v1.46 tasks.
16) Message boards : News : New Nbody version 1.48 (Message 63159)
Posted 18 Feb 2015 by Jacob Klein
Post:
If it reads version 1.46, thats the bugged version and those units will never finish. Abandon those units then force an update for milkyway@home to get v1.48.


Did we ever get absolute confirmation from a project admin, that some of the 1.46 units would never complete?
17) Message boards : News : New Nbody version 1.48 (Message 63147)
Posted 15 Feb 2015 by Jacob Klein
Post:
Thank you very much for the explanation - it helps more than you know. I was really trying to determine if it was the cause of the problems I've been noticing.

For 1.46 nbody mt tasks, I noticed on multiple Windows PCs that they would get to 100%, and then continue to crunch in the "Running" state well beyond 100%, single-threaded, leaving the PC's other CPUs idle. Also, that same task's "Elapsed" value would be over 24 hours, yet "CPU time at last checkpoint" would be blank, indicating that it never checkpointed at all, during those 24+ hours.

Do you know if that behavior (going single-threaded, going past 100%, going without checkpoint)... is expected? And would that task ever complete? And is any of it a possible bug? And might any of it be fixed with the 1.48 version?

I was just a bit shocked to see idle resources on my PCs, all due to an nbody task that wasn't checkpointing, and was in fact restarting entirely every time BOINC was restarted.

Thanks,
Jacob
18) Message boards : News : New Nbody version 1.48 (Message 63142)
Posted 14 Feb 2015 by Jacob Klein
Post:
Please clarify "made the program indeterminate".
Was that a bug in the 1.46 app? What would it look like, to the common user?
19) Message boards : Number crunching : Multithreaded opt out. (Message 63140)
Posted 14 Feb 2015 by Jacob Klein
Post:
I too have been noticing MilkyWay nbody mt (multi-threading) v1.46 tasks... go to 100%, and then continue doing something indefinitely, utilizing only a single core, despite BOINC budgeting multiple CPUs (3, or 4, or 8) to the task. Worse yet, there are no saved checkpoints for the task!

So, yes, it is for sure leading to idle resources.

The questions are:
- Is it expected for a MilkyWay nbody mt task to continue running for a long time, single-threaded, after the 100% mark? This is wasting resources.
- If it is expected, for how long should we wait until we consider it a bug? This is wasting resources.
- Why aren't there any checkpoints? If a user decides to suspend the workunit, and restart it, it'll restart from the very beginning again. This is wasting resources.
- Does the v1.48 app solve any of these problems, and if so, which ones?

Have I made my point? My questions deserve answers.
20) Message boards : Number crunching : Error: Maximum disk usage exceeded - 196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED (Message 62076)
Posted 24 Jul 2014 by Jacob Klein
Post:
A work unit is continuously failing for:
Maximum disk usage exceeded
Exit status: 196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED

See:
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=593762910
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=794663027
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=794879051
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=795016363

Could an admin please investigate, and potentially correct the rsc_disk_bound value for tasks like this?

Regards,
Jacob

Name	de_nbody_06_11_orphan_sim_0_1405680903_92137_2
Workunit	593762910
Created	24 Jul 2014, 9:42:28 UTC
Sent	24 Jul 2014, 10:06:21 UTC
Received	24 Jul 2014, 12:41:31 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	196 (0xc4) EXIT_DISK_LIMIT_EXCEEDED
Computer ID	523078
Report deadline	5 Aug 2014, 10:06:21 UTC
Run time	484.69
CPU time	1,642.38
Validate state	Invalid
Credit	0.00
Application version	MilkyWay@Home N-Body Simulation v1.40 (mt)
Stderr output

<core_client_version>7.4.8</core_client_version>
<![CDATA[
<message>
Maximum disk usage exceeded
</message>


Next 20

©2019 Astroinformatics Group