Welcome to MilkyWay@home

Posts by Jean-David Beyer

1) Message boards : Number crunching : NBody tasks taking much longer ... (Message 74134)
Posted 9 Sep 2022 by Jean-David Beyer
Post:
On my machine, Nbody tasks usually have been taking 4 to 8 minutes. This morning one has about 2 1/2 hours on it and seems to be running normally.

boinc 325569 16183 99 07:33 ? 02:49:19 ../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.82_x86_64-pc-linux-gnu__mt -f nbody_parameters.lua -h histogram.txt --seed 196001641 -np 11 -p 3.58248 1 0.117433 0.148923 56.652 0.301723 1 1 1 1 1

This is not a complaint, but it sure is a surprise.
2) Message boards : Number crunching : Validation inconclusive (Message 73674)
Posted 24 May 2022 by Jean-David Beyer
Post:
I do not know if anything is getting validated. This is my oldest. There are 184 others.

Workunit 458806994
name 	de_modfit_86_bundle5_3s_south_pt2_2_1652888500_3384632
application 	Milkyway@home Separation
created 	21 May 2022, 6:48:25 UTC
minimum quorum 	1
initial replication 	1
max # of error/total/success tasks 	2, 9, 6
validation 	Pending
Task Computer	Sent	Time reported explain	Status	
Run time CPU time	Credit	
Application
279025918 	928280 	22 May 2022, 8:24:29 UTC 	22 May 2022, 23:13:27 UTC 	Completed, waiting for validation 	2,791.94 	2,777.26 	pending 	
Milkyway@home Separation v1.46
x86_64-pc-linux-gnu
3) Message boards : Number crunching : I do not understand waiting for validation (Message 73547)
Posted 16 May 2022 by Jean-David Beyer
Post:

Now, the "Quorum 1, initial replication 1" is just the way a project using adaptive replication should launch new work units; when the initial task is returned and the validator has looked at it, if the result is not obviously invalid and if the system that returned it has a good run of consecutive validated returns there's a significant likelihood that it will validate without needing a second opinion - there's a random factor involved, and it'll allow most tasks to self-validate. If the system doesn't have a run of validated results, or if the random factor decrees that a second opinion is needed, the validator will request a second task (and the quorum and/or replication counts will increment.)
...
Hope this helps...


That is probably the reason then. I am a new user of Milkyway and while all my results seem to me to be correct, perhaps a day or two of results are not enough to go through self-validation even though they say a quorum of one is the requirement.

I notice I now have no work units in this category.
4) Message boards : Number crunching : I do not understand waiting for validation (Message 73539)
Posted 16 May 2022 by Jean-David Beyer
Post:
Workunit 452113339
name 	de_modfit_84_bundle5_3s_south_pt2_2_1651669798_11233861
application 	Milkyway@home Separation
created 	15 May 2022, 9:34:34 UTC
minimum quorum 	1
initial replication 	1
max # of error/total/success tasks 	2, 9, 6
validation 	Pending
Task
click for details	Computer	Sent	Time reported
or deadline
explain	Status	Run time
(sec)	CPU time
(sec)	Credit	Application
271031951 	928280 	15 May 2022, 9:43:24 UTC 	15 May 2022, 23:26:13 UTC 	Completed, waiting for validation 	2,773.68 	2,758.37 	pending 	Milkyway@home Separation v1.46
x86_64-pc-linux-gnu


What validation do rhey want when
minimum quorum 1
initial replication 1

max # of error/total/success tasks 2, 9, 6
validation Pending
5) Message boards : News : Nbody WU Flush (Message 73535)
Posted 15 May 2022 by Jean-David Beyer
Post:
On my linux machine, I get similar results: all are 64-bit.
Red Hat Enterprise Linux release 8.6 (Ootpa)
This is a client machine, not a server.
[/usr/bin]$ file boinc*
boinc:        ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=4a444f877bad66d3e3c69815b610c2f44dec51c0, stripped
boinc_client: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=4a444f877bad66d3e3c69815b610c2f44dec51c0, stripped
boinccmd:     ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=004ad0d0739c3b4a05a6afd9e5f716f730d63226, stripped
boincmgr:     ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=30032e4e8cb1a106b666ce63ac8b11e722fccb82, stripped
boincscr:     ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=1b24af4798f6e15af7d5fd66681355a7005d82ad, stripped

[/usr/lib64]$ file ld-linux* ld-2.28*
ld-linux-x86-64.so.2: symbolic link to ld-2.28.so
ld-2.28.so:           ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=67aa0f1504abcfc4befb2b61a329a30a9984e0db, with debug_info, not stripped

[/usr/lib64]$ file *boinc*
libboinc_api.so.7:             symbolic link to libboinc_api.so.7.16.11
libboinc_api.so.7.16.11:       ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=b534348dcb43689c3eaacc94f317305a5be2a982, stripped
libboinc_graphics2.so.7:       symbolic link to libboinc_graphics2.so.7.16.11
libboinc_graphics2.so.7.16.11: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=900728d6af6e7c77e639e569d657326824973d4d, stripped
libboinc_opencl.so.7:          symbolic link to libboinc_opencl.so.7.16.11
libboinc_opencl.so.7.16.11:    ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=011ba4e89c630388ff1694e4cc56db3d55b53fe7, stripped

6) Questions and Answers : Unix/Linux : How do tasks like de_nbody_08_31_2021_v176_40k__data__13_1647295263_3642347_2 work? (Message 73507)
Posted 14 May 2022 by Jean-David Beyer
Post:
You're guessing correctly, the task is probably using 8 cores. N-Body Simulation project is multicore and can be set up to use 1 to 16 cores per task via app_config.xml. By default each task will use all available cores (up to 16). BOINC will even sometimes pause other running tasks, like you've noticed. I'm currently running this project on Ubuntu 20.04 (via WSL2 on Windows 10) and both top and htop commands also give the same reading as you describe. It seems to be the (strange) way they interpret multicore usage. I found that 3-5 core setup produces highest throughputs. Here's the app_config.xml that I have to run it using 4 cores.


Thank-you. I set it to run with only 4 cores and it is now working that way.

I am not sure what the optimum is. I am now running
# perf stat -aB -e cache-references,cache-misses
to see how the processor cash is doing. So I am by no means running out of processor cache. This is unusually good number. When ClimatePrediction is running big (N216) models, the processor cache missis go up to about 50%, but now only milkyway and universe@home are the only Boinc tasks (and not much else) that are running.
Performance counter stats for 'system wide':

    10,736,725,787      cache-references                                            
       122,353,406      cache-misses              #    1.140 % of all cache refs    

      60.247759271 seconds time elapsed
7) Questions and Answers : Unix/Linux : How do tasks like de_nbody_08_31_2021_v176_40k__data__13_1647295263_3642347_2 work? (Message 73501)
Posted 14 May 2022 by Jean-David Beyer
Post:
I am running a machine using Red Hat Enterprise Linux release 8.5 (Ootpa).

When my machine runs a task like de_nbody_08_31_2021_v176_40k__data__13_1647295263_3642347_2, it seems to run just fine, and sends back up to the server where it is marked valid. But if I run the top command to look at only Boinc tasks, that kind of task seems to be assigned to one CPU, but that CPU seems to run at 650% (or so) of capacity, and the other Boinc tasks go to sleep. I get the impression that it is using six to eight processors, but top does not say so. So this is not a complaint, but I would like to know what is really going on.




©2024 Astroinformatics Group