Welcome to MilkyWay@home

Posts by rjs5

1) Message boards : Number crunching : cpu and gpu (Message 69919)
Posted 13 Jun 2020 by rjs5
Post:
I have a quad core proccesser and i am running 4 other work units for a total of 5 work units


You "running 4 other work units", but those 4 work units plus the GPU 0.417 CPU need are running on 4 CPU. BOINC marks all the work units as "running" but the OS will schedule the 4 CPU among the 5 work units based on the OS scheduling algorithm. The OS overhead will increase when it has to switch between work units it is trying to run.

Interestingly, I am running the GPU work units on a Nvidia 2080 ti and MW@H says that those work units require 0.997C. It takes almost a whole CPU to keep the GPU fed.
2) Message boards : News : New Server Update (Message 68348)
Posted 22 Mar 2019 by rjs5
Post:
I was able to fix the error with the not downloadabel exe. I downloaded the file from https://milkyway.cs.rpi.edu/milkyway/download/ and copied it manually to the project folder.

Greetings Marcus


Fixed mine too. Thanks much.
3) Questions and Answers : Getting started : Project download error (Message 68336)
Posted 22 Mar 2019 by rjs5
Post:
My only Windows 10 machine has successfully crunched 22 million credits. When the server was changed, new binaries stuck in the middle of downloading. This happened on both milkyway_1.46_windows_x86_64.exe and milkyway_1.46_windows_x86_64__opencl_nvidia_101.exe. Nothing changed on the machine. The machine worked before the server change and is stuck in downloading mode after.

I am unable to allow access to my computers, so I think they are hidden. When I try to SET the Preferences "Should MilkyWay@home show your computers on its web site?" to CHECKED, the option will not allow me.


Linux machines are running fine.

ID: 399149
Details | Tasks
Cross-project stats:
BOINCstats.com Free-DC sky2066 32,507.57 22,233,868 7.14.2 GenuineIntel
Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz [Family 6 Model 85 Stepping 4]
(24 processors) NVIDIA GeForce RTX 2080 Ti (4095MB) driver: 419.35 OpenCL: 1.2 Microsoft Windows 10
Professional x64 Edition, (10.00.17134.00) 22 Mar 2019, 1:49:19 UTC



3/21/2019 7:37:24 PM | Milkyway@Home | Started download of milkyway_1.46_windows_x86_64.exe
3/21/2019 7:37:36 PM | Milkyway@Home | Started download of milkyway_1.46_windows_x86_64__opencl_nvidia_101.exe
3/21/2019 7:37:46 PM | | Project communication failed: attempting access to reference site
3/21/2019 7:37:46 PM | Milkyway@Home | Temporarily failed download of milkyway_1.46_windows_x86_64.exe: connect() failed
3/21/2019 7:37:46 PM | Milkyway@Home | Backing off 01:36:06 on download of milkyway_1.46_windows_x86_64.exe
3/21/2019 7:37:48 PM | | Internet access OK - project servers may be temporarily down.
3/21/2019 7:37:58 PM | | Project communication failed: attempting access to reference site
3/21/2019 7:37:58 PM | Milkyway@Home | Temporarily failed download of milkyway_1.46_windows_x86_64__opencl_nvidia_101.exe: connect() failed
3/21/2019 7:37:58 PM | Milkyway@Home | Backing off 00:53:01 on download of milkyway_1.46_windows_x86_64__opencl_nvidia_101.exe
4) Message boards : News : New Server Update (Message 68320)
Posted 21 Mar 2019 by rjs5
Post:
Hi there,

I have the same problem as Tim

20.03.2019 22:50:25 | Milkyway@Home | Temporarily failed download of milkyway_1.46_windows_x86_64__opencl_nvidia_101.exe: connect() failed
20.03.2019 22:50:25 | Milkyway@Home | Backing off 05:54:55 on download of milkyway_1.46_windows_x86_64__opencl_nvidia_101.exe
20.03.2019 22:50:26 | | Internet access OK - project servers may be temporarily down.

It has downloaded 40 WUs but not stating them as the exe is missing.

Greetings Marcus


I am seeing this too, but only on my Windows machine (ID 399149 ). Linux machines are doing fine. It downloaded 8 WU but the Windows Milkway nvidia binary has been stuck since the new server was put online.

3/20/2019 11:52:05 PM | Milkyway@Home | Temporarily failed download of milkyway_1.46_windows_x86_64__opencl_nvidia_101.exe: connect() failed
3/20/2019 11:52:05 PM | Milkyway@Home | Backing off 04:51:31 on download of milkyway_1.46_windows_x86_64__opencl_nvidia_101.exe
5) Message boards : Number crunching : Maximum number of task (Message 67684)
Posted 23 Jul 2018 by rjs5
Post:
I'm trying to do the same thing, but I keep getting this error message:

Milkyway@Home: Notice from BOINC
Missing <app_config> in app_config.xml


I used NotepadXML to create this file:

<?xml version="1.0" encoding="utf-8"?>
<app_config>
<app>
<name>MilkyWay@Home</name>
<max_concurrent>5</max_concurrent>
</app>
</app_config>


The first line was inserted by NotepadXML and the <name> line is one of my efforts to resolve the error. Doesn't seem to matter whether either line is there... the same error is returned and the file doesn't seem to work.

Obviously, I'm doing something wrong, but I haven't a clue what.

Any suggestions?


I think the actual names of the M@H apps are: 'milkyway' and 'milkyway_nbody'

Try changing the "MilkyWay@Home" to the app name you are interested in.
6) Message boards : Number crunching : Setting some apps to GPU and some to CPU (Message 64564)
Posted 21 May 2016 by rjs5
Post:
I suspect that you are going down the right path with the XML configuration ... if a solution exists or not.

Do you see much difference in the time to complete 2 jobs in parallel versus 2 sequentially? GPU-Z indicates that my GTX970 is pretty loaded down with MH@H. Some other projects I have set to 0.5 GPU, but not MW.
7) Message boards : Number crunching : Setting some apps to GPU and some to CPU (Message 64562)
Posted 21 May 2016 by rjs5
Post:
Do you mean you want to split the 4 MW@H apps,
MilkyWay@Home
MilkyWay@Home N-Body Simulation
Milkyway@Home Separation
Milkyway@Home Separation (Modified Fit)



so, these two run on the CPU only,
Milkyway@Home Separation
Milkyway@Home Separation (Modified Fit)


and only this one runs on the GPU
MilkyWay@Home
8) Message boards : News : New Nbody version 1.46 (Message 62904)
Posted 28 Dec 2014 by rjs5
Post:
ps_nbody_12_20_orphan_sim_2_1413455402_1450477_2

I have one too and it appears to be stuck in a loop where its exit is based on a floating point compare.

It has been running nbody 1.46mt at 100% completion for several hours on Ubuntu Linux.
Only one CPU is active of the 8 CPUs.
perf top indicates that execution is stuck in the pow_rn function for that single CPU running.

The only functions measuring non-zero execution time (using perf top) are:
88% pow_rn
11.75% 0x000....9cf72
0.03% pow_exact_rn
0.01% dsfmt_gen_rand_all


http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=676782111

I ran a perf record -a -- sleep 5 to capture what the machine was doing.
perf report shows ....

2.4% of the total 88% time above is being spent in the "subsd" instruction, underlined below from the objdump. The remaining instructions down to the "jne" loop exit are at about 1% of execution time.


497716: 66 45 0f 28 cd movapd %xmm13,%xmm9
49771b: f2 45 0f 5c cf subsd %xmm15,%xmm9
497720: 66 45 0f 28 f9 movapd %xmm9,%xmm15
497725: f2 44 0f 10 4c 24 c8 movsd -0x38(%rsp),%xmm9
49772c: f2 45 0f 59 ce mulsd %xmm14,%xmm9
497731: f2 44 0f 59 74 24 e8 mulsd -0x18(%rsp),%xmm14

497738: f2 45 0f 5c cd subsd %xmm13,%xmm9

49773d: f2 44 0f 10 6c 24 f0 movsd -0x10(%rsp),%xmm13
497744: f2 44 0f 59 6c 24 c8 mulsd -0x38(%rsp),%xmm13
49774b: f2 45 0f 58 ce addsd %xmm14,%xmm9
497750: f2 45 0f 58 cd addsd %xmm13,%xmm9
497755: f2 44 0f 10 6c 24 f0 movsd -0x10(%rsp),%xmm13
49775c: f2 44 0f 59 6c 24 e8 mulsd -0x18(%rsp),%xmm13
497763: f2 45 0f 58 cd addsd %xmm13,%xmm9
497768: f2 45 0f 58 cc addsd %xmm12,%xmm9
49776d: f2 45 0f 58 f9 addsd %xmm9,%xmm15
497772: f2 44 0f 10 0d 85 00 movsd 0x30085(%rip),%xmm9 # 4c7800 <scs_sixinv+0x9180>
497779: 03 00
49777b: f2 45 0f 59 cf mulsd %xmm15,%xmm9
497780: f2 44 0f 58 0d 7f 00 addsd 0x3007f(%rip),%xmm9 # 4c7808 <scs_sixinv+0x9188>
497787: 03 00
497789: f2 45 0f 59 cf mulsd %xmm15,%xmm9
49778e: f2 44 0f 58 0d 71 1b addsd 0x21b71(%rip),%xmm9 # 4b9308 <p_n+0x2e28>
497795: 02 00
497797: f2 45 0f 59 cf mulsd %xmm15,%xmm9
49779c: f2 45 0f 59 d1 mulsd %xmm9,%xmm10
4977a1: f2 45 0f 58 d1 addsd %xmm9,%xmm10
4977a6: f2 44 0f 58 54 24 d8 addsd -0x28(%rsp),%xmm10
4977ad: f2 44 0f 59 54 24 d0 mulsd -0x30(%rsp),%xmm10
4977b4: f2 45 0f 58 da addsd %xmm10,%xmm11
4977b9: 66 45 0f 28 cb movapd %xmm11,%xmm9
4977be: f2 44 0f 5c 4c 24 d0 subsd -0x30(%rsp),%xmm9
4977c5: f2 45 0f 5c d1 subsd %xmm9,%xmm10
4977ca: 66 45 0f 28 cb movapd %xmm11,%xmm9
4977cf: f2 44 0f 58 54 24 e0 addsd -0x20(%rsp),%xmm10
4977d6: f2 45 0f 58 ca addsd %xmm10,%xmm9
4977db: 66 45 0f 28 e1 movapd %xmm9,%xmm12
4977e0: f2 45 0f 5c e3 subsd %xmm11,%xmm12
4977e5: f2 45 0f 5c d4 subsd %xmm12,%xmm10
4977ea: 0f 8c f8 00 00 00 jl 4978e8 <pow_rn+0x9c8>
4977f0: f2 44 0f 59 15 67 9c mulsd 0x29c67(%rip),%xmm10 # 4c1460 <scs_sixinv+0x2de0>
4977f7: 02 00
4977f9: f2 45 0f 58 d1 addsd %xmm9,%xmm10
4977fe: 66 45 0f 2e ca ucomisd %xmm10,%xmm9
497803: 0f 85 cf 00 00 00 jne 4978d8 <pow_rn+0x9b8>
497809: 0f 8a c9 00 00 00 jp 4978d8 <pow_rn+0x9b8>

49780f: 81 fe fe 03 00 00 cmp $0x3fe,%esi

[/b]
9) Message boards : News : N-Body 1.18 (Message 58581)
Posted 8 Jun 2013 by rjs5
Post:
I was installing a new compiler and EVERYTHING was operating normally, I think.


Please ignore my last post.
10) Message boards : News : N-Body 1.18 (Message 58580)
Posted 8 Jun 2013 by rjs5
Post:
I am still running but ran into what appears to be a scheduler problem.

I have an 8-core i7 Sandy Bridge and an EVGA GTX 650 Ti Nvidia GPU.

The CPU mt started up "running 8 cpus".
The mt GPU task started up and takes 0.417 cpu.


I see a pattern. Only one of the two will run under normally scheduling. They Ping-Pong back and forth where the 8-CPU mt version only runs during reload of the next GPU task and a short time following.


If I suspend all GPU, the 8cpu mt starts. If I resume GPU both run for a short period of time and then the 8cpu mt job suspends.


It appears that the CPU mt version wants ALL the CPU but the GPU starts and wants a CPU FRACTION with the total of the two is > total.




6/8/2013 3:07:48 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:07:50 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks
6/8/2013 3:07:50 PM | Milkyway@Home | Project has no tasks available
6/8/2013 3:15:28 PM | Milkyway@Home | Computation for task de_separation_79_DR8_rev_2_1370577207_678480_2 finished
6/8/2013 3:15:28 PM | Milkyway@Home | Starting task de_separation_79_DR8_rev_3_1370577207_1062567_0 using milkyway version 102 (opencl_nvidia) in slot 10
6/8/2013 3:15:31 PM | Milkyway@Home | Sending scheduler request: To fetch work.
6/8/2013 3:15:31 PM | Milkyway@Home | Reporting 1 completed tasks
6/8/2013 3:15:31 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:15:33 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks
6/8/2013 3:20:29 PM | Milkyway@Home | Restarting task de_nbody_06_06_dark_1370577207_85348_0 using milkyway_nbody version 118 (mt) in slot 9
6/8/2013 3:24:19 PM | Milkyway@Home | Computation for task de_separation_79_DR8_rev_3_1370577207_1062567_0 finished
6/8/2013 3:24:19 PM | Milkyway@Home | Starting task de_separation_79_DR8_rev_3_1370577207_1062563_0 using milkyway version 102 (opencl_nvidia) in slot 10
6/8/2013 3:24:24 PM | Milkyway@Home | Sending scheduler request: To fetch work.
6/8/2013 3:24:24 PM | Milkyway@Home | Reporting 1 completed tasks
6/8/2013 3:24:24 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:24:26 PM | Milkyway@Home | Scheduler request completed: got 2 new tasks
6/8/2013 3:25:31 PM | Milkyway@Home | Sending scheduler request: To fetch work.
6/8/2013 3:25:31 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:25:33 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks
6/8/2013 3:25:33 PM | Milkyway@Home | Project has no tasks available
6/8/2013 3:33:12 PM | Milkyway@Home | Computation for task de_separation_79_DR8_rev_3_1370577207_1062563_0 finished
6/8/2013 3:33:12 PM | Milkyway@Home | Starting task de_separation_79_DR8_rev_3_1370577207_677538_2 using milkyway version 102 (opencl_nvidia) in slot 10
6/8/2013 3:33:14 PM | Milkyway@Home | Sending scheduler request: To fetch work.
6/8/2013 3:33:14 PM | Milkyway@Home | Reporting 1 completed tasks
6/8/2013 3:33:14 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:33:16 PM | Milkyway@Home | Scheduler request completed: got 1 new tasks
6/8/2013 3:34:22 PM | Milkyway@Home | Sending scheduler request: To fetch work.
6/8/2013 3:34:22 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:34:24 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks
6/8/2013 3:34:24 PM | Milkyway@Home | Project has no tasks available
6/8/2013 3:41:13 PM | | Reading preferences override file
6/8/2013 3:41:13 PM | | Preferences:
6/8/2013 3:41:13 PM | | max memory usage when active: 8183.22MB
6/8/2013 3:41:13 PM | | max memory usage when idle: 14729.80MB
6/8/2013 3:41:13 PM | | max disk usage: 100.00GB
6/8/2013 3:41:13 PM | | don't use GPU while active
6/8/2013 3:41:13 PM | | suspend work if non-BOINC CPU load exceeds 25 %
6/8/2013 3:41:13 PM | | (to change preferences, visit a project web site or select Preferences in the Manager)
6/8/2013 3:41:28 PM | | Suspending GPU computation - user request
6/8/2013 3:41:28 PM | Milkyway@Home | Restarting task de_nbody_06_06_dark_1370577207_85348_0 using milkyway_nbody version 118 (mt) in slot 9
6/8/2013 3:41:37 PM | | Resuming GPU computation
6/8/2013 3:41:37 PM | Milkyway@Home | Restarting task de_separation_79_DR8_rev_3_1370577207_677538_2 using milkyway version 102 (opencl_nvidia) in slot 10
6/8/2013 3:42:24 PM | Milkyway@Home | Computation for task de_separation_79_DR8_rev_3_1370577207_677538_2 finished
6/8/2013 3:42:24 PM | Milkyway@Home | Starting task de_separation_79_DR8_rev_3_1370577207_1062568_0 using milkyway version 102 (opencl_nvidia) in slot 10
6/8/2013 3:42:26 PM | Milkyway@Home | Sending scheduler request: To fetch work.
6/8/2013 3:42:26 PM | Milkyway@Home | Reporting 1 completed tasks
6/8/2013 3:42:26 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:42:29 PM | Milkyway@Home | Scheduler request completed: got 1 new tasks
6/8/2013 3:43:34 PM | Milkyway@Home | Sending scheduler request: To fetch work.
6/8/2013 3:43:34 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:43:37 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks
6/8/2013 3:43:37 PM | Milkyway@Home | Project has no tasks available
6/8/2013 3:49:42 PM | Milkyway@Home | Sending scheduler request: To fetch work.
6/8/2013 3:49:42 PM | Milkyway@Home | Requesting new tasks for NVIDIA
6/8/2013 3:49:44 PM | Milkyway@Home | Scheduler request completed: got 0 new tasks
6/8/2013 3:49:44 PM | Milkyway@Home | Project has no tasks available
11) Message boards : News : N-Body 1.18 (Message 58571)
Posted 8 Jun 2013 by rjs5
Post:
Thanks Richard and Jeffery (I think it was you two who put mt back in working order)


Richard,

It looks like I inadvertently played your "straight man". It appears that your information has put "mt" back in play.

It was not automatic but I am now running (it appears) mt MilkyWay workloads with multiple CPU.

UPDATING caused MW mt workload to think it was running mt mode but only used one CPU.
A DETACH and ATTACH seemed to fix it.

The DETACH/ATTACH seemed to work for me. I am not suggesting that it be the general solution for everyone. I leave the general solution to those who know what is going on.

12) Message boards : News : N-Body 1.18 (Message 58542)
Posted 7 Jun 2013 by rjs5
Post:
I have been wondering what happened to the multithreading operation on my machine. I thought my machine was configured incorrectly. Are their a lot of issues and has someone summarized them somewhere? If you have time, I would interested in knowing what the problems are.
thanks
13) Message boards : News : Nbody 1.04 (Message 56848)
Posted 12 Jan 2013 by rjs5
Post:
On a SECOND system that is working ALMOST. There does seem to be some hyper-dependency on the elements found on the system. It does not work on a pristine system and goes pretty crazy on my working system.

I will continue to try do disassemble what you are doing from the outside and see if I can find anything.


I have a near pristine Ivy Bridge Core i7. I removed and restored MilkyWay on it with no change in behavior. The Nbody tasks run but 8 of them run in parallel instead of 1 with 8 threads. The files that are found in slot 1 of one of the idled 8 milkyway workloads is:

boinc_task_state.xml
---
<active_task>
<project_master_url>http%3
14) Message boards : News : Nbody 1.04 (Message 56835)
Posted 11 Jan 2013 by rjs5
Post:
deleted everything and reinstalled. same error.

is it possible to check system calls for error status after the first call so some better diagionstics could be returned? doesn't seem too hard.
15) Message boards : News : Nbody 1.04 (Message 56834)
Posted 11 Jan 2013 by rjs5
Post:
I probably have a non-standard installation. I put the ProgramData directory on an SSD drive "K:\". Is it possible that the project makes an assumption that the program data is on the same drive as the binary?

C:\Program Files\BOINC
K:\ProgramData\BOINC\projects\milkyway.cs.rpi.edu_milkyway
16) Message boards : News : Nbody 1.04 (Message 56829)
Posted 11 Jan 2013 by rjs5
Post:
I am running Win7 64-bit and Boinc 7.0.28 64-bit (computer 399149 ) and all the MilkyWay@Home N-Body Simulation v1.04 tasks error out in just a few seconds. I looked over this thread and was wondering if there is anything I need to fix to stop the error outs?

I tried removing and reattaching the project with no change in behavior. The output seems to be a short stderr message.

Thoughts?
thanks
rjs


Stderr output
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
- exit code -1073741515 (0xc0000135)
</message>
]]>



http://milkyway.cs.rpi.edu/milkyway/results.php?userid=135958&offset=0&show_names=0&state=5&appid=7
17) Message boards : News : issues with workunits crashing might be fixed now and nbody work generation information (Message 55022)
Posted 5 Jul 2012 by rjs5
Post:
It is hard to tell what is happening with this stripped, statically linked program. I don't know how the program manages the different system call interfaces (various versions of Linux) with a single static link. It has been a long time since I have seen anyone statically link anything.

Intel has released a new beta version of their compiler that performs dynamic, runtime pointer checking that might help locate the bug but there is nothing MilkyWay@Home users can do to help other than to say .... still failing.
http://software.intel.com/en-us/articles/beta-tech-talks/

They have both a Fortan and C compiler that should help clean up bogus pointers. They can run a test on the application and locate their corrupted pointer.

An objdump of the application shows that there are AVX instructions in what I guess is the OpenMP code. I have Sandy Bridge and Ivy Bridge systems and ONE Nehalem system. The Nehalem system seems to work. The Sandy/Ivy bridge systems using AVX (via OpenMP) are failing.



If you get a computation error, the choices are to (1) turn off work or (2) let the compute errors pile up.

The workloads fail pretty rapidly so I am going to let the compute errors filter back to the system and it will be clear when they have fixed the bug.

18) Message boards : News : issues with workunits crashing might be fixed now and nbody work generation information (Message 55010)
Posted 4 Jul 2012 by rjs5
Post:
I am having similar failures running on Linux 64-bit. I am running an old version of Boinc which is the one that is easiest to get running on CentOS.

I saw similar problems on Einstein and was able to clear compute errors by doing an "ldd" to see which libraries that Einstein could not find. For Einstein, I had to install some 32-bit versions of libraries (GLUT,...).

Milkyway is statically linked and stripped of symbols so missing libraries is not the problem for Milkyway.


rod


Task 249098203

Stderr output

<core_client_version>6.10.45</core_client_version>
<![CDATA[
<message>
process exited with code 15 (0xf, -241)
</message>
<stderr_txt>
<search_application> milkyway_nbody 0.88 Linux x86_64 double OpenMP, Crlibm </search_application>
Using OpenMP 4 max threads on a system with 4 processors
Error reading histogram line 37: massPerParticle = 0.000100
23:07:55 (8730): called boinc_finish

</stderr_txt>
]]>




©2024 Astroinformatics Group