Posts by rjs5

1) Questions and Answers : Windows : Task not using 100% of CPU (Message 77221) Posted 16 hours ago by rjs5 Post: Thread bump. Was this ever resolved, because I have the same problem. It shows 8 days to finish something and is only using 17% of my cpu, even though I specified 75% as the max (i.e., 6 out of 8 for me). I doubt anyone is even looking at it. Most projects that use multiple CPU will have a main loop that assigns work to each CPU and then waits for all to finish. There is some compute overlap with multiple CPU, but not 100%. I have historically seen the Milkyway project not have real efficient multithreading. I recently changed the NCPUs to 1 and the WU throughput increased dramatically along with CPU usage. If a WU has been computing a long time, I abort it. That one CPU can finish several WU during the time it would sit there and not finish. Probably a bug in the threading logic.
2) Message boards : Number crunching : cpu and gpu (Message 69919) Posted 13 Jun 2020 by rjs5 Post: I have a quad core proccesser and i am running 4 other work units for a total of 5 work units You "running 4 other work units", but those 4 work units plus the GPU 0.417 CPU need are running on 4 CPU. BOINC marks all the work units as "running" but the OS will schedule the 4 CPU among the 5 work units based on the OS scheduling algorithm. The OS overhead will increase when it has to switch between work units it is trying to run. Interestingly, I am running the GPU work units on a Nvidia 2080 ti and MW@H says that those work units require 0.997C. It takes almost a whole CPU to keep the GPU fed.
3) Message boards : News : New Server Update (Message 68348) Posted 22 Mar 2019 by rjs5 Post: I was able to fix the error with the not downloadabel exe. I downloaded the file from https://milkyway.cs.rpi.edu/milkyway/download/ and copied it manually to the project folder. Greetings Marcus Fixed mine too. Thanks much.
4) Questions and Answers : Getting started : Project download error (Message 68336) Posted 22 Mar 2019 by rjs5 Post: My only Windows 10 machine has successfully crunched 22 million credits. When the server was changed, new binaries stuck in the middle of downloading. This happened on both milkyway_1.46_windows_x86_64.exe and milkyway_1.46_windows_x86_64__opencl_nvidia_101.exe. Nothing changed on the machine. The machine worked before the server change and is stuck in downloading mode after. I am unable to allow access to my computers, so I think they are hidden. When I try to SET the Preferences "Should MilkyWay@home show your computers on its web site?" to CHECKED, the option will not allow me. Linux machines are running fine. ID: 399149 Details \| Tasks Cross-project stats: BOINCstats.com Free-DC sky2066 32,507.57 22,233,868 7.14.2 GenuineIntel Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz [Family 6 Model 85 Stepping 4] (24 processors) NVIDIA GeForce RTX 2080 Ti (4095MB) driver: 419.35 OpenCL: 1.2 Microsoft Windows 10 Professional x64 Edition, (10.00.17134.00) 22 Mar 2019, 1:49:19 UTC 3/21/2019 7:37:24 PM \| Milkyway@Home \| Started download of milkyway_1.46_windows_x86_64.exe 3/21/2019 7:37:36 PM \| Milkyway@Home \| Started download of milkyway_1.46_windows_x86_64__opencl_nvidia_101.exe 3/21/2019 7:37:46 PM \| \| Project communication failed: attempting access to reference site 3/21/2019 7:37:46 PM \| Milkyway@Home \| Temporarily failed download of milkyway_1.46_windows_x86_64.exe: connect() failed 3/21/2019 7:37:46 PM \| Milkyway@Home \| Backing off 01:36:06 on download of milkyway_1.46_windows_x86_64.exe 3/21/2019 7:37:48 PM \| \| Internet access OK - project servers may be temporarily down. 3/21/2019 7:37:58 PM \| \| Project communication failed: attempting access to reference site 3/21/2019 7:37:58 PM \| Milkyway@Home \| Temporarily failed download of milkyway_1.46_windows_x86_64__opencl_nvidia_101.exe: connect() failed 3/21/2019 7:37:58 PM \| Milkyway@Home \| Backing off 00:53:01 on download of milkyway_1.46_windows_x86_64__opencl_nvidia_101.exe
5) Message boards : News : New Server Update (Message 68320) Posted 21 Mar 2019 by rjs5 Post: Hi there, I have the same problem as Tim 20.03.2019 22:50:25 \| Milkyway@Home \| Temporarily failed download of milkyway_1.46_windows_x86_64__opencl_nvidia_101.exe: connect() failed 20.03.2019 22:50:25 \| Milkyway@Home \| Backing off 05:54:55 on download of milkyway_1.46_windows_x86_64__opencl_nvidia_101.exe 20.03.2019 22:50:26 \| \| Internet access OK - project servers may be temporarily down. It has downloaded 40 WUs but not stating them as the exe is missing. Greetings Marcus I am seeing this too, but only on my Windows machine (ID 399149 ). Linux machines are doing fine. It downloaded 8 WU but the Windows Milkway nvidia binary has been stuck since the new server was put online. 3/20/2019 11:52:05 PM \| Milkyway@Home \| Temporarily failed download of milkyway_1.46_windows_x86_64__opencl_nvidia_101.exe: connect() failed 3/20/2019 11:52:05 PM \| Milkyway@Home \| Backing off 04:51:31 on download of milkyway_1.46_windows_x86_64__opencl_nvidia_101.exe
6) Message boards : Number crunching : Maximum number of task (Message 67684) Posted 23 Jul 2018 by rjs5 Post: I'm trying to do the same thing, but I keep getting this error message: Milkyway@Home: Notice from BOINC Missing <app_config> in app_config.xml I used NotepadXML to create this file: <?xml version="1.0" encoding="utf-8"?> <app_config> <app> <name>MilkyWay@Home</name> <max_concurrent>5</max_concurrent> </app> </app_config> The first line was inserted by NotepadXML and the <name> line is one of my efforts to resolve the error. Doesn't seem to matter whether either line is there... the same error is returned and the file doesn't seem to work. Obviously, I'm doing something wrong, but I haven't a clue what. Any suggestions? I think the actual names of the M@H apps are: 'milkyway' and 'milkyway_nbody' Try changing the "MilkyWay@Home" to the app name you are interested in.
7) Message boards : Number crunching : Setting some apps to GPU and some to CPU (Message 64564) Posted 21 May 2016 by rjs5 Post: I suspect that you are going down the right path with the XML configuration ... if a solution exists or not. Do you see much difference in the time to complete 2 jobs in parallel versus 2 sequentially? GPU-Z indicates that my GTX970 is pretty loaded down with MH@H. Some other projects I have set to 0.5 GPU, but not MW.
8) Message boards : Number crunching : Setting some apps to GPU and some to CPU (Message 64562) Posted 21 May 2016 by rjs5 Post: Do you mean you want to split the 4 MW@H apps, MilkyWay@Home MilkyWay@Home N-Body Simulation Milkyway@Home Separation Milkyway@Home Separation (Modified Fit) so, these two run on the CPU only, Milkyway@Home Separation Milkyway@Home Separation (Modified Fit) and only this one runs on the GPU MilkyWay@Home
9) Message boards : News : New Nbody version 1.46 (Message 62904) Posted 28 Dec 2014 by rjs5 Post: ps_nbody_12_20_orphan_sim_2_1413455402_1450477_2 I have one too and it appears to be stuck in a loop where its exit is based on a floating point compare. It has been running nbody 1.46mt at 100% completion for several hours on Ubuntu Linux. Only one CPU is active of the 8 CPUs. perf top indicates that execution is stuck in the pow_rn function for that single CPU running. The only functions measuring non-zero execution time (using perf top) are: 88% pow_rn 11.75% 0x000....9cf72 0.03% pow_exact_rn 0.01% dsfmt_gen_rand_all http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=676782111 I ran a perf record -a -- sleep 5 to capture what the machine was doing. perf report shows .... 2.4% of the total 88% time above is being spent in the "subsd" instruction, underlined below from the objdump. The remaining instructions down to the "jne" loop exit are at about 1% of execution time. 497716: 66 45 0f 28 cd movapd %xmm13,%xmm9 49771b: f2 45 0f 5c cf subsd %xmm15,%xmm9 497720: 66 45 0f 28 f9 movapd %xmm9,%xmm15 497725: f2 44 0f 10 4c 24 c8 movsd -0x38(%rsp),%xmm9 49772c: f2 45 0f 59 ce mulsd %xmm14,%xmm9 497731: f2 44 0f 59 74 24 e8 mulsd -0x18(%rsp),%xmm14 497738: f2 45 0f 5c cd subsd %xmm13,%xmm9 49773d: f2 44 0f 10 6c 24 f0 movsd -0x10(%rsp),%xmm13 497744: f2 44 0f 59 6c 24 c8 mulsd -0x38(%rsp),%xmm13 49774b: f2 45 0f 58 ce addsd %xmm14,%xmm9 497750: f2 45 0f 58 cd addsd %xmm13,%xmm9 497755: f2 44 0f 10 6c 24 f0 movsd -0x10(%rsp),%xmm13 49775c: f2 44 0f 59 6c 24 e8 mulsd -0x18(%rsp),%xmm13 497763: f2 45 0f 58 cd addsd %xmm13,%xmm9 497768: f2 45 0f 58 cc addsd %xmm12,%xmm9 49776d: f2 45 0f 58 f9 addsd %xmm9,%xmm15 497772: f2 44 0f 10 0d 85 00 movsd 0x30085(%rip),%xmm9 # 4c7800 <scs_sixinv+0x9180> 497779: 03 00 49777b: f2 45 0f 59 cf mulsd %xmm15,%xmm9 497780: f2 44 0f 58 0d 7f 00 addsd 0x3007f(%rip),%xmm9 # 4c7808 <scs_sixinv+0x9188> 497787: 03 00 497789: f2 45 0f 59 cf mulsd %xmm15,%xmm9 49778e: f2 44 0f 58 0d 71 1b addsd 0x21b71(%rip),%xmm9 # 4b9308 <p_n+0x2e28> 497795: 02 00 497797: f2 45 0f 59 cf mulsd %xmm15,%xmm9 49779c: f2 45 0f 59 d1 mulsd %xmm9,%xmm10 4977a1: f2 45 0f 58 d1 addsd %xmm9,%xmm10 4977a6: f2 44 0f 58 54 24 d8 addsd -0x28(%rsp),%xmm10 4977ad: f2 44 0f 59 54 24 d0 mulsd -0x30(%rsp),%xmm10 4977b4: f2 45 0f 58 da addsd %xmm10,%xmm11 4977b9: 66 45 0f 28 cb movapd %xmm11,%xmm9 4977be: f2 44 0f 5c 4c 24 d0 subsd -0x30(%rsp),%xmm9 4977c5: f2 45 0f 5c d1 subsd %xmm9,%xmm10 4977ca: 66 45 0f 28 cb movapd %xmm11,%xmm9 4977cf: f2 44 0f 58 54 24 e0 addsd -0x20(%rsp),%xmm10 4977d6: f2 45 0f 58 ca addsd %xmm10,%xmm9 4977db: 66 45 0f 28 e1 movapd %xmm9,%xmm12 4977e0: f2 45 0f 5c e3 subsd %xmm11,%xmm12 4977e5: f2 45 0f 5c d4 subsd %xmm12,%xmm10 4977ea: 0f 8c f8 00 00 00 jl 4978e8 <pow_rn+0x9c8> 4977f0: f2 44 0f 59 15 67 9c mulsd 0x29c67(%rip),%xmm10 # 4c1460 <scs_sixinv+0x2de0> 4977f7: 02 00 4977f9: f2 45 0f 58 d1 addsd %xmm9,%xmm10 4977fe: 66 45 0f 2e ca ucomisd %xmm10,%xmm9 497803: 0f 85 cf 00 00 00 jne 4978d8 <pow_rn+0x9b8> 497809: 0f 8a c9 00 00 00 jp 4978d8 <pow_rn+0x9b8> 49780f: 81 fe fe 03 00 00 cmp $0x3fe,%esi [/b]
10) Message boards : News : N-Body 1.18 (Message 58581) Posted 8 Jun 2013 by rjs5 Post: I was installing a new compiler and EVERYTHING was operating normally, I think. Please ignore my last post.
11) Message boards : News : N-Body 1.18 (Message 58580) Posted 8 Jun 2013 by rjs5 Post: I am still running but ran into what appears to be a scheduler problem. I have an 8-core i7 Sandy Bridge and an EVGA GTX 650 Ti Nvidia GPU. The CPU mt started up "running 8 cpus". The mt GPU task started up and takes 0.417 cpu. I see a pattern. Only one of the two will run under normally scheduling. They Ping-Pong back and forth where the 8-CPU mt version only runs during reload of the next GPU task and a short time following. If I suspend all GPU, the 8cpu mt starts. If I resume GPU both run for a short period of time and then the 8cpu mt job suspends. It appears that the CPU mt version wants ALL the CPU but the GPU starts and wants a CPU FRACTION with the total of the two is > total. 6/8/2013 3:07:48 PM \| Milkyway@Home \| Requesting new tasks for NVIDIA 6/8/2013 3:07:50 PM \| Milkyway@Home \| Scheduler request completed: got 0 new tasks 6/8/2013 3:07:50 PM \| Milkyway@Home \| Project has no tasks available 6/8/2013 3:15:28 PM \| Milkyway@Home \| Computation for task de_separation_79_DR8_rev_2_1370577207_678480_2 finished 6/8/2013 3:15:28 PM \| Milkyway@Home \| Starting task de_separation_79_DR8_rev_3_1370577207_1062567_0 using milkyway version 102 (opencl_nvidia) in slot 10 6/8/2013 3:15:31 PM \| Milkyway@Home \| Sending scheduler request: To fetch work. 6/8/2013 3:15:31 PM \| Milkyway@Home \| Reporting 1 completed tasks 6/8/2013 3:15:31 PM \| Milkyway@Home \| Requesting new tasks for NVIDIA 6/8/2013 3:15:33 PM \| Milkyway@Home \| Scheduler request completed: got 0 new tasks 6/8/2013 3:20:29 PM \| Milkyway@Home \| Restarting task de_nbody_06_06_dark_1370577207_85348_0 using milkyway_nbody version 118 (mt) in slot 9 6/8/2013 3:24:19 PM \| Milkyway@Home \| Computation for task de_separation_79_DR8_rev_3_1370577207_1062567_0 finished 6/8/2013 3:24:19 PM \| Milkyway@Home \| Starting task de_separation_79_DR8_rev_3_1370577207_1062563_0 using milkyway version 102 (opencl_nvidia) in slot 10 6/8/2013 3:24:24 PM \| Milkyway@Home \| Sending scheduler request: To fetch work. 6/8/2013 3:24:24 PM \| Milkyway@Home \| Reporting 1 completed tasks 6/8/2013 3:24:24 PM \| Milkyway@Home \| Requesting new tasks for NVIDIA 6/8/2013 3:24:26 PM \| Milkyway@Home \| Scheduler request completed: got 2 new tasks 6/8/2013 3:25:31 PM \| Milkyway@Home \| Sending scheduler request: To fetch work. 6/8/2013 3:25:31 PM \| Milkyway@Home \| Requesting new tasks for NVIDIA 6/8/2013 3:25:33 PM \| Milkyway@Home \| Scheduler request completed: got 0 new tasks 6/8/2013 3:25:33 PM \| Milkyway@Home \| Project has no tasks available 6/8/2013 3:33:12 PM \| Milkyway@Home \| Computation for task de_separation_79_DR8_rev_3_1370577207_1062563_0 finished 6/8/2013 3:33:12 PM \| Milkyway@Home \| Starting task de_separation_79_DR8_rev_3_1370577207_677538_2 using milkyway version 102 (opencl_nvidia) in slot 10 6/8/2013 3:33:14 PM \| Milkyway@Home \| Sending scheduler request: To fetch work. 6/8/2013 3:33:14 PM \| Milkyway@Home \| Reporting 1 completed tasks 6/8/2013 3:33:14 PM \| Milkyway@Home \| Requesting new tasks for NVIDIA 6/8/2013 3:33:16 PM \| Milkyway@Home \| Scheduler request completed: got 1 new tasks 6/8/2013 3:34:22 PM \| Milkyway@Home \| Sending scheduler request: To fetch work. 6/8/2013 3:34:22 PM \| Milkyway@Home \| Requesting new tasks for NVIDIA 6/8/2013 3:34:24 PM \| Milkyway@Home \| Scheduler request completed: got 0 new tasks 6/8/2013 3:34:24 PM \| Milkyway@Home \| Project has no tasks available 6/8/2013 3:41:13 PM \| \| Reading preferences override file 6/8/2013 3:41:13 PM \| \| Preferences: 6/8/2013 3:41:13 PM \| \| max memory usage when active: 8183.22MB 6/8/2013 3:41:13 PM \| \| max memory usage when idle: 14729.80MB 6/8/2013 3:41:13 PM \| \| max disk usage: 100.00GB 6/8/2013 3:41:13 PM \| \| don't use GPU while active 6/8/2013 3:41:13 PM \| \| suspend work if non-BOINC CPU load exceeds 25 % 6/8/2013 3:41:13 PM \| \| (to change preferences, visit a project web site or select Preferences in the Manager) 6/8/2013 3:41:28 PM \| \| Suspending GPU computation - user request 6/8/2013 3:41:28 PM \| Milkyway@Home \| Restarting task de_nbody_06_06_dark_1370577207_85348_0 using milkyway_nbody version 118 (mt) in slot 9 6/8/2013 3:41:37 PM \| \| Resuming GPU computation 6/8/2013 3:41:37 PM \| Milkyway@Home \| Restarting task de_separation_79_DR8_rev_3_1370577207_677538_2 using milkyway version 102 (opencl_nvidia) in slot 10 6/8/2013 3:42:24 PM \| Milkyway@Home \| Computation for task de_separation_79_DR8_rev_3_1370577207_677538_2 finished 6/8/2013 3:42:24 PM \| Milkyway@Home \| Starting task de_separation_79_DR8_rev_3_1370577207_1062568_0 using milkyway version 102 (opencl_nvidia) in slot 10 6/8/2013 3:42:26 PM \| Milkyway@Home \| Sending scheduler request: To fetch work. 6/8/2013 3:42:26 PM \| Milkyway@Home \| Reporting 1 completed tasks 6/8/2013 3:42:26 PM \| Milkyway@Home \| Requesting new tasks for NVIDIA 6/8/2013 3:42:29 PM \| Milkyway@Home \| Scheduler request completed: got 1 new tasks 6/8/2013 3:43:34 PM \| Milkyway@Home \| Sending scheduler request: To fetch work. 6/8/2013 3:43:34 PM \| Milkyway@Home \| Requesting new tasks for NVIDIA 6/8/2013 3:43:37 PM \| Milkyway@Home \| Scheduler request completed: got 0 new tasks 6/8/2013 3:43:37 PM \| Milkyway@Home \| Project has no tasks available 6/8/2013 3:49:42 PM \| Milkyway@Home \| Sending scheduler request: To fetch work. 6/8/2013 3:49:42 PM \| Milkyway@Home \| Requesting new tasks for NVIDIA 6/8/2013 3:49:44 PM \| Milkyway@Home \| Scheduler request completed: got 0 new tasks 6/8/2013 3:49:44 PM \| Milkyway@Home \| Project has no tasks available
12) Message boards : News : N-Body 1.18 (Message 58571) Posted 8 Jun 2013 by rjs5 Post: Thanks Richard and Jeffery (I think it was you two who put mt back in working order) Richard, It looks like I inadvertently played your "straight man". It appears that your information has put "mt" back in play. It was not automatic but I am now running (it appears) mt MilkyWay workloads with multiple CPU. UPDATING caused MW mt workload to think it was running mt mode but only used one CPU. A DETACH and ATTACH seemed to fix it. The DETACH/ATTACH seemed to work for me. I am not suggesting that it be the general solution for everyone. I leave the general solution to those who know what is going on.
13) Message boards : News : N-Body 1.18 (Message 58542) Posted 7 Jun 2013 by rjs5 Post: I have been wondering what happened to the multithreading operation on my machine. I thought my machine was configured incorrectly. Are their a lot of issues and has someone summarized them somewhere? If you have time, I would interested in knowing what the problems are. thanks
14) Message boards : News : Nbody 1.04 (Message 56848) Posted 12 Jan 2013 by rjs5 Post: On a SECOND system that is working ALMOST. There does seem to be some hyper-dependency on the elements found on the system. It does not work on a pristine system and goes pretty crazy on my working system. I will continue to try do disassemble what you are doing from the outside and see if I can find anything. I have a near pristine Ivy Bridge Core i7. I removed and restored MilkyWay on it with no change in behavior. The Nbody tasks run but 8 of them run in parallel instead of 1 with 8 threads. The files that are found in slot 1 of one of the idled 8 milkyway workloads is: boinc_task_state.xml --- <active_task> <project_master_url>http%3
15) Message boards : News : Nbody 1.04 (Message 56835) Posted 11 Jan 2013 by rjs5 Post: deleted everything and reinstalled. same error. is it possible to check system calls for error status after the first call so some better diagionstics could be returned? doesn't seem too hard.
16) Message boards : News : Nbody 1.04 (Message 56834) Posted 11 Jan 2013 by rjs5 Post: I probably have a non-standard installation. I put the ProgramData directory on an SSD drive "K:\". Is it possible that the project makes an assumption that the program data is on the same drive as the binary? C:\Program Files\BOINC K:\ProgramData\BOINC\projects\milkyway.cs.rpi.edu_milkyway
17) Message boards : News : Nbody 1.04 (Message 56829) Posted 11 Jan 2013 by rjs5 Post: I am running Win7 64-bit and Boinc 7.0.28 64-bit (computer 399149 ) and all the MilkyWay@Home N-Body Simulation v1.04 tasks error out in just a few seconds. I looked over this thread and was wondering if there is anything I need to fix to stop the error outs? I tried removing and reattaching the project with no change in behavior. The output seems to be a short stderr message. Thoughts? thanks rjs Stderr output <core_client_version>7.0.28</core_client_version> <![CDATA[ <message> - exit code -1073741515 (0xc0000135) </message> ]]> http://milkyway.cs.rpi.edu/milkyway/results.php?userid=135958&offset=0&show_names=0&state=5&appid=7
18) Message boards : News : issues with workunits crashing might be fixed now and nbody work generation information (Message 55022) Posted 5 Jul 2012 by rjs5 Post: It is hard to tell what is happening with this stripped, statically linked program. I don't know how the program manages the different system call interfaces (various versions of Linux) with a single static link. It has been a long time since I have seen anyone statically link anything. Intel has released a new beta version of their compiler that performs dynamic, runtime pointer checking that might help locate the bug but there is nothing MilkyWay@Home users can do to help other than to say .... still failing. http://software.intel.com/en-us/articles/beta-tech-talks/ They have both a Fortan and C compiler that should help clean up bogus pointers. They can run a test on the application and locate their corrupted pointer. An objdump of the application shows that there are AVX instructions in what I guess is the OpenMP code. I have Sandy Bridge and Ivy Bridge systems and ONE Nehalem system. The Nehalem system seems to work. The Sandy/Ivy bridge systems using AVX (via OpenMP) are failing. If you get a computation error, the choices are to (1) turn off work or (2) let the compute errors pile up. The workloads fail pretty rapidly so I am going to let the compute errors filter back to the system and it will be clear when they have fixed the bug.
19) Message boards : News : issues with workunits crashing might be fixed now and nbody work generation information (Message 55010) Posted 4 Jul 2012 by rjs5 Post: I am having similar failures running on Linux 64-bit. I am running an old version of Boinc which is the one that is easiest to get running on CentOS. I saw similar problems on Einstein and was able to clear compute errors by doing an "ldd" to see which libraries that Einstein could not find. For Einstein, I had to install some 32-bit versions of libraries (GLUT,...). Milkyway is statically linked and stripped of symbols so missing libraries is not the problem for Milkyway. rod Task 249098203 Stderr output <core_client_version>6.10.45</core_client_version> <![CDATA[ <message> process exited with code 15 (0xf, -241) </message> <stderr_txt> <search_application> milkyway_nbody 0.88 Linux x86_64 double OpenMP, Crlibm </search_application> Using OpenMP 4 max threads on a system with 4 processors Error reading histogram line 37: massPerParticle = 0.000100 23:07:55 (8730): called boinc_finish </stderr_txt> ]]>