Posts by Greg Tippitt

1) Message boards : Number crunching : Low Credits for OpenMP N-Body work units? (Message 67233) Posted 9 Mar 2018 by Greg Tippitt Post: Have others of you noticed that it takes 6 to 8 times as many CPU seconds for the same credit on N-Body OpenMP work units, than it takes for non-NBody Milkyway CPU jobs? I don't mean GPU versus CPU, but rather CPU versus CPU jobs? In the past for Milkyway I have mainly run GPU work units, while running CPU tasks for other BOINC projects. I recently started running CPU tasks for Milkyway and have found that for given number of CPU seconds, I get more than 6 times as much credit for MilkyWay@Home v1.46 as I do for N-Body Simulation tasks. On my machine (details at link below) that has 4 hex core AMD CPUs (26 real cores), I get 1 credit for every 30 CPU seconds on MW@Hv1.46 jobs, but it takes 175 CPU seconds to get 1 credit on N-Body Sim jobs. This disparity persists whether I am running 8, 10, or 16 core N-Body jobs. http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=718220 I have triple checked, and I am not comparing the the Radeon GPU jobs with the N-Body OpenMP CPU tasks. This is CPU to CPU comparison. I run jobs for several BOINC projects, and I am accustomed to some projects giving lots more credit for the same processing time, whether CPU or GPU. On projects that have multiple types of jobs, a given project normally gives fairly comparable credit per their different applications unless there is a reason for doing otherwise. Last year I was running work for another project that temporarily gave us double credit while they were testing a new application, because half of the work units would run for hours and then crash without us getting any credit. Once they debugged the new application and work units quit crashing, then they dropped the extra-credit on that application to be comparable to their other applications. Another project had one application where the work units required 4 times as much RAM as their other applications, so they gave a bit more credit for the RAM hungry application. The very low credit for the N-Body application seems illogical, unless it's a private joke and "n-body" is a pun for the application "nobody" will want to run. Greg
2) Message boards : Number crunching : N-Body long processing time (Message 62306) Posted 10 Sep 2014 by Greg Tippitt Post: Your CPU has 4 full cores and the other 4 Hyper-threaded ones. I have only one machine with an Intel hyper-threaded CPU, and the performance of the hyper-thread can vary widely. For some stuff it performs just like a dual core, but for other stuff, the second thread seems to help much less. The best I've come up with, it seems that small repetitive junks of code, like benchmarks, that stay within the L1 cache will produce results almost the same as a full dual core. For programs that have long set of instructions being fetched from outside the cache, the secondary thread helps by getting the next instruction ready while the main core is executing. When you get two long sets of instructions being run on both a main core and its secondary thread, the performance for me was not close to that of two full cores. My experience with this was not with BOINC apps, but with some statistical programming I was doing a some time ago. Intel improved the hyper-threading with later releases, and compilers have also improved their optimization for hyper-threading, so results can vary. In general, Intel's CPUs are faster at floating point math than AMDs, but AMD's real Hex cores are sometimes faster than Intel's hyper-quad 8's. My experience with N-Body tasks is that they seldom take more than 3 hours on my machines, which have 4 hex core AMD Opteron CPUs. The OpenMP multi-threading used in the N-Body tasks is limited to only using 16 of my 24 cores, which is fine since I normally keep BOINC using only 20 of the 24 cores, so that the other 4 cores are available for IO, system overhead, and feeding data to GPU tasks running on the machine as well. With all 24 cores in use, the contention for resources was so high that overall CPU utilization was lower with 24 tasks than with 20 task, since many of the 24 tasks were in wait states much of the time. Similarly your hyper-threaded tasks my be interrupting tasks on the main core. For example I use the cc_config.xml file below. You could try the "ncpu" option and let it run tasks for a day at 8 and compare the run times with 4, or something in between. To compare the run times with different settings use the following formula: RunTime / threads(8) * ncpu(4) If your run times are something like 8 hours with ncpu = 8 and 4 hours with ncpu = 4 then you are better off with using all 8 threads. If your run time with ncpu = 4 is less than half the time as when using 8 threads, then you are better off with the lower ncpu settings. Multi-threaded performance estimation of far from straightforward. If your 8 threads were all full cores at 3.4ghz, your machine would likely run faster than my 24 cores at 2.4 ghz. For a single task, yours is almost 50% faster, but for 20 separate tasks running, mine would complete more tasks in a given time. For multi-threaded tasks like the n-body tasks, the results are less clear. For this reason sometimes the result validation is not as clear as with other BOINC tasks, and it takes some time before you get credit on n-body tasks. On the other hand, once you get your system configured, n-body multi-threaded tasks can eat lots of data quickly. It's the same principle that lets GPU tasks run so quickly. Some work is not easily ported to OpenCl on GPUs, which have limited instruction capability compared to the CPUs. These OpenMP tasks like n-body, allow the programers to use all of the complex instructions of the CPU, and line up multiple threads knocking over dominoes quickly like a GPU with more limited instruction sets. Good Luck, Greg <cc_config> <options> <allow_remote_gui_rpc>1</allow_remote_gui_rpc> <use_all_gpus>1</use_all_gpus> <fetch_on_update>1</fetch_on_update> <exclusive_app>synaptic</exclusive_app> <exclusive_gpu_app>vlc</exclusive_gpu_app> <start_delay>60</start_delay> <ncpus>20</ncpus> </options> </cc_config>
3) Message boards : Number crunching : Problem with OpenCL tasks on NVIDIA Telsa C1060 (Message 62305) Posted 9 Sep 2014 by Greg Tippitt Post: I have 5 machines that I'm trying to get to run OpenCL tasks for MilkyWay. They all have the same hardware. They each have 4 hex core AMD Opteron 8431 CPUs and NVIDIA Telsa C1060 GPUs. These GPUs have 4GB memory and support double precision. The GPUs run tasks for SETI, POEM, EINSTEIN, and GPUGRID without problems. On some other MW@H tasks, I have had errors with difficulty writing finished files, which have been due to an unstable NFS disk server, but I've gotten those cleaned up. I don't know where to start with these GPU errors from MilkyWay, since the GPUs are working for other projects okay, and the CPUs are working on MilkyWay okay as well. Any help would be greatly appreciated. N-Body Simulation v1.42 jobs runs fine with it using 16 of the 24 cores on the machine to runs these really fast. Separation tasks on CPU run fine as well. But GPU tasks for NVIDIA OpenCL for both @home v1.02 and Separation (Modified Fit) v1.30 end with errors. The links for my hardware and the examples of three tasks are below. Thanks and good luck with the fund raiser, I can't wait for my t-shirt to arrive. Greg ================================== http://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=583640 NVIDIA GPU 0: Tesla T10 Processor driver version 340.32 CUDA version 6.5 compute capability 1.3 OpenCL 1.0 4096MB, 4041MB available, 933 GFLOPS peak Processor: AMD Six-Core AMD Opteron(tm) Processor 8431 [Family 16 Model 8 Stepping 0] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock nrip_save pausefilter OS: Linux: 3.13.0-34-generic ====================================== Application version MilkyWay@Home N-Body Simulation v1.42 (mt) http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=820222153 <core_client_version>7.2.42</core_client_version> <![CDATA[ <stderr_txt> <search_application> milkyway_nbody 1.42 Linux x86_64 double OpenMP, Crlibm </search_application> Using OpenMP 16 max threads on a system with 24 processors RHO MAX IS 7.17600 7.17600<search_likelihood>-5.740629918799274</search_likelihood> 14:49:03 (21689): called boinc_finish </stderr_txt> ]]> ===================================== Application version Milkyway@Home Separation (Modified Fit) v1.30 (opencl_nvidia) http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=825033456 Stderr output <core_client_version>7.2.42</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> <search_application> milkyway_separation 1.30 Linux x86_64 double OpenCL </search_application> Reading preferences ended prematurely BOINC GPU type suggests using OpenCL vendor 'NVIDIA Corporation' Setting process priority to 0 (13): Permission denied Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Switching to Parameter File Using SSE3 path Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR Failed to get information about device Error getting device and context (1): MW_CL_ERROR Failed to calculate likelihood <background_integral> nan </background_integral> <stream_integral> nan nan nan </stream_integral> <background_likelihood> nan </background_likelihood> <stream_only_likelihood> nan nan nan </stream_only_likelihood> <search_likelihood> nan </search_likelihood> 10:22:39 (3969): called boinc_finish </stderr_txt> ]]> --------------------------------------- Application version MilkyWay@Home v1.02 (opencl_nvidia) http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=824774451 <core_client_version>7.2.42</core_client_version> <![CDATA[ <message> process exited with code 1 (0x1, -255) </message> <stderr_txt> <search_application> milkyway_separation 1.02 Linux x86_64 double OpenCL </search_application> Unrecognized XML in project preferences: max_gfx_cpu_pct Skipping: 0 Skipping: /max_gfx_cpu_pct Unrecognized XML in project preferences: allow_non_preferred_apps Skipping: 1 Skipping: /allow_non_preferred_apps Unrecognized XML in project preferences: nbody_graphics_poll_period Skipping: 30 Skipping: /nbody_graphics_poll_period Unrecognized XML in project preferences: nbody_graphics_float_speed Skipping: 5 Skipping: /nbody_graphics_float_speed Unrecognized XML in project preferences: nbody_graphics_textured_point_size Skipping: 250 Skipping: /nbody_graphics_textured_point_size Unrecognized XML in project preferences: nbody_graphics_point_point_size Skipping: 40 Skipping: /nbody_graphics_point_point_size BOINC GPU type suggests using OpenCL vendor 'NVIDIA Corporation' Setting process priority to 0 (13): Permission denied Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Error reading astronomy parameters from file 'astronomy_parameters.txt' Trying old parameters file Using SSE3 path Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR Failed to get information about device Error getting device and context (1): MW_CL_ERROR Failed to calculate likelihood <background_integral> nan </background_integral> <stream_integral> nan nan nan </stream_integral> <background_likelihood> nan </background_likelihood> <stream_only_likelihood> nan nan nan </stream_only_likelihood> <search_likelihood> nan </search_likelihood> 04:43:39 (4471): called boinc_finish </stderr_txt> ]]> ---------------------------------------------