Welcome to MilkyWay@home

"Failed to calculate integral 0 Failed to calculate likelihood" errors

Questions and Answers : Unix/Linux : "Failed to calculate integral 0 Failed to calculate likelihood" errors
Message board moderation

To post messages, you must log in.

AuthorMessage
VictordeHollander

Send message
Joined: 9 Nov 10
Posts: 19
Credit: 71,077,081
RAC: 0
Message 67324 - Posted: 11 Apr 2018, 16:31:33 UTC

Hi,

Does anybody know what is causing the
"Failed to calculate integral 0
Failed to calculate likelihood"
errors?


Likelihood time = 2.088974 s
<background_integral3> 0.000135044562893 </background_integral3>
<stream_integral3>  73.177979946439066  191.871483289292200  122.601402164019433 </stream_integral3>
<background_likelihood3> -3.327750849128395 </background_likelihood3>
<stream_only_likelihood3>  -3.332280687700035  -4.324665411311142  -8.682139791419360 </stream_only_likelihood3>
<search_likelihood3> -2.929028725467456 </search_likelihood3>
Using SSE3 path
Found 1 platform
Platform 0 information:
  Name:       AMD Accelerated Parallel Processing
  Version:    OpenCL 2.0 AMD-APP (1912.5)
  Vendor:     Advanced Micro Devices, Inc.
  Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Profile:    FULL_PROFILE
Using device 0 on platform 0
Found 1 CL device
Device 'Tahiti' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Board: AMD Radeon HD 7900 Series 
Driver version:      1912.5 (VM)
Version:             OpenCL 1.2 AMD-APP (1912.5)
Compute capability:  0.0
Max compute units:   28
Clock frequency:     800 Mhz
Global mem size:     2896491072
Local mem size:      32768
Max const buf size:  65536
Double extension:    cl_khr_fp64
Estimated AMD GPU GFLOP/s: 2867 SP GFLOP/s, 717 DP FLOP/s
Using a target frequency of 60.0
Using a block size of 7168 with 78 blocks/chunk
Using clWaitForEvents() for polling (mode -1)
Range:          { nu_steps = 320, mu_steps = 800, r_steps = 700 }
Iteration area: 560000
Chunk estimate: 1
Num chunks:     2
Chunk size:     559104
Added area:     558208
Effective area: 1118208
Initial wait:   20 ms
Integration time: 11.634570 s. Average time per iteration = 36.358030 ms
Integral 0 time = 11.993249 s
Failed to calculate integral 0
Failed to calculate likelihood


For instance in this WU:
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=2308241462

It calculates a lot of streams successfully (and randomly fails at that point)

The system has 600+ valid tasks and some (32 at the moment) invalid and a few with error status (7).

The ones that failed have different de_modfit_XX and it seems to happen at random?

Is this a hardware or driver or BOINC issue?

OS: Ubuntu 14.04
GPU: AMD HD7950
BOINC: 7.9.3
ID: 67324 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
VictordeHollander

Send message
Joined: 9 Nov 10
Posts: 19
Credit: 71,077,081
RAC: 0
Message 67334 - Posted: 15 Apr 2018, 16:08:42 UTC

I installed Windows 10 Pro (1709) on the hardware and it now runs without errors (700+ valid tasks). Previously it would produce the error (above) in about 1 every 20 tasks (so 1 "Failed to calculate likelihood" in about 100 WUs/streams).

Now that I know the hardware is fine, I suspect it is one of these:
1. the AMD graphic cards drivers for Linux (I used the .deb package for Ubuntu 14.04.2)
2. BOINC client (7.9.3 on Ubuntu vs. 7.8.3 on Win10)
3. Priority (Ubuntu runs BOINC and subprocesses at "nice 10", so lower than standard/normal, while Windows at Normal/standard priority (equivalent to nice 0). The lower priority could mean it takes too long before the task gets CPU time and errors out. I can change the nicelevel of the boinc-client to 0 on Ubuntu with superuser commands, but every OpenCL process/WU start with nice 10 again.
ID: 67334 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
VictordeHollander

Send message
Joined: 9 Nov 10
Posts: 19
Credit: 71,077,081
RAC: 0
Message 67335 - Posted: 15 Apr 2018, 16:12:45 UTC

or 4. the Milkyway OpenCL Linux executable
ID: 67335 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 708
Credit: 543,191,675
RAC: 142,277
Message 67507 - Posted: 19 May 2018, 20:49:09 UTC - in response to Message 67334.  

I installed Windows 10 Pro (1709) on the hardware and it now runs without errors (700+ valid tasks). Previously it would produce the error (above) in about 1 every 20 tasks (so 1 "Failed to calculate likelihood" in about 100 WUs/streams).

Now that I know the hardware is fine, I suspect it is one of these:
1. the AMD graphic cards drivers for Linux (I used the .deb package for Ubuntu 14.04.2)
2. BOINC client (7.9.3 on Ubuntu vs. 7.8.3 on Win10)
3. Priority (Ubuntu runs BOINC and subprocesses at "nice 10", so lower than standard/normal, while Windows at Normal/standard priority (equivalent to nice 0). The lower priority could mean it takes too long before the task gets CPU time and errors out. I can change the nicelevel of the boinc-client to 0 on Ubuntu with superuser commands, but every OpenCL process/WU start with nice 10 again.


I may have a solution/suggestion for you. I run a bash file at startup that permanently assigns affinity and process level for my Seti applications. It uses an app called schedtool that can be retrieved from the repository. I just set the nice level of each application that I want to run with fixed priority. I downlevel the cpu apps and uplevel the gpu apps.

This is what the file looks like. You can get an idea and make a similar script that calls your specific application and allow you to raise the scheduling priority.

#Run in root terminal, NOT sudo


nvidia-smi -pm 1

for (( ; ; ))
do
  # Assign CPU Priority (19=Nice/LowPriority, 0=Normal, -20=HighPriority)
 # This was code Petri gave out
 # GPU Tasks get high Priority
  schedtool -n -20 `pidof setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda90`
  schedtool -n -20 `pidof astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100`
 # CPU Tasks get (a little) Below Normal Priority (0 being normal) to make sure it doesn't choke the OS
  schedtool -n   5 `pidof ap_7.05r2728_sse3_linux64`
  schedtool -n   5 `pidof MBv8_8.22r3711_sse41_x86_64-pc-linux-gnu`

  # Assign CPU Usage Threads (0-7)
 # Brent added this to Petri's code
 # Keep GPU tasks on threads 1 3 5 7 9 11 13 15
  schedtool -a 1,3,5,7,9,11,13,15 `pidof setiathome_x41p_zi3v_x86_64-pc-linux-gnu_cuda90`
  schedtool -a 1,3,5,7,9,11,13,15 `pidof astropulse_7.08_x86_64-pc-linux-gnu__opencl_nvidia_100`
 # Keep CPU tasks on threads 0 2 4 6 8 10 12 14
  schedtool -a 0,2,4,6,8,10,12,14 `pidof MBv8_8.22r3711_sse41_x86_64-pc-linux-gnu`
  schedtool -a 0,2,4,6,8,10,12,14 `pidof ap_7.05r2728_sse3_linux64`


  #    CPU Priority Assignment Script
  date
  # lscpu | grep MHz
  sleep 5
  echo  "  CPU Priority and Assignment Script (8 Threads)" 
done


You just run it from a root terminal and then minimize the script and leave it running. It runs every 5 seconds to pick up the next task being run.

You would have to alter the nvidia-setting persistence line to whatever is similar or needed for your AMD cards.
ID: 67507 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : "Failed to calculate integral 0 Failed to calculate likelihood" errors

©2024 Astroinformatics Group