Author | Message |
Jeffery M. Thompson Volunteer moderator Project administrator Project developer Project tester Project scientist
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0
|
After a summer and autumn of intense development, we are ready to restart the nbody runs. We have updated the code and have working binaries for Linux 64bit, Linux 32bit, Windows 64bit and Mac 64bit. At a later date I will be posting the binary for Windows 32.
Please let us know how the new code is running.
Previous n_body code was labeled 1.00 in mistake, this new release continues the versioning system previously in use and will be version 0.94
Thank you all!
Jeff Thompson
|
|
Jake Bauer Project developer Project tester Project scientist
Send message Joined: 20 Aug 12 Posts: 66 Credit: 406,916 RAC: 0
|
This is excellent. I haven't gotten any jobs yet, but I'm sure I will. I can't wait to see how the results are turning out later.
Jake
|
|
Arif Mert Kapicioglu
Send message Joined: 14 Dec 09 Posts: 161 Credit: 589,318,064 RAC: 0
|
Hello,
Imminent error on nbody tasks.
Edit code -1073741515 (0xffffffffc0000135) Unknown error number
|
|
BadBike2
Send message Joined: 11 Feb 11 Posts: 5 Credit: 21,146,417 RAC: 0
|
I'm getting computation errors here as well. The task "Exit Status" reads:
"-1073741515 (0xffffffffc0000135) Unknown error number"
I should note that I'm getting crash notices within Vista itself. I'm running Vista SP2, and computing with a Phenom 9850 and HD7770.
|
|
Andrey Fesenko (anfes)
Send message Joined: 4 Oct 08 Posts: 1 Credit: 293,485,667 RAC: 0
|
|
|
Jeffery M. Thompson Volunteer moderator Project administrator Project developer Project tester Project scientist
Send message Joined: 23 Sep 12 Posts: 159 Credit: 16,977,106 RAC: 0
|
We are noticing these trends.
Every linux job that has failed is on version 1.00 and needs to be on version 0.94
Linux users may need to manually update their clients.
The other issues are on Windows Clients we are researching at this time.
We are disabling the Windows Client and retesting it.
Jeff Thompson
|
|
Cliff Harding
Send message Joined: 2 Jul 09 Posts: 27 Credit: 253,069,838 RAC: 0
|
Only run Milkyway when running out of work for other projects. After attaching to the project and noticed the following on the N-BODY tasks (System is i7/950, Win7/64-bit, 6Gb ram 1 x EVGA GTX660SC 2Gb, 1 x EVGA GTX460SE 2 1Gb, Nvidia 310.33):
11/12/2012 16:23:57 | Milkyway@Home | [sched_op] Reason: Unrecoverable error for task ps_nbody_plus_slice_emd_1_1352203202_7185_0 ( - exit code -1073741515 (0xc0000135))
11/12/2012 16:23:57 | Milkyway@Home | [sched_op] Reason: Unrecoverable error for task de_nbody_plus_slice_emd_1_1352203202_367_1 ( - exit code -1073741515 (0xc0000135))
11/12/2012 16:23:57 | Milkyway@Home | [sched_op] Reason: Unrecoverable error for task ps_nbody_plus_slice_emd_1_1352203202_7186_0 ( - exit code -1073741515 (0xc0000135))
11/12/2012 16:23:59 | Milkyway@Home | [sched_op] Reason: Unrecoverable error for task ps_nbody_plus_slice_emd_1_1352203202_7167_0 ( - exit code -1073741515 (0xc0000135))
11/12/2012 16:23:59 | Milkyway@Home | [sched_op] Reason: Unrecoverable error for task ps_nbody_plus_slice_emd_1_1352203202_6336_1 ( - exit code -1073741515 (0xc0000135))
11/12/2012 16:23:59 | Milkyway@Home | [sched_op] Reason: Unrecoverable error for task de_nbody_plus_slice_emd_1_1352203202_175_1 ( - exit code -1073741515 (0xc0000135))
11/12/2012 16:24:00 | Milkyway@Home | [sched_op] Reason: Unrecoverable error for task de_nbody_plus_slice_emd_1_1352203202_1248_2 ( - exit code -1073741515 (0xc0000135))
11/12/2012 16:24:00 | Milkyway@Home | [sched_op] Reason: Unrecoverable error for task de_nbody_plus_slice_emd_1_1352203202_6994_1 ( - exit code -1073741515 (0xc0000135))
11/12/2012 16:24:00 | Milkyway@Home | [sched_op] Reason: Unrecoverable error for task de_nbody_plus_slice_emd_1_1352203202_1219_3 ( - exit code -1073741515 (0xc0000135))
Have set project to NNT.
[edit] There was no run time on these tasks. Tasks start and immediately ended. [/edit]
I don't buy computers, I build them!
|
|
POPSIE
Send message Joined: 25 Jan 11 Posts: 12 Credit: 16,960,651 RAC: 0
|
On Windows 8
Stderr Ausgabe
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
BOINC: parse gpu_opencl_dev_index 0
<search_application> milkyway_separation 1.02 Windows x86_64 double OpenCL </search_application>
Unrecognized XML in project preferences: max_gfx_cpu_pct
Skipping: 100
Skipping: /max_gfx_cpu_pct
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Using SSE3 path
Found 1 platform
Platform 0 information:
Name: AMD Accelerated Parallel Processing
Version: OpenCL 1.2 AMD-APP (923.1)
Vendor: Advanced Micro Devices, Inc.
Extensions: cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_khr_d3d10_sharing
Profile: FULL_PROFILE
Using device 0 on platform 0
Found 2 CL devices
Device 'ATI RV770' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Driver version: CAL 1.4.1720
Version: OpenCL 1.0 AMD-APP (923.1)
Compute capability: 0.0
Max compute units: 10
Clock frequency: 575 Mhz
Global mem size: 268435456
Local mem size: 16384
Max const buf size: 65536
Double extension: cl_khr_fp64
Build log:
--------------------------------------------------------------------------------
LOOP UNROLL: pragma unroll (line 288)
Unrolled as requested!
LOOP UNROLL: pragma unroll (line 280)
Unrolled as requested!
LOOP UNROLL: pragma unroll (line 273)
Unrolled as requested!
LOOP UNROLL: pragma unroll (line 244)
Unrolled as requested!
LOOP UNROLL: pragma unroll (line 202)
Unrolled as requested!
--------------------------------------------------------------------------------
Using AMD IL kernel
Binary status (0): CL_SUCCESS
Estimated AMD GPU GFLOP/s: 920 SP GFLOP/s, 184 DP FLOP/s
Using a target frequency of 60.0
Using a block size of 2560 with 145 blocks/chunk
Using clWaitForEvents() for polling (mode -1)
Range: { nu_steps = 640, mu_steps = 1600, r_steps = 1400 }
Iteration area: 2240000
Chunk estimate: 6
Num chunks: 7
Chunk size: 371200
Added area: 358400
Effective area: 2598400
Initial wait: 13 ms
Integration time: 91.939081 s. Average time per iteration = 143.654814 ms
Integral 0 time = 93.596521 s
Estimated AMD GPU GFLOP/s: 920 SP GFLOP/s, 184 DP FLOP/s
Using a target frequency of 60.0
Using a block size of 2560 with 145 blocks/chunk
Using clWaitForEvents() for polling (mode -1)
Range: { nu_steps = 640, mu_steps = 800, r_steps = 1400 }
Iteration area: 1120000
Chunk estimate: 3
Num chunks: 4
Chunk size: 371200
Added area: 364800
Effective area: 1484800
Initial wait: 13 ms
Integration time: 40.721433 s. Average time per iteration = 63.627240 ms
Integral 1 time = 41.992103 s
Estimated AMD GPU GFLOP/s: 920 SP GFLOP/s, 184 DP FLOP/s
Using a target frequency of 60.0
Using a block size of 2560 with 145 blocks/chunk
Using clWaitForEvents() for polling (mode -1)
Range: { nu_steps = 640, mu_steps = 800, r_steps = 1400 }
Iteration area: 1120000
Chunk estimate: 3
Num chunks: 4
Chunk size: 371200
Added area: 364800
Effective area: 1484800
Initial wait: 13 ms
Integration time: 42.888748 s. Average time per iteration = 67.013669 ms
Integral 2 time = 43.638151 s
Running likelihood with 66200 stars
Likelihood time = 0.342848 s
Non-finite result
Failed to calculate likelihood
<background_integral> 0.000170892611007 </background_integral>
<stream_integral> 0.000000000000000 0.084049238267364 79.889389034944514 </stream_integral>
<background_likelihood> -3.006227472513750 </background_likelihood>
<stream_only_likelihood> -1.#IND00000000000 -219.635048495225450 -3.920174169135421 </stream_only_likelihood>
<search_likelihood> -241.000000000000000 </search_likelihood>
14:19:26 (3632): called boinc_finish
</stderr_txt>
]]>
|
|
POPSIE
Send message Joined: 25 Jan 11 Posts: 12 Credit: 16,960,651 RAC: 0
|
On Linux OpenSuse with MilkyWay@Home N-Body Simulation v0.94
Endstatus -1073741515 (0xffffffffc0000135) Unknown error number
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
- exit code -1073741515 (0xc0000135)
</message>
]]>
On Linux OpenSuse with MilkyWay@Home N-Body Simulation v1.00
Endstatus 193 (0xc1) EXIT_SIGNAL
<core_client_version>7.0.28</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
<search_application> milkyway_nbody 0.94 Linux x86 double , Crlibm </search_application>
Warning: not applying timestep correction for workunit with min version 0.80
<search_likelihood>-78.873167893643412</search_likelihood>
*** glibc detected *** ../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.00_i686-pc-linux-gnu__mt: free(): invalid next size (normal): 0x094e0140 ***
======= Backtrace: =========
/lib/i386-linux-gnu/libc.so.6(+0x73e42)[0x575e42]
../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.00_i686-pc-linux-gnu__mt(destroyNBodyState+0x72)[0x806c0d2]
../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.00_i686-pc-linux-gnu__mt(nbMain+0x2ae)[0x80649fe]
../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.00_i686-pc-linux-gnu__mt(main+0x28d)[0x806258d]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x51b4d3]
../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.00_i686-pc-linux-gnu__mt[0x8062745]
======= Memory map: ========
0016d000-00184000 r-xp 00000000 2b:00 12003 /rofs/lib/i386-linux-gnu/libpthread-2.15.so
00184000-00185000 r--p 00016000 2b:00 12003 /rofs/lib/i386-linux-gnu/libpthread-2.15.so
00185000-00186000 rw-p 00017000 2b:00 12003 /rofs/lib/i386-linux-gnu/libpthread-2.15.so
00186000-00188000 rw-p 00000000 00:00 0
00502000-006a1000 r-xp 00000000 2b:00 12054 /rofs/lib/i386-linux-gnu/libc-2.15.so
006a1000-006a3000 r--p 0019f000 2b:00 12054 /rofs/lib/i386-linux-gnu/libc-2.15.so
006a3000-006a4000 rw-p 001a1000 2b:00 12054 /rofs/lib/i386-linux-gnu/libc-2.15.so
006a4000-006a7000 rw-p 00000000 00:00 0
007ac000-007ad000 r-xp 00000000 00:00 0 [vdso]
007b2000-007b9000 r-xp 00000000 2b:00 11978 /rofs/lib/i386-linux-gnu/librt-2.15.so
007b9000-007ba000 r--p 00006000 2b:00 11978 /rofs/lib/i386-linux-gnu/librt-2.15.so
007ba000-007bb000 rw-p 00007000 2b:00 11978 /rofs/lib/i386-linux-gnu/librt-2.15.so
00d9c000-00dc6000 r-xp 00000000 2b:00 12060 /rofs/lib/i386-linux-gnu/libm-2.15.so
00dc6000-00dc7000 r--p 00029000 2b:00 12060 /rofs/lib/i386-linux-gnu/libm-2.15.so
00dc7000-00dc8000 rw-p 0002a000 2b:00 12060 /rofs/lib/i386-linux-gnu/libm-2.15.so
00dcf000-00deb000 r-xp 00000000 2b:00 12056 /rofs/lib/i386-linux-gnu/libgcc_s.so.1
00deb000-00dec000 r--p 0001b000 2b:00 12056 /rofs/lib/i386-linux-gnu/libgcc_s.so.1
00dec000-00ded000 rw-p 0001c000 2b:00 12056 /rofs/lib/i386-linux-gnu/libgcc_s.so.1
00e00000-00e20000 r-xp 00000000 2b:00 12062 /rofs/lib/i386-linux-gnu/ld-2.15.so
00e20000-00e21000 r--p 0001f000 2b:00 12062 /rofs/lib/i386-linux-gnu/ld-2.15.so
00e21000-00e22000 rw-p 00020000 2b:00 12062 /rofs/lib/i386-linux-gnu/ld-2.15.so
08048000-08120000 r-xp 00000000 00:19 47848874 /opt/x86/boinc/work/pc23/projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.00_i686-pc-linux-gnu__mt
08120000-08121000 r--p 000d8000 00:19 47848874 /opt/x86/boinc/work/pc23/projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.00_i686-pc-linux-gnu__mt
08121000-08122000 rw-p 000d9000 00:19 47848874 /opt/x86/boinc/work/pc23/projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.00_i686-pc-linux-gnu__mt
08122000-08143000 rw-p 00000000 00:00 0
09254000-09504000 rw-p 00000000 00:00 0 [heap]
b76be000-b7736000 rw-s 00000000 00:19 47972438 /opt/x86/boinc/work/pc23/slots/2/boinc_milkyway_nbody_0.94_i686-pc-linux-gnu__sse2_2
b7785000-b7788000 rw-p 00000000 00:00 0
b7788000-b7789000 rw-p 00000000 00:00 0
b7789000-b778a000 ---p 00000000 00:00 0
b778a000-b778d000 rw-p 00000000 00:00 0
b778d000-b778f000 rw-s 00000000 00:19 47972632 /opt/x86/boinc/work/pc23/slots/2/boinc_mmap_file
b778f000-b7791000 rw-p 00000000 00:00 0
bfeb9000-bfeda000 rw-p 00000000 00:00 0 [stack]
SIGABRT: abort called
Stack trace (12 frames):
../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.00_i686-pc-linux-gnu__mt(boinc_catch_signal+0x1ad)[0x80c7ca7]
[0x7ac400]
[0x7ac416]
/lib/i386-linux-gnu/libc.so.6(gsignal+0x4f)[0x5301ef]
/lib/i386-linux-gnu/libc.so.6(abort+0x175)[0x533835]
/lib/i386-linux-gnu/libc.so.6(+0x692fa)[0x56b2fa]
/lib/i386-linux-gnu/libc.so.6(+0x73e42)[0x575e42]
../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.00_i686-pc-linux-gnu__mt(destroyNBodyState+0x72)[0x806c0d2]
../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.00_i686-pc-linux-gnu__mt(nbMain+0x2ae)[0x80649fe]
../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.00_i686-pc-linux-gnu__mt(main+0x28d)[0x806258d]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x51b4d3]
../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.00_i686-pc-linux-gnu__mt[0x8062745]
Exiting...
</stderr_txt>
]]>
Or On Linux OpenSuse with MilkyWay@Home N-Body Simulation v1.00
Serverstatus Abgeschlossen
Resultat Erfolgreich
Clientstatus Fertig
Prüfungsstatus Arbeitspaket fehlerhaft - Prüfung übersprungen
<core_client_version>7.0.31</core_client_version>
<![CDATA[
<stderr_txt>
<search_application> milkyway_nbody 0.94 Darwin x86_64 double OpenMP, Crlibm </search_application>
Using OpenMP 1 max threads on a system with 12 processors
Warning: not applying timestep correction for workunit with min version 0.80
<search_likelihood>-78.873167893643412</search_likelihood>
20:35:00 (39930): called boinc_finish
</stderr_txt>
]]>
Why is trying to use NVIDIA, this PC has got no GPU!?
<core_client_version>6.12.34</core_client_version>
<![CDATA[
<message>
process exited with code 7 (0x7, -249)
</message>
<stderr_txt>
<search_application> milkyway_nbody 0.94 Linux x86_64 double OpenCL, Crlibm </search_application>
FATAL: Module nvidia not found.
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
02:40:48 (29064): called boinc_finish
</stderr_txt>
]]>
|
|
GALAXY-VOYAGER
Send message Joined: 25 Oct 12 Posts: 3 Credit: 164,089 RAC: 0
|
It's really too early for me to tell if there has been any changes. I've only been using BOING Projects since 21st October 2012. I haven't had any noticable situations with M@H so far, except that the Time Remaining Clock has been Counting Upwards instead of Downwards, but this has been happening with other Projects too. I was informed in another Forum that it's Normal because thats the way The Program Works with the Tasks. However, I can't see why it should operate that way. Why would a Countdown Clock Count Upwards. Its like the timer on a Microwave Oven, for example, that has been set for 30 Minutes, but instead of it Counting Down to Zero Minutes, it Counts Upwards and the Remaining Time just gets Increases, as Time Passes By. Thank goodness that we can see that The PROGRESS Colum shows that The Percentage of Work Done is actually Increasing, therefore confirming that the Work unit is being Processed.
Also, the Elapsed Time Clock IS working Correctly, and DOES Increase as expected. So thats another thing that shows that its doing its job.
Anyway, I hope that thee changes you have made work the way you intend them too.
[/code]
|
|
Byron Leigh Hatch @ team Carl ...
Send message Joined: 20 Mar 09 Posts: 11 Credit: 1,736,134 RAC: 0
|
Richard Haselgrove asked if some one would copy and paste the following:
Could somebody with posting rights copy the following information to the project's News thread, please? Their message boards are locked up so tight that a new users can't even post an explanation for why their new app isn't going to give them enough credit to post...
0xC0000135 isn't exactly an unknown error - it means 'The application failed to initialize properly'. That's usually a missing DLL - and no, thank you Google, NOT usually the silly 'dot Net' framework.
Sure enough, trusty old Dependency Walker indicates a need for LIBGOMP_64-1.DLL, and the app_version supplied for the app doesn't reference it:
<app_version>
<app_name>milkyway_nbody</app_name>
<version_num>94</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>1.000000</avg_ncpus>
<max_ncpus>1.000000</max_ncpus>
<flops>2503477159.822932</flops>
<api_version>6.13.0</api_version>
<file_ref>
<file_name>milkyway_nbody_0.94_windows_x86_64__mt.exe</file_name>
<main_program/>
</file_ref>
</app_version>
Also, I was expecting this to be a multi-threaded app, and I see _mt at the end of the file name: yet there's no MT plan class, and <max_ncpus> is just 1. If the app itself is going to try to use more cores than BOINC frees for it, that's going to cause major scheduling problems.
|
|
Byron Leigh Hatch @ team Carl ...
Send message Joined: 20 Mar 09 Posts: 11 Credit: 1,736,134 RAC: 0
|
|
|
Byron Leigh Hatch @ team Carl ...
Send message Joined: 20 Mar 09 Posts: 11 Credit: 1,736,134 RAC: 0
|
|
|
arkayn
Send message Joined: 14 Feb 09 Posts: 999 Credit: 74,932,619 RAC: 0
|
And a further post as well.
The next error, once the DLLs are in place, is usually 0xc0000374 for memory heap corruption. They'll need to go back to code, and see where they're writing outside the bounds of allocated memory, for that one.
|
|
Link
Send message Joined: 19 Jul 10 Posts: 624 Credit: 19,290,230 RAC: 2,073
|
Its like the timer on a Microwave Oven, for example, that has been set for 30 Minutes, but instead of it Counting Down to Zero Minutes, it Counts Upwards and the Remaining Time just gets Increases, as Time Passes By.
It's like a timer on an intelligent microwave oven, which checks all the time the temperature of the food and when it sees, that it's not getting warm fast as initially expected, it's telling you it might take longer than expected to get it warm.
|
|
Steinar1965
Send message Joined: 13 Aug 11 Posts: 2 Credit: 899,289 RAC: 0
|
Several n-bodies fail after running for 100 hrs+. They finish all hrs it seems but status says failure when finish
|
|