Welcome to MilkyWay@home

N-Body 1.08

Message boards : News : N-Body 1.08
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Jake Bauer
Project developer
Project tester
Project scientist

Send message
Joined: 20 Aug 12
Posts: 66
Credit: 406,916
RAC: 0
Message 57574 - Posted: 20 Mar 2013, 16:26:50 UTC

Hello users,

I have just updated the N-Body binaries. Expect a new release tonight. Post errors here!

Thanks,

Jake
ID: 57574 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jimmy Gondek

Send message
Joined: 28 Sep 11
Posts: 60
Credit: 22,764,173
RAC: 0
Message 57577 - Posted: 20 Mar 2013, 19:18:56 UTC - in response to Message 57574.  

Hi Jake,

A couple of stderr's as examples below...more in my task bin...is this what you're looking for...?

Task 424020748
Jimmy Gondek | log out
Name ps_nbody_100K_EMD_32013_2_1358941502_230897_0
Workunit 326324298
Created 20 Mar 2013 | 18:12:21 UTC
Sent 20 Mar 2013 | 18:32:31 UTC
Received 20 Mar 2013 | 18:58:55 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 330727
Report deadline 1 Apr 2013 | 18:32:31 UTC
Run time 1,584.00
CPU time 3,518.23
Validate state Checked, but no consensus yet
Credit 0.00
Application version MilkyWay@Home N-Body Simulation v1.08
Stderr output

<core_client_version>6.12.43</core_client_version>
<![CDATA[
<stderr_txt>
<search_application> milkyway_nbody 1.08 Darwin x86_64 double OpenMP, Crlibm </search_application>
Number of particles in bins is very small compared to total. (7 << 100000). Skipping distance calculation
<search_likelihood>-9999999.900000000372529</search_likelihood>
14:58:48 (73108): called boinc_finish

</stderr_txt>
]]>


Task 423995278
Jimmy Gondek | log out
Name de_nbody_100K_EMD_32013_1358941502_229457_1
Workunit 326299430
Created 20 Mar 2013 | 17:26:12 UTC
Sent 20 Mar 2013 | 17:42:41 UTC
Received 20 Mar 2013 | 18:12:50 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x0)
Computer ID 330727
Report deadline 1 Apr 2013 | 17:42:41 UTC
Run time 1,809.00
CPU time 6,633.10
Validate state Checked, but no consensus yet
Credit 0.00
Application version MilkyWay@Home N-Body Simulation v1.08
Stderr output

<core_client_version>6.12.43</core_client_version>
<![CDATA[
<stderr_txt>
<search_application> milkyway_nbody 1.08 Darwin x86_64 double OpenMP, Crlibm </search_application>
<search_likelihood>-14517.243707935935163</search_likelihood>
14:12:44 (72750): called boinc_finish

</stderr_txt>
]]>


ID: 57577 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Trambambaj

Send message
Joined: 13 Feb 13
Posts: 1
Credit: 436,515
RAC: 0
Message 57579 - Posted: 20 Mar 2013, 20:27:27 UTC

Hi,
I recieved task that is estimated for 7915 hours on CPU.
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=326343808
ID: 57579 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 57580 - Posted: 20 Mar 2013, 20:52:21 UTC
Last modified: 20 Mar 2013, 21:12:16 UTC

ARGH!

The Applications page is still showing plan classes (opencl_amd_ati) and (opencl_nvidia) for Linux, and *no* plan class (MT) for Windows.

For the record, can we confirm that N-Body 1.08 is still supposed to be a CPU-only, multi-threaded, application?

Never mind, I'll go grab a task and see what I can make of it.

Edit - yes, OpenMP is still reporting into stderr.txt "Using 1 max threads on a system with 4 processors", and Process Explorer confirms one worker thread using 24% CPU. I'll transfer the new executables down to my anonymous platform machine, and see how it looks multithreaded.

This task started with a 1156 hour (7 week) estimate, but completed 7.5% in the first 10 minutes.
ID: 57580 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
TLSI2000

Send message
Joined: 15 Mar 10
Posts: 17
Credit: 1,221,936,867
RAC: 0
Message 57584 - Posted: 21 Mar 2013, 3:07:59 UTC

These seem to be running fine, with a run time coming in at 2 to 4 hours
for an older AMD 2.4 ghz

But the credit calculation seems to be a bit odd

Run time _ _ CPU time _ _ Credit _ _ Application
6,357.59 _ _ 6,357.59 _ _ 26.84 _ _ MilkyWay@Home N-Body Simulation v1.08
6,404.13 _ _ 6,404.13 _ _ 27.04 _ _ MilkyWay@Home N-Body Simulation v1.08
9,509.63 _ _ 9,496.64 _ _ 13.22 _ _ MilkyWay@Home N-Body Simulation v1.08
ID: 57584 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 57597 - Posted: 21 Mar 2013, 15:16:04 UTC

I've put through a few tasks on both hosts now

Host 479865 - running 'au naturel', single threaded (roughly the same run time and CPU time)

Host 465695 - running under app_info.xml with --nthreads 3, and CPU times to match.

Observations:
The stock host got a grossly exaggerated runtime estimate for the very first task (only), but settled immediately to reasonable values thereafter. That makes sense - I was probably too early for app_version.pfc_scale to have been established for the first one.

The anonymous platform host is still getting distorted runtimes - the most recent one an initial estimate of 983 hours. That may be because for anonymous platform cases, the application details record isn't re-initialised for each new app_version: it appears the server thinks my i7 is much slower than it really is, which might be the case if a different base <rsc_fpops_est> has been used for this app/batch. That probably accounts for the extremely low credit scores for that host, too (apologies to wingmates).

Once this new task has finished, I'll force that machine to get a new HostID and thus reset the speed and usage data - see if that cures it.

One thing I haven't tested yet is restarting from checkpoints - I'll leave that to someone else.
ID: 57597 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 29 Sep 10
Posts: 54
Credit: 1,342,886
RAC: 0
Message 57607 - Posted: 22 Mar 2013, 12:34:33 UTC - in response to Message 57574.  
Last modified: 22 Mar 2013, 12:43:09 UTC

I'm getting SELINUX errors because the nbody application is trying to access /home. No, I will not disable SELINUX. Your application should not access my filesystem outside its working directory.

Edit:
The real problem is that it says it requires GLIB 2.14. I have libc.so.6, but I guess it's not the right version.

./milkyway_nbody_1.08_x86_64-pc-linux-gnu__mt: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by ./milkyway_nbody_1.08_x86_64-pc-linux-gnu__mt)
linux-vdso.so.1 => (0x00007fff693ff000)
librt.so.1 => /lib64/librt.so.1 (0x0000003315200000)
libm.so.6 => /lib64/libm.so.6 (0x0000003088400000)
libgomp.so.1 => /usr/lib64/libgomp.so.1 (0x000000326d800000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003315e00000)
libc.so.6 => /lib64/libc.so.6 (0x0000003087800000)
/lib64/ld-linux-x86-64.so.2 (0x0000003087400000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003088800000)
ID: 57607 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jeffery M. Thompson
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist
Avatar

Send message
Joined: 23 Sep 12
Posts: 159
Credit: 16,977,106
RAC: 0
Message 57617 - Posted: 22 Mar 2013, 19:24:38 UTC

I am seeing the Glibc errors coming across, but on older versions of the BOINC client.

I am guessing you are running BOINC 6.10 or 6.12 if it matches the pattern I have been observing. I don't have a resolution at this time as I am just researching the error. But I would suggest to update to the latest BOINC client to see if the error persists.


Jeff Thompson
ID: 57617 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
floyd

Send message
Joined: 13 Sep 11
Posts: 17
Credit: 3,263,835
RAC: 0
Message 57623 - Posted: 23 Mar 2013, 11:52:41 UTC - in response to Message 57617.  

I am seeing the Glibc errors coming across, but on older versions of the BOINC client.

I am guessing you are running BOINC 6.10 or 6.12 if it matches the pattern I have been observing.


That´s just because older clients will be more likely to run on older systems with older glibc. AFAICS the real reason is that the application is dynamically linked - BTW the i686 binary is not - and it specifically depends on GLIBC_2.14 just as stated in the error message. To be more precise it´s just memcpy from that version. If this requirement is really necessary is up to the developers. If in doubt I´d suggest a static binary.
ID: 57623 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 29 Sep 10
Posts: 54
Credit: 1,342,886
RAC: 0
Message 57626 - Posted: 23 Mar 2013, 17:29:01 UTC

I agree. Statically link it, or I'm not going to be able to run the latest n-body application.
ID: 57626 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Overtonesinger
Avatar

Send message
Joined: 15 Feb 10
Posts: 63
Credit: 1,836,010
RAC: 0
Message 57647 - Posted: 25 Mar 2013, 22:01:19 UTC

unlikely result :) :) :)

- seems like only 2 stars from the given sample are nearly-fitting the Saggitarius Dwarf stream? :O

<core_client_version>7.0.58</core_client_version>
<![CDATA[
<stderr_txt>
<search_application> milkyway_nbody 1.08 Windows x86_64 double  OpenMP, Crlibm </search_application>
Using OpenMP 1 max threads on a system with 4 processors
Number of particles in bins is very small compared to total. (2 << 100000). Skipping distance calculation
<search_likelihood>-9999999.900000000400000</search_likelihood>
22:44:58 (4740): called boinc_finish

</stderr_txt>
]]>



http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=328674025
Melwen - Child of the Fangorn Forest
Rig "BRISINGR" [ASUS G73-JH, i7 720QM 1.73, 4x2GB DDR3 1333 CL7, ATi HD5870M 1GB GDDR5],bought on 2011-02-24
ID: 57647 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 57650 - Posted: 26 Mar 2013, 1:37:29 UTC

Looks like there are still some problems with checkpointing on Winboxes as well.

Although I haven't had the problem on any of mine, I don't have an "out of the box" standard installation of Windows on any of my machines.

http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=327231097
ID: 57650 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
lucas_smith

Send message
Joined: 23 Feb 13
Posts: 3
Credit: 18,695
RAC: 0
Message 57656 - Posted: 26 Mar 2013, 14:09:44 UTC

Hi everyone! Sorry but I'm new and don't know how to find the download for the new nbody release. Where should I look for such releases in the future? Can you assist me? Thank you!
ID: 57656 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Alinator

Send message
Joined: 7 Jun 08
Posts: 464
Credit: 56,639,936
RAC: 0
Message 57658 - Posted: 26 Mar 2013, 15:20:35 UTC - in response to Message 57656.  

I checked your host, and at this point you don't need to do anything. You have alrerady run 4 of them successfully.
ID: 57658 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
lucas_smith

Send message
Joined: 23 Feb 13
Posts: 3
Credit: 18,695
RAC: 0
Message 57659 - Posted: 26 Mar 2013, 15:47:08 UTC - in response to Message 57658.  

Thank you! Does this mean that it is automatically downloaded and that I will be not have to take action in the future? I appreciate the help!
ID: 57659 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jay_e
Avatar

Send message
Joined: 24 Mar 13
Posts: 11
Credit: 25,297
RAC: 0
Message 57666 - Posted: 26 Mar 2013, 23:43:13 UTC
Last modified: 26 Mar 2013, 23:48:22 UTC

Greetings.

I have an 8 core CPU and set the CPU preferences to 50%.
4 CPU tasks are loaded and 1 GPU task.
Yet the CPU monitor shows all 8 cores are running at 100%.
I set no new tasks. Let all tasks finish. stopped work from all other projects and rebooted. Problem repeats.
Summary: Ubuntu-Linux and de_nbody_100K_EMD_32013_2_1358941502_444444_0 .

details follow.
Tue 26 Mar 2013 07:03:52 PM EDT |  | Starting BOINC client version 7.0.27 for x86_64-pc-linux-gnu
Tue 26 Mar 2013 07:03:52 PM EDT |  | log flags: file_xfer, sched_ops, task
Tue 26 Mar 2013 07:03:52 PM EDT |  | Libraries: libcurl/7.29.0 OpenSSL/1.0.1c zlib/1.2.7 libidn/1.25 librtmp/2.3
Tue 26 Mar 2013 07:03:52 PM EDT |  | Data directory: /var/lib/boinc-client
Tue 26 Mar 2013 07:03:52 PM EDT |  | Processor: 8 AuthenticAMD AMD FX(tm)-8150 Eight-Core Processor [Family 21 Model 1 Stepping 2]
Tue 26 Mar 2013 07:03:52 PM EDT |  | Processor: 2.00 MB cache
Tue 26 Mar 2013 07:03:52 PM EDT |  | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
Tue 26 Mar 2013 07:03:52 PM EDT |  | OS: Linux: 3.8.0-13-generic
Tue 26 Mar 2013 07:03:52 PM EDT |  | Memory: 7.70 GB physical, 8.04 GB virtual
Tue 26 Mar 2013 07:03:52 PM EDT |  | Disk: 18.33 GB total, 16.27 GB free
Tue 26 Mar 2013 07:03:52 PM EDT |  | Local time is UTC -4 hours
Tue 26 Mar 2013 07:03:52 PM EDT |  | ATI GPU 0: Capeverde (CAL version 1.4.1741, 2048MB, 1710MB available, 2048 GFLOPS peak)
Tue 26 Mar 2013 07:03:52 PM EDT |  | OpenCL: ATI GPU 0: Capeverde (driver version 1084.4 (VM), device version OpenCL 1.2 AMD-APP (1084.4), 2048MB, 1710MB available)
Tue 26 Mar 2013 07:03:52 PM EDT |  | Config: use all coprocessors
Tue 26 Mar 2013 07:03:52 PM EDT |  | Config: GUI RPC allowed from:
Tue 26 Mar 2013 07:03:52 PM EDT |  | A new version of BOINC is available. <a href=http://boinc.berkeley.edu/download.php>Download it.</a>
Tue 26 Mar 2013 07:03:52 PM EDT | malariacontrol.net | URL http://www.malariacontrol.net/; Computer ID 621946; resource share 20
Tue 26 Mar 2013 07:03:52 PM EDT | LHC@home 1.0 | URL http://lhcathomeclassic.cern.ch/sixtrack/; Computer ID 10282414; resource share 20
Tue 26 Mar 2013 07:03:52 PM EDT | World Community Grid | URL http://www.worldcommunitygrid.org/; Computer ID 2325898; resource share 40
Tue 26 Mar 2013 07:03:52 PM EDT | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 508164; resource share 20
Tue 26 Mar 2013 07:03:52 PM EDT |  | General prefs: from http://setiathome.berkeley.edu/ (last modified 24-Mar-2013 04:56:00)
Tue 26 Mar 2013 07:03:52 PM EDT |  | Host location: none
Tue 26 Mar 2013 07:03:52 PM EDT |  | General prefs: using your defaults
Tue 26 Mar 2013 07:03:52 PM EDT |  | Preferences:
Tue 26 Mar 2013 07:03:52 PM EDT |  | max memory usage when active: 7494.78MB
Tue 26 Mar 2013 07:03:52 PM EDT |  | max memory usage when idle: 7494.78MB
Tue 26 Mar 2013 07:03:52 PM EDT |  | max disk usage: 10.00GB
Tue 26 Mar 2013 07:03:52 PM EDT |  | max CPUs used: 7
Tue 26 Mar 2013 07:03:52 PM EDT |  | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
Tue 26 Mar 2013 07:03:52 PM EDT |  | Not using a proxy
Tue 26 Mar 2013 07:04:17 PM EDT | LHC@home 1.0 | update requested by user
Tue 26 Mar 2013 07:04:19 PM EDT | LHC@home 1.0 | Sending scheduler request: Requested by user.
Tue 26 Mar 2013 07:04:19 PM EDT | LHC@home 1.0 | Not reporting or requesting tasks
Tue 26 Mar 2013 07:04:20 PM EDT | LHC@home 1.0 | work fetch resumed by user
Tue 26 Mar 2013 07:04:21 PM EDT | LHC@home 1.0 | Scheduler request completed
Tue 26 Mar 2013 07:04:31 PM EDT | LHC@home 1.0 | Sending scheduler request: To fetch work.
Tue 26 Mar 2013 07:04:31 PM EDT | LHC@home 1.0 | Requesting new tasks for CPU
Tue 26 Mar 2013 07:04:33 PM EDT | LHC@home 1.0 | Scheduler request completed: got 0 new tasks
Tue 26 Mar 2013 07:04:33 PM EDT | LHC@home 1.0 | Project has no tasks available
Tue 26 Mar 2013 07:05:21 PM EDT | LHC@home 1.0 | work fetch suspended by user
Tue 26 Mar 2013 07:05:53 PM EDT |  | General prefs: from http://setiathome.berkeley.edu/ (last modified 24-Mar-2013 04:56:00)
Tue 26 Mar 2013 07:05:53 PM EDT |  | Host location: none
Tue 26 Mar 2013 07:05:53 PM EDT |  | General prefs: using your defaults
Tue 26 Mar 2013 07:05:53 PM EDT |  | Reading preferences override file
Tue 26 Mar 2013 07:05:53 PM EDT |  | Preferences:
Tue 26 Mar 2013 07:05:53 PM EDT |  | max memory usage when active: 7494.78MB
Tue 26 Mar 2013 07:05:53 PM EDT |  | max memory usage when idle: 7494.78MB
Tue 26 Mar 2013 07:05:53 PM EDT |  | max disk usage: 10.00GB
Tue 26 Mar 2013 07:05:53 PM EDT |  | Number of usable CPUs has changed from 7 to 4.
     [color=darkred]This is where I set preferences to 50% before allowing ANY work.[/color]
Tue 26 Mar 2013 07:05:53 PM EDT |  | max CPUs used: 4
Tue 26 Mar 2013 07:05:53 PM EDT |  | (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
Tue 26 Mar 2013 07:05:59 PM EDT | Milkyway@Home | work fetch resumed by user
Tue 26 Mar 2013 07:06:53 PM EDT | Milkyway@Home | Sending scheduler request: To fetch work.
Tue 26 Mar 2013 07:06:53 PM EDT | Milkyway@Home | Requesting new tasks for CPU and ATI
Tue 26 Mar 2013 07:06:55 PM EDT | Milkyway@Home | Scheduler request completed: got 5 new tasks
Tue 26 Mar 2013 07:06:57 PM EDT | Milkyway@Home | Starting task de_nbody_100K_EMD_32013_2_1358941502_444444_0 using milkyway_nbody version 108 (opencl_amd_ati) in slot 0
Tue 26 Mar 2013 07:06:57 PM EDT | Milkyway@Home | Starting task ps_nbody_100K_EMD_32013_2_1358941502_274697_2 using milkyway_nbody version 108 in slot 1
Tue 26 Mar 2013 07:06:57 PM EDT | Milkyway@Home | Starting task de_separation_23_3s_sSgr_1_1358941502_28794660_0 using milkyway version 101 in slot 2
Tue 26 Mar 2013 07:06:57 PM EDT | Milkyway@Home | Starting task ps_nbody_100K_EMD_32013_2_1358941502_444422_0 using milkyway_nbody version 108 in slot 3
Tue 26 Mar 2013 07:06:57 PM EDT | Milkyway@Home | Starting task de_separation_23_3s_sSgr_1_1358941502_28794661_0 using milkyway version 101 in slot 4

App_config.xml and cc_config.xml
<app_config>
  <app>
  <name>hcc1</name>
  <max_concurrent>2</max_concurrent>`
    <gpu_versions>
      <gpu_usage>0.5</gpu_usage>
      <cpu_usage>0.5</cpu_usage>
    </gpu_version>
  </app>
  <app>
    <name>milkyway</name>
    <max_concurrent>2</max_concurrent>
    <gpu_versions>
    <gpu_usage>0.5</gpu_usage>
    <cpu_usage>0.5</cpu_usage>
    </gpu_versions>
  </app>
</app_config>

=================================================

<cc_config>
    <options>
        <use_all_gpus>1</use_all_gpus>
    </options>
</cc_config>


Need more data?
In comparison, when using WCG Help Conquer Cancer GPU WU,
the total CPU utilization is ~50%

Is there a way to see if something is spinning in a loop?
Its not a reliable measurement, but the fans sound like they are running for a 100% load.

One weirdness.
The BOINC Task page, on the line describing the GPU task says:
"Running (0.05 CPUs + 1 ATI GPU)
The xml file was not set to 0.05 for CPU.
And, it looks like FOUR CPUs are attached/linked/associated with the GPU task.
If I suspend the GPU task, the utilization goes to using 4 of the 8 CPUs at 100% and the other 4 are idle.
Resuming the single GPU task sets all 8 cores to 100%. This is not temporary, but lasts all the time the GPU is running.
I have not observed, yet, what happens when the WU finishes and uploads finished data and gets new data into the GPU.
I'll try to observe this and report later.

T H A N K S,
Jay

--edit - add fglrx versions --
Package fglrx:
i 2:9.010-0ubuntu2 raring 500

Package fglrx-amdcccle:
i A 2:9.010-0ubuntu2 raring 500

Ubuntu 13.04.
ID: 57666 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jay_e
Avatar

Send message
Joined: 24 Mar 13
Posts: 11
Credit: 25,297
RAC: 0
Message 57667 - Posted: 27 Mar 2013, 0:26:33 UTC - in response to Message 57607.  

-additional data to previous post --

Data when GPU task changes.

The overload stopped when the GPU task finished.
System Moinitor shows 5 of 8 cores at 100%.

But then, after I was writing this post, all 8 cores went back to 100%
Tue 26 Mar 2013 07:50:18 PM EDT | Milkyway@Home | Computation for task de_nbody_100K_EMD_32013_2_1358941502_373121_1 finished

Hmmm. Not sure how BOINC and MW list the CPU task that handles the GPU loading/unloading.

I did check when I suspended the GPU task and the overload stopped. It *was* the GPU task - not the CPU task that I suspended on the BOINC Manager screen - when the overload previously stopped.
The overload lasted for the time that the GPU task ran - approx. 20 minutes.
The GPU is a Radeon HD 7750 with 2GB memory - slower - but less heat and watts.

Here is link to the 1st completed GPU task - no errors in stderr.
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=428297681

After the load went back to 100%.
I did a ps -ef to get a task list - but all of the parameters did not fit on the display.
boinc     2551  2473 60 19:06 ?        00:42:17 ../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.08_x86_64-pc-linux-gnu__mt -f nbody_parameters.lua -h histogram.txt --seed 230516087
boinc     2552  2473 59 19:06 ?        00:41:49 ../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_separation_1.01_x86_64-pc-linux-gnu -np 20 -p 0.401795116392895 11.0570420466829 20 120 9.23
boinc     2553  2473 60 19:06 ?        00:42:02 ../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.08_x86_64-pc-linux-gnu__mt -f nbody_parameters.lua -h histogram.txt --seed 244208335
boinc     2554  2473 60 19:06 ?        00:41:55 ../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_separation_1.01_x86_64-pc-linux-gnu -np 20 -p 0.917377772089099 1 20 218.173156674949 9.3681

root      2733     2  0 19:44 ?        00:00:00 [flush-8:0]

root      2797     2  0 19:52 ?        00:00:00 [kworker/4:2]
boinc     2830  2473 99 20:04 ?        01:04:40 ../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.08_x86_64-pc-linux-gnu_mt__opencl_amd_ati -f nbody_parameters.lua -h histogram.txt -


Enjoy!
Jay
ID: 57667 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
jay_e
Avatar

Send message
Joined: 24 Mar 13
Posts: 11
Credit: 25,297
RAC: 0
Message 57669 - Posted: 27 Mar 2013, 1:06:06 UTC
Last modified: 27 Mar 2013, 1:07:55 UTC

2nd 'continued' posting to CPU using 100% when 50% specified.

OK.
It looks like every the overload happens on every other GPU WU.

I checked to see if the problem was in thesystem monitor.

I went back to the sysstat package and ran sar -P ALL
here is what it reported during the overload:
08:45:35 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
08:45:45 PM     all      1.64     92.98      0.82      0.00      0.00      4.56
08:45:45 PM       0      0.70     96.72      0.70      0.00      0.00      1.89
08:45:45 PM       1      0.00     94.31      0.50      0.00      0.00      5.19
08:45:45 PM       2      1.00     94.41      0.70      0.00      0.00      3.90
08:45:45 PM       3      0.70     94.31      0.70      0.00      0.00      4.29
08:45:45 PM       4      5.32     87.56      1.71      0.00      0.00      5.42
08:45:45 PM       5      3.09     89.82      1.20      0.00      0.00      5.89
08:45:45 PM       6      1.71     92.38      0.50      0.00      0.00      5.42
08:45:45 PM       7      0.70     94.49      0.40      0.00      0.00      4.40

this shows that all 8, indeed, are used.

The BOINC status only shows 4 CPU plus one CPU-GPU task.

A ps -ef only shows ( but there are 3 more running at 100% somwhere)

$ ps -ef | grep boinc
boinc     2473     1  0 19:03 ?        00:01:01 /usr/bin/boinc --check_all_logins --redirectio --dir /var/lib/boinc-client
jay       2514     1  1 19:04 ?        00:01:59 /usr/bin/boincmgr
boinc     2551  2473 69 19:06 ?        01:15:17 ../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.08_x86_64-pc-linux-gnu__mt -f nbody_parameters.lua -h histogram.txt --seed 230516087 -np 6 -p 2.2613531307244 2.30756208857862 0.280775306084978 0.307090154017686 13.7441723154459 0.146058620134541
boinc     2552  2473 68 19:06 ?        01:15:02 ../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_separation_1.01_x86_64-pc-linux-gnu -np 20 -p 0.401795116392895 11.0570420466829 20 120 9.23227259465482 6.02408145224687 -4.65891254542505 13.49007896143 20 122.412517562509 2.3 0.569129901562 -6.28318530717959 2.69928641156317 20 244 2.4 4.0146428615553 6.28318530717959 0.984356404794499
boinc     2554  2473 68 19:06 ?        01:14:56 ../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_separation_1.01_x86_64-pc-linux-gnu -np 20 -p 0.917377772089099 1 20 218.173156674949 9.36815519919617 6.28318530717959 5.3483147091067 16.745039480715 4.81361567974091 151.560387347829 2.3 6.28318530717959 -6.28318530717959 3.30858427949722 20 244 2.4 5.39780165528236 -4.43232117875659 4.42432041794522
boinc     3038  2473 66 20:37 ?        00:12:06 ../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.08_x86_64-pc-linux-gnu__mt -f nbody_parameters.lua -h histogram.txt --seed 25278614 -np 6 -p 1.5 1.5 0.5 0.5 15 0.128067006109071
boinc     3043  2473 99 20:44 ?        01:01:30 ../../projects/milkyway.cs.rpi.edu_milkyway/milkyway_nbody_1.08_x86_64-pc-linux-gnu_mt__opencl_amd_ati -f nbody_parameters.lua -h histogram.txt --seed 130617792 -np 6 -p 2.5 1.66034131823107 0.5 0.330835734494028 15 0.131445339787751 --device 0

I tried running the ps-ef as root - same thing

ah-hah
"htop" shows 9 task - different PIDs running opencl amd ati tasks.

Hmmmm

Anyone else see this?
Should I change to Beta fglrx drivers?

Thanks,
Jay
ID: 57669 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 57672 - Posted: 27 Mar 2013, 9:30:01 UTC - in response to Message 57669.  

Anyone else see this?
Should I change to Beta fglrx drivers?

No, don't change anything at your end.

The N-Body application has been designed and programmed to use every available CPU in your system. It is not GPU application, and should be going nowhere near your ATI card.

Unfortunately, it has been deployed (repeatedly) on the Milkyway server as if it was a GPU program, and the server sends out resource settings (0.05 CPUs + 1 ATI GPU) which tell your computer to treat as a GPU application.

We have pointed out this mistake many, many times since the N-Body project was restarted 6 months ago, but unfortunately nobody at the project seems to understand, or even to be listening.
ID: 57672 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 57673 - Posted: 27 Mar 2013, 10:12:49 UTC

@ admins,

"A person who won't read has no advantage over one who can't read."

Mark Twain
ID: 57673 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : N-Body 1.08

©2024 Astroinformatics Group