Welcome to MilkyWay@home

N-Body + Mac OS X + i7 = no good ?

Message boards : Number crunching : N-Body + Mac OS X + i7 = no good ?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile [AF>Le_Pommier] Jerome_C2005

Send message
Joined: 1 Apr 08
Posts: 30
Credit: 84,549,863
RAC: 0
Message 49345 - Posted: 15 Jun 2011, 20:48:19 UTC

Hi,

I'm running boinc 6.10.58 on an iMac i7 10.6.7 and I realized yesterday that N-Body 0.40 multithreaded was causing a real problem : the WU (this one) would run normally, keeping 8 cores busy, then suddenly (after a too short time compared to my 240 mn param before switching app) it would stop and now the weirdest part : boinc wasn't able anymore to allocate the 8 cores to the other projects running (I have many projects running at the same time), a variable number of WUs (from 3 to 5) would be running... then after some time same story, N-body running with 8 cores, stop, another different number of WU of other projects restarting...

I tried to stop / restart the client, kill manually the processes, reboot the mac : same stuff.

Then I suspended milkyway and all went back to normal since then.

I see there is no parameter to select project inside milkyway so I cannot block N-body, so I'll keep milkyway suspended for the moment...

Did anybody ever experience this on Mac OS X ?

Thanks.
ID: 49345 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 29 Sep 10
Posts: 54
Credit: 1,343,989
RAC: 48
Message 49406 - Posted: 18 Jun 2011, 1:58:48 UTC - in response to Message 49345.  
Last modified: 18 Jun 2011, 1:59:17 UTC

They just released n-body 0.60 for mac osx.
ID: 49406 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 49412 - Posted: 18 Jun 2011, 12:16:25 UTC - in response to Message 49406.  

They just released n-body 0.60 for mac osx.
The only thing new there is the 32 bit OS X one. 64 bit OS X Nbody has always been there.
ID: 49412 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 49413 - Posted: 18 Jun 2011, 12:21:10 UTC - in response to Message 49345.  

I'm not really sure what problem you're describing. It sounds more or less like the expected behaviour for multithreaded applications. BOINC should schedule some number of threads at different times along with other applications. I might be missing something
ID: 49413 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Le_Pommier] Jerome_C2005

Send message
Joined: 1 Apr 08
Posts: 30
Credit: 84,549,863
RAC: 0
Message 50156 - Posted: 18 Jul 2011, 6:19:44 UTC
Last modified: 18 Jul 2011, 7:16:34 UTC

Of course it is not normal that boinc would only run 3 cores, or 5 cores, when it's normally using 8, and this because a nbody has just ran !

I'm using 6.12.33 now, I had stopped milky until now and i decided to give it another try : when nbody starts, I receive some errors from a few other project (not the 8 running before)

Lun 18 jul 04:34:29 2011 | ibercivis | Task Mr.Wilson_10_01_50_52_824277135_2133303569_2: no shared memory segment
Lun 18 jul 04:34:29 2011 | ibercivis | Task Mr.Wilson_10_01_50_52_824277135_2133303569_2 exited with zero status but no 'finished' file
Lun 18 jul 04:34:29 2011 | ibercivis | If this happens repeatedly you may need to reset the project.

I have the same for CPDN, but not the other 6 running projects.

Then it finishes after only 10 mn, and boinc is able to restart 8 projects just after that, including the ibercivis with that "error", apparently not affecting it (it finished in valid status).

For CPDN I don't know cause it has not restarted it yet.

So I'd say : things have improved, though a bit weird.


Another thing is that I can see it ran 2 tasks during the night,

Lun 18 jul 00:18:04 2011 | Milkyway@home | Sending scheduler request: To fetch work.
Lun 18 jul 00:18:04 2011 | Milkyway@home | Requesting new tasks for CPU
Lun 18 jul 00:18:06 2011 | Milkyway@home | Scheduler request completed: got 1 new tasks
Lun 18 jul 00:18:08 2011 | Milkyway@home | Started download of milkyway_nbody_0.60_x86_64-apple-darwin__mt
Lun 18 jul 00:18:11 2011 | Milkyway@home | Finished download of milkyway_nbody_0.60_x86_64-apple-darwin__mt

Lun 18 jul 01:13:14 2011 | Milkyway@home | Starting task ps_nbody_test3_480599_1 using milkyway_nbody version 60
Lun 18 jul 01:22:42 2011 | Milkyway@home | Computation for task ps_nbody_test3_480599_1 finished
Lun 18 jul 01:22:43 2011 | Milkyway@home | Sending scheduler request: To fetch work.
Lun 18 jul 01:22:43 2011 | Milkyway@home | Reporting 1 completed tasks, requesting new tasks for CPU
Lun 18 jul 01:22:44 2011 | Milkyway@home | Scheduler request completed: got 1 new tasks

Lun 18 jul 04:34:17 2011 | Milkyway@home | Starting task ps_nbody_test3_504091_0 using milkyway_nbody version 60
Lun 18 jul 04:47:11 2011 | Milkyway@home | Computation for task ps_nbody_test3_504091_0 finished
Lun 18 jul 04:47:11 2011 | Milkyway@home | Sending scheduler request: To fetch work.
Lun 18 jul 04:47:11 2011 | Milkyway@home | Reporting 1 completed tasks, requesting new tasks for CPU
Lun 18 jul 04:47:13 2011 | Milkyway@home | Scheduler request completed: got 1 new tasks

but as you can there is not the traditional download sequence after it says "got 1 new tasks" after the first and second time it finishes, and also on my account on the website I can only see one pending WU, no trace of the other two...


edit : I can tell you that the behavior is very strange : now I can see that another nbody started

Lun 18 jul 08:59:01 2011 | Milkyway@home | Starting task ps_nbody_test3_498129_1 using milkyway_nbody version 60

and 15 secs later all the other WU did start again :

Lun 18 jul 08:59:16 2011 | World Community Grid | Restarting task CMD2_2051-2IAE_B.clustersOccur-3D1M_A.clustersOccur_6_1 using hcmd2 version 640
Lun 18 jul 08:59:16 2011 | Einstein@Home | Restarting task h1_0303.55_S6GC1__1179_S6BucketA_0 using einstein_S6Bucket version 101
Lun 18 jul 08:59:16 2011 | ibercivis | Restarting task Mr.Wilson_18_04_44_25_372159899_2071232659_1 using wilson version 6
Lun 18 jul 08:59:16 2011 | Poem@Home | Restarting task poempp_gvpj_1310904715_747262093_0 using poempp version 6
Lun 18 jul 08:59:16 2011 | Test4Theory@Home | Restarting task uc_1310460186_4776_0 using cernvm version 601
Lun 18 jul 08:59:16 2011 | malariacontrol.net | Restarting task wu_1167_35_10675_0_1310959567_0 using openMalariaB version 657
Lun 18 jul 08:59:16 2011 | Leiden Classical | Restarting task wu_898976128_1309873376_21394_0 using classical version 556
Lun 18 jul 08:59:16 2011 | NFS@Home | Starting task S2m1061a_508041_0 using lasievef version 108

with 15 secs of run time and then going back to sleep, I really don't think it's a normal behavior...

Also, interesting fact maybe, it remained in memory, using no CPU (the setup of my boinc is that WUs don't remain when suspended), and it's the only one like that in almost all my projects (appart from enigma where this is not working well neither).
ID: 50156 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Le_Pommier] Jerome_C2005

Send message
Joined: 1 Apr 08
Posts: 30
Credit: 84,549,863
RAC: 0
Message 50190 - Posted: 18 Jul 2011, 17:47:24 UTC

Definitively something wrong, here are all the milkyway project entries for today (after I left home this morning) in the boinc message list :

Lun 18 jul 12:25:56 2011 | Milkyway@home | Resuming task ps_nbody_test3_498129_1 using milkyway_nbody version 60

Lun 18 jul 14:28:28 2011 | Milkyway@home | Restarting task ps_nbody_test3_498129_1 using milkyway_nbody version 60

Lun 18 jul 15:36:37 2011 | Milkyway@home | Restarting task ps_nbody_test3_498129_1 using milkyway_nbody version 60

Lun 18 jul 19:13:14 2011 | Milkyway@home | Restarting task ps_nbody_test3_498129_1 using milkyway_nbody version 60

And each time, 10 secs after, all the other projects are restarting (successfully, unlike the initial problem I had back in june). This WU is cumulating... 30 secs of computing so far... more than 24 hours remaining, good luck !!!
ID: 50190 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile [AF>Le_Pommier] Jerome_C2005

Send message
Joined: 1 Apr 08
Posts: 30
Credit: 84,549,863
RAC: 0
Message 50255 - Posted: 19 Jul 2011, 21:29:18 UTC

... then at some point in the afternoon, it decided to run for 10 mn without stopping (wow) and finished, and requested a new WU, that also ran for a complete 10 mn (nbody also) and also finished... and then it requested another WU and got a MW@home 0.82 "classical non MT WU", which is obviously behaving like the other project with one single core among other projects...

So I don't know what to think about those nbody unit, that tend to have a very strange behavior, on Mac OS X at least.
ID: 50255 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : N-Body + Mac OS X + i7 = no good ?

©2024 Astroinformatics Group