Welcome to MilkyWay@home

Use more threads than available?


Advanced search

Message boards : Number crunching : Use more threads than available?
Message board moderation

To post messages, you must log in.

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 750
Credit: 361,752,181
RAC: 388,607
300 million credit badge11 year member badge
Message 73581 - Posted: 18 May 2022, 4:35:24 UTC

I'm running 2 Milkyway Nbody tasks, 12 threads each, on a 24 thread machine. But a lot of the time they do preparatory work with 1 core. Can I get Boinc to run 3 or 4 of them at once? As in 4 of 12 thread tasks on a 24 thread machine? This is my current app config:

<app_config>
   <app_version>
       <app_name>milkyway_nbody</app_name>
       <plan_class>mt</plan_class>
       <avg_ncpus>12</avg_ncpus>
       <cmdline>--nthreads 12</cmdline>
   </app_version>
</app_config>
ID: 73581 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 750
Credit: 361,752,181
RAC: 388,607
300 million credit badge11 year member badge
Message 73582 - Posted: 18 May 2022, 5:00:13 UTC - in response to Message 73581.  

I think I've worked this out, can somebody please confirm this is what those variables mean? (Since the page at https://boinc.berkeley.edu/wiki/Client_configuration isn't clear).

I've changed it to

<app_config>
   <app_version>
       <app_name>milkyway_nbody</app_name>
       <plan_class>mt</plan_class>
       <avg_ncpus>4</avg_ncpus>
       <cmdline>--nthreads 12</cmdline>
   </app_version>
</app_config>
Which I think means they're allowed to use up to 12 threads each, but on average they will use 4, so 6 will run concurrently in the scheduler.
ID: 73582 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 13 Oct 21
Posts: 43
Credit: 115,300,840
RAC: 337,066
100 million credit badge1 year member badge
Message 73583 - Posted: 18 May 2022, 7:19:13 UTC

I must say that I don't yet know the difference between avg_ncpus and --nthreads entries, to me the --nthreads seems redundant if it even does anything (it doesn't seem to from a quick test I tried).

I think multithread apps work as follows. If configured to use 4 cores then the task will take up 4 cores, use 1 to set up (the other 3 are idle) then use all 4 to compute, and finally use 1 to wrap up (3 are idle). Once finished the 4 cores are released. The set up and wrap up seem brief, less than a minute. If you want the most efficient use of cores, you'd have to run each task as single core. I found that for highest throughput (most tasks per unit of time) 3-5 core setup should be used. Even if you find a way to run multiple tasks per core, I think it'll slow everything down significantly.
ID: 73583 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 3038
Credit: 513,179,147
RAC: 253,409
500 million credit badge13 year member badgeextraordinary contributions badge
Message 73586 - Posted: 18 May 2022, 11:00:25 UTC - in response to Message 73583.  

I must say that I don't yet know the difference between avg_ncpus and --nthreads entries, to me the --nthreads seems redundant if it even does anything (it doesn't seem to from a quick test I tried).


Try this page, yes it's old but so is Boinc https://boinc.berkeley.edu/wiki/Client_configuration
ID: 73586 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileHRFMguy

Send message
Joined: 12 Nov 21
Posts: 221
Credit: 280,681,436
RAC: 1,498,030
200 million credit badge1 year member badge
Message 73594 - Posted: 18 May 2022, 18:16:05 UTC - in response to Message 73582.  

I think I've worked this out, can somebody please confirm this is what those variables mean? (Since the page at https://boinc.berkeley.edu/wiki/Client_configuration isn't clear).

I've changed it to

<app_config>
   <app_version>
       <app_name>milkyway_nbody</app_name>
       <plan_class>mt</plan_class>
       <avg_ncpus>4</avg_ncpus>
       <cmdline>--nthreads 12</cmdline>
   </app_version>
</app_config>
Which I think means they're allowed to use up to 12 threads each, but on average they will use 4, so 6 will run concurrently in the scheduler.
here is what I'm using:
<app_config>
<app_version>
<app_name>milkyway</app_name>
<plan_class>opencl_ati_101</plan_class>
<avg_ncpus>0.866</avg_ncpus>
<ngpus>0.333</ngpus>
</app_version>
<app_version>
<app_name>milkyway_nbody</app_name>
<max_concurrent>4</max_concurrent>
<plan_class>mt</plan_class>
<avg_ncpus>4</avg_ncpus>
<cmdline>--nthreads 4</cmdline>
</app_version>
<!--Your comment-->
</app_config>

It yields 6 simultaneous jobs running, 4 threads each.
ID: 73594 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKeith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 657
Credit: 508,257,305
RAC: 101,414
500 million credit badge12 year member badgeextraordinary contributions badge
Message 73595 - Posted: 19 May 2022, 1:21:23 UTC

No that is not what that configuration does. The nthreads parameter determines the total amount of threads each MT task will use.

Each task will use 12 threads.

The ncpus value will present to the scheduler that a task will only use 4 threads so the scheduler will overcommit that cpu thinking it will use less than it really will.

Depending on how many cpus you have allowed BOINC to use via the "use at most" parameter either in the global preferences or via the local preferences will determine how many of the 12 thread MT tasks will fit in the allowed cpus.
ID: 73595 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 750
Credit: 361,752,181
RAC: 388,607
300 million credit badge11 year member badge
Message 73596 - Posted: 19 May 2022, 3:45:56 UTC - in response to Message 73595.  

No that is not what that configuration does. The nthreads parameter determines the total amount of threads each MT task will use.

Each task will use 12 threads.

The ncpus value will present to the scheduler that a task will only use 4 threads so the scheduler will overcommit that cpu thinking it will use less than it really will.

Depending on how many cpus you have allowed BOINC to use via the "use at most" parameter either in the global preferences or via the local preferences will determine how many of the 12 thread MT tasks will fit in the allowed cpus.
That seems my new config is correct then, I've set Nbody to use 12 threads per task using nthreads, but told the scheduler the task uses an average of 4 threads. So it runs 6 tasks at once, using up to 12 threads each, so the whole 24 thread CPU is always in use. There's quite a lot of 1 thread time in the Nbodys so I avoid wastage this way.
ID: 73596 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 13 Oct 21
Posts: 43
Credit: 115,300,840
RAC: 337,066
100 million credit badge1 year member badge
Message 73600 - Posted: 19 May 2022, 10:33:08 UTC

I recently learned a bit more about app_config and I'm skeptical. I believe that whenever you have different values for avg_ncpus and --nthreads in cmdline you're likely going to create negative consequences. Looking at your history, 12 thread tasks used to take about 10 minutes, now they're 20+. It seems like you're overloading your system and it's slowed down significantly.

The benefit of running multicore is increased speed of task completion, the drawback is some idle time of most cores during set up and wrap up. If you're concerned about the most efficient use of cores than run tasks single-core (at the drawback of slowest task completion). If you want most tasks per unit of time, run them multicore (for me 4 core produces the most) at the drawback of some idle core time during set up and wrap up. Another idea would be to run 2-core. It minimizes the amount of cores that would go idle and gives you the biggest increase in task completion time (no other 1-core jump will be as high).
ID: 73600 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 750
Credit: 361,752,181
RAC: 388,607
300 million credit badge11 year member badge
Message 73601 - Posted: 19 May 2022, 10:52:38 UTC - in response to Message 73600.  
Last modified: 19 May 2022, 10:54:59 UTC

I recently learned a bit more about app_config and I'm skeptical. I believe that whenever you have different values for avg_ncpus and --nthreads in cmdline you're likely going to create negative consequences. Looking at your history, 12 thread tasks used to take about 10 minutes, now they're 20+. It seems like you're overloading your system and it's slowed down significantly.
I was running 2, now I'm running 6. So anything under 30 minutes is a speed up.

Looking at my computer "xeon1", I was doing 2 every 10 minutes, now I'm doing 6 every 20 minutes. That's 50% faster.
ID: 73601 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 3038
Credit: 513,179,147
RAC: 253,409
500 million credit badge13 year member badgeextraordinary contributions badge
Message 73602 - Posted: 19 May 2022, 10:53:02 UTC - in response to Message 73600.  
Last modified: 19 May 2022, 10:54:26 UTC

I recently learned a bit more about app_config and I'm skeptical. I believe that whenever you have different values for avg_ncpus and --nthreads in cmdline you're likely going to create negative consequences. Looking at your history, 12 thread tasks used to take about 10 minutes, now they're 20+. It seems like you're overloading your system and it's slowed down significantly.

The benefit of running multicore is increased speed of task completion, the drawback is some idle time of most cores during set up and wrap up. If you're concerned about the most efficient use of cores than run tasks single-core (at the drawback of slowest task completion). If you want most tasks per unit of time, run them multicore (for me 4 core produces the most) at the drawback of some idle core time during set up and wrap up. Another idea would be to run 2-core. It minimizes the amount of cores that would go idle and gives you the biggest increase in task completion time (no other 1-core jump will be as high).


Another way is to suspend some of the running tasks for a couple minutes then one at a time let them run again, that way they aren't all hitting the slowdown points at the same time. Yes you will also have to suspend all other MW tasks or it will start those while you are trying to get things to place nicely together.
ID: 73602 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 750
Credit: 361,752,181
RAC: 388,607
300 million credit badge11 year member badge
Message 73603 - Posted: 19 May 2022, 10:55:58 UTC - in response to Message 73602.  

Another way is to suspend some of the running tasks for a couple minutes then one at a time let them run again, that way they aren't all hitting the slowdown points at the same time. Yes you will also have to suspend all other MW tasks or it will start those while you are trying to get things to place nicely together.
I find they get out of synch themselves anyway, especially nbody which aren't all the same size.
ID: 73603 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Use more threads than available?

©2023 Astroinformatics Group