Welcome to MilkyWay@home

HIgh thread count applications

Message boards : Number crunching : HIgh thread count applications
Message board moderation

To post messages, you must log in.

AuthorMessage
rz5rqt

Send message
Joined: 5 Sep 09
Posts: 7
Credit: 551,884,976
RAC: 128,092
Message 76906 - Posted: 10 Feb 2024, 16:36:40 UTC

In another forum I read an article about applications with high thread counts are not as efficient. For example 16 thread count application will not finish twice as fast as an 8 thread count application. This got me thinking about how I might improving my throughput by running multiple applications at a lower thread count AND can I increase CPU utilization on some systems. One of my computers is a XEON E5 2678 with 24 "CPUs". Milkyway uses 16 by default but with a config file I can run two applications at a time with 12 CPUs apiece. Seems like a "no brainer". But how much did I gain? To "prove" that I would need 2 test files that had equal run times. First run sequentially, then run concurrently. Anyone here ever see any data like this? Pointers to other articles?
ID: 76906 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3322
Credit: 520,669,822
RAC: 33,322
Message 76912 - Posted: 11 Feb 2024, 10:45:26 UTC - in response to Message 76906.  

In another forum I read an article about applications with high thread counts are not as efficient. For example 16 thread count application will not finish twice as fast as an 8 thread count application. This got me thinking about how I might improving my throughput by running multiple applications at a lower thread count AND can I increase CPU utilization on some systems. One of my computers is a XEON E5 2678 with 24 "CPUs". Milkyway uses 16 by default but with a config file I can run two applications at a time with 12 CPUs apiece. Seems like a "no brainer". But how much did I gain? To "prove" that I would need 2 test files that had equal run times. First run sequentially, then run concurrently. Anyone here ever see any data like this? Pointers to other articles?


Why not run some regular tasks one way and then some the other way, say for about 24 hours each, and then take the average times of each and see which is better. The problem with choosing just one task is that one task could be just that a one off and not representative of real life tasks except that one. IF you do want to run just one task you can run it outside of Boinc but I don't know how to do that.
ID: 76912 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
xii5ku

Send message
Joined: 1 Jan 17
Posts: 34
Credit: 100,687,188
RAC: 285,657
Message 76913 - Posted: 11 Feb 2024, 14:21:08 UTC

My personal recipe to answer this question — whenever a project with variable workunit sizes is involved — is to test-run tasks outside of boinc, from a little script — with all tasks generated from one and the same workunit. Using a fixed workunit for the tests makes this process fully repeatable and very precise. The script launches as many tasks as I want, with as many software threads as I want, and measures the time it takes for each of such a test-run. (Or it kills the test-run after a set length of time, and measures the progress that the tasks made within that time.)

I haven't looked into applying this recipe to the NBody application yet. If NBody tasks receive only deterministic input parameters from the workunit, then it may be doable. But if there are also randomized initial values per task, this recipe won't be applicable.
ID: 76913 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
rz5rqt

Send message
Joined: 5 Sep 09
Posts: 7
Credit: 551,884,976
RAC: 128,092
Message 76914 - Posted: 11 Feb 2024, 15:33:59 UTC - in response to Message 76912.  

True. But using real data would mean slightly different results if tested multiple times over multiple days because the data packets we crunch are different length. An unknown variable. When we get through this big backlog of _01 tasks and start getting credit again, I could use recent credit numbers to see an improvement. And it would be very close but because of my background, I would kind of like to know to the .01%. Now I don't expect to go down some rabbit hole for these numbers but I do like to look at things like this. Used to do it for a living. Just thought if anyone else out there had done some "all variables controlled" testing, I would take advantage of their work. Thanks for responding. I see we both started back in 2009.
ID: 76914 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 708
Credit: 544,186,038
RAC: 124,689
Message 76915 - Posted: 11 Feb 2024, 22:06:42 UTC - in response to Message 76906.  

In another forum I read an article about applications with high thread counts are not as efficient. For example 16 thread count application will not finish twice as fast as an 8 thread count application. This got me thinking about how I might improving my throughput by running multiple applications at a lower thread count AND can I increase CPU utilization on some systems. One of my computers is a XEON E5 2678 with 24 "CPUs". Milkyway uses 16 by default but with a config file I can run two applications at a time with 12 CPUs apiece. Seems like a "no brainer". But how much did I gain? To "prove" that I would need 2 test files that had equal run times. First run sequentially, then run concurrently. Anyone here ever see any data like this? Pointers to other articles?

Grab a task file from a local computer and copy it to a temp folder along with the MW MT app. Stop BOINC. Open a command window and cd to the temp folder and run the MT executable with the task file as its input. The application will run in the task file with the default 16 threads.

Save the stderr.txt output in the folder for later comparison. Then delete the output result file, the boinc_finish file, the lockfile if any and the stdderr.txt file. Don't delete the task file.

The run the application again but this time run it with the num_threads 12 parameter after the executable name and before the task file name.

If the application won't accept the num_threads parameter directly, then just export the value to your environment.
export OMP_NUM_THREADS=12

Then compare the stderr.txt files by comparing the runtimes by subtracting the start time from the finish time to find the elapsed time for both runs. If the elapsed time times two of the num_threads 12 run is less than double the 16 threads run, the 12 threads run times two tasks will be best for production.
ID: 76915 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
dchhbfx
Avatar

Send message
Joined: 11 Feb 24
Posts: 1
Credit: 33,281
RAC: 30
Message 76959 - Posted: 6 Mar 2024, 12:43:09 UTC - in response to Message 76915.  

hello
ID: 76959 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3322
Credit: 520,669,822
RAC: 33,322
Message 76961 - Posted: 7 Mar 2024, 11:47:42 UTC - in response to Message 76959.  

hello


Welcome!! Nice bunch of computers you have there and they seem to be doing great!!
ID: 76961 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : HIgh thread count applications

©2024 Astroinformatics Group