AMD Radeon R9 Fury X - app_info.xml and apps

Author	Message
Sutaru Tsureku Send message Joined: 30 Apr 09 Posts: 101 Credit: 29,871,946 RAC: 334	Message 64939 - Posted: 25 Jul 2016, 20:50:28 UTC Because of my Messsage 64935... I have inter alia four AMD Radeon R9 Fury X VGA cards and it's still not possible to get work, if the project is 'stock'. I need to use (the following entries and apps are correct?) an app_info.xml file. [Windows 64Bit, (N-Body Sim. non-MultiThread (Single-Thread) CPU app)] <app_info> <app> <name>milkyway_nbody</name> <user_friendly_name>Milkyway N-Body Sim.</user_friendly_name> </app> <file_info> <name>milkyway_nbody_1.62_windows_x86_64.exe</name> <executable/> </file_info> <app_version> <app_name>milkyway_nbody</app_name> <version_num>162</version_num> <platform>windows_x86_64</platform> <file_ref> <file_name>milkyway_nbody_1.62_windows_x86_64.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>milkyway</name> <user_friendly_name>Milkyway</user_friendly_name> </app> <file_info> <name>milkyway_1.36_windows_x86_64.exe</name> <executable/> </file_info> <app_version> <app_name>milkyway</app_name> <version_num>136</version_num> <platform>windows_x86_64</platform> <file_ref> <file_name>milkyway_1.36_windows_x86_64.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>milkyway</name> </app> <file_info> <name>milkyway_1.36_windows_x86_64__opencl_ati_101.exe</name> <executable/> </file_info> <app_version> <app_name>milkyway</app_name> <version_num>136</version_num> <platform>windows_x86_64</platform> <avg_ncpus>1</avg_ncpus> <max_ncpus>1</max_ncpus> <plan_class>opencl_ati_101</plan_class> <cmdline></cmdline> <coproc> <type>ATI</type> <count>1</count> </coproc> <file_ref> <file_name>milkyway_1.36_windows_x86_64__opencl_ati_101.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>milkyway_separation__modified_fit</name> <user_friendly_name>Milkyway Sep. (Mod. Fit)</user_friendly_name> </app> <file_info> <name>milkyway_separation__modified_fit_1.36_windows_x86_64.exe</name> <executable/> </file_info> <app_version> <app_name>milkyway_separation__modified_fit</app_name> <version_num>136</version_num> <platform>windows_x86_64</platform> <file_ref> <file_name>milkyway_separation__modified_fit_1.36_windows_x86_64.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>milkyway_separation__modified_fit</name> </app> <file_info> <name>milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_ati_101.exe</name> <executable/> </file_info> <app_version> <app_name>milkyway_separation__modified_fit</app_name> <version_num>136</version_num> <platform>windows_x86_64</platform> <avg_ncpus>1</avg_ncpus> <max_ncpus>1</max_ncpus> <plan_class>opencl_ati_101</plan_class> <cmdline></cmdline> <coproc> <type>ATI</type> <count>1</count> </coproc> <file_ref> <file_name>milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_ati_101.exe</file_name> <main_program/> </file_ref> </app_version> </app_info> http://milkyway.cs.rpi.edu/milkyway/download/milkyway_nbody_1.62_windows_x86_64.exe http://milkyway.cs.rpi.edu/milkyway/download/milkyway_1.36_windows_x86_64.exe http://milkyway.cs.rpi.edu/milkyway/download/milkyway_1.36_windows_x86_64__opencl_ati_101.exe http://milkyway.cs.rpi.edu/milkyway/download/milkyway_separation__modified_fit_1.36_windows_x86_64.exe http://milkyway.cs.rpi.edu/milkyway/download/milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_ati_101.exe BTW. Is the milkyway_separation__modified_fit part superfluous now, so I could delete this part in red (and this two apps)? - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - I made a forum search because of the project prefs point: Frequency (in Hz) that should try to complete individual work chunks. Higher numbers may run slower but will provide a more responsive system. Lower may be faster but more laggy. default 60 (corresponds to 60 fps) It looks like just the outdated Milkyway 1.20 ATI app used this settings. The currently Milkyway 1.36 ATI app don't use it. I set '1' (one) but the app use/show: Using a target frequency of 60.0 Is this OK, or a bug? - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - I'm a bit disappointed that a project task last ~ 16 seconds with the Milkyway 1.36 ATI app on one FuryX VGA card (1 WU/GPU). I looked to other PCs, e.g. hostid=590597 with 'R9 200 Series - Hawaii' VGA cards. This VGA card have just 44 ComputeUnits (CUs) A task last ~ 15 seconds. The/my FuryX have 64 CUs. But a task last ~ 16 seconds. Is there something wrong - possibilities to optimize/fine tune? Thanks. ID: 64939 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 8 May 09 Posts: 3321 Credit: 520,597,515 RAC: 30,434	Message 64941 - Posted: 26 Jul 2016, 11:18:51 UTC - in response to Message 64939. I'm a bit disappointed that a project task last ~ 16 seconds with the Milkyway 1.36 ATI app on one FuryX VGA card (1 WU/GPU). I looked to other PCs, e.g. hostid=590597 with 'R9 200 Series - Hawaii' VGA cards. This VGA card have just 44 ComputeUnits (CUs) A task last ~ 15 seconds. The/my FuryX have 64 CUs. But a task last ~ 16 seconds. Is there something wrong - possibilities to optimize/fine tune? Thanks. Take a look at memory speed and whether the R9 280 has 128bit, 256 bit or 384 bit thruput speeds, the faster the memory and the faster the bits get transferred at one time, the faster the card crunches. It's not JUST the CU's anymore. ID: 64941 · Rating: 0 · rate: / Reply Quote

Kylinblue Send message Joined: 23 Aug 11 Posts: 7 Credit: 498,188 RAC: 0	Message 64942 - Posted: 26 Jul 2016, 15:07:18 UTC - in response to Message 64941. I'm a bit disappointed that a project task last ~ 16 seconds with the Milkyway 1.36 ATI app on one FuryX VGA card (1 WU/GPU). I looked to other PCs, e.g. hostid=590597 with 'R9 200 Series - Hawaii' VGA cards. This VGA card have just 44 ComputeUnits (CUs) A task last ~ 15 seconds. The/my FuryX have 64 CUs. But a task last ~ 16 seconds. Is there something wrong - possibilities to optimize/fine tune? Thanks. Take a look at memory speed and whether the R9 280 has 128bit, 256 bit or 384 bit thruput speeds, the faster the memory and the faster the bits get transferred at one time, the faster the card crunches. It's not JUST the CU's anymore. Don't you know that Fury X beats R9 290x memory bandwidth by 1.6 (512gb/s vs 320) ID: 64942 · Rating: 0 · rate: / Reply Quote

Sutaru Tsureku Send message Joined: 30 Apr 09 Posts: 101 Credit: 29,871,946 RAC: 334	Message 64945 - Posted: 27 Jul 2016, 1:08:39 UTC Last modified: 27 Jul 2016, 1:43:24 UTC I read a lot in the forum and found that the GPU apps can be fine tuned. I found: --non-responsive --gpu-target-frequency N --gpu-polling-mode N --gpu-wait-factor N --process-priority N --gpu-disable-checkpointing On my both PCs I use currently: --non-responsive --gpu-target-frequency 1 --gpu-polling-mode 0 --gpu-wait-factor 0.01 --process-priority 4 --gpu-disable-checkpointing [and on the/my FuryX: 3 WUs/GPU - GT730: 2 WUs/GPU At SETI just 1 WU/GPU possible (on FuryX) - if more tasks/GPU simultaneously invalid results or computation errors.] BTW. Still with Crimson 15.12 (1912.5 driver included). From my experiences the 'best' driver so far. Although with a downclock bug on FuryX. Either after a few minutes or a few days one or more VGA cards downclock continuously - just a reboot help. I tested all (also Beta/Hotfix) up to Crimson 16.3.2 and Crimson 16.7.2 (Hotfix? but WHQL?) 16.3.2 invalid AP results at SETI, 16.7.2 (July/19) frozen VGA cards. I don't know if this are the 'strongest/fastest' possible settings - and maybe it could be counterproductively...? Is somewhere an overview with explanation of all possible cmdline settings - and how to set them? If not, it's possible to do this (maybe sticky thread in Number Crunching)? This is the currently used app_info.xml file of my FuryX PC (apps like above): <app_info> <app> <name>milkyway_nbody</name> <user_friendly_name>Milkyway N-Body Sim.</user_friendly_name> </app> <file_info> <name>milkyway_nbody_1.62_windows_x86_64.exe</name> <executable/> </file_info> <app_version> <app_name>milkyway_nbody</app_name> <version_num>162</version_num> <platform>windows_x86_64</platform> <file_ref> <file_name>milkyway_nbody_1.62_windows_x86_64.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>milkyway</name> <user_friendly_name>Milkyway</user_friendly_name> </app> <file_info> <name>milkyway_1.36_windows_x86_64.exe</name> <executable/> </file_info> <app_version> <app_name>milkyway</app_name> <version_num>136</version_num> <platform>windows_x86_64</platform> <file_ref> <file_name>milkyway_1.36_windows_x86_64.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>milkyway</name> </app> <file_info> <name>milkyway_1.36_windows_x86_64__opencl_ati_101.exe</name> <executable/> </file_info> <app_version> <app_name>milkyway</app_name> <version_num>136</version_num> <platform>windows_x86_64</platform> <avg_ncpus>0.17</avg_ncpus> <max_ncpus>0.17</max_ncpus> <plan_class>opencl_ati_101</plan_class> <cmdline>--non-responsive --gpu-target-frequency 1 --gpu-polling-mode 0 --gpu-wait-factor 0.01 --process-priority 4 --gpu-disable-checkpointing</cmdline> <coproc> <type>ATI</type> <count>0.33</count> </coproc> <file_ref> <file_name>milkyway_1.36_windows_x86_64__opencl_ati_101.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>milkyway_separation__modified_fit</name> <user_friendly_name>Milkyway Sep. (Mod. Fit)</user_friendly_name> </app> <file_info> <name>milkyway_separation__modified_fit_1.36_windows_x86_64.exe</name> <executable/> </file_info> <app_version> <app_name>milkyway_separation__modified_fit</app_name> <version_num>136</version_num> <platform>windows_x86_64</platform> <file_ref> <file_name>milkyway_separation__modified_fit_1.36_windows_x86_64.exe</file_name> <main_program/> </file_ref> </app_version> <app> <name>milkyway_separation__modified_fit</name> </app> <file_info> <name>milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_ati_101.exe</name> <executable/> </file_info> <app_version> <app_name>milkyway_separation__modified_fit</app_name> <version_num>136</version_num> <platform>windows_x86_64</platform> <avg_ncpus>0.17</avg_ncpus> <max_ncpus>0.17</max_ncpus> <plan_class>opencl_ati_101</plan_class> <cmdline>--non-responsive --gpu-target-frequency 1 --gpu-polling-mode 0 --gpu-wait-factor 0.01 --process-priority 4 --gpu-disable-checkpointing</cmdline> <coproc> <type>ATI</type> <count>0.33</count> </coproc> <file_ref> <file_name>milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_ati_101.exe</file_name> <main_program/> </file_ref> </app_version> </app_info> ('milkyway_separation__modified_fit' part(s) still needed, because old resends around.) The FuryX PC have two Xeon CPUs, 6Cores/12Threads each, but HT off for faster GPU app calculation, so in whole 12 CPU-Cores available. With this above app_info.xml file entries: 2 CPU-Cores for GPU app support 10 CPU-Cores for CPU tasks With this settings the whole PC use ~ 1,100 W at the wall plug. But no problem, the PC have a 2,000 W PSU. At SETI it was ~ 850 W. The GPU temps are higher (~ + 4 Â°C) and the (PC-)room hotter - ambient temps currently 38.5 Â°C. :-( With this settings, it looks like ~ 10 seconds / GPU task (theoretically). The tasks last ~ 30 seconds, but then are 3 tasks finished. It's really a hard work for BOINC to feed the PC with enough tasks for 24/7. :-( and ;-) If someone have also a FuryX VGA card, you are welcome to write here also in this thread to share your experiences. Thanks. ID: 64945 · Rating: 0 · rate: / Reply Quote

JHMarshall Send message Joined: 24 Jul 12 Posts: 40 Credit: 7,123,301,054 RAC: 0	Message 64946 - Posted: 27 Jul 2016, 2:57:10 UTC - in response to Message 64941. I'm a bit disappointed that a project task last ~ 16 seconds with the Milkyway 1.36 ATI app on one FuryX VGA card (1 WU/GPU). I looked to other PCs, e.g. hostid=590597 with 'R9 200 Series - Hawaii' VGA cards. This VGA card have just 44 ComputeUnits (CUs) A task last ~ 15 seconds. The/my FuryX have 64 CUs. But a task last ~ 16 seconds. Is there something wrong - possibilities to optimize/fine tune? Thanks. Take a look at memory speed and whether the R9 280 has 128bit, 256 bit or 384 bit thruput speeds, the faster the memory and the faster the bits get transferred at one time, the faster the card crunches. It's not JUST the CU's anymore. It's not just memory speed here. Since MW use double precision calculations (DP) the card's DP compute capability is the real driver. R9 280 series DP = 1/4 SP R9 290 series DP = 1/8 SP R9 Fury X series DP = 1/16 SP The R9 280 series may not be the fastest single precision performer but the 1/4 DP to SP ratio makes it the leader in double precision for the $. Joe ID: 64946 · Rating: 0 · rate: / Reply Quote

Jozef J Send message Joined: 4 Mar 10 Posts: 65 Credit: 639,958,626 RAC: 0	Message 65010 - Posted: 8 Aug 2016, 20:44:33 UTC - in response to Message 64946. I'm a bit disappointed that a project task last ~ 16 seconds with the Milkyway 1.36 ATI app on one FuryX VGA card (1 WU/GPU). I looked to other PCs, e.g. hostid=590597 with 'R9 200 Series - Hawaii' VGA cards. This VGA card have just 44 ComputeUnits (CUs) A task last ~ 15 seconds. The/my FuryX have 64 CUs. But a task last ~ 16 seconds. Is there something wrong - possibilities to optimize/fine tune? Thanks. Take a look at memory speed and whether the R9 280 has 128bit, 256 bit or 384 bit thruput speeds, the faster the memory and the faster the bits get transferred at one time, the faster the card crunches. It's not JUST the CU's anymore. It's not just memory speed here. Since MW use double precision calculations (DP) the card's DP compute capability is the real driver. R9 280 series DP = 1/4 SP R9 290 series DP = 1/8 SP R9 Fury X series DP = 1/16 SP The R9 280 series may not be the fastest single precision performer but the 1/4 DP to SP ratio makes it the leader in double precision for the $. Joe https://www.primegrid.com/forum_thread.php?id=6113 if is all right in this thread GPU______________________FP32 GFLOPS__FP64 GFLOPS__Ratio Radeon R9 295X2__________11264________1408_________FP64 = 1/8 FP32 Radeon HD 7990___________7782_________1946_________FP64 = 1/4 FP32 GeForce GTX Titan Black____5645_________1881_________FP64 = 1/3 FP32 GeForce GTX 690___________5622_________234__________FP64 = 1/24 FP32 Radeon R9 290X___________5632_________704__________FP64 = 1/8 FP32 GeForce GTX 780 Ti_________5345_________223__________FP64 = 1/24 FP32 Radeon HD 6990___________5099_________1276_________FP64 = 1/4 FP32 GeForce GTX 980___________4981_________156__________FP64 = 1/32 FP32 Radeon R9 290_____________4849_________606__________FP64 = 1/8 FP32 GeForce GTX Titan__________4709_________1523_________FP64 = 1/3 FP32 Radeon HD 7970 GHz_______4301_________1075_________FP64 = 1/4 FP32.................... "Niceee ..7970 rocks :-))" actually i have 15-16 second per one task. ID: 65010 · Rating: 0 · rate: / Reply Quote

Jozef J Send message Joined: 4 Mar 10 Posts: 65 Credit: 639,958,626 RAC: 0	Message 65012 - Posted: 8 Aug 2016, 22:42:42 UTC - in response to Message 65010. run 13 sec now. no errors . im wonder if i can run more tasks per one 7970 gfcard withou afect results.. where is thread with app info? ID: 65012 · Rating: 0 · rate: / Reply Quote

[AF>EDLS]GuL Send message Joined: 5 Jun 08 Posts: 21 Credit: 245,803,013 RAC: 0	Message 65013 - Posted: 9 Aug 2016, 7:23:53 UTC - in response to Message 65012. Hi Jojez J. Have a look there https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3987&postid=64975 Cheers ID: 65013 · Rating: 0 · rate: / Reply Quote

Sutaru Tsureku Send message Joined: 30 Apr 09 Posts: 101 Credit: 29,871,946 RAC: 334	Message 65390 - Posted: 5 Oct 2016, 18:55:29 UTC Last modified: 5 Oct 2016, 18:56:34 UTC I don't know if this ratios are correct... If I look online the 1.39 app say in the <stderr_txt>: (...) Estimated AMD GPU GFLOP/s: 672 SP GFLOP/s, 134 DP FLOP/s (...) This would be a 5:1 ratio for the R9 Fury X. ID: 65390 · Rating: 0 · rate: / Reply Quote

rtennill Send message Joined: 22 Mar 09 Posts: 6 Credit: 778,044,893 RAC: 0	Message 65529 - Posted: 24 Oct 2016, 16:15:44 UTC - in response to Message 64939. Unfortunately this is expected with modern consumer cards. This project is all about double precision compute capability. The trend for both Nvidia and AMD over the last several years has been to significantly reduce the double precision compute capability on consumer/gaming cards in favor of single precision and other features. Wikipedia has good tables for comparing the stats across generation and model. My 6970 has more double precision compute than the fury x. My new card, the 7970, will have almost twice the double precision capability. https://en.wikipedia.org/wiki/AMD_Radeon_Rx_300_series https://en.wikipedia.org/wiki/Radeon_HD_7000_Series ID: 65529 · Rating: 0 · rate: / Reply Quote

AMD Radeon R9 Fury X - app_info.xml and apps - optimizations