Welcome to MilkyWay@home

Posts by Michael H.W. Weber

1) Message boards : News : Scheduled Maintenance Concluded (Message 65898)
Posted 18 Nov 2016 by Profile Michael H.W. Weber
Post:
if the app processes all of the bundle BUT fails on the last of the 5, with
a computational error.. ALL 5 are lost, not just the one that actually failed.

Since the entire bundle is labelled as computational error..

Well, if that is correct, then Jake has to go back to the bench and improve the server logic with respect to the validation code.

Michael.
2) Message boards : News : Scheduled Maintenance Concluded (Message 65887)
Posted 17 Nov 2016 by Profile Michael H.W. Weber
Post:
Is anyone running a 390 or 390X (290 or 290x may have the same problem)

I still have the problem, when i run several WUs at once, then after some time (from minutes to one hour or so) some WUs start to hang, and go on for ever, while one or two crunch on.

I have tested drivers since 15.9, always the same problem, win 7 or win10 does not matter either. Tried different hardware setups, new installations of windows or old ones no difference.

I hope someone can confirm the problem, so we can start searching for the root cause and maybe even a fix.

The 290X can't run MW tasks in parallel regardless of what driver I use.
By contrast, the 280X can (but I don't use it because I have both a 290X and a 280X in the same machine).

Michael.
3) Message boards : News : Scheduled Maintenance Concluded (Message 65823)
Posted 15 Nov 2016 by Profile Michael H.W. Weber
Post:
Just released the GPU version. It is a 32-bit application that works on 64 bit machines. Let me know if there are any issues.

Well, first of all congratulations that you finally made it happen!
The returned tasks validate as before the bundling efforts, i.e. many are instantly valid, a majority is inconclusive - but then shifted to the valid bucket (this behavior I never understood, by the way...).

They should take about 5x longer than normal work units since you are crunching 5.

In fact, a 280X requires 9 secs for a single WU. The 5x bundle completes in 38 secs which is quicker than 5-fold. Same with the 290X: 13 secs for a single task, 58 secs for a bundle of 5 tasks. So, the computation is certainly more time efficient.
Moreover it is better for the GPU hardware, because it does not cool down and heat up as frequently as before but is kept on a rather constant operation temperature.

Finally, I am unsure whether bundling of only 5 tasks will solve the DDoS-like attack issues on your server. You can easily increase the bundle size by another factor of 10 or even 100 and then disallow server contacts below a reasonable time threshold.
But let's see. As soon as you find that the 'GPU people' run out of work again, you might want to increase the bundle size as suggested.

And thanks again for taking our concerns serious. As a result, I am quite sure you will be flooded with new results.

Michael.
4) Message boards : News : Scheduled Maintenance Concluded (Message 65733)
Posted 13 Nov 2016 by Profile Michael H.W. Weber
Post:
I think that there is no need to suspend the project while these problems persist. Configure your MilkyWay@home preferences to only accept CPU work and not GPU work.

Ehm, the whole discussion over here is all about GPU tasks which, because of their short duration and the limited amount which a single machine can retrieve, are hammering the server down in a DDoS attack-like fashion.

Michael.

P.S.: Although there is the "no load issue" regarding GPU core and RAM clocks (going down to 300/150 MHz as soon as an MW task is initiated), both types of WUs do validate - I checked one of each of the two types on my AMD cards.
5) Message boards : News : Scheduled Maintenance Concluded (Message 65709)
Posted 12 Nov 2016 by Profile Michael H.W. Weber
Post:
Are the individual units in the new bundle of 5 units bigger than the old units?

The 1.42s are working for me. However, even though they are taking up to 3 times as long to complete, (both GPU and CPU) the Credit applied is the same as the previous version - 1.39?

Runtime on 290X went from 15 seconds to 27,066 seconds and I still get same 26.73 credits, no thanks : http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1886485669

As I said above, please check your GPU core and RAM clocks:

Engaging on v1.42 tasks results in immediate GPU core clock reduction to 300 Mhz and GPU RAM clock to 150 MHz on both 280X and 290X AMD graphics boards.

I will have to suspend MW until this issue is resolved.

Michael.
6) Message boards : News : Scheduled Maintenance Concluded (Message 65690)
Posted 11 Nov 2016 by Profile Michael H.W. Weber
Post:
All tasks with version <= 1.41 produce errors, only.

Engagig on v1.42 tasks results in immediate GPU core clock reduction to 300 Mhz and GPU RAM clock to 150 MHz on both 280X and 290X AMD graphics boards.

When reaching 100% of estimated run time, the bundle tasks on my 290X reset to zero progress and appear to restart although the total runtime duration is not reset (I guess from the file name that it might be a bundle of 5, so this will hopefully repeat 5-fold and then upload).
By contrast, the constraints tasks do complete and upload in the expected manner on my 280X. Runtime with reduced core and RAM clock is 741.63 secs as opposed to 9 secs with standard clocks. The result file first ends up in the "inconclusive" bunch - as is usual.

So, there is still something wrong with both of these tasks types with respect to clock resetting.

I checked with Einstein: Once a new Einstein WU starts after having finished a MW one, the core and RAM clocks go up to regular speed. So, the down clocking is conducted by MW client.

Please inform us once you have solved the clocking issue.

Michael.
7) Message boards : Number crunching : Is this project OVER, semi abandoned or DYING? (Message 65608)
Posted 7 Nov 2016 by Profile Michael H.W. Weber
Post:
-Click-

Michael.
8) Message boards : Number crunching : Massive server issues and Wu validation delays (Message 65561)
Posted 29 Oct 2016 by Profile Michael H.W. Weber
Post:
I will take into consideration changing how we bundle work units (hopefully packing 4-10 workunits together work be nice), but at the moment that is technically challenging since the framework we have set up for workunits and their generation does not allow for it (imagine a lot of coding a bugs for at least 6 months).

Packing 4-10 tasks into one bundle won't solve the problem when taking runtimes per WU of 9 secs into consideration.

This is what you need to do:

1. Modify the WU generator to define criteria by which WUs are bundled.
2. Modify the application such that the bundled WUs are processed one by one individually.
3. Modify the validator to validate the WUs.

To keep things simple, use the wrapper approach:

For 1: Pack as many tasks into one .zip bundle until the estimated run time is below 5 hrs or until you have 200 tasks at maximum.

For 2: Using the wrapper you only need to list the required program calls in the job.xml; initially unpack the .zip bundle on the client machine and later re-pack everything again as a .zip after computation completion.

For 3: Adapt the validator to read the .zip returned by the client and compare the individual result files with a second result (or call a different validation algorithm).

That's it.
No further year of fiddling around required.

:)

Michael.

P.S.: As I offered ealier, contact me or my team mates for code samples from RNA World. But don't wait too long.
9) Message boards : Number crunching : Massive server issues and Wu validation delays (Message 65546)
Posted 28 Oct 2016 by Profile Michael H.W. Weber
Post:
A few clarifications and additional thoughts to address some of the postings above:

(1) Was was talking about GPU tasks, ONLY.

(2) The idea of running task in parallel is nice e.g. if you use systems which have a single 280X or a bundle of these. In my case however, I run combinations of 290X and 280X cards in the same machine. That's possible as they use the same driver and this approach (a) combines different capabilities in the same system, (b) saves me hardware (power supplies) and (c) increases electricity efficiency. By contrast to the 280X, however, the 290X does not allow processing multiple MW WUs in parallel. Well, to be precise, of course you can make it do this. But then most of these end up not getting validated - so you simply waste your electricity. In short: Bundling is not a generally applicable solution.
Moreover, it does not at all reduce the server load: By contrast, it rather worsens it because as stated above, running tasks in parallel is more time efficient than running them individually. Hence, in the same time frame, more work is requested from the server.

(3) Using scripts in an attempt to counteract the obvious issues of the MW server configuration can't be the right choice: A DC project has to expect that its volunteers are either not able to create such solutions or just don't want to spend their time on such things. In short: Participation has to be kept simple. The solution has to be implemented on the project server side end and not at the user's end.

In fact, there is no need to discuss this much further.
I have given clear suggestions which measures will help relieve the server.
A bundling of WUs is mandatory when increasing the server connection delay to ensure that machines do not idle around during these increased intervals.
Try these suggestions and then we will see.

Michael.
10) Message boards : Number crunching : Massive server issues and Wu validation delays (Message 65539)
Posted 27 Oct 2016 by Profile Michael H.W. Weber
Post:
What I am offering is a clear description of the problem plus a solution. For the latter, no grant is required. Three first measures:

1. Increase client delay, such that connections are allowed only every 30 minutes. Yes, not nice but server stability is of priority in the current situation.

2. WU run time duration requires increase to at least 1 hour per WU, better would be 5 hrs. If a simple increase per WU is impossible for whatever reason, bundle 100 or more tasks in a single packet.

3. Keep your database small: Produce WUs only when the server has no more WUs ready for delivery. Delete WUs once they have been completed - do not save them for long.

With these measures we run such projects on a laptop even during worldwide challenges. Without any grant.

Just try it. ;)

Michael.
11) Message boards : Number crunching : Massive server issues and Wu validation delays (Message 65537)
Posted 27 Oct 2016 by Profile Michael H.W. Weber
Post:
Since around three days I am again trying to support your project by using 4 GPUs (2x 280X, 2x 290X). Although I had addressed these things earlier in this forum, to date you still have not solved the following issues:

1. GPU WUs are to short (280X: 9 sec/ WU, 290X: 13 Sec/WU).
2. Your server hands out only a limited number of WUs at a time.

At least the automatic detection of the 290X GPUs you have implemented. Thanks for that (although it took you years to do so).

Now, your server has repeated massive database issues around every 15-30 minutes resulting in:

1. failure to upload result data
2. failure to download new WUs (which results in idling machines)
3. failure to login to my account
4. inability to report this problem to you because your forum does not work either.

On top of that, your validator does not seem to keep up with the incoming results. Of the 11446 tasks I was capable of uploading to your server within the past 3 days (and this is only a tiny, tiny fraction of what would have been possible if your server wouldn't crash every other minute), only 1244 were validated. The rest 'hangs in the air' waiting for validation (credit aquisition is accordingly delayed).

I do not know what is the reason for all of these issues, but if you like people to support this project, you need to address these issues quickly. Not in a year, please.

I think I do have an explanation for your server problem, though:
Because your WUs are so small, your server can't keep up with the connections made by all the clients working on your tasks.
At the RNA World distributed computing project, we solved exactly the same problem by simply bundling the tiny WUs to larger WU packets. This massively reduced the number of connection requests and also helped deliver more work to the clients.

PLEASE think about that when you get free advice from people running their own DC projects and servers. :)
If you like, contact Yoyo from our admin & project team and ask for code details on how we bundle the tasks.
And remember: I suggested this long back, too.
What you are currently doing is a self-induced DDoS-attack on your own server(s).

Michael.
12) Message boards : Number crunching : GPU WU runtimes too short: Wast of compute power (Message 65144)
Posted 15 Sep 2016 by Profile Michael H.W. Weber
Post:
Multiple tasks don't run in parallel on 290X cards for unknown reasons. So bundling is the proper choice to solve the problem.

Moreover, even if you run 6 WUs in parallel, they are also completed in a few minutes, so the problem detailed above persists. And by the way, for 280X cards, you may run even up to 12 tasks in parallel (tested).

Michael.
13) Message boards : Number crunching : Issues with & proper support of AMD R9 290X GPUs (Message 65123)
Posted 9 Sep 2016 by Profile Michael H.W. Weber
Post:
Regarding the number of invalids: Are you successfully running multiple tasks together on those other projects? I've seen comments about not being able to do that with that card.

I will look into that.

Some think it's a driver issue. I would not be surprised if that problem is also present here.

A driver issue can be excluded as detailed above.

Michael.
14) Message boards : News : GPU Issues Mega Thread (Message 65122)
Posted 9 Sep 2016 by Profile Michael H.W. Weber
Post:
I've always gotten the impression that if you want to crunch for MilkyWay, it is best to run Nvidia cards...

Well, if double precision is of importance, you should never choose NVIDIA cards but instead AMD as these are significantly more potent in this discipline.
To the best of my knowledge, the best model on the consumer market still is the R9 280X (and here e.g. the Toxic version from Sapphire).
An exception from the rule is the NVIDIA Titan Black series but that one is so expensive that you can afford many 280X cards for the price of one Titan Black such that these will again outperform the Titan Black.

Also, there are no significant driver issues with AMD graphics cards as long as you are using Windows as OS.

The core problem with MW is that it will just not recognize some of the AMD cards properly while other projects have absolutely no problem doing that (Einstein, Collatz, POEM, Primegrid, SETI tested).

Michael.
15) Message boards : Number crunching : Issues with & proper support of AMD R9 290X GPUs (Message 65114)
Posted 8 Sep 2016 by Profile Michael H.W. Weber
Post:
Any comments?

Michael.
16) Message boards : Number crunching : Issues with & proper support of AMD R9 290X GPUs (Message 65104)
Posted 5 Sep 2016 by Profile Michael H.W. Weber
Post:
Due to its superior double precision (DP) performance (second best of AMDs consumer cards), AMDs R9 290X GPU is a valuable card worth being supported properly by the Milkyway@home project.

So far, however, this card is not even recognized as a graphics board by this project when using Windows 7. Instead, one has to manually setup the following app_info.xml file and copy it to the Milkyway@home project folder:

<app_info>
<app>
<name>milkyway_nbody</name>
<user_friendly_name>Milkyway N-Body Sim.</user_friendly_name>
</app>
<file_info>
<name>milkyway_nbody_1.62_windows_x86_64.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>milkyway_nbody</app_name>
<version_num>162</version_num>
<platform>windows_x86_64</platform>
<file_ref>
<file_name>milkyway_nbody_1.62_windows_x86_64.exe</file_name>
<main_program/>
</file_ref>
</app_version>
<app>
<name>milkyway</name>
<user_friendly_name>Milkyway</user_friendly_name>
</app>
<file_info>
<name>milkyway_1.36_windows_x86_64.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>milkyway</app_name>
<version_num>136</version_num>
<platform>windows_x86_64</platform>
<file_ref>
<file_name>milkyway_1.36_windows_x86_64.exe</file_name>
<main_program/>
</file_ref>
</app_version>
<app>
<name>milkyway</name>
</app>
<file_info>
<name>milkyway_1.36_windows_x86_64__opencl_ati_101.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>milkyway</app_name>
<version_num>136</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>1</avg_ncpus>
<max_ncpus>1</max_ncpus>
<plan_class>opencl_ati_101</plan_class>
<cmdline></cmdline>
<coproc>
<type>ATI</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>milkyway_1.36_windows_x86_64__opencl_ati_101.exe</file_name>
<main_program/>
</file_ref>
</app_version>
<app>
<name>milkyway_separation__modified_fit</name>
<user_friendly_name>Milkyway Sep. (Mod. Fit)</user_friendly_name>
</app>
<file_info>
<name>milkyway_separation__modified_fit_1.36_windows_x86_64.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>milkyway_separation__modified_fit</app_name>
<version_num>136</version_num>
<platform>windows_x86_64</platform>
<file_ref>
<file_name>milkyway_separation__modified_fit_1.36_windows_x86_64.exe</file_name>
<main_program/>
</file_ref>
</app_version>
<app>
<name>milkyway_separation__modified_fit</name>
</app>
<file_info>
<name>milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_ati_101.exe</name>
<executable/>
</file_info>
<app_version>
<app_name>milkyway_separation__modified_fit</app_name>
<version_num>136</version_num>
<platform>windows_x86_64</platform>
<avg_ncpus>1</avg_ncpus>
<max_ncpus>1</max_ncpus>
<plan_class>opencl_ati_101</plan_class>
<cmdline></cmdline>
<coproc>
<type>ATI</type>
<count>1</count>
</coproc>
<file_ref>
<file_name>milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_ati_101.exe</file_name>
<main_program/>
</file_ref>
</app_version>
</app_info>

...followed by manual download of the corresponding executables from the Milkyway@home website (to be also stored in the project folder):

milkyway_1.36_windows_x86_64.exe
milkyway_1.36_windows_x86_64__opencl_ati_101.exe
milkyway_nbody_1.62_windows_x86_64.exe
milkyway_separation__modified_fit_1.36_windows_x86_64.exe
milkyway_separation__modified_fit_1.36_windows_x86_64__opencl_ati_101.exe

After restart of BOINC, Milkyway@home will finally start to compute tasks. Single tasks. One after the other. Task duration is around 16 seconds!

Long ago, I asked for longer GPU tasks here in this forum, because initiation of a task every 16 seconds is a massive waste of compute time and requires permanent internet connection for constant up- and downloads as the number of tasks per machine is severely limited, too. Nothing has happened.

For AMDs R9 280X, which is the same board family and the most performant GPU with respect to DP, GPU recognition by Milkayway@home is, by contrast to the 290X, fully automated. This card can process several tasks in parallel, so I thought, it should also be possible with the 290X.

Pustekuchen!

With the 280X the following app_info.xml does the job to run 8 tasks simultaneously, thereby significantly increasing throughput:

<app_config>
<app>
<name>milkyway</name>
<gpu_versions>
<gpu_usage>0.125</gpu_usage>
<cpu_usage>0.1</cpu_usage>
</gpu_versions>
</app>
<app>
<name>milkyway_separation__modified_fit</name>
<gpu_versions>
<gpu_usage>0.125</gpu_usage>
<cpu_usage>0.1</cpu_usage>
</gpu_versions>
</app>
</app_config>

When I include this file into my work folder for the 290X card, some tasks do validate, others do not. The majority does not validate. They mostly are initially categorized as inconclusive and then are directed into the "bad box".

What I want to know is, why there is this massive fraction of invalid tasks? I have tested a second R9 290X which behaves absolutely identical. Both cards work properly with ALL other tested distributed computing GPU projects. Specifically, I tested Primegrid, Folding@home, POEM@home, Collatz Conjecture, Einstein@home and SETI@home. Since I use the latest AMD drivers (and have also tested older ones from the outdated Catalyst series) I therefore conclude two things: Neither my hardware nor the driver are the cause of the issue. Hence, something is wrong with Milkyway@home or my manual configuration as detailed above.

In order to nail the problem, I will post three exemplary result files from my 290X card as follows.

A valid task:

Aufgabe 1764179469
Michael H.W. Weber · Abmelden
Name de_modfit_fast_15_3s_136_ModfitConstraints5_4_1471352126_22530160_0
Arbeitspaket 1294581971
Erstellt 4 Sep 2016, 20:27:06 UTC
Gesendet 4 Sep 2016, 20:27:34 UTC
Ablaufdatum 16 Sep 2016, 20:27:34 UTC
Empfangen 4 Sep 2016, 20:35:22 UTC
Serverstatus Abgeschlossen
Resultat Erfolgreich
Clientstatus Fertig
Endstatus 0 (0x0)
Computer ID 611995
Laufzeit 1 min. 19 sek.
CPU Zeit 5 sek.
Prüfungsstatus Gültig
Punkte 26.74
Device peak FLOPS 475.20 GFLOPS
Anwendungsversion MilkyWay@Home
Anonyme Plattform (ATI Grafikkarte)
Peak working set size 91.66 MB
Peak swap size 96.41 MB
Peak disk usage 0.01 MB
Stderr Ausgabe

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
<search_application> milkyway_separation 1.36 Windows x86_64 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File
Using AVX path
Found 1 platform
Platform 0 information:
Name: AMD Accelerated Parallel Processing
Version: OpenCL 2.0 AMD-APP (2117.9)
Vendor: Advanced Micro Devices, Inc.
Extensions: cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices
Profile: FULL_PROFILE
Using device 0 on platform 0
Found 1 CL device
Device 'Hawaii' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Board: AMD Radeon R9 200 Series
Driver version: 2117.9 (VM)
Version: OpenCL 2.0 AMD-APP (2117.9)
Compute capability: 0.0
Max compute units: 44
Clock frequency: 1080 Mhz
Global mem size: 4294967296
Local mem size: 32768
Max const buf size: 65536
Double extension: cl_khr_fp64
Build log:
--------------------------------------------------------------------------------
C:\Users\MW\AppData\Local\Temp\\OCL1492T5.cl:176:72: warning: unknown attribute 'max_constant_size' ignored
__constant real* _ap_consts __attribute__((max_constant_size(18 * sizeof(real)))),
^
C:\Users\MW\AppData\Local\Temp\\OCL1492T5.cl:178:62: warning: unknown attribute 'max_constant_size' ignored
__constant SC* sc __attribute__((max_constant_size(NSTREAM * sizeof(SC)))),
^
C:\Users\MW\AppData\Local\Temp\\OCL1492T5.cl:179:67: warning: unknown attribute 'max_constant_size' ignored
__constant real* sg_dx __attribute__((max_constant_size(256 * sizeof(real)))),
^
3 warnings generated.

--------------------------------------------------------------------------------
Estimated AMD GPU GFLOP/s: 475 SP GFLOP/s, 95 DP FLOP/s
Warning: Bizarrely low flops (95). Defaulting to 100
Using a target frequency of 60.0
Using a block size of 11264 with 4 blocks/chunk
Using clWaitForEvents() for polling (mode -1)
Range: { nu_steps = 320, mu_steps = 800, r_steps = 700 }
Iteration area: 560000
Chunk estimate: 11
Num chunks: 13
Chunk size: 45056
Added area: 25728
Effective area: 585728
Initial wait: 13 ms
Integration time: 74.696196 s. Average time per iteration = 233.425612 ms
Integral 0 time = 75.041790 s
Running likelihood with 108460 stars
Likelihood time = 2.384113 s
<background_integral> 0.000219895184258 </background_integral>
<stream_integral> 26.279213526558141 282.717596669534940 59.916729387043546 </stream_integral>
<background_likelihood> -3.519474843288314 </background_likelihood>
<stream_only_likelihood> -64.978883633533044 -3.731802400470303 -3.737878862213316 </stream_only_likelihood>
<search_likelihood> -2.974664769155546 </search_likelihood>
22:32:49 (1492): called boinc_finish

</stderr_txt>
]]>

An inconclusive task:

Aufgabe 1764185669
Michael H.W. Weber · Abmelden
Name de_modfit_fast_15_3s_136_ModfitConstraints5_3_1471352126_22534138_0
Arbeitspaket 1294586066
Erstellt 4 Sep 2016, 20:31:47 UTC
Gesendet 4 Sep 2016, 20:32:02 UTC
Ablaufdatum 16 Sep 2016, 20:32:02 UTC
Empfangen 4 Sep 2016, 20:39:51 UTC
Serverstatus Abgeschlossen
Resultat Erfolgreich
Clientstatus Fertig
Endstatus 0 (0x0)
Computer ID 611995
Laufzeit 1 min. 4 sek.
CPU Zeit 6 sek.
Prüfungsstatus Überprüft, noch keine Übereinstimmung
Punkte 0.00
Device peak FLOPS 475.20 GFLOPS
Anwendungsversion MilkyWay@Home
Anonyme Plattform (ATI Grafikkarte)
Peak working set size 85.95 MB
Peak swap size 90.49 MB
Peak disk usage 0.01 MB
Stderr Ausgabe

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
<search_application> milkyway_separation 1.36 Windows x86_64 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File
Using AVX path
Found 1 platform
Platform 0 information:
Name: AMD Accelerated Parallel Processing
Version: OpenCL 2.0 AMD-APP (2117.9)
Vendor: Advanced Micro Devices, Inc.
Extensions: cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices
Profile: FULL_PROFILE
Using device 0 on platform 0
Found 1 CL device
Device 'Hawaii' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Board: AMD Radeon R9 200 Series
Driver version: 2117.9 (VM)
Version: OpenCL 2.0 AMD-APP (2117.9)
Compute capability: 0.0
Max compute units: 44
Clock frequency: 1080 Mhz
Global mem size: 4294967296
Local mem size: 32768
Max const buf size: 65536
Double extension: cl_khr_fp64
Build log:
--------------------------------------------------------------------------------
C:\Users\MW\AppData\Local\Temp\\OCL4924T5.cl:176:72: warning: unknown attribute 'max_constant_size' ignored
__constant real* _ap_consts __attribute__((max_constant_size(18 * sizeof(real)))),
^
C:\Users\MW\AppData\Local\Temp\\OCL4924T5.cl:178:62: warning: unknown attribute 'max_constant_size' ignored
__constant SC* sc __attribute__((max_constant_size(NSTREAM * sizeof(SC)))),
^
C:\Users\MW\AppData\Local\Temp\\OCL4924T5.cl:179:67: warning: unknown attribute 'max_constant_size' ignored
__constant real* sg_dx __attribute__((max_constant_size(256 * sizeof(real)))),
^
3 warnings generated.

--------------------------------------------------------------------------------
Estimated AMD GPU GFLOP/s: 475 SP GFLOP/s, 95 DP FLOP/s
Warning: Bizarrely low flops (95). Defaulting to 100
Using a target frequency of 60.0
Using a block size of 11264 with 4 blocks/chunk
Using clWaitForEvents() for polling (mode -1)
Range: { nu_steps = 320, mu_steps = 800, r_steps = 700 }
Iteration area: 560000
Chunk estimate: 11
Num chunks: 13
Chunk size: 45056
Added area: 25728
Effective area: 585728
Initial wait: 13 ms
Integration time: 58.227728 s. Average time per iteration = 181.961650 ms
Integral 0 time = 58.710419 s
Running likelihood with 108460 stars
Likelihood time = 2.392808 s
<background_integral> 0.000256231358565 </background_integral>
<stream_integral> 36.101376679045323 312.027800009935050 101.204115896450660 </stream_integral>
<background_likelihood> -3.399075315784963 </background_likelihood>
<stream_only_likelihood> -4.530232320863708 -3.982049344512974 -3.885673391537581 </stream_only_likelihood>
<search_likelihood> -2.969699537899006 </search_likelihood>
22:37:18 (4924): called boinc_finish

</stderr_txt>
]]>

An invalid task:

Aufgabe 1764165427
Michael H.W. Weber · Abmelden
Name de_modfit_fast_15_3s_136_fixedangles3_3_1471352126_22478237_2
Arbeitspaket 1294526646
Erstellt 4 Sep 2016, 20:17:26 UTC
Gesendet 4 Sep 2016, 20:17:28 UTC
Ablaufdatum 16 Sep 2016, 20:17:28 UTC
Empfangen 4 Sep 2016, 20:25:19 UTC
Serverstatus Abgeschlossen
Resultat Erfolgreich
Clientstatus Fertig
Endstatus 0 (0x0)
Computer ID 611995
Laufzeit 1 min. 23 sek.
CPU Zeit 6 sek.
Prüfungsstatus Arbeitspaket fehlerhaft - Prüfung übersprungen
Punkte 0.00
Device peak FLOPS 475.20 GFLOPS
Anwendungsversion MilkyWay@Home
Anonyme Plattform (ATI Grafikkarte)
Peak working set size 86.03 MB
Peak swap size 90.61 MB
Peak disk usage 0.01 MB
Stderr Ausgabe

<core_client_version>7.6.22</core_client_version>
<![CDATA[
<stderr_txt>
<search_application> milkyway_separation 1.36 Windows x86_64 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File
Using AVX path
Found 1 platform
Platform 0 information:
Name: AMD Accelerated Parallel Processing
Version: OpenCL 2.0 AMD-APP (2117.9)
Vendor: Advanced Micro Devices, Inc.
Extensions: cl_khr_icd cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_khr_dx9_media_sharing cl_amd_event_callback cl_amd_offline_devices
Profile: FULL_PROFILE
Using device 0 on platform 0
Found 1 CL device
Device 'Hawaii' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU)
Board: AMD Radeon R9 200 Series
Driver version: 2117.9 (VM)
Version: OpenCL 2.0 AMD-APP (2117.9)
Compute capability: 0.0
Max compute units: 44
Clock frequency: 1080 Mhz
Global mem size: 4294967296
Local mem size: 32768
Max const buf size: 65536
Double extension: cl_khr_fp64
Build log:
--------------------------------------------------------------------------------
C:\Users\MW\AppData\Local\Temp\\OCL3612T5.cl:176:72: warning: unknown attribute 'max_constant_size' ignored
__constant real* _ap_consts __attribute__((max_constant_size(18 * sizeof(real)))),
^
C:\Users\MW\AppData\Local\Temp\\OCL3612T5.cl:178:62: warning: unknown attribute 'max_constant_size' ignored
__constant SC* sc __attribute__((max_constant_size(NSTREAM * sizeof(SC)))),
^
C:\Users\MW\AppData\Local\Temp\\OCL3612T5.cl:179:67: warning: unknown attribute 'max_constant_size' ignored
__constant real* sg_dx __attribute__((max_constant_size(256 * sizeof(real)))),
^
3 warnings generated.

--------------------------------------------------------------------------------
Estimated AMD GPU GFLOP/s: 475 SP GFLOP/s, 95 DP FLOP/s
Warning: Bizarrely low flops (95). Defaulting to 100
Using a target frequency of 60.0
Using a block size of 11264 with 4 blocks/chunk
Using clWaitForEvents() for polling (mode -1)
Range: { nu_steps = 320, mu_steps = 800, r_steps = 700 }
Iteration area: 560000
Chunk estimate: 11
Num chunks: 13
Chunk size: 45056
Added area: 25728
Effective area: 585728
Initial wait: 13 ms
Integration time: 78.214496 s. Average time per iteration = 244.420301 ms
Integral 0 time = 78.526644 s
Running likelihood with 108460 stars
Likelihood time = 2.372197 s
<background_integral> 0.000221023641879 </background_integral>
<stream_integral> 56.528977956441722 341.255842918021300 64.662390777401527 </stream_integral>
<background_likelihood> -3.551674800922283 </background_likelihood>
<stream_only_likelihood> -5.829435611967259 -3.728891679824896 -3.829357027346766 </stream_only_likelihood>
<search_likelihood> -2.993528071709122 </search_likelihood>
22:23:12 (3612): called boinc_finish

</stderr_txt>
]]>

Note that these are results generated when running 8 tasks in parallel.

My system is an Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz [Family 6 Model 42 Stepping 7] CPU running Windows 7 Ultimate x64 with an MSI R9 290X Lightning GPU. The machine is equipped with 16 GB of RAM and one CPU core is kept empty to fire the GPU with maximum performance.

It is an excellent card and I am highly disappointed that this project makes so little out of it. It almost appears as if you guys have enough compute power for free. If that is the case, just let me know and I won't bother you any further as there are many projects out there which are in need of ressources.

I should also note that the configuration file above previously was a different one which I had to manually update. Suddenly the older one did not work anymore. Without notice from the project on its website.

If you expect people to participate in your project in large sums, then you need to take utmost care to keep things as simple as possible. A person new to Milkyway@home will most likely never get an R9 290X to run for you given the manual intervention required to do so.

Michael.
17) Message boards : Number crunching : GPU WU runtimes too short: Wast of compute power (Message 64329)
Posted 11 Feb 2016 by Profile Michael H.W. Weber
Post:
Hey Michael,

We have been looking into "bundling" work units to make them a bit bigger. This has its own problems that we are currently trying to solve. Hopefully by the end of the Summer session we will have a working option for this.

Jake W

Well, summer is over and I patiently waited for another 8 months.
How is the implementation of longer GPU WUs progressing?

Michael.
18) Message boards : Number crunching : GPU WU runtimes too short: Wast of compute power (Message 63702)
Posted 11 Jun 2015 by Profile Michael H.W. Weber
Post:
Well, thanks for the feedback. I think it is an important issue and I hope it can be solved soon.

Michael.
19) Message boards : Number crunching : GPU WU runtimes too short: Wast of compute power (Message 63692)
Posted 10 Jun 2015 by Profile Michael H.W. Weber
Post:
Is it possible to extend the individual runtimes of GPU WUs, please?

My AMD 290X completes its WUs within 16 to 47 sec per WU (depending on the WU type). Then there is a limited number of WUs that can be downloaded, too. The time required to power down and to restart the next WU takes a good fraction of the short running WUs, so making these WUs as short as they are currently, wastes a significant proportion of computation time.
On top of that there is a high frequency of internet connections required. The system is virtually permanently up- or downloading work.

All these issues could be resolved if the WUs could be "bundeled" to take, say, around 30 min to a few hours of compute time per WU.

Would that be an option?

Michael.
20) Message boards : Number crunching : AMD R9 290X does not receive any GPU work (Message 63372)
Posted 13 Apr 2015 by Profile Michael H.W. Weber
Post:
Finally, my 290X receives WUs. These run very short, though. I realised completion durations ranging only from 14 to 49 seconds per WU, so far.
Strange...

Michael.


Next 20

©2024 Astroinformatics Group