Posts by BeemerBiker
log in
1) Message boards : Number crunching : AMD FirePro S9150 (Message 67552)
Posted 21 days ago by Profile BeemerBiker


After making the change in that ati.exe I got the following results
Warp size: 64 ALU per CU: 64 Double extension: cl_khr_fp64 Double fraction: 1/4 --- --- Estimated AMD GPU GFLOP/s: 4608 SP GFLOP/s, 1152 DP FLOP/s Using a target frequency of 1.0 Using a block size of 10240 with 55 blocks/chunk





Isn't FP32 to FP64 ratio 1/2?


Must I do everything?

where the source code shows 16*4,4,64 that is 40,4,40 hex
change the 4 to a 2 as shown here


and get the following 1/2 ratio

Device 'Hawaii' (Advanced Micro Devices, Inc.:0x1002) (CL_DEVICE_TYPE_GPU) Board: AMD FirePro S9100 Driver version: 2527.9 Version: OpenCL 1.2 AMD-APP (2527.9) Compute capability: 0.0 Max compute units: 40 Clock frequency: 900 Mhz Global mem size: 3221225472 Local mem size: 32768 Max const buf size: 3221225472 Double extension: cl_khr_fp64 Build log: -------------------------------------------------------------------------------- C:\Users\JSTATE~1\AppData\Local\Temp\\OCL7972T4.cl:183:72: warning: unknown attribute 'max_constant_size' ignored __constant real* _ap_consts __attribute__((max_constant_size(18 * sizeof(real)))), ^ C:\Users\JSTATE~1\AppData\Local\Temp\\OCL7972T4.cl:185:62: warning: unknown attribute 'max_constant_size' ignored __constant SC* sc __attribute__((max_constant_size(NSTREAM * sizeof(SC)))), ^ C:\Users\JSTATE~1\AppData\Local\Temp\\OCL7972T4.cl:186:67: warning: unknown attribute 'max_constant_size' ignored __constant real* sg_dx __attribute__((max_constant_size(256 * sizeof(real)))), ^ 3 warnings generated. -------------------------------------------------------------------------------- Estimated AMD GPU GFLOP/s: 4608 SP GFLOP/s, 2304 DP FLOP/s Using a target frequency of 60.0
2) Message boards : Number crunching : AMD FirePro S9150 (Message 67546)
Posted 22 days ago by Profile BeemerBiker
On paper, s9150s and s9100 has so much potential with milkyway@home. hopefully you guys can figure out what causes the invalids.


Found out a couple of things. (1) a new driver released 5-24 and (2) Found out how to better identify the S9xxx which is not being recognized properly (requires binary patch to exe).

The following info can be observed by adding
<cmdline>--verbose</cmdline>
to the app_config file.

Warp size: 64 ALU per CU: 5 Double extension: cl_khr_fp64 Double fraction: 1/5 --- --- Estimated AMD GPU GFLOP/s: 360 SP GFLOP/s, 72 DP FLOP/s Warning: Bizarrely low flops (72). Defaulting to 100


The above shows that MW assumes only 5 arithmetic logical units (ALU) are in a compute unit and assumes that 5 single precision operations can take place in the time it takes a single double precision to complete. In actuality, according to wiki, there are 64 ALUs and the S9150 can easily complete a double precision in only twice the time it takes to do a single 5070:2530

I looked at the source code and spotted a deficiency:"Hawaii" was missing from the list of AMD boards. S9xxx boards are hawaii series.


This can be fixed (at least for me) by changing "Thames" to"Hawaii" as I do not have a Thames graphics product.

I used the free binary editor "neo" to make the change as shown here



After making the change in that ati.exe I got the following results
Warp size: 64 ALU per CU: 64 Double extension: cl_khr_fp64 Double fraction: 1/4 --- --- Estimated AMD GPU GFLOP/s: 4608 SP GFLOP/s, 1152 DP FLOP/s Using a target frequency of 1.0 Using a block size of 10240 with 55 blocks/chunk



Anyway, after all this work, I still do not have TESLA performance, but it is running about %20 faster. I suspect there are other factors involved, but at least my S9100 is better identified.

I am processing 4 WUs at a time on both S9100 and S9000. Each WU is bundled as either 4 or 5 units. When I compare performance improvement I have to make sure that i am comparing the same bundles.
3) Message boards : Number crunching : AMD FirePro S9150 (Message 67536)
Posted 26 days ago by Profile BeemerBiker
did you by chance notice how hot the VRMs were?


Unfortunately, S9xxx information is not as complete as most HD7950. In addition to missing measurements, the clock frequency on my S9100 is not fixed at its maximum value like the S9000 or HD79xx series. It varies with load but it does jump to its minimum (300) with no load like the other AMD boards. Just does not stay at 800 like one would expect. Maybe this is by design. Note the s9000 (equivalent to HD7950) is locked at 900, its maximum. According to AMD docs, the 9100 supports OpenCL 2.1 but is being used at 1.2 according to the MW stdout report. My guess NM did not test their program against this board to optimize their code but I dont blame them as this is not a widely used board as it is designed for servers and has no video output. The S9000 does have video but not the 9100.

[EDIT] While both HD79xx and S9000 are the same basic chip, I have given up trying to get them to co-exist on the same motherboard.



4) Message boards : Number crunching : AMD FirePro S9150 (Message 67534)
Posted 26 days ago by Profile BeemerBiker
perhaps the errors are caused by overheating if doing too many WUs. AMD specs for the S9150 require 20 cfm at 45 degree max inlet temps.


System requirements: 20 CFM airflow cooling at 45° C maximum inlet temperature, Available PCI Express x16 (dual slot), 3.0 for optimal performance Power supply plus one 2x4 (8-pin) and one 2x3 (6-pin) AUX power connectors, 2GB system memory


Yes, I saw that, but even in my garage it never gets that hot. The attic on a hot day probably hits that 114f or higher in the middle of the summer.

My S9100 has only a single 8pin unlike the s9150 and less memory. I have ECC enabled and have never seen an error. There are no MW errors (invalids) when running one concurrent task at a time. The MW invalids increase exponsntially as more concurrent tasks are added.

Currently it is in the garage due to the 20cfm (or higher) blower as it makes too much noise. gpu-z measured temp is 65c for the S9100 and slighly less for the Q9550s cpu as reported by tthrottle. It is running 3 WUs at a time and ratio of valid to invalid stays about 500:1 When I was running 10 concurrent tasks I was getting an 8:1 ratio and that test was run inside with A/C probably 75f way under 45c.
5) Message boards : Number crunching : New Linux system trashes all tasks (Message 67525)
Posted 21 May 2018 by Profile BeemerBiker
You are using CUDA 9.1.84 which came out last year. However, your video driver is 391.24 which is fairly recent, March 18 it seems.

Something is not right. My windows 7 system shows 9.1.104 for CUDA and the driver that it came with is 388.71 which is older than your driver.

Did you install the CUDA toolkit? It comes with a default driver and is only up-to-date when a new toolkit is released. Suggest you download last weeks release: WHQL 397.64 which should get you cuda 9.2.x
6) Message boards : Number crunching : New Linux system trashes all tasks (Message 67522)
Posted 20 May 2018 by Profile BeemerBiker
Looks like it is picking the wrong device
Using device 2 on platform 0 Found 2 CL devices Requested device is out of range of number found devices


Your win7 system has 3 gtx1070 but it only sees 2. When it picks devices 0 and 1 it gets (last time I looked at your computer) 127 valid tasks but when it picks device 2 (the 3rd one) all tasks fail. So far, 113 of them

Valid tasks for computer 257518 Next 20 State: All (495) · In progress (240) · Validation pending (0) · Validation inconclusive (15) · Valid (127) · Invalid (0) · Error (113) Application: All (495) · MilkyWay@Home (495) · MilkyWay@Home N-Body Simulation (0)


Strange it sees only 2 opencl but tries the 3rd one anyway.

What does your event message show? I have three gtx 1070ti and milkyway is using sse4.1, not avx on my i9-7900X


CUDA: NVIDIA GPU 0: GeForce GTX 1070 Ti (driver version 397.64, CUDA version 9.2, compute capability 6.1, 4096MB, 3558MB available, 8186 GFLOPS peak) CUDA: NVIDIA GPU 1: GeForce GTX 1070 Ti (driver version 397.64, CUDA version 9.2, compute capability 6.1, 4096MB, 3558MB available, 8186 GFLOPS peak) CUDA: NVIDIA GPU 2: GeForce GTX 1070 Ti (driver version 397.64, CUDA version 9.2, compute capability 6.1, 4096MB, 3558MB available, 8186 GFLOPS peak) OpenCL: NVIDIA GPU 0: GeForce GTX 1070 Ti (driver version 397.64, device version OpenCL 1.2 CUDA, 8192MB, 3558MB available, 8186 GFLOPS peak) OpenCL: NVIDIA GPU 1: GeForce GTX 1070 Ti (driver version 397.64, device version OpenCL 1.2 CUDA, 8192MB, 3558MB available, 8186 GFLOPS peak) OpenCL: NVIDIA GPU 2: GeForce GTX 1070 Ti (driver version 397.64, device version OpenCL 1.2 CUDA, 8192MB, 3558MB available, 8186 GFLOPS peak) Host name: JYSArea51 Processor: 20 GenuineIntel Intel(R) Core(TM) i9-7900X CPU @ 3.30GHz [Family 6 Model 85 Stepping 4] Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 fma cx16 sse4_1 sse4_2 movebe popcnt aes f16c rdrandsyscall nx lm avx avx2 vmx tm2 dca pbe fsgsbase bmi1 hle OS: Microsoft Windows 10: Professional x64 Edition, (10.00.17134.00)
7) Message boards : Number crunching : New Linux system trashes all tasks (Message 67513)
Posted 20 May 2018 by Profile BeemerBiker
Found another errored task and this one has a lot more information in the stderr.txt output. It looks like the application had a problem compiling the OpenCL wisdom file. I have not had any issues with either Seti or Einstein compiling their OpenCL applications wisdom files.

Has anyone else had issues with Linux compiling the OpenCL wisdom files before?

Stderr output
<core_client_version>7.4.44</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)
</message>
<stderr_txt>
<search_application> milkyway_separation 1.46 Linux x86_64 double OpenCL </search_application>
[...snip...]
Build log:
--------------------------------------------------------------------------------
<kernel>:183:72: warning: unknown attribute 'max_constant_size' ignored
__constant real* _ap_consts __attribute__((max_constant_size(18 * sizeof(real)))),
^
<kernel>:185:62: warning: unknown attribute 'max_constant_size' ignored
__constant SC* sc __attribute__((max_constant_size(NSTREAM * sizeof(SC)))),
^
<kernel>:186:67: warning: unknown attribute 'max_constant_size' ignored
__constant real* sg_dx __attribute__((max_constant_size(256 * sizeof(real)))),
^
<kernel>:235:26: error: use of undeclared identifier 'inf'
tmp = mad((real) Q_INV_SQR, z * z, tmp); /* (q_invsqr * z^2) + (x^2 + y^2) */
^
<built-in>:35:19: note: expanded from here
#define Q_INV_SQR inf
^

--------------------------------------------------------------------------------
clBuildProgram: Build failure (-11): CL_BUILD_PROGRAM_FAILURE
Error building program from source (-11): CL_BUILD_PROGRAM_FAILURE
Error creating integral program from source
Failed to calculate likelihood
Background Epsilon (61.817300) must be >= 0, <= 1
18:13:51 (10595): called boinc_finish(1)

</stderr_txt>
]]>


Keith,

I recognized that Q_INV_SQR error message! Rather than duplicating stuff posted back in early 2017, I'll refer you to a thread in the Linux forum, titled "Consistent "Validate error" status", in which I mentioned some research I'd done into why the client was apparently building bad GPU kernels:

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4091

Also, another thread (in the Science board) called "Fix it or I'm gone":

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4093

In precis, it looks as if the parameter reading can get out of sync in some versions of boinclib; folks seeing that error back then cleared it by moving to newer clients (usually 7.6 family...)

Now I know you're an enthusiastic Seti@Home person, so it might be you have a good reason for running 7.4.44 (which was reputed to have some singularities of its own!)- I'll just say that I've never had any problems with Seti, Einstein or MilkyWay using client 7.6.32 or .33 (using NVidia GPUs)

Don't know whether this will have helped any in your case, but at least it explains what causes the error message!

Cheers - Al.

P.S. I think it's actually trying to build a GPU kernel, not a wisdom file (but I'm not a MilkyWay developer s I could be wrong...)


You beat me to it. I just found that 4093 thread about INF stands for infinity and the suggestion that old boinc clients cause this problem. Poking around, I also found the source code seeming tocause that particular error here
8) Message boards : Number crunching : New Linux system trashes all tasks (Message 67497)
Posted 19 May 2018 by Profile BeemerBiker
this may not apply to you, but I just discovered that one of my Linux systems "lost" opencl. This was a minimum server install and I mistakenly allowed it to apply updates and upgrades on its own when first set up several months ago.

It seems to have rebooted just recently, possibly power spike, but I noticed yesterday that all asteroids, collatz and Einstein were erroring out. Collatz had almost 300 failed units.

Unaccountably, I could not reinstall my original cuda 384 and had to download and install the 390.59 which also failed but at least the error messages showed up on google and I was able to install and get opencl working again.
9) Message boards : Number crunching : New Linux system trashes all tasks (Message 67484)
Posted 18 May 2018 by Profile BeemerBiker
Your line "stream sigma 0.0 is invalid" seems to be the first real error. Comparing your system with mine the stderr is the same but I show "Using SSE4.1 path" where you have that invalid warning.

Googlein that error message did not get a single hit, sorry, not sure the problem. I assume it is related to opencl as the next line after "Using SSE4.1 path" is the platorm that opencl finds.


Your kernel 4.15.0-20 is newer than mine. I assume you have 18.x while my build was 17.04

However, you are running a really old build of boinc. My apt-get got me 7.8.3 and you show 7.4.44 really old. I assume that is not a problem since Einstein and seti are working.

The official boinc download site shows an even older 7.4.22 but they make it clear that one should use the package manager to get the newest version.

May not help, but you might consider getting 7.8.3.
10) Message boards : Number crunching : Which GPU? (Message 67473)
Posted 17 May 2018 by Profile BeemerBiker
Can't Milkyway use Single Precision instead? From what I've read, a double precision calculation can be emulated using two single precision calculations. This would speed things up on almost every GPU out there.


Single precision is accurate to 7 digits and double to 15 as illustrated here

If your accuracy requires more than 7 digits of precison you use double precision: either in hardware or using a library that emulates double precision.

Possibly, some optimization could be done to speed things up as SETI lunatics did for setiathome.

I had to program in CMS-2M on navy systems that did not support floating point hardware. It was all done using scaled arithmetic, binary angles, and trig lookup tables. Floating point hardware would have made a huge difference in cost, especially labor.
11) Message boards : Number crunching : Which GPU? (Message 67470)
Posted 17 May 2018 by Profile BeemerBiker
https://docs.google.com/spreadsheets/d/1ImSDoLeuZFvmO6xoMpy2VMs9Du_6sHT5GqcORMtx2tQ

That link should work for my Spreadsheet. Still a work in progress and there's some missing data and some extra fluff like the "fake rank" column where I decided to try and assign a very-rough estimate of overall general performance based on Cores*Clockspeed.

The main focus is showing 3 main variables (GFLOPS, Price, Watts) and how they interact.

Unfortunately due to locking/merging certain cells I cannot dynamically sort the columns how I would like. Still determining if there is a way I can do that.


Nice spreadsheet, thanks!

Want to mention that the S9000 is listed at "<225" at WiKi, and I suspect it is no more than 200 and may be closer to 150 than 225. It has a single 8 pin PCIe which has 3 12v power leads unlike the "200w" 7950 which has 4 (two 6pin ) or 5 (6pin+8pin). I lost a pair of 7950 due to overheat and the remaining I converted to liquid cooling. When I got my first S9000 (new, unused for $150 but passive cooling), I discovered the left over HD7950 fan cooling fit perfectly on the S9000. Depending on the OEM, the cooling assembly also cooled the memory chips on the GPU side of the board. MSI and PowerColor heat sinks serviced only the GPU chip. I now have 3 of the S9000 and they all run cool. I had to use a dremmel to cut a small amount of plastic from the shroud exhaust end on a gigabyte cooling assembly so one of the S9000 would fit in case. Anyone with a dead HD7950 but working cooling assembly can essentially get a brand new, unused replacement for under $160. Crossfire is not supported and a 3pin molex must be forced into the 4 pin cooling cable to make the fans run. Since I run 24/7 this as fine for me.
12) Message boards : Number crunching : Huge number of 'Validation inconclusive' WUs (Message 67451)
Posted 11 May 2018 by Profile BeemerBiker
Oh...they are not "my" Tesla's. I wish. I got permission to play with the machines they were in. Thats also why they are hidden...didn't want the host names to get out


THIS is what happens when the host name gets out!
13) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 67448)
Posted 10 May 2018 by Profile BeemerBiker
Just saw this thread and got the following computations using this program.

Single S9100 with Q9550s (Core 2 quad) - Three concurrent work units
Run Time CPU Time Credit (sec) (sec) 93.3 19.1 227.7 100.3 19.2 227.7 86.3 17.9 227.6 96.3 21.1 227.7 114.4 18.5 229.7 105.3 17.8 229.4 111.4 17.7 227.6 78.2 18.0 227.6 93.4 13.9 228.5 93.3 14.9 231.2 90.3 17.7 227.6 97.4 16.3 227.6 94.3 16.2 227.6 128.5 14.8 231.2 97.4 14.3 228.5 102.4 21.4 227.7 110.4 18.5 229.4 119.4 14.8 231.2 91.3 18.0 227.6 123.4 18.6 229.7 ---------------------------------- AVG: 101.3 17.4 228.6 STD: 12.7 2.1 1.3


101.3 / 3 = about 34 seconds for a single work unit


Pair of S9000 (same as HD7950) on X5470 w/771->775 adapter Five concurrent tasks
Run Time CPU Time Credit (sec) (sec) 170.5 12.0 227.6 211.8 19.0 227.7 271.5 20.5 229.3 213.1 16.5 229.4 223.1 16.9 229.7 252.5 16.6 230.8 252.5 16.5 230.8 197.8 16.6 229.4 204.3 19.4 227.7 185.9 16.3 227.6 235.5 22.3 227.3 239.4 20.3 229.3 264.2 16.9 230.8 172.7 16.7 227.6 266.4 16.4 230.8 235.0 22.3 227.3 172.0 16.4 227.7 178.4 18.8 227.7 179.7 17.0 227.7 221.3 16.6 227.6 ---------------------------------- AVG: 217.4 17.7 228.7 STD: 33.1 2.3 1.3


217 / 5 = 44 seconds per workunit each board.
14) Message boards : Number crunching : Huge number of 'Validation inconclusive' WUs (Message 67447)
Posted 9 May 2018 by Profile BeemerBiker
I was crunching on 3 Tesla v100's at the same time. A WU took around 35 seconds while running 6 at a time. So averaging a little under 6 seconds per WU x 3 cards which averages out to under 2 seconds per WU. Thats over 43,000 a day. Now I only ran like this for a couple hours (testing) but my inconclusive total was over 2,000 in those couple hours. Out of those I threw 6 errors and 2 invalids, I have 134 still at inconclusive, and all the rest validated.


I still don't know what they issue is you guys are posting about.....


The original post was about inconclusive validations and you are correct in that if you wait them out all will eventually be validated.

OTOH, the post I made was to point out that the first system listed had 3 titans and 19,000 of the work units error'ed which is not the same as inconclusive validations.

Looking at task details one observes that OpenCL was unable to find any nVidia devices although the system had 3 titans.

MY S9100 is nowhere as fast as your tesla. It did cost me under $300 and used only a single 8pin power connector which is a plus.

Your computers are hidden, but you might want to run my program at
http://new.stateson.net/HostProjectStats to get an accurate measurement of completion time.
15) Message boards : Number crunching : Huge number of 'Validation inconclusive' WUs (Message 67434)
Posted 8 May 2018 by Profile BeemerBiker
Hello,
at present I have more than 2000 'validation inconclusive' WU (MilkyWay@Home v1.46 (opencl_ati_101)), these are of the 'Unsent' variety, on my three machines:

https://milkyway.cs.rpi.edu/milkyway//results.php?hostid=764666&offset=0&show_names=0&state=3&appid=
https://milkyway.cs.rpi.edu/milkyway//results.php?hostid=765378&offset=0&show_names=0&state=3&appid=
https://milkyway.cs.rpi.edu/milkyway//results.php?hostid=763440&offset=0&show_names=0&state=3&appid=

Any idea what's going on?

Many thanks,
max


Problem is bug in program. Looking at the set of errors, the first user I looked at had 3 titans but had almost 20,000 errors. If all tasks all error out the number of "pending" will rise to the total number of work units.

I thought only 80 were allowed per day. Even with 3 titans my math suggests it should have taken a week at 80 per 24 hours. All 19,736 was from May7, 5am to may8, 1300
16) Message boards : Number crunching : Project frequently using wrong grapics board (FP64 on FP32 only system) (Message 67429)
Posted 6 May 2018 by Profile BeemerBiker
I have been looking at why I get a lot of invalidate errors on my S9100 graphics board and was comparing the Task Details of my work unit with the Task Details of my wingmans and noticed a deficiency, probably in how BOINC reports the type of graphics boards and how the project chooses to use that info.

First, this work unit shows 2 errors, 2 valid and 1 (mine) invalid. 3 systems were ATI and 2 are nVidia and overall state was marked "too many error possible bug"

Examining each of the "error" systems shows an attempt to use the built in Intel graphics chipset instead of the "BOINC suggested nVidia" On both system, the Intel GPU did not support FP64. The system with supposidly two 1060s had over 300 errors with only 27 valid but the other system had almost 3000 errors and no other results.
I looked at the 27 valid units and in all 27 the NVidia platform was recognized unlike the 364 failures. Obvious bug, probably OpenCL? Possibly Milkyway?

This system appears to have 2 gtx1060 but actually that is an error in how BOINC goes about determining whats there when there are 2 or move video boards depending on the OS. Here is a typical Task Report
&lt;search_application&gt; milkyway_separation 1.46 Windows x86 double OpenCL &lt;/search_application&gt; BOINC GPU type suggests using OpenCL vendor 'NVIDIA Corporation' --- Using AVX path Found 1 platform Platform 0 information: Name: Intel(R) OpenCL Version: OpenCL 1.2 Vendor: Intel(R) Corporation Extensions: cl_khr_fp64 cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_intel_printf cl_ext_device_fission cl_intel_exec_by_local_thread cl_khr_gl_sharing cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d11_sharing Profile: FULL_PROFILE Didn't find preferred platform Using device 1 on platform 0 Failed to find number of devices (-1): CL_DEVICE_NOT_FOUND Failed to get information about device Error getting device and context (1): MW_CL_ERROR Failed to calculate likelihood-----this keeps repeating, nVidia is never found nor used although suggested---- -----that intel graphcs board does not support FP64, should have been rejected immediately----

This system has built in Intel graphics and also an nVidia 960m which is capable of (very low) double precision. BOINC, under linux, does report the correct identities of each graphics board under the hostid unlike the windows gtx1060 systems. here is the problem
&lt;search_application&gt; milkyway_separation 1.46 Linux x86_64 double OpenCL &lt;/search_application&gt; BOINC GPU type suggests using OpenCL vendor 'NVIDIA Corporation' ----- Platform 0 information: Name: Intel Gen OCL Driver Version: OpenCL 1.2 beignet 1.1.1 Vendor: Intel Extensions: cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_khr_icd Profile: FULL_PROFILE Didn't find preferred platform Using device 0 on platform 0 Found 1 CL device Device 'Intel(R) HD Graphics Skylake Halo GT2' (Intel:0x8086) (CL_DEVICE_TYPE_GPU) ---this repeats and there is no further mention of the nVidia board---- ---that intel chipset does not supoport FP64, the 960m should have been used---


So far, those two errors were because the FP64 GPU was available but not used (or couldnt be found) so there were actually two "validates" and one (mine) invalided. Result should have been accepted.

IMHO the project should check for "double precision missing", mark as an error, but not use the error as part of any invalidation test.
17) Message boards : Number crunching : My number of "invalids" decreased. How is this possible? (Message 67428)
Posted 6 May 2018 by Profile BeemerBiker
Hi BeemerBiker,

It looks like the likelihood calculation is actually not being completed correctly for some small percentage of your work. Is it possible that you are running more runs than your card has memory for? Since we've moved to bundling workunits the size of a single run has increased to 500 MB on the card. Normally if you run out of memory OpenCL is supposed to throw an error, but in my experience, OpenCL sometimes fails to do what you expect it to.

Jake



I got to digging deeper into this as there should be no reason why I am getting invalidate errors on my S9100 as a function of the number of concurrent tasks. Another user, melk also discovered the same behavior. I went to my list of invalids and found some strange statistics, one of which I want to share.

First, these co-processors, are 3x as fast double precision as a typical HD7950 at the same speed (~850mhz). FP64 is 2620 compared to 717. They also have 12gb or more memory compared to 3gb. Granted that opencl probably cannot access over 4gb and possible uses way less than that. However, the behavior I see is that when I add additional concurrent tasks, The S91xx devices start generating invalidate errors exponentially up through about 20 concurrent WUs (all I tried), while the HD7950s generate a boatload of errors starting after 5 concurrent units. Sometimes so fast that I go through all 80 I am allowed before I can suspend the project. I have a program here that I use to compute performance information when setting up a project to use concurrent tasks. What I observed I can summerize as follows
HD7950: no invalids and can run 4 - 5 concurrent but it looks like after 3 concurrent there is no addional benefit in throughput and after 5 all h-ll breaks loose.

S9100
1 work unit generates no invalids that I can see
2 work units may generate an invalid. Not sure as I dont have time to watch it and your policy (like all projects) drops the invalid count as data is removed form the server to make room.
3 concurrent work units probably generate invalidates but if I start with, say, 320, it make take 4-5 days before that number drops to, for example, 100
4 concurrent work units generate enough "invalidate" errors to consistently stay at 320 invalids as I noticed it hoovered at 320 without changing much over a weeks time. I finally dropped ngpu to 3 as I didnt want other users to have to reprocess my invalidates.
This gets much worse...
I ran 20 WUs for a short time and about one out of four were invalid. This system is extremely fast producing a result in 35 seconds and my run at 20 concurrent tasks for 30 minutes produced 800 invalid work units. I was not aware of this until about an hour after I started my performance test when I got around to looking at the results. The fact that 3 out of 4 were computed correctly indicates there is a software problem in handling how work units are added or removed from the GPU. This could be an opencl problem, not milkyway as the driver is different. For example, looking at other users S91xx (total of 3 users here) I also see the following warning in the "Task Details" of all results, both valid and invalid
C:\Users\josep\AppData\Local\Temp\\OCL2292T3.cl:183:72: warning: unknown attribute 'max_constant_size' ignored __constant real* _ap_consts __attribute__((max_constant_size(18 * sizeof(real)))), ^ C:\Users\josep\AppData\Local\Temp\\OCL2292T3.cl:185:62: warning: unknown attribute 'max_constant_size' ignored __constant SC* sc __attribute__((max_constant_size(NSTREAM * sizeof(SC)))), ^ C:\Users\josep\AppData\Local\Temp\\OCL2292T3.cl:186:67: warning: unknown attribute 'max_constant_size' ignored __constant real* sg_dx __attribute__((max_constant_size(256 * sizeof(real)))), ^ 3 warnings generated.


I do not see the above warning in any of my HD7950 systems and the drivers are different.

I noticed some other problems but will post on a separate thread.

Question: Is there any command line parameters or config setting that can enable any type of debugging information? I would like to debug this problem and see if there is a fix. I am guessing that possbily the tasks are being inserted and removed from the GPU too fast and possibly (just a guess) the results are not being allowed to be fully "saved" before it is removed from the GPU.
18) Message boards : Number crunching : AMD FirePro S9150 (Message 67393)
Posted 24 Apr 2018 by Profile BeemerBiker
Good to hear you got it working. Have some questions and an observation.

What are you using for cooling? I got a blower attachment from a 3d printer guy on ebay, but it made so much noise I put the system in my garage.



I am thinking of using a CPU liquid cooler and attaching it using a NZXT adapter.

I have an RX-560 in a Desktop Ubuntu 17.10.1 system and I read here that this same AMD PRO driver supports the "AMD FirePro™ S-Series" I may take it off that uAtx motherboard and put it into my desktop Ubuntu system. I will not fit with that blower attachment.

I started using a Desktop Ubuntu system when I discovered that the windows version of the gridcoin wallet had serious bugs.

Your system also has the same problem with invalid tasks as mine. Your count is at 349 and slowly climbing. I hardly get any invalids on my HD7950s nor on my S9100 if I run 1 at a time. I am running 4 at a time right now and I suspect these S series are much faster and there is probably a bug in the OpenCL code Milkyway is using when scheduling tasks into and getting results out of the graphics board.
19) Message boards : Number crunching : Titan V (Message 67375)
Posted 20 Apr 2018 by Profile BeemerBiker

Sounds to me like you should get one and find out!! 3 thousand dollars is WAAAY beyond my budget for gpu's but it sure would be nice to have one and find out how it works here!!


nVidia boards seem to hold their value, especially those Titans. OTOH those AMD "professional" workstation boards have had huge drops in price especially the servers. One can easily get a new, unused s9150 for under $500 whereas the original MSRP was 10x as much. I cannot find the original article but I read last week where some big "project" fell through in China and someone was trying to unload several lots of S9150. Probably caused by bitcoin losing half its value.

Be nice if the Titan would drop by 10x but I suspect it will never be seen on eBay for $300 except for "parts"
20) Message boards : Number crunching : n body crawling (Message 67373)
Posted 20 Apr 2018 by Profile BeemerBiker
GpuGrid's quantum chemistry uses multiple CPUs but is Linux. I believe Amicable numbers also has a multiple cpu application.

Also, you cannot take the initial waiting time, that 13 days, literally. It takes a while to determine how efficient the system is and usually the time drops significantly. My problem is older core 2 quads that barely keep up with feeding the Milkyway GPU tasks. I also noticed that when first attaching, or re-attaching, one can pick up an n-body even if you have checked the "DO NOT USE CPU". I posted about this over at BOINC since it is their scheduler (I am guessing) that asks for tasks before it checks the venues, resources, or restrictions.


Next 20

Main page · Your account · Message boards


Copyright © 2018 AstroInformatics Group