rpi_logo
GPU Issues Mega Thread
GPU Issues Mega Thread
log in

Advanced search

Message boards : News : GPU Issues Mega Thread

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next
Author Message
Profile UnionJack
Send message
Joined: 8 Jan 10
Posts: 16
Credit: 4,783,731
RAC: 3,514

Message 65489 - Posted: 19 Oct 2016, 9:04:21 UTC

The problem is not in whether or when to use the GPU, it's that every modfit task fails. I can sit and watch while 48 tasks run for two seconds each before stopping with Computation Error.

There seems to be a coding error in modfit that prevents it from using my GPU properly, or perhaps even from using it at all. That's why I gave as much detail of the device as I could, in the hope that the devs might discover how it differs from the ones they know.
____________
Rgds
Peter.

alanb1951
Send message
Joined: 16 Mar 10
Posts: 35
Credit: 29,591,718
RAC: 11,165

Message 65499 - Posted: 19 Oct 2016, 17:17:51 UTC - in response to Message 65489.

The problem is not in whether or when to use the GPU, it's that every modfit task fails. I can sit and watch while 48 tasks run for two seconds each before stopping with Computation Error.

There seems to be a coding error in modfit that prevents it from using my GPU properly, or perhaps even from using it at all. That's why I gave as much detail of the device as I could, in the hope that the devs might discover how it differs from the ones they know.


Peter,

(Caveat - I do not work for MilkyWay@Home and am not an OpenCL programmer...)

I am not a Gentoo user, but I do know there are problems with some AMD cards and recent kernels... In my case I can't yet migrate to Ubuntu 16.04 because my old AMD card is not yet fully supported by the (newer) AMDGPU mechanisms. I'd have to use the open-source drivers (as you seem to be doing with Gentoo) and there were enough reports of problems for me not to want to go there on an active workstation!

Have you ever successfully run a Milkyway OpenCL task on this machine? If so, was it with the current version of Gentoo, the same Linux kernel, and with Clover (mesa) OpenCL?

Are you successfully running other OpenCL tasks on that machine? And if so do they also use the Clover (mesa) OpenCL stuff, and are any of them using double precision floating point?

It may be that getting your graphics device to work with the new AMD stack might solve your problems; possibly something as simple as getting the GPU kernel compilations to use an AMD ICD might solve the problem.

Sensible feedback regarding your problem may well be better sought from the Gentoo community or from experts in the open-source OpenCL software and drivers. Where-ever you might get help from, I hope something shows up to resolve your issues!

Al.

Profile UnionJack
Send message
Joined: 8 Jan 10
Posts: 16
Credit: 4,783,731
RAC: 3,514

Message 65503 - Posted: 20 Oct 2016, 10:16:12 UTC - in response to Message 65499.

Hello Al,

You've got me thinking there - dangerous, but thanks!

All software in this box is as up-to-date as it can be, but you made me wonder about kernel versions and when this problem first appeared, which prompted me to look again at the amdgpu config in the kernel. I had a test-only setting in there for some reason, so I reset it and rebooted. I'm now waiting for MW@home to have some tasks for me.
____________
Rgds
Peter.

Profile UnionJack
Send message
Joined: 8 Jan 10
Posts: 16
Credit: 4,783,731
RAC: 3,514

Message 65509 - Posted: 21 Oct 2016, 9:29:01 UTC - in response to Message 65503.

Nope, that wasn't it. I have another 43 tasks showing computation error.
____________
Rgds
Peter.

wb8ili
Send message
Joined: 18 Jul 10
Posts: 61
Credit: 240,902,526
RAC: 216,136

Message 65510 - Posted: 21 Oct 2016, 11:50:20 UTC

On Union Jacks issue -

It looks like the issue is building the CL program. See error below from on of the tasks
.
Anybody know what causes that?


Found 1 CL device
Device 'AMD TONGA (DRM 3.3.0 / 4.8.2-gentoo, LLVM 3.8.1)' (AMD:0x1002) (CL_DEVICE_TYPE_GPU)
Board:
Driver version: 13.0.0-rc1
Version: OpenCL 1.1 Mesa 13.0.0-rc1
Compute capability: 0.0
Max compute units: 32
Clock frequency: 0 Mhz
Global mem size: 4293156864
Local mem size: 32768
Max const buf size: 2147483647
Double extension: cl_khr_fp64
clBuildProgram: Build failure (-43): CL_INVALID_BUILD_OPTIONS
Error building program from source (-43): CL_INVALID_BUILD_OPTIONS
Error creating integral program from source
Failed to calculate likelihood
<background_integral> nan </background_integral>
<stream_integral> nan nan nan </stream_integral>
<background_likelihood> nan </background_likelihood>
<stream_only_likelihood> nan nan nan </stream_only_likelihood>
<search_likelihood> nan </search_likelihood>
15:50:55 (24757): called boinc_finish(1)

Profile UnionJack
Send message
Joined: 8 Jan 10
Posts: 16
Credit: 4,783,731
RAC: 3,514

Message 65515 - Posted: 22 Oct 2016, 10:50:04 UTC

Something else is going wrong now. I decided to refuse modfit tasks until I had some solution, so I went to my preferences page, deselected modfit, told it not to accept tasks for other applications, then updated the project preferences - both on the website and in boincmgr. It still sent me modfit tasks. I found two applications listed in account_milkyway.cs.rpi.edu_milkyway.xml: numbers 3 and 7.

Next I detached from the project, checked that no milky* or Milky* files or directories existed, went back to the project preferences web page, told it to update once more, then reattached to the project in boincmgr.

It STILL sent me modfit tasks! Grr!

How do I tame this beast?
____________
Rgds
Peter.

draco_seti
Send message
Joined: 23 Jan 14
Posts: 8
Credit: 36,283
RAC: 0

Message 66037 - Posted: 28 Dec 2016, 20:16:53 UTC

there is something new?
i have a two machines - intel i7 with msi gtx750ti, and i5 with nvidia gt730.

on both is latest slackware linux
latest boinc
750ti has latest nvidia drivers from long lived branch, gt730 - latest legacy driver, too, from nvidia site.
put both up about week ago.

on both i have alot "invalid" results in milkyway@home .
in seti@ home , rosetta all was ok - no any errors, nor invalid results, only in milkyway.

all hardware is stock, no any overclosks or so on.

search google for "milkwyay at home invalid", but cant find something new about that topic.

all others thousands of milkyway participants with nvidia GPU not have any problems?

only i have so lucky....? :-O

Profile [AF>EDLS]GuL
Avatar
Send message
Joined: 5 Jun 08
Posts: 21
Credit: 235,617,506
RAC: 3,054

Message 66038 - Posted: 28 Dec 2016, 20:36:57 UTC - in response to Message 66037.

@draco_seti
It may come from your side : I just tried with a quadro 4000 on fedora 24 with driver 367.57 and it worked perfectly. Have you tried to reset the project ?
____________

draco_seti
Send message
Joined: 23 Jan 14
Posts: 8
Credit: 36,283
RAC: 0

Message 66039 - Posted: 28 Dec 2016, 20:40:54 UTC - in response to Message 66038.

yes, on one machine i reset milkyway project, on another i use freshly attached client.

and all other projects, where i participate not shown errors, only milkyway:

https://setiathome.berkeley.edu/show_user.php?userid=7984269

draco_seti
Send message
Joined: 23 Jan 14
Posts: 8
Credit: 36,283
RAC: 0

Message 66040 - Posted: 28 Dec 2016, 20:44:18 UTC

i have 64 bit system, and latest drivers from
http://www.nvidia.com/object/unix.html

Linux x86_64/AMD64/EM64T
Latest Long Lived Branch version: 375.26 - on i7 machine with gtx750ti,
and

Latest Legacy GPU version (340.xx series): 340.101 on i5 with GT730...

and both show almost all results on milkyway as invalid....:\

Peter Hucker
Send message
Joined: 5 Jul 11
Posts: 138
Credit: 27,075,696
RAC: 0

Message 66041 - Posted: 28 Dec 2016, 20:47:57 UTC - in response to Message 66040.

i have 64 bit system, and latest drivers from
http://www.nvidia.com/object/unix.html

Linux x86_64/AMD64/EM64T
Latest Long Lived Branch version: 375.26 - on i7 machine with gtx750ti,
and

Latest Legacy GPU version (340.xx series): 340.101 on i5 with GT730...

and both show almost all results on milkyway as invalid....:\


Maybe there is a bug in the latest Linux GPU driver? Try going back one?

Profile [AF>EDLS]GuL
Avatar
Send message
Joined: 5 Jun 08
Posts: 21
Credit: 235,617,506
RAC: 3,054

Message 66042 - Posted: 28 Dec 2016, 20:52:09 UTC - in response to Message 66040.
Last modified: 28 Dec 2016, 20:54:42 UTC

Are your cards overclocked ? Milky is very hardware demanding. Can you describe their ventilation ? If you can add a lateral intake, it will help to cool down your VRM (Voltage Regulation Module) and memory ram.

Edit : for drivers, prefer distribution ones rather than the ones on nvidia site. You will avoid compatibility problems
____________

draco_seti
Send message
Joined: 23 Jan 14
Posts: 8
Credit: 36,283
RAC: 0

Message 66043 - Posted: 28 Dec 2016, 20:57:59 UTC - in response to Message 66042.

all hardware runs on standart frequencies and voltages. both cases are open.
gtx750ti temperature is around 60 celsium, looking from nvidia_smi output.

there is no evidence for problems on my side. and another projects work fine, as i say earlier....

Profile [AF>EDLS]GuL
Avatar
Send message
Joined: 5 Jun 08
Posts: 21
Credit: 235,617,506
RAC: 3,054

Message 66044 - Posted: 28 Dec 2016, 21:23:00 UTC - in response to Message 66043.

If you can, test them under windows with furemark and check their VRM temperatures with gpu-z. I don't know any linux software providing this information.
You can also test your cards on a computer you know working on milky.
____________

draco_seti
Send message
Joined: 23 Jan 14
Posts: 8
Credit: 36,283
RAC: 0

Message 66047 - Posted: 29 Dec 2016, 17:16:28 UTC - in response to Message 66044.

If you can, test them under windows with furemark and check their VRM temperatures with gpu-z. I don't know any linux software providing this information.
You can also test your cards on a computer you know working on milky.


my choice is far more rational: i set nomoretasks to milkyway project, as i can spend my cpu und gpu times on projects, who work correctly, and produce valid results :D

wb8ili
Send message
Joined: 18 Jul 10
Posts: 61
Credit: 240,902,526
RAC: 216,136

Message 66048 - Posted: 29 Dec 2016, 17:28:41 UTC

Tell me if I am wrong, but -

I don't think the combination of LINUX and NVIDIA cards has worked (returned valid results) since the tasks were bundled. At least mine haven't.

Windows + Nvidia = OK
Windows + ATI = OK
Linux + ATI = OK
Linux + NVIDIA = All results invalid.

Jake indicated he was going to fix this (See messages 65807 and 65809 on 14Nov2016) but as far as I can tell it didn't happen

draco_seti
Send message
Joined: 23 Jan 14
Posts: 8
Credit: 36,283
RAC: 0

Message 66049 - Posted: 29 Dec 2016, 18:05:35 UTC - in response to Message 66048.

thanks for clarification. that means, problems is not on my side, but on milkyway software.

in any way, until i not get info that the problem is solved, i do not want spend electricity to produce invalid results, as so i switch to seti, cosmology @ home, einsten at home, climateprediction and gpugrid :)

captainjack
Send message
Joined: 22 Jun 13
Posts: 40
Credit: 43,700,032
RAC: 0

Message 66050 - Posted: 29 Dec 2016, 18:37:55 UTC

wb8ili wrote:

Tell me if I am wrong, but -

I don't think the combination of LINUX and NVIDIA cards has worked (returned valid results) since the tasks were bundled. At least mine haven't.


All of my LINUX (Ubuntu) and NVIDIA tasks are validated.

Please post the first 50 or so lines from the event log after a restart so we can see how the system in question is set up. Maybe it will give us some clues.

Profile [AF>EDLS]GuL
Avatar
Send message
Joined: 5 Jun 08
Posts: 21
Credit: 235,617,506
RAC: 3,054

Message 66051 - Posted: 29 Dec 2016, 19:03:01 UTC - in response to Message 66048.


I don't think the combination of LINUX and NVIDIA cards has worked (returned valid results) since the tasks were bundled. At least mine haven't.


I don't know if it working on all cards, but on one of mines it's ok : https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1923858198
____________

wb8ili
Send message
Joined: 18 Jul 10
Posts: 61
Credit: 240,902,526
RAC: 216,136

Message 66052 - Posted: 29 Dec 2016, 19:05:48 UTC

captianjack -

Yes, yours are working.

Here is my event log -

Click on my userid and go to the computer "Desktop". Check the Invalid tasks for today. There may be something important in the stdtxt.

If there is something wrong on my end, I certainly would like to fix it.

This is not the only computer I have this issue. Like draco_seti my computers work fine on SETI and Einstein.



Thu 29 Dec 2016 01:45:53 PM EST | | cc_config.xml not found - using defaults
Thu 29 Dec 2016 01:45:53 PM EST | | Starting BOINC client version 7.2.42 for x86_64-pc-linux-gnu
Thu 29 Dec 2016 01:45:53 PM EST | | log flags: file_xfer, sched_ops, task
Thu 29 Dec 2016 01:45:53 PM EST | | Libraries: libcurl/7.47.0 OpenSSL/1.0.2g zlib/1.2.8 libidn/1.32 librtmp/2.3
Thu 29 Dec 2016 01:45:53 PM EST | | Data directory: /home/bob/BOINC
Thu 29 Dec 2016 01:45:53 PM EST | | CUDA: NVIDIA GPU 0: GeForce GTX 650 Ti BOOST (driver version unknown, CUDA version 8.0, compute capability 3.0, 1994MB, 1849MB available, 1586 GFLOPS peak)
Thu 29 Dec 2016 01:45:53 PM EST | | OpenCL: NVIDIA GPU 0: GeForce GTX 650 Ti BOOST (driver version 367.57, device version OpenCL 1.2 CUDA, 1994MB, 1849MB available, 1586 GFLOPS peak)
Thu 29 Dec 2016 01:45:53 PM EST | | Host name: Desktop
Thu 29 Dec 2016 01:45:53 PM EST | | Processor: 4 AuthenticAMD AMD Phenom(tm) II X4 945 Processor [Family 16 Model 4 Stepping 3]
Thu 29 Dec 2016 01:45:53 PM EST | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate vmmcall npt lbrv svm_lock nrip_save
Thu 29 Dec 2016 01:45:53 PM EST | | OS: Linux: 4.4.0-57-generic
Thu 29 Dec 2016 01:45:53 PM EST | | Memory: 3.86 GB physical, 4.00 GB virtual
Thu 29 Dec 2016 01:45:53 PM EST | | Disk: 69.30 GB total, 54.58 GB free
Thu 29 Dec 2016 01:45:53 PM EST | | Local time is UTC -5 hours
Thu 29 Dec 2016 01:45:53 PM EST | climateprediction.net | Found app_config.xml
Thu 29 Dec 2016 01:45:53 PM EST | SETI@home | Found app_config.xml
Thu 29 Dec 2016 01:45:53 PM EST | Einstein@Home | Found app_config.xml
Thu 29 Dec 2016 01:45:53 PM EST | Einstein@Home | Your app_config.xml file refers to an unknown application 'dummy'. Known applications: 'hsgamma_FGRP4', 'hsgamma_FGRPB1', 'einstein_O1AS20-100T', 'einstein_O1AS20-100I', 'einsteinbinary_BRP6', 'einsteinbinary_BRP4G', 'einstein_O1MD1G', 'hsgamma_FGRPB1G'
Thu 29 Dec 2016 01:45:53 PM EST | Milkyway@Home | Found app_config.xml
Thu 29 Dec 2016 01:45:53 PM EST | climateprediction.net | URL http://climateprediction.net/; Computer ID 1378464; resource share 100
Thu 29 Dec 2016 01:45:53 PM EST | SETI@home | URL http://setiathome.berkeley.edu/; Computer ID 7815982; resource share 1
Thu 29 Dec 2016 01:45:53 PM EST | Einstein@Home | URL http://einstein.phys.uwm.edu/; Computer ID 12109559; resource share 1
Thu 29 Dec 2016 01:45:53 PM EST | Milkyway@Home | URL http://milkyway.cs.rpi.edu/milkyway/; Computer ID 636700; resource share 1
Thu 29 Dec 2016 01:45:53 PM EST | Milkyway@Home | General prefs: from Milkyway@Home (last modified 07-Jul-2014 20:27:35)
Thu 29 Dec 2016 01:45:53 PM EST | Milkyway@Home | Computer location: school
Thu 29 Dec 2016 01:45:53 PM EST | Milkyway@Home | General prefs: no separate prefs for school; using your defaults
Thu 29 Dec 2016 01:45:53 PM EST | | Reading preferences override file
Thu 29 Dec 2016 01:45:53 PM EST | | Preferences:
Thu 29 Dec 2016 01:45:53 PM EST | | max memory usage when active: 3951.39MB
Thu 29 Dec 2016 01:45:53 PM EST | | max memory usage when idle: 3951.39MB
Thu 29 Dec 2016 01:45:53 PM EST | | max disk usage: 58.35GB
Thu 29 Dec 2016 01:45:53 PM EST | | (to change preferences, visit a project web site or select Preferences in the Manager)
Thu 29 Dec 2016 01:45:53 PM EST | | Not using a proxy
Thu 29 Dec 2016 01:45:54 PM EST | | Running CPU benchmarks
Thu 29 Dec 2016 01:45:54 PM EST | | Suspending computation - CPU benchmarks in progress
Thu 29 Dec 2016 01:46:18 PM EST | Milkyway@Home | work fetch resumed by user
Thu 29 Dec 2016 01:46:26 PM EST | | Benchmark results:
Thu 29 Dec 2016 01:46:26 PM EST | | Number of CPUs: 4
Thu 29 Dec 2016 01:46:26 PM EST | | 2859 floating point MIPS (Whetstone) per CPU
Thu 29 Dec 2016 01:46:26 PM EST | | 13294 integer MIPS (Dhrystone) per CPU
Thu 29 Dec 2016 01:46:29 PM EST | Milkyway@Home | update requested by user
Thu 29 Dec 2016 01:46:30 PM EST | Milkyway@Home | Sending scheduler request: Requested by user.
Thu 29 Dec 2016 01:46:30 PM EST | Milkyway@Home | Requesting new tasks for NVIDIA
Thu 29 Dec 2016 01:46:32 PM EST | Milkyway@Home | Scheduler request completed: got 8 new tasks
Thu 29 Dec 2016 01:46:34 PM EST | Milkyway@Home | Starting task de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_3_1480516808_9968346_0
Thu 29 Dec 2016 01:46:34 PM EST | Milkyway@Home | Starting task de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_3_1480516808_9968342_0
Thu 29 Dec 2016 01:46:37 PM EST | climateprediction.net | Sending scheduler request: To fetch work.
Thu 29 Dec 2016 01:46:37 PM EST | climateprediction.net | Requesting new tasks for CPU
Thu 29 Dec 2016 01:46:39 PM EST | climateprediction.net | Scheduler request completed: got 0 new tasks
Thu 29 Dec 2016 01:46:39 PM EST | climateprediction.net | Project has no tasks available
Thu 29 Dec 2016 01:48:05 PM EST | Milkyway@Home | Sending scheduler request: To fetch work.
Thu 29 Dec 2016 01:48:05 PM EST | Milkyway@Home | Requesting new tasks for NVIDIA
Thu 29 Dec 2016 01:48:07 PM EST | Milkyway@Home | Scheduler request completed: got 5 new tasks
Thu 29 Dec 2016 01:49:43 PM EST | Milkyway@Home | Sending scheduler request: To fetch work.
Thu 29 Dec 2016 01:49:43 PM EST | Milkyway@Home | Requesting new tasks for NVIDIA
Thu 29 Dec 2016 01:49:45 PM EST | Milkyway@Home | Scheduler request completed: got 3 new tasks
Thu 29 Dec 2016 01:51:21 PM EST | Milkyway@Home | Sending scheduler request: To fetch work.
Thu 29 Dec 2016 01:51:21 PM EST | Milkyway@Home | Requesting new tasks for NVIDIA
Thu 29 Dec 2016 01:51:23 PM EST | Milkyway@Home | Scheduler request completed: got 2 new tasks
Thu 29 Dec 2016 01:52:59 PM EST | Milkyway@Home | Sending scheduler request: To fetch work.
Thu 29 Dec 2016 01:52:59 PM EST | Milkyway@Home | Requesting new tasks for NVIDIA
Thu 29 Dec 2016 01:53:01 PM EST | Milkyway@Home | Scheduler request completed: got 2 new tasks
Thu 29 Dec 2016 01:54:37 PM EST | Milkyway@Home | Sending scheduler request: To fetch work.
Thu 29 Dec 2016 01:54:37 PM EST | Milkyway@Home | Requesting new tasks for NVIDIA
Thu 29 Dec 2016 01:54:39 PM EST | Milkyway@Home | Scheduler request completed: got 1 new tasks
Thu 29 Dec 2016 01:56:15 PM EST | Milkyway@Home | Sending scheduler request: To fetch work.
Thu 29 Dec 2016 01:56:15 PM EST | Milkyway@Home | Requesting new tasks for NVIDIA
Thu 29 Dec 2016 01:56:17 PM EST | Milkyway@Home | Scheduler request completed: got 1 new tasks
Thu 29 Dec 2016 01:56:27 PM EST | Milkyway@Home | work fetch suspended by user
Thu 29 Dec 2016 01:58:42 PM EST | Milkyway@Home | Computation for task de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_3_1480516808_9968346_0 finished
Thu 29 Dec 2016 01:58:42 PM EST | Milkyway@Home | Starting task de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_3_1480516808_9934708_1
Thu 29 Dec 2016 01:58:43 PM EST | Milkyway@Home | Computation for task de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_3_1480516808_9968342_0 finished
Thu 29 Dec 2016 01:58:43 PM EST | Milkyway@Home | Starting task de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3_3_1480516808_9968347_0
Thu 29 Dec 2016 01:58:43 PM EST | Milkyway@Home | Sending scheduler request: To report completed tasks.
Thu 29 Dec 2016 01:58:43 PM EST | Milkyway@Home | Reporting 2 completed tasks
Thu 29 Dec 2016 01:58:43 PM EST | Milkyway@Home | Not requesting tasks: "no new tasks" requested via Manager
Thu 29 Dec 2016 01:58:45 PM EST | Milkyway@Home | Scheduler request completed

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next
Post to thread

Message boards : News : GPU Issues Mega Thread


Main page · Your account · Message boards


Copyright © 2019 AstroInformatics Group