MilkyWay@home Version 1.38 Released
log in

Advanced search

Message boards : News : MilkyWay@home Version 1.38 Released

Previous · 1 · 2 · 3 · 4 · Next
Author Message
[VENETO] boboviz
Send message
Joined: 10 Feb 09
Posts: 29
Credit: 4,447,129
RAC: 27,333

Message 65260 - Posted: 25 Sep 2016, 7:21:21 UTC - in response to Message 65245.

I actually have a pretty good idea as to why the server isn't sending work units to new GPUs. Turns out my server updating was never updating boinc specific daemons like the scheduler, feeder and a few others.


I think you have used this...or not?
https://boinc.berkeley.edu/trac/wiki/ToolUpgrade

Sebastian*
Send message
Joined: 8 Apr 09
Posts: 64
Credit: 4,843,634,662
RAC: 4,025,237

Message 65286 - Posted: 26 Sep 2016, 19:05:27 UTC
Last modified: 26 Sep 2016, 19:20:45 UTC

Looks like my HD 4850 GPUs don't get work any longer. It got work on the 24th. I now get the message when i update Boinc, that i got 0 new tasks. Has something changed between then an now? Did the Downtime cause the issue?

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=307550

The computers with the HD 7970 get work tho.

And i get a lof of that in the Event Log from Boinc:

26/09/2016 21:18:26 | Milkyway@Home | Message from task: 0

Seem it shows up once a WU is finished.

Something was changed because of the Downtime. Any ideas?

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 440
Credit: 10,325,706
RAC: 201,402

Message 65291 - Posted: 26 Sep 2016, 19:52:53 UTC

Sebastian,

Its strange that the HD4850s don't get work anymore. I think it might have to do with the update to the scheduler I did this morning. It looks like it is recognizing newer cards but not older cards. I wonder if I can write a custom patch for the scheduler to increase the number of cards it recognizes.

Jake

Sebastian*
Send message
Joined: 8 Apr 09
Posts: 64
Credit: 4,843,634,662
RAC: 4,025,237

Message 65300 - Posted: 27 Sep 2016, 1:02:03 UTC

Ok, the R9 390X seems to work. Getting work fine. Not sure if it will give good results. And i had to plug in a monitor to get her to run the WUs through. Without, they were stuck at 100%.

Computer: http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=704045

Let me know, when you have updated the scheduler. Then i will give the HD 4850s another try.

Sebastian*
Send message
Joined: 8 Apr 09
Posts: 64
Credit: 4,843,634,662
RAC: 4,025,237

Message 65322 - Posted: 27 Sep 2016, 18:20:56 UTC

On my R9 390X computer Windows 10 just did an update to the anniversary edition i guess. Now, after one WU finishes, (several other run through fine) the GPU driver seems to crash and the WUs are stuck.
I had to install the GPU drivers again, so i am not sure what driver windows used, but boinc detectet 2 390X gpu, but only one is installed. With the official AMD driver everything runs fine, until one WU causes the driver crash.

The 280X GPUs seem to run fine on other Win10 boxes with the version. Does anyone have an idea what the root cause could be? I used DDU (Display Driver uninstaller) to get rid of the Win10 drivers.

Fow now something is really broken

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 440
Credit: 10,325,706
RAC: 201,402

Message 65325 - Posted: 27 Sep 2016, 18:36:24 UTC

Hey Sebastian,

The Windows 10 anniversary edition is known to uninstall the proprietary GPU drivers you had installed. I would recommend reinstalling all of the AMD drivers.

Is there something else that could be causing issues? I noticed that you're actually calculating incorrect likelihoods for certain work units which is concerning. Maybe somehow the star files are getting tampered with in your BOINC directory?

Jake

Sebastian*
Send message
Joined: 8 Apr 09
Posts: 64
Credit: 4,843,634,662
RAC: 4,025,237

Message 65328 - Posted: 27 Sep 2016, 20:01:26 UTC

Reinstalling drivers did not help.

Since the system is running on a SSD, i will try tomorrow to reset the project. It could be that the Win10 update did so much wearing on the SSD that it is starting to degrade now.

Any update on the HD 4xxx cards? (Scheduler fix for older cards)

And my 280X cards don't get work any longer at the moment. I will tell if it got better by tomorrow.

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 440
Credit: 10,325,706
RAC: 201,402

Message 65330 - Posted: 27 Sep 2016, 20:32:50 UTC

Sebastian,

No update on scheduler fix yet. I have been working on ensuring a stable work unit flow to users. I think I almost have that figured out without breaking other BOINC subsystems. I'll get back to you when I have time to look into it.

Jake

Profile Werkstatt
Send message
Joined: 19 Feb 08
Posts: 350
Credit: 123,169,995
RAC: 73,213

Message 65332 - Posted: 27 Sep 2016, 20:37:11 UTC - in response to Message 65291.

Sebastian,

Its strange that the HD4850s don't get work anymore. I think it might have to do with the update to the scheduler I did this morning. It looks like it is recognizing newer cards but not older cards. I wonder if I can write a custom patch for the scheduler to increase the number of cards it recognizes.

Jake


I can confirm that HD4850 does not get work atm.

Profile JumpinJohnny
Send message
Joined: 29 Mar 13
Posts: 5
Credit: 31,192,336
RAC: 0

Message 65334 - Posted: 27 Sep 2016, 23:30:25 UTC - in response to Message 65332.



I can confirm that HD4850 does not get work atm.


ditto for the HD4870, confirmed no work yet

cowboy2199
Send message
Joined: 28 May 14
Posts: 2
Credit: 8,507,048
RAC: 0

Message 65335 - Posted: 28 Sep 2016, 2:34:24 UTC

Good evening Jake,

Any reason why my AVG antivirus would pull the EXE as a threat?

F:\Boinc Data\projects\milkyway.cs.rpi.edu_milkyway\milkyway_1.39_windows_x86_64.exe had to be set as an exception so AVG would stop trying to kill it.

Could it be that is what is happening on some of the other computers with validation issues maybe?

Vortac
Send message
Joined: 22 Apr 09
Posts: 77
Credit: 1,052,831,093
RAC: 44,963

Message 65336 - Posted: 28 Sep 2016, 6:15:45 UTC - in response to Message 65322.

On my R9 390X computer Windows 10 just did an update to the anniversary edition i guess. Now, after one WU finishes, (several other run through fine) the GPU driver seems to crash and the WUs are stuck.
I had to install the GPU drivers again, so i am not sure what driver windows used, but boinc detectet 2 390X gpu, but only one is installed. With the official AMD driver everything runs fine, until one WU causes the driver crash.

Unfortunately, I had the same problem on Win10, although it was the Nov2015 edition, not the Anniversary one. Video driver crashed all the time when running Milkyway. Also, Win10 driver didn't allow me to increase Power Limit for my 7970s above 20%, a hack which is certainly possible on Win7 and which gives a nice boost to BOINC. I reverted back to Win7, for the time being.

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 440
Credit: 10,325,706
RAC: 201,402

Message 65337 - Posted: 28 Sep 2016, 13:36:37 UTC

Cowboy2199,

I did change the compiler I use for Windows applications. I am cross compiling from Linux now instead of using a native Windows compiler. Otherwise, not much changed. This is a pretty common problem for BOINC projects though as far as I understand. The binary is safe though. BOINC checks the signature on the binary from the server and our code is open source so you can see everything in the binary if you are concerned.

As far as validation issues, I am not sure. It could slow things down a bit I'm sure. I'll see if there is some way to make it so our applications don't get tagged in the future.

Jake

Sebastian*
Send message
Joined: 8 Apr 09
Posts: 64
Credit: 4,843,634,662
RAC: 4,025,237

Message 65339 - Posted: 28 Sep 2016, 16:37:05 UTC

I did a clean install on the Win 10 computer with the r9 390X. Installed the 16.9.2 (21st of september) driver.

I still get the same problem. Driver crashes, and the WU which cause the crash at 100% gets stuck. Restarting just boinc only lets the WUs start from 0% but they are not running.

Vortac, which driver are you using on Win7 for Boinc?

An all my 7970s or 280X GPU don't get work any longer. The Milkyway server has 1150 WUs available currently. The 390X gets work instantly, but with the error mentioned above.

New computer ID for the PC, because of the clean install.

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=705276

Rymorea
Send message
Joined: 6 Oct 14
Posts: 45
Credit: 10,019,624
RAC: 882

Message 65340 - Posted: 28 Sep 2016, 17:05:11 UTC - in response to Message 65339.

I did a clean install on the Win 10 computer with the r9 390X. Installed the 16.9.2 (21st of september) driver.

I still get the same problem. Driver crashes, and the WU which cause the crash at 100% gets stuck. Restarting just boinc only lets the WUs start from 0% but they are not running.

Vortac, which driver are you using on Win7 for Boinc?

An all my 7970s or 280X GPU don't get work any longer. The Milkyway server has 1150 WUs available currently. The 390X gets work instantly, but with the error mentioned above.

New computer ID for the PC, because of the clean install.

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=705276


Hi Sebastian
I have similar problem before cause I am using insider preview build don't use amd latest beta fix driver. Use 16.7.3 whql one driver. And before install it, uninstall it from control panel and don't restart use DDU uninstaller its reboot safemode and clean remaining parts and restart. After that install the 16.7.3 I hope this is help.
____________

Sebastian*
Send message
Joined: 8 Apr 09
Posts: 64
Credit: 4,843,634,662
RAC: 4,025,237

Message 65341 - Posted: 28 Sep 2016, 18:18:04 UTC

I only used DDU and installed 16.7.3. But the problem is still there. I am running some Einstein for now, to see if something similar happens there.

Nvidia also allows you to select between windows10 and windows 10 anniversery edition, when looking for their drivers. I wonder if Microsoft changed something related to the GPU drivers.

I am using an app_config file to run 4 WUs at once. Might cause the problem, have to test it later. The strange thing is, that it worked more or less well before the windows update.

Rymorea
Send message
Joined: 6 Oct 14
Posts: 45
Credit: 10,019,624
RAC: 882

Message 65342 - Posted: 28 Sep 2016, 18:29:59 UTC - in response to Message 65341.

DDU looked normal installation directories and registry entries but I look manually at registry find a lot of different entry for amd driver. I think those coming from MS auto driver installation.

today new Windows 10 Insider Preview build 14936 starting download, I hope I am not getting problems again :)
____________

Sebastian*
Send message
Joined: 8 Apr 09
Posts: 64
Credit: 4,843,634,662
RAC: 4,025,237

Message 65344 - Posted: 28 Sep 2016, 18:57:39 UTC
Last modified: 28 Sep 2016, 19:00:33 UTC

could anyone look at this:

http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1804922945

any idea why i get the warnings? The other 2 persons running the task did not get it.

It only shows up on the Win10 computer with the 390X gpu. I've installed Boinc in the standard directory. on all my other computers i install it on D:\Boinc\...

Vortac
Send message
Joined: 22 Apr 09
Posts: 77
Credit: 1,052,831,093
RAC: 44,963

Message 65345 - Posted: 28 Sep 2016, 22:35:51 UTC - in response to Message 65339.

Vortac, which driver are you using on Win7 for Boinc?

An all my 7970s or 280X GPU don't get work any longer. The Milkyway server has 1150 WUs available currently. The 390X gets work instantly, but with the error mentioned above.

On Win7, I am using 16.9.1 drivers. But I didn't have any problems even with previous versions. I am not getting any GPU work too right now, but I don't think that's driver related. I have two 5870s with an older driver in another machine and they are also out of work.

Sebastian*
Send message
Joined: 8 Apr 09
Posts: 64
Credit: 4,843,634,662
RAC: 4,025,237

Message 65401 - Posted: 6 Oct 2016, 18:23:08 UTC

For those who have a R9 390 or 390X, and want to run several workunits at once, stay away from Windows 10 right now. I tried driver 16.10.1 hotfix, 16.7.2, and 16.9.2 hotfix. All drivers crash when one WU reaches 100% and all WUs get stuck where they are. One WU sometimes keeps running, when i run 8 WUs at once.

I use the app_config.xml file to run WUs parallel on one GPU.

And i will try the 390X on Win7 tomorrow, to see if it is Win10 related, what i think. Since the Windows 10 Version 1607 the problem occurs.

Please post anyone if he experiences the same problems

Previous · 1 · 2 · 3 · 4 · Next
Post to thread

Message boards : News : MilkyWay@home Version 1.38 Released


Main page · Your account · Message boards


Copyright © 2017 AstroInformatics Group