Welcome to MilkyWay@home

MilkyWay@home Version 1.38 Released

Message boards : News : MilkyWay@home Version 1.38 Released
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
[VENETO] boboviz

Send message
Joined: 10 Feb 09
Posts: 52
Credit: 16,291,993
RAC: 0
Message 65260 - Posted: 25 Sep 2016, 7:21:21 UTC - in response to Message 65245.  

I actually have a pretty good idea as to why the server isn't sending work units to new GPUs. Turns out my server updating was never updating boinc specific daemons like the scheduler, feeder and a few others.


I think you have used this...or not?
https://boinc.berkeley.edu/trac/wiki/ToolUpgrade
ID: 65260 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sebastian*

Send message
Joined: 8 Apr 09
Posts: 70
Credit: 11,027,167,827
RAC: 0
Message 65286 - Posted: 26 Sep 2016, 19:05:27 UTC
Last modified: 26 Sep 2016, 19:20:45 UTC

Looks like my HD 4850 GPUs don't get work any longer. It got work on the 24th. I now get the message when i update Boinc, that i got 0 new tasks. Has something changed between then an now? Did the Downtime cause the issue?

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=307550

The computers with the HD 7970 get work tho.

And i get a lof of that in the Event Log from Boinc:

26/09/2016 21:18:26 | Milkyway@Home | Message from task: 0

Seem it shows up once a WU is finished.

Something was changed because of the Downtime. Any ideas?
ID: 65286 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65291 - Posted: 26 Sep 2016, 19:52:53 UTC

Sebastian,

Its strange that the HD4850s don't get work anymore. I think it might have to do with the update to the scheduler I did this morning. It looks like it is recognizing newer cards but not older cards. I wonder if I can write a custom patch for the scheduler to increase the number of cards it recognizes.

Jake
ID: 65291 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sebastian*

Send message
Joined: 8 Apr 09
Posts: 70
Credit: 11,027,167,827
RAC: 0
Message 65300 - Posted: 27 Sep 2016, 1:02:03 UTC

Ok, the R9 390X seems to work. Getting work fine. Not sure if it will give good results. And i had to plug in a monitor to get her to run the WUs through. Without, they were stuck at 100%.

Computer: http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=704045

Let me know, when you have updated the scheduler. Then i will give the HD 4850s another try.
ID: 65300 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sebastian*

Send message
Joined: 8 Apr 09
Posts: 70
Credit: 11,027,167,827
RAC: 0
Message 65322 - Posted: 27 Sep 2016, 18:20:56 UTC

On my R9 390X computer Windows 10 just did an update to the anniversary edition i guess. Now, after one WU finishes, (several other run through fine) the GPU driver seems to crash and the WUs are stuck.
I had to install the GPU drivers again, so i am not sure what driver windows used, but boinc detectet 2 390X gpu, but only one is installed. With the official AMD driver everything runs fine, until one WU causes the driver crash.

The 280X GPUs seem to run fine on other Win10 boxes with the version. Does anyone have an idea what the root cause could be? I used DDU (Display Driver uninstaller) to get rid of the Win10 drivers.

Fow now something is really broken
ID: 65322 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65325 - Posted: 27 Sep 2016, 18:36:24 UTC

Hey Sebastian,

The Windows 10 anniversary edition is known to uninstall the proprietary GPU drivers you had installed. I would recommend reinstalling all of the AMD drivers.

Is there something else that could be causing issues? I noticed that you're actually calculating incorrect likelihoods for certain work units which is concerning. Maybe somehow the star files are getting tampered with in your BOINC directory?

Jake
ID: 65325 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sebastian*

Send message
Joined: 8 Apr 09
Posts: 70
Credit: 11,027,167,827
RAC: 0
Message 65328 - Posted: 27 Sep 2016, 20:01:26 UTC

Reinstalling drivers did not help.

Since the system is running on a SSD, i will try tomorrow to reset the project. It could be that the Win10 update did so much wearing on the SSD that it is starting to degrade now.

Any update on the HD 4xxx cards? (Scheduler fix for older cards)

And my 280X cards don't get work any longer at the moment. I will tell if it got better by tomorrow.
ID: 65328 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65330 - Posted: 27 Sep 2016, 20:32:50 UTC

Sebastian,

No update on scheduler fix yet. I have been working on ensuring a stable work unit flow to users. I think I almost have that figured out without breaking other BOINC subsystems. I'll get back to you when I have time to look into it.

Jake
ID: 65330 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Werkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 141,284,369
RAC: 0
Message 65332 - Posted: 27 Sep 2016, 20:37:11 UTC - in response to Message 65291.  

Sebastian,

Its strange that the HD4850s don't get work anymore. I think it might have to do with the update to the scheduler I did this morning. It looks like it is recognizing newer cards but not older cards. I wonder if I can write a custom patch for the scheduler to increase the number of cards it recognizes.

Jake


I can confirm that HD4850 does not get work atm.
ID: 65332 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JumpinJohnny

Send message
Joined: 29 Mar 13
Posts: 5
Credit: 31,192,336
RAC: 0
Message 65334 - Posted: 27 Sep 2016, 23:30:25 UTC - in response to Message 65332.  



I can confirm that HD4850 does not get work atm.


ditto for the HD4870, confirmed no work yet
ID: 65334 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
cowboy2199

Send message
Joined: 28 May 14
Posts: 2
Credit: 8,507,048
RAC: 0
Message 65335 - Posted: 28 Sep 2016, 2:34:24 UTC

Good evening Jake,

Any reason why my AVG antivirus would pull the EXE as a threat?

F:\Boinc Data\projects\milkyway.cs.rpi.edu_milkyway\milkyway_1.39_windows_x86_64.exe had to be set as an exception so AVG would stop trying to kill it.

Could it be that is what is happening on some of the other computers with validation issues maybe?
ID: 65335 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 65336 - Posted: 28 Sep 2016, 6:15:45 UTC - in response to Message 65322.  

On my R9 390X computer Windows 10 just did an update to the anniversary edition i guess. Now, after one WU finishes, (several other run through fine) the GPU driver seems to crash and the WUs are stuck.
I had to install the GPU drivers again, so i am not sure what driver windows used, but boinc detectet 2 390X gpu, but only one is installed. With the official AMD driver everything runs fine, until one WU causes the driver crash.

Unfortunately, I had the same problem on Win10, although it was the Nov2015 edition, not the Anniversary one. Video driver crashed all the time when running Milkyway. Also, Win10 driver didn't allow me to increase Power Limit for my 7970s above 20%, a hack which is certainly possible on Win7 and which gives a nice boost to BOINC. I reverted back to Win7, for the time being.
ID: 65336 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65337 - Posted: 28 Sep 2016, 13:36:37 UTC

Cowboy2199,

I did change the compiler I use for Windows applications. I am cross compiling from Linux now instead of using a native Windows compiler. Otherwise, not much changed. This is a pretty common problem for BOINC projects though as far as I understand. The binary is safe though. BOINC checks the signature on the binary from the server and our code is open source so you can see everything in the binary if you are concerned.

As far as validation issues, I am not sure. It could slow things down a bit I'm sure. I'll see if there is some way to make it so our applications don't get tagged in the future.

Jake
ID: 65337 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sebastian*

Send message
Joined: 8 Apr 09
Posts: 70
Credit: 11,027,167,827
RAC: 0
Message 65339 - Posted: 28 Sep 2016, 16:37:05 UTC

I did a clean install on the Win 10 computer with the r9 390X. Installed the 16.9.2 (21st of september) driver.

I still get the same problem. Driver crashes, and the WU which cause the crash at 100% gets stuck. Restarting just boinc only lets the WUs start from 0% but they are not running.

Vortac, which driver are you using on Win7 for Boinc?

An all my 7970s or 280X GPU don't get work any longer. The Milkyway server has 1150 WUs available currently. The 390X gets work instantly, but with the error mentioned above.

New computer ID for the PC, because of the clean install.

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=705276
ID: 65339 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rymorea

Send message
Joined: 6 Oct 14
Posts: 46
Credit: 20,017,425
RAC: 0
Message 65340 - Posted: 28 Sep 2016, 17:05:11 UTC - in response to Message 65339.  

I did a clean install on the Win 10 computer with the r9 390X. Installed the 16.9.2 (21st of september) driver.

I still get the same problem. Driver crashes, and the WU which cause the crash at 100% gets stuck. Restarting just boinc only lets the WUs start from 0% but they are not running.

Vortac, which driver are you using on Win7 for Boinc?

An all my 7970s or 280X GPU don't get work any longer. The Milkyway server has 1150 WUs available currently. The 390X gets work instantly, but with the error mentioned above.

New computer ID for the PC, because of the clean install.

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=705276


Hi Sebastian
I have similar problem before cause I am using insider preview build don't use amd latest beta fix driver. Use 16.7.3 whql one driver. And before install it, uninstall it from control panel and don't restart use DDU uninstaller its reboot safemode and clean remaining parts and restart. After that install the 16.7.3 I hope this is help.
ID: 65340 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sebastian*

Send message
Joined: 8 Apr 09
Posts: 70
Credit: 11,027,167,827
RAC: 0
Message 65341 - Posted: 28 Sep 2016, 18:18:04 UTC

I only used DDU and installed 16.7.3. But the problem is still there. I am running some Einstein for now, to see if something similar happens there.

Nvidia also allows you to select between windows10 and windows 10 anniversery edition, when looking for their drivers. I wonder if Microsoft changed something related to the GPU drivers.

I am using an app_config file to run 4 WUs at once. Might cause the problem, have to test it later. The strange thing is, that it worked more or less well before the windows update.
ID: 65341 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rymorea

Send message
Joined: 6 Oct 14
Posts: 46
Credit: 20,017,425
RAC: 0
Message 65342 - Posted: 28 Sep 2016, 18:29:59 UTC - in response to Message 65341.  

DDU looked normal installation directories and registry entries but I look manually at registry find a lot of different entry for amd driver. I think those coming from MS auto driver installation.

today new Windows 10 Insider Preview build 14936 starting download, I hope I am not getting problems again :)
ID: 65342 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sebastian*

Send message
Joined: 8 Apr 09
Posts: 70
Credit: 11,027,167,827
RAC: 0
Message 65344 - Posted: 28 Sep 2016, 18:57:39 UTC
Last modified: 28 Sep 2016, 19:00:33 UTC

could anyone look at this:

http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1804922945

any idea why i get the warnings? The other 2 persons running the task did not get it.

It only shows up on the Win10 computer with the 390X gpu. I've installed Boinc in the standard directory. on all my other computers i install it on D:\Boinc\...
ID: 65344 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Vortac

Send message
Joined: 22 Apr 09
Posts: 95
Credit: 4,808,181,963
RAC: 0
Message 65345 - Posted: 28 Sep 2016, 22:35:51 UTC - in response to Message 65339.  

Vortac, which driver are you using on Win7 for Boinc?

An all my 7970s or 280X GPU don't get work any longer. The Milkyway server has 1150 WUs available currently. The 390X gets work instantly, but with the error mentioned above.

On Win7, I am using 16.9.1 drivers. But I didn't have any problems even with previous versions. I am not getting any GPU work too right now, but I don't think that's driver related. I have two 5870s with an older driver in another machine and they are also out of work.
ID: 65345 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Sebastian*

Send message
Joined: 8 Apr 09
Posts: 70
Credit: 11,027,167,827
RAC: 0
Message 65401 - Posted: 6 Oct 2016, 18:23:08 UTC

For those who have a R9 390 or 390X, and want to run several workunits at once, stay away from Windows 10 right now. I tried driver 16.10.1 hotfix, 16.7.2, and 16.9.2 hotfix. All drivers crash when one WU reaches 100% and all WUs get stuck where they are. One WU sometimes keeps running, when i run 8 WUs at once.

I use the app_config.xml file to run WUs parallel on one GPU.

And i will try the 390X on Win7 tomorrow, to see if it is Win10 related, what i think. Since the Windows 10 Version 1607 the problem occurs.

Please post anyone if he experiences the same problems
ID: 65401 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : News : MilkyWay@home Version 1.38 Released

©2024 Astroinformatics Group