rpi_logo
Testing Some New Plan Classes
Testing Some New Plan Classes
log in

Advanced search

Message boards : News : Testing Some New Plan Classes

Author Message
Profile Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 501
Credit: 34,647,251
RAC: 224

Message 67229 - Posted: 8 Mar 2018, 18:06:31 UTC

Hey Everyone,

I am going to try changing the GPU plan classes to reduce workunits sent to users without double precision gpus. If you notice any issues on your end, please let me know.

Thanks,

Jake

Tom*
Send message
Joined: 4 Oct 11
Posts: 38
Credit: 283,140,578
RAC: 0

Message 67230 - Posted: 8 Mar 2018, 18:46:10 UTC

Thank you Jake,

most of my workunits failing to successfully complete are due to wingers
without DP.

Bill

Slywy
Send message
Joined: 22 Jul 12
Posts: 2
Credit: 136,597
RAC: 230

Message 67235 - Posted: 10 Mar 2018, 13:57:25 UTC - in response to Message 67229.

Hey Everyone,

I am going to try changing the GPU plan classes to reduce workunits sent to users without double precision gpus. If you notice any issues on your end, please let me know.

Thanks,

Jake


I must not have one because I'm suddenly getting hundreds of computation errors.

Dunx
Send message
Joined: 13 Feb 11
Posts: 23
Credit: 426,625,749
RAC: 1,360,564

Message 67236 - Posted: 10 Mar 2018, 15:55:13 UTC

https://milkyway.cs.rpi.edu/milkyway/results.php?hostid=767829

Like this ?

dunx

Profile mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2183
Credit: 232,361,889
RAC: 230,124

Message 67240 - Posted: 11 Mar 2018, 12:26:53 UTC - in response to Message 67235.

Hey Everyone,

I am going to try changing the GPU plan classes to reduce workunits sent to users without double precision gpus. If you notice any issues on your end, please let me know.

Thanks,

Jake


I must not have one because I'm suddenly getting hundreds of computation errors.


I checked Nvidia and it's listed as single precision.

macgeyer
Send message
Joined: 2 Mar 18
Posts: 6
Credit: 1,241,834
RAC: 0

Message 67241 - Posted: 13 Mar 2018, 19:02:56 UTC

Website had problems last hour and my computer ID: 767940 doesn't get any new task :
13/03/2018 19:57:35 | Milkyway@Home | Scheduler request completed: got 0 new tasks

macgeyer
Send message
Joined: 2 Mar 18
Posts: 6
Credit: 1,241,834
RAC: 0

Message 67242 - Posted: 13 Mar 2018, 19:39:02 UTC - in response to Message 67241.

Website had problems last hour and my computer ID: 767940 doesn't get any new task :
13/03/2018 19:57:35 | Milkyway@Home | Scheduler request completed: got 0 new tasks


Got new tasks :
13/03/2018 20:37:37 | Milkyway@Home | Scheduler request completed: got 192 new tasks

macgeyer
Send message
Joined: 2 Mar 18
Posts: 6
Credit: 1,241,834
RAC: 0

Message 67243 - Posted: 13 Mar 2018, 19:39:02 UTC - in response to Message 67241.
Last modified: 13 Mar 2018, 19:39:59 UTC

sorry double posted

Jon A. Robison
Send message
Joined: 6 Oct 10
Posts: 1
Credit: 6,695,220
RAC: 0

Message 67244 - Posted: 14 Mar 2018, 15:54:03 UTC

OK, yesterday (March 13,2018) I was trying to post that I had 27 "Computation Error" work units and only 2 "Waiting to Report". Over the last several weeks I've been inundated with work units that end up this way. My User ID is 128146. I have changed nothing in my machine (unless MS updates did something) that should cause this. My processor is an INTEL Q9550 and my video board is an ATI Radeon HD 4870 (1Gb GDD5) and I've never had this type problem before. My conclusion is your work units are at fault. I've suspended the current batch of work units and may drop this work all together since my machine doesn't appear to be able to process your data!!!!
____________

Profile mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2183
Credit: 232,361,889
RAC: 230,124

Message 67245 - Posted: 15 Mar 2018, 11:24:59 UTC - in response to Message 67244.

OK, yesterday (March 13,2018) I was trying to post that I had 27 "Computation Error" work units and only 2 "Waiting to Report". Over the last several weeks I've been inundated with work units that end up this way. My User ID is 128146. I have changed nothing in my machine (unless MS updates did something) that should cause this. My processor is an INTEL Q9550 and my video board is an ATI Radeon HD 4870 (1Gb GDD5) and I've never had this type problem before. My conclusion is your work units are at fault. I've suspended the current batch of work units and may drop this work all together since my machine doesn't appear to be able to process your data!!!!


It might help if you updated to a newer version of Boinc, 7.2.47 is fairly old at this point.

Profile Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 501
Credit: 34,647,251
RAC: 224

Message 67247 - Posted: 15 Mar 2018, 20:27:54 UTC

Hey Jon,

Looks to me like its probably a driver error since the error you are getting is "Failed to compute likelihood." Can you try a different driver version and get back to me?

Jake

Profile BeemerBiker
Avatar
Send message
Joined: 18 Nov 08
Posts: 94
Credit: 658,323,421
RAC: 8,764

Message 67252 - Posted: 17 Mar 2018, 13:26:24 UTC
Last modified: 17 Mar 2018, 13:57:47 UTC

Something strange has happened, not sure if it is your plan change. Over a 30 minute period all my ATI (AMD) Tahati class (7950, S9000) went from just over 3 minutes per WU to 17-19 minutes. The GPU clock went from 900 down to 300 indicating very little load on the GPU. CPU load went from 11 to 30 % I changed the ngpu assignment from 4 down to 1 and suspended all CPU tasks but that had no effect. Time to complete stayed in the 15 minute range even with only 1 WU per S9000.

ATI Pitcairn class (7850) did not show any change, GPU clock was 925 and WUs taking 12-14 minutes. However, this system had a full day of tasks maybe the problem has not shown up yet. Same for RX560, WUs take 18 minutes with no change and a somewhat large cache.

All of the above systems are old core 2 quads with 8gb memory if that makes any difference

[EDIT] When I switched project from Milkyway to Collatz, the GPU clocks shot back up to the 900mhz range like they normally run. Something broke.

[EDIT-2] Just compared one of my slow WUs to my wingman. His took the normal 3 or so minutes to complete the same WU where I took 18 minutes with the GPU clock running at 300 instead of 900
http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1589386532
Something is not right. The video boards are both tahiti class. However the motherboard and CPU are vastly different.

mmonnin
Send message
Joined: 2 Oct 16
Posts: 102
Credit: 81,199,642
RAC: 18,711

Message 67253 - Posted: 17 Mar 2018, 15:12:43 UTC

Sounds more like a driver crash than a project problem. Even if load did go back up with collatz.

Profile BeemerBiker
Avatar
Send message
Joined: 18 Nov 08
Posts: 94
Credit: 658,323,421
RAC: 8,764

Message 67255 - Posted: 17 Mar 2018, 15:38:27 UTC - in response to Message 67253.
Last modified: 17 Mar 2018, 16:10:06 UTC

Yes, that is what I thought, but it happened on two systems

Asus P5E starting 3/17 at 1:03am to 1:27am went from 3 minutes per wu to 15

MSI P7N SLI FTW starting 3/16 9:56pm to 10:33 went from 3 per wu to 17-20 minutes

The work unit report states 900mhz for the video boards but I suspect that is just the core freq being reported. gpu-z shows 300mhz and low temps in the 30c. Switching to collatz gpu-z shows 900mhz and temps into the 60 and 70 as these are air cooled. Switching back the freq drops to 300 gain for all 7950s and s9000 on both systems.

Could still be a driver problem. Microsoft did release a bunch of stuff Tuesday and systems may have just got around to rebooting. I will look into it.

I brought up afterburner but do not see how to change the clock speed from 300 to a higher number. I know that if tasks are starved for data (cpu cores busy or low on memory) then the gpu clock will drop as the card is not busy enough to run at 900. There are a lot of possibilities. None of my Linux system show a milkyway problem but they have 16 threads to feed a pair of really slow NVidia 1050TIs where the above systems have total of 4 threads (1 per core) for a much more productive (double precision) Tahitis.

Looking at the report there is a lot of info being presented. Perhaps that causes my few threads to not be able to feed the gpu?

[EDIT] Compared much faster against way slower and don't really see any reason for the much longer time to complete. Consistency is totally unlike any milkyway ATI tasks I have seen on the same type of system.

Profile Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 501
Credit: 34,647,251
RAC: 224

Message 67256 - Posted: 17 Mar 2018, 16:41:22 UTC

Hey BeemerBiker,

So the runs I put up a couple of days ago have a few extra calculations in them, so they should take a little bit longer. Your credits should have been adjusted accordingly.

Please let me know if you aren't getting an increase in credits for the increased work.

Jake

Profile BeemerBiker
Avatar
Send message
Joined: 18 Nov 08
Posts: 94
Credit: 658,323,421
RAC: 8,764

Message 67258 - Posted: 17 Mar 2018, 16:54:13 UTC - in response to Message 67256.

Hey BeemerBiker,

So the runs I put up a couple of days ago have a few extra calculations in them, so they should take a little bit longer. Your credits should have been adjusted accordingly.

Please let me know if you aren't getting an increase in credits for the increased work.

Jake


yea, probably will get more credit, thanks, but why are 4 of my Tahiti class GPUs running at 300mhz instead of 850 or 900. They are not being fed properly I suspect. Could be a problem at my end as I don't see anyone else reporting stuff like I am getting.

mmonnin
Send message
Joined: 2 Oct 16
Posts: 102
Credit: 81,199,642
RAC: 18,711

Message 67260 - Posted: 18 Mar 2018, 0:13:53 UTC - in response to Message 67258.

Hey BeemerBiker,

So the runs I put up a couple of days ago have a few extra calculations in them, so they should take a little bit longer. Your credits should have been adjusted accordingly.

Please let me know if you aren't getting an increase in credits for the increased work.

Jake


yea, probably will get more credit, thanks, but why are 4 of my Tahiti class GPUs running at 300mhz instead of 850 or 900. They are not being fed properly I suspect. Could be a problem at my end as I don't see anyone else reporting stuff like I am getting.


I have a 280x in Win10 running at 1070 mhz.

Profile mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2183
Credit: 232,361,889
RAC: 230,124

Message 67262 - Posted: 18 Mar 2018, 12:56:40 UTC - in response to Message 67255.

Yes, that is what I thought, but it happened on two systems

I brought up afterburner but do not see how to change the clock speed from 300 to a higher number. I know that if tasks are starved for data (cpu cores busy or low on memory) then the gpu clock will drop as the card is not busy enough to run at 900. There are a lot of possibilities. None of my Linux system show a milkyway problem but they have 16 threads to feed a pair of really slow NVidia 1050TIs where the above systems have total of 4 threads (1 per core) for a much more productive (double precision) Tahitis.


The only way to reset a crashed driver in Windows is to restart the whole pc. If you have Win10 it could have done an update and crashed the driver in the process. There could also be a newer better driver if Win10 was updated.

Profile BeemerBiker
Avatar
Send message
Joined: 18 Nov 08
Posts: 94
Credit: 658,323,421
RAC: 8,764

Message 67264 - Posted: 18 Mar 2018, 16:24:28 UTC - in response to Message 67262.
Last modified: 18 Mar 2018, 16:37:52 UTC

Yes, that is what I thought, but it happened on two systems

I brought up afterburner but do not see how to change the clock speed from 300 to a higher number. I know that if tasks are starved for data (cpu cores busy or low on memory) then the gpu clock will drop as the card is not busy enough to run at 900. There are a lot of possibilities. None of my Linux system show a milkyway problem but they have 16 threads to feed a pair of really slow NVidia 1050TIs where the above systems have total of 4 threads (1 per core) for a much more productive (double precision) Tahitis.


The only way to reset a crashed driver in Windows is to restart the whole pc. If you have Win10 it could have done an update and crashed the driver in the process. There could also be a newer better driver if Win10 was updated.



Yea, problem was drivers. I looked at my P5E and it was waiting to reboot to install drivers to fix whatever Microsoft had done the previous Tuesday. After rebooting exactly 30 milkyway ATI tasks reported an error but all the remaining tasks plus the new downloads were back at their 3 minute normal WU time to complete. Has been 24 hours working just fine. I don't think anything was wrong with those 30 tasks, just the driver change bumped them out. I had tried an S9000 graphics in this system before putting it in the P7N.

The P7N on the other hand did not respond to reboot like my P5E. This system worked fine on collatz ATI tasks but ran at only 300mhz for milkyway. I suspect the same problem with the driver. The driver failed to uninstall (win10x64) even the ATI "cleanup" program was unable to uninstall the AMD software on this Intel system. I deleted both the S9000 video boards from the system manager and rebooted. They were recognized as w8000 video boards but they worked and time to complete is back down to 3 minutes per WU when running 4 on each GPU. Apparently the Adrenalin Radeon driver caused problems with mixed S9000 and 7950 graphics boards. I did not attempt to update whatever Microsoft installed to handle the "w8000" as it is working and I don't want to mess with it any more. Just a coincidence that these problems occurred the same time as the class plans were changed here.

The "S9000" were $160 new, free ship on eBay and I could not pass up a chance to get a new Tahiti system with 6gb mem not just 3. They just cannot be mixed with normal 7950 boards and require DIY cooling.

Profile mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2183
Credit: 232,361,889
RAC: 230,124

Message 67266 - Posted: 19 Mar 2018, 11:34:33 UTC - in response to Message 67264.

Yes, that is what I thought, but it happened on two systems

I brought up afterburner but do not see how to change the clock speed from 300 to a higher number. I know that if tasks are starved for data (cpu cores busy or low on memory) then the gpu clock will drop as the card is not busy enough to run at 900. There are a lot of possibilities. None of my Linux system show a milkyway problem but they have 16 threads to feed a pair of really slow NVidia 1050TIs where the above systems have total of 4 threads (1 per core) for a much more productive (double precision) Tahitis.


The only way to reset a crashed driver in Windows is to restart the whole pc. If you have Win10 it could have done an update and crashed the driver in the process. There could also be a newer better driver if Win10 was updated.



Yea, problem was drivers. I looked at my P5E and it was waiting to reboot to install drivers to fix whatever Microsoft had done the previous Tuesday. After rebooting exactly 30 milkyway ATI tasks reported an error but all the remaining tasks plus the new downloads were back at their 3 minute normal WU time to complete. Has been 24 hours working just fine. I don't think anything was wrong with those 30 tasks, just the driver change bumped them out. I had tried an S9000 graphics in this system before putting it in the P7N.

The P7N on the other hand did not respond to reboot like my P5E. This system worked fine on collatz ATI tasks but ran at only 300mhz for milkyway. I suspect the same problem with the driver. The driver failed to uninstall (win10x64) even the ATI "cleanup" program was unable to uninstall the AMD software on this Intel system. I deleted both the S9000 video boards from the system manager and rebooted. They were recognized as w8000 video boards but they worked and time to complete is back down to 3 minutes per WU when running 4 on each GPU. Apparently the Adrenalin Radeon driver caused problems with mixed S9000 and 7950 graphics boards. I did not attempt to update whatever Microsoft installed to handle the "w8000" as it is working and I don't want to mess with it any more. Just a coincidence that these problems occurred the same time as the class plans were changed here.

The "S9000" were $160 new, free ship on eBay and I could not pass up a chance to get a new Tahiti system with 6gb mem not just 3. They just cannot be mixed with normal 7950 boards and require DIY cooling.


Sounds like a good deal for someone who knows how to do that stuff, I'm glad you are back and crunching fast again.


Post to thread

Message boards : News : Testing Some New Plan Classes


Main page · Your account · Message boards


Copyright © 2018 AstroInformatics Group