Message boards :
News :
Testing Some New Plan Classes
Message board moderation
Author | Message |
---|---|
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey Everyone, I am going to try changing the GPU plan classes to reduce workunits sent to users without double precision gpus. If you notice any issues on your end, please let me know. Thanks, Jake |
Send message Joined: 4 Oct 11 Posts: 38 Credit: 309,729,457 RAC: 0 |
Thank you Jake, most of my workunits failing to successfully complete are due to wingers without DP. Bill |
Send message Joined: 22 Jul 12 Posts: 11 Credit: 1,008,373 RAC: 0 |
Hey Everyone, I must not have one because I'm suddenly getting hundreds of computation errors. |
Send message Joined: 13 Feb 11 Posts: 31 Credit: 1,403,524,537 RAC: 0 |
|
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 14 |
Hey Everyone, I checked Nvidia and it's listed as single precision. |
Send message Joined: 2 Mar 18 Posts: 9 Credit: 457,043,383 RAC: 0 |
Website had problems last hour and my computer ID: 767940 doesn't get any new task : 13/03/2018 19:57:35 | Milkyway@Home | Scheduler request completed: got 0 new tasks |
Send message Joined: 2 Mar 18 Posts: 9 Credit: 457,043,383 RAC: 0 |
Website had problems last hour and my computer ID: 767940 doesn't get any new task : Got new tasks : 13/03/2018 20:37:37 | Milkyway@Home | Scheduler request completed: got 192 new tasks |
Send message Joined: 2 Mar 18 Posts: 9 Credit: 457,043,383 RAC: 0 |
sorry double posted |
Send message Joined: 6 Oct 10 Posts: 1 Credit: 6,703,932 RAC: 0 |
OK, yesterday (March 13,2018) I was trying to post that I had 27 "Computation Error" work units and only 2 "Waiting to Report". Over the last several weeks I've been inundated with work units that end up this way. My User ID is 128146. I have changed nothing in my machine (unless MS updates did something) that should cause this. My processor is an INTEL Q9550 and my video board is an ATI Radeon HD 4870 (1Gb GDD5) and I've never had this type problem before. My conclusion is your work units are at fault. I've suspended the current batch of work units and may drop this work all together since my machine doesn't appear to be able to process your data!!!! |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 14 |
OK, yesterday (March 13,2018) I was trying to post that I had 27 "Computation Error" work units and only 2 "Waiting to Report". Over the last several weeks I've been inundated with work units that end up this way. My User ID is 128146. I have changed nothing in my machine (unless MS updates did something) that should cause this. My processor is an INTEL Q9550 and my video board is an ATI Radeon HD 4870 (1Gb GDD5) and I've never had this type problem before. My conclusion is your work units are at fault. I've suspended the current batch of work units and may drop this work all together since my machine doesn't appear to be able to process your data!!!! It might help if you updated to a newer version of Boinc, 7.2.47 is fairly old at this point. |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey Jon, Looks to me like its probably a driver error since the error you are getting is "Failed to compute likelihood." Can you try a different driver version and get back to me? Jake |
Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,461,693,501 RAC: 0 |
Something strange has happened, not sure if it is your plan change. Over a 30 minute period all my ATI (AMD) Tahati class (7950, S9000) went from just over 3 minutes per WU to 17-19 minutes. The GPU clock went from 900 down to 300 indicating very little load on the GPU. CPU load went from 11 to 30 % I changed the ngpu assignment from 4 down to 1 and suspended all CPU tasks but that had no effect. Time to complete stayed in the 15 minute range even with only 1 WU per S9000. ATI Pitcairn class (7850) did not show any change, GPU clock was 925 and WUs taking 12-14 minutes. However, this system had a full day of tasks maybe the problem has not shown up yet. Same for RX560, WUs take 18 minutes with no change and a somewhat large cache. All of the above systems are old core 2 quads with 8gb memory if that makes any difference [EDIT] When I switched project from Milkyway to Collatz, the GPU clocks shot back up to the 900mhz range like they normally run. Something broke. [EDIT-2] Just compared one of my slow WUs to my wingman. His took the normal 3 or so minutes to complete the same WU where I took 18 minutes with the GPU clock running at 300 instead of 900 http://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1589386532 Something is not right. The video boards are both tahiti class. However the motherboard and CPU are vastly different. |
Send message Joined: 2 Oct 16 Posts: 167 Credit: 1,006,950,718 RAC: 686 |
Sounds more like a driver crash than a project problem. Even if load did go back up with collatz. |
Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,461,693,501 RAC: 0 |
Yes, that is what I thought, but it happened on two systems Asus P5E starting 3/17 at 1:03am to 1:27am went from 3 minutes per wu to 15 MSI P7N SLI FTW starting 3/16 9:56pm to 10:33 went from 3 per wu to 17-20 minutes The work unit report states 900mhz for the video boards but I suspect that is just the core freq being reported. gpu-z shows 300mhz and low temps in the 30c. Switching to collatz gpu-z shows 900mhz and temps into the 60 and 70 as these are air cooled. Switching back the freq drops to 300 gain for all 7950s and s9000 on both systems. Could still be a driver problem. Microsoft did release a bunch of stuff Tuesday and systems may have just got around to rebooting. I will look into it. I brought up afterburner but do not see how to change the clock speed from 300 to a higher number. I know that if tasks are starved for data (cpu cores busy or low on memory) then the gpu clock will drop as the card is not busy enough to run at 900. There are a lot of possibilities. None of my Linux system show a milkyway problem but they have 16 threads to feed a pair of really slow NVidia 1050TIs where the above systems have total of 4 threads (1 per core) for a much more productive (double precision) Tahitis. Looking at the report there is a lot of info being presented. Perhaps that causes my few threads to not be able to feed the gpu? [EDIT] Compared much faster against way slower and don't really see any reason for the much longer time to complete. Consistency is totally unlike any milkyway ATI tasks I have seen on the same type of system. |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
Hey BeemerBiker, So the runs I put up a couple of days ago have a few extra calculations in them, so they should take a little bit longer. Your credits should have been adjusted accordingly. Please let me know if you aren't getting an increase in credits for the increased work. Jake |
Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,461,693,501 RAC: 0 |
Hey BeemerBiker, yea, probably will get more credit, thanks, but why are 4 of my Tahiti class GPUs running at 300mhz instead of 850 or 900. They are not being fed properly I suspect. Could be a problem at my end as I don't see anyone else reporting stuff like I am getting. |
Send message Joined: 2 Oct 16 Posts: 167 Credit: 1,006,950,718 RAC: 686 |
Hey BeemerBiker, I have a 280x in Win10 running at 1070 mhz. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 14 |
Yes, that is what I thought, but it happened on two systems The only way to reset a crashed driver in Windows is to restart the whole pc. If you have Win10 it could have done an update and crashed the driver in the process. There could also be a newer better driver if Win10 was updated. |
Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,461,693,501 RAC: 0 |
Yes, that is what I thought, but it happened on two systems Yea, problem was drivers. I looked at my P5E and it was waiting to reboot to install drivers to fix whatever Microsoft had done the previous Tuesday. After rebooting exactly 30 milkyway ATI tasks reported an error but all the remaining tasks plus the new downloads were back at their 3 minute normal WU time to complete. Has been 24 hours working just fine. I don't think anything was wrong with those 30 tasks, just the driver change bumped them out. I had tried an S9000 graphics in this system before putting it in the P7N. The P7N on the other hand did not respond to reboot like my P5E. This system worked fine on collatz ATI tasks but ran at only 300mhz for milkyway. I suspect the same problem with the driver. The driver failed to uninstall (win10x64) even the ATI "cleanup" program was unable to uninstall the AMD software on this Intel system. I deleted both the S9000 video boards from the system manager and rebooted. They were recognized as w8000 video boards but they worked and time to complete is back down to 3 minutes per WU when running 4 on each GPU. Apparently the Adrenalin Radeon driver caused problems with mixed S9000 and 7950 graphics boards. I did not attempt to update whatever Microsoft installed to handle the "w8000" as it is working and I don't want to mess with it any more. Just a coincidence that these problems occurred the same time as the class plans were changed here. The "S9000" were $160 new, free ship on eBay and I could not pass up a chance to get a new Tahiti system with 6gb mem not just 3. They just cannot be mixed with normal 7950 boards and require DIY cooling. |
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 14 |
Yes, that is what I thought, but it happened on two systems Sounds like a good deal for someone who knows how to do that stuff, I'm glad you are back and crunching fast again. |
©2024 Astroinformatics Group