Author | Message |
alk44
Send message Joined: 2 Mar 20 Posts: 131 Credit: 320,375,336 RAC: 14,084
|
Just wanted to find out if anyone else is having this problem. My system is not crashing and all else seems to be fine, but Boinc will randomly crash and not restart automatically, but restarts normally when I start it.
I'm running Windows10 1904 version. Temps are good with plenty of head room.
Any suggestions are greatly appreciated.
Allen
|
|
Rick
Send message Joined: 29 Aug 21 Posts: 24 Credit: 67,446,872 RAC: 605
|
Could be thermal stress on you GPU. A sudden crash with the system still functioning point to the GPU but go have a look at your crash log and see what upset Boinc. Lots of info on how to check the crash log on the web.
|
|
alk44
Send message Joined: 2 Mar 20 Posts: 131 Credit: 320,375,336 RAC: 14,084
|
Thanks for the input. My processor is a Ryzen7 4700G, which means my GPU is in my CPU processor. Temps are well under the max temp for that processor. So, I'm still looking. Have accessed some of the data in the crash log, but am not finding anything too reliable to have caused the problem.
Thanks again.
Allen
|
|
Rick
Send message Joined: 29 Aug 21 Posts: 24 Credit: 67,446,872 RAC: 605
|
Ok, could be ram, try swapping it around. Could be crashing from an 'exception' that is not system critical. Sometimes swapping the ram will do nothing, so problem fixed or crash the system the ram is using which helps you find the stick causing it.
|
|
Joseph Stateson
Send message Joined: 18 Nov 08 Posts: 291 Credit: 2,461,693,501 RAC: 0
|
My processor is a Ryzen7 4700G, which means my GPU is in my CPU processor. Temps are well under the max temp for that processor.
Your win10 system has only 16 real "errors" out of almost 1400 work units. IMHO that is pretty good. Except for one canceled by server the rest ran for over 1/2 hour (almost to completion for separation tasks) before dying.
I picked one work unit result at random and looked:
- Pagefile Usage -
PagefileUsage: 5324800, PeakPagefileUsage: 4513792
- Working Set Size -
WorkingSetSize: 105152, PeakWorkingSetSize: 7745536, PageFaultCount: 8404992
*** Dump of thread ID 7164 (state: Waiting): ***
8,404,992 is a huge amount of memory paging. It looks like a thread was waiting for something to happen but it did not happen and timed out. This is just a guess as I don't see the phrase "timeout". But a timeout could trigger a beakpoint which is all that would be reported unless the project chose to provide more info. Einstein@home would print "out of paper" for some error message so actually Milkyway is doing good.
I suspect you are running too many concurrent tasks. I would avoid running ryzen opencl and also cpu tasks of same project.
|
|
alk44
Send message Joined: 2 Mar 20 Posts: 131 Credit: 320,375,336 RAC: 14,084
|
Thanks, but I only have one 16 GB ram stick, so swapping is not likely to save this problem.
Nice try though!
Allen
|
|
alk44
Send message Joined: 2 Mar 20 Posts: 131 Credit: 320,375,336 RAC: 14,084
|
Hmmmm... I am running a unit on all 16 instances available and the GPU task. You could be right, but I hate to cut production. I will however reduce my CPU instances if I find the problem is continuing. Have been running flat out for a few days now with no trouble. Wonder how long it will continue.
Thanks for you help,
Allen
|
|
mikey
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0
|
Hmmmm... I am running a unit on all 16 instances available and the GPU task. You could be right, but I hate to cut production. I will however reduce my CPU instances if I find the problem is continuing. Have been running flat out for a few days now with no trouble. Wonder how long it will continue.
Thanks for you help,
Allen
Go into the Boinc Manager and click on a running cpu task, then go to the panel to the left and click on properties and see how much memory that task is using, there will be 2 numbers, virtual and working size you are concerned about the bigger one. If the number is approaching 1gb of memory or more for each task then there's your problem, you only have 16gb of memory and are running 16 tasks plus a gpu task and Windows takes at least 1gb just for itself. Then if you try to surf the net or watch a video it's no wonder your pc is having problems.
|
|
alk44
Send message Joined: 2 Mar 20 Posts: 131 Credit: 320,375,336 RAC: 14,084
|
Thank you! I will take a look and see what there is to see.
Isn't a gig per unit quite a bit to us though?
Allen
|
|
mikey
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0
|
Thank you! I will take a look and see what there is to see.
Isn't a gig per unit quite a bit to us though?
Allen
I don't remember as I haven't run them in a long time, I usually run the gpu application here. But Rosetta for example won't even download a task if you don't have more than 8gb of ram in your pc and won't start a task until 8.?gb of ram is free, and that's for each task. The task itself doesn't actually use that much memory but since Rosetta doesn't actually make the tasks, they are all 3rd party tasks, there is a huge lag between the different group making the tasks and the people running the tasks.
|
|
alk44
Send message Joined: 2 Mar 20 Posts: 131 Credit: 320,375,336 RAC: 14,084
|
Thank you! I will take a look and see what there is to see.
Isn't a gig per unit quite a bit to us though?
Allen
I don't remember as I haven't run them in a long time, I usually run the gpu application here. But Rosetta for example won't even download a task if you don't have more than 8gb of ram in your pc and won't start a task until 8.?gb of ram is free, and that's for each task. The task itself doesn't actually use that much memory but since Rosetta doesn't actually make the tasks, they are all 3rd party tasks, there is a huge lag between the different group making the tasks and the people running the tasks.
Hello again. After checking, I find that the virtual memory is using less than 6 megs and the the working size was a little over 10 megs for a CPU task. The GPU task had much higher amounts, but far short of a GB, coming in at 81 and 100 MB respectively.Interestingly, the task running all 16 cpus at once, is checking out at 10 and 17 repspectively.
Since none of this gets anywhere close to using even 1 GB of RAM I guess we can put this thought to bed.
Love the help though.
I can say that at this point, this is the longest in recent times that Boinc has not crashed. I hope it continues.
Ready for more suggestions should you think of any!
Allen
|
|
alk44
Send message Joined: 2 Mar 20 Posts: 131 Credit: 320,375,336 RAC: 14,084
|
My processor is a Ryzen7 4700G, which means my GPU is in my CPU processor. Temps are well under the max temp for that processor.
Your win10 system has only 16 real "errors" out of almost 1400 work units. IMHO that is pretty good. Except for one canceled by server the rest ran for over 1/2 hour (almost to completion for separation tasks) before dying.
I picked one work unit result at random and looked:
- Pagefile Usage -
PagefileUsage: 5324800, PeakPagefileUsage: 4513792
- Working Set Size -
WorkingSetSize: 105152, PeakWorkingSetSize: 7745536, PageFaultCount: 8404992
*** Dump of thread ID 7164 (state: Waiting): ***
8,404,992 is a huge amount of memory paging. It looks like a thread was waiting for something to happen but it did not happen and timed out. This is just a guess as I don't see the phrase "timeout". But a timeout could trigger a beakpoint which is all that would be reported unless the project chose to provide more info. Einstein@home would print "out of paper" for some error message so actually Milkyway is doing good.
You've made me consider another possibility. I have had a hunch about Windows Update. In Win10 it is very difficult to stop it from updating whenever it takes a notion. Since the last time it crashed (Boinc) I have told it to put a temp hold on updates and have had no trouble since that time. I was just thinking that maybe between Win10 updating and trying to run Boinc at the same time, maybe it ran out of RAM and decided that the update process is more important and just shut Boinc down so that Update could continue. Any thoughts???
Allen
|
|
mikey
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0
|
My processor is a Ryzen7 4700G, which means my GPU is in my CPU processor. Temps are well under the max temp for that processor.
Your win10 system has only 16 real "errors" out of almost 1400 work units. IMHO that is pretty good. Except for one canceled by server the rest ran for over 1/2 hour (almost to completion for separation tasks) before dying.
I picked one work unit result at random and looked:
- Pagefile Usage -
PagefileUsage: 5324800, PeakPagefileUsage: 4513792
- Working Set Size -
WorkingSetSize: 105152, PeakWorkingSetSize: 7745536, PageFaultCount: 8404992
*** Dump of thread ID 7164 (state: Waiting): ***
8,404,992 is a huge amount of memory paging. It looks like a thread was waiting for something to happen but it did not happen and timed out. This is just a guess as I don't see the phrase "timeout". But a timeout could trigger a beakpoint which is all that would be reported unless the project chose to provide more info. Einstein@home would print "out of paper" for some error message so actually Milkyway is doing good.
You've made me consider another possibility. I have had a hunch about Windows Update. In Win10 it is very difficult to stop it from updating whenever it takes a notion. Since the last time it crashed (Boinc) I have told it to put a temp hold on updates and have had no trouble since that time. I was just thinking that maybe between Win10 updating and trying to run Boinc at the same time, maybe it ran out of RAM and decided that the update process is more important and just shut Boinc down so that Update could continue. Any thoughts???
Allen
Yes that's what Windows does now, it updates when we told it the pc is least used, which is never for most crunchers, but you MUST pick a time and then Windows does it thing cancelling all other things running to do so.
Also a Windows page file should be anywhere from 1.5 to 3 times the amount of ram in the pc and yes it is adjustable: https://computerinfobits.com/adjust-page-file-windows-10/#:~:text=How%20to%20adjust%20the%20paging%20file%20size%20in,Settings.%203%203.%20Change%20the%20virtual%20memory%20settings.
YES that 1.5 and 3 times the size of the ram is a guesstimate and will need to be adjusted for each person but if you have an SSD drive it can speed things up when needed.
|
|
alk44
Send message Joined: 2 Mar 20 Posts: 131 Credit: 320,375,336 RAC: 14,084
|
My processor is a Ryzen7 4700G, which means my GPU is in my CPU processor. Temps are well under the max temp for that processor.
Your win10 system has only 16 real "errors" out of almost 1400 work units. IMHO that is pretty good. Except for one canceled by server the rest ran for over 1/2 hour (almost to completion for separation tasks) before dying.
I picked one work unit result at random and looked:
- Pagefile Usage -
PagefileUsage: 5324800, PeakPagefileUsage: 4513792
- Working Set Size -
WorkingSetSize: 105152, PeakWorkingSetSize: 7745536, PageFaultCount: 8404992
*** Dump of thread ID 7164 (state: Waiting): ***
8,404,992 is a huge amount of memory paging. It looks like a thread was waiting for something to happen but it did not happen and timed out. This is just a guess as I don't see the phrase "timeout". But a timeout could trigger a beakpoint which is all that would be reported unless the project chose to provide more info. Einstein@home would print "out of paper" for some error message so actually Milkyway is doing good.
You've made me consider another possibility. I have had a hunch about Windows Update. In Win10 it is very difficult to stop it from updating whenever it takes a notion. Since the last time it crashed (Boinc) I have told it to put a temp hold on updates and have had no trouble since that time. I was just thinking that maybe between Win10 updating and trying to run Boinc at the same time, maybe it ran out of RAM and decided that the update process is more important and just shut Boinc down so that Update could continue. Any thoughts???
Allen
Yes that's what Windows does now, it updates when we told it the pc is least used, which is never for most crunchers, but you MUST pick a time and then Windows does it thing cancelling all other things running to do so.
Also a Windows page file should be anywhere from 1.5 to 3 times the amount of ram in the pc and yes it is adjustable: https://computerinfobits.com/adjust-page-file-windows-10/#:~:text=How%20to%20adjust%20the%20paging%20file%20size%20in,Settings.%203%203.%20Change%20the%20virtual%20memory%20settings.
YES that 1.5 and 3 times the size of the ram is a guesstimate and will need to be adjusted for each person but if you have an SSD drive it can speed things up when needed.
Yes, I know about the pagefile size and will have to check what it is set for and Yes, I do have a 512 SSD that this is running on. I hate the way Win10 is taking over things and am contemplating changing all of my systems over to Linux. I'm really tired of Bill Gates World!
Thanks and I will get back here soon and let you know about the pagefile size, although it might be just a MS update problem.
Allen
|
|
alk44
Send message Joined: 2 Mar 20 Posts: 131 Credit: 320,375,336 RAC: 14,084
|
My processor is a Ryzen7 4700G, which means my GPU is in my CPU processor. Temps are well under the max temp for that processor.
Your win10 system has only 16 real "errors" out of almost 1400 work units. IMHO that is pretty good. Except for one canceled by server the rest ran for over 1/2 hour (almost to completion for separation tasks) before dying.
I picked one work unit result at random and looked:
- Pagefile Usage -
PagefileUsage: 5324800, PeakPagefileUsage: 4513792
- Working Set Size -
WorkingSetSize: 105152, PeakWorkingSetSize: 7745536, PageFaultCount: 8404992
*** Dump of thread ID 7164 (state: Waiting): ***
8,404,992 is a huge amount of memory paging. It looks like a thread was waiting for something to happen but it did not happen and timed out. This is just a guess as I don't see the phrase "timeout". But a timeout could trigger a beakpoint which is all that would be reported unless the project chose to provide more info. Einstein@home would print "out of paper" for some error message so actually Milkyway is doing good.
You've made me consider another possibility. I have had a hunch about Windows Update. In Win10 it is very difficult to stop it from updating whenever it takes a notion. Since the last time it crashed (Boinc) I have told it to put a temp hold on updates and have had no trouble since that time. I was just thinking that maybe between Win10 updating and trying to run Boinc at the same time, maybe it ran out of RAM and decided that the update process is more important and just shut Boinc down so that Update could continue. Any thoughts???
Allen
Yes that's what Windows does now, it updates when we told it the pc is least used, which is never for most crunchers, but you MUST pick a time and then Windows does it thing cancelling all other things running to do so.
Also a Windows page file should be anywhere from 1.5 to 3 times the amount of ram in the pc and yes it is adjustable: https://computerinfobits.com/adjust-page-file-windows-10/#:~:text=How%20to%20adjust%20the%20paging%20file%20size%20in,Settings.%203%203.%20Change%20the%20virtual%20memory%20settings.
YES that 1.5 and 3 times the size of the ram is a guesstimate and will need to be adjusted for each person but if you have an SSD drive it can speed things up when needed.
Yes, I know about the pagefile size and will have to check what it is set for and Yes, I do have a 512 SSD that this is running on. I hate the way Win10 is taking over things and am contemplating changing all of my systems over to Linux. I'm really tired of Bill Gates World!
Thanks and I will get back here soon and let you know about the pagefile size, although it might be just a MS update problem.
Allen
mikey,
Checked the pagefile in Windows and found it to be set for a mere shy of 3 GBs. I have reset it to just shy of 6 GB and we shall see what happens. Really was going to set it higher, but since it worked "most of the time" with 3 GB, 6 is twice as much so.....
Thanks for you help. Will let you know if it fails again, but if it does, I will increase the pagefile size first.
Allen
|
|
mikey
Send message Joined: 8 May 09 Posts: 3339 Credit: 524,010,781 RAC: 0
|
My processor is a Ryzen7 4700G, which means my GPU is in my CPU processor. Temps are well under the max temp for that processor.
Your win10 system has only 16 real "errors" out of almost 1400 work units. IMHO that is pretty good. Except for one canceled by server the rest ran for over 1/2 hour (almost to completion for separation tasks) before dying.
I picked one work unit result at random and looked:
- Pagefile Usage -
PagefileUsage: 5324800, PeakPagefileUsage: 4513792
- Working Set Size -
WorkingSetSize: 105152, PeakWorkingSetSize: 7745536, PageFaultCount: 8404992
*** Dump of thread ID 7164 (state: Waiting): ***
8,404,992 is a huge amount of memory paging. It looks like a thread was waiting for something to happen but it did not happen and timed out. This is just a guess as I don't see the phrase "timeout". But a timeout could trigger a beakpoint which is all that would be reported unless the project chose to provide more info. Einstein@home would print "out of paper" for some error message so actually Milkyway is doing good.
You've made me consider another possibility. I have had a hunch about Windows Update. In Win10 it is very difficult to stop it from updating whenever it takes a notion. Since the last time it crashed (Boinc) I have told it to put a temp hold on updates and have had no trouble since that time. I was just thinking that maybe between Win10 updating and trying to run Boinc at the same time, maybe it ran out of RAM and decided that the update process is more important and just shut Boinc down so that Update could continue. Any thoughts???
Allen
Yes that's what Windows does now, it updates when we told it the pc is least used, which is never for most crunchers, but you MUST pick a time and then Windows does it thing cancelling all other things running to do so.
Also a Windows page file should be anywhere from 1.5 to 3 times the amount of ram in the pc and yes it is adjustable: https://computerinfobits.com/adjust-page-file-windows-10/#:~:text=How%20to%20adjust%20the%20paging%20file%20size%20in,Settings.%203%203.%20Change%20the%20virtual%20memory%20settings.
YES that 1.5 and 3 times the size of the ram is a guesstimate and will need to be adjusted for each person but if you have an SSD drive it can speed things up when needed.
Yes, I know about the pagefile size and will have to check what it is set for and Yes, I do have a 512 SSD that this is running on. I hate the way Win10 is taking over things and am contemplating changing all of my systems over to Linux. I'm really tired of Bill Gates World!
Thanks and I will get back here soon and let you know about the pagefile size, although it might be just a MS update problem.
Allen
mikey,
Checked the pagefile in Windows and found it to be set for a mere shy of 3 GBs. I have reset it to just shy of 6 GB and we shall see what happens. Really was going to set it higher, but since it worked "most of the time" with 3 GB, 6 is twice as much so.....
Thanks for you help. Will let you know if it fails again, but if it does, I will increase the pagefile size first.
Allen
Sounds like a plan
|
|
alk44
Send message Joined: 2 Mar 20 Posts: 131 Credit: 320,375,336 RAC: 14,084
|
My processor is a Ryzen7 4700G, which means my GPU is in my CPU processor. Temps are well under the max temp for that processor.
Your win10 system has only 16 real "errors" out of almost 1400 work units. IMHO that is pretty good. Except for one canceled by server the rest ran for over 1/2 hour (almost to completion for separation tasks) before dying.
I picked one work unit result at random and looked:
- Pagefile Usage -
PagefileUsage: 5324800, PeakPagefileUsage: 4513792
- Working Set Size -
WorkingSetSize: 105152, PeakWorkingSetSize: 7745536, PageFaultCount: 8404992
*** Dump of thread ID 7164 (state: Waiting): ***
8,404,992 is a huge amount of memory paging. It looks like a thread was waiting for something to happen but it did not happen and timed out. This is just a guess as I don't see the phrase "timeout". But a timeout could trigger a beakpoint which is all that would be reported unless the project chose to provide more info. Einstein@home would print "out of paper" for some error message so actually Milkyway is doing good.
You've made me consider another possibility. I have had a hunch about Windows Update. In Win10 it is very difficult to stop it from updating whenever it takes a notion. Since the last time it crashed (Boinc) I have told it to put a temp hold on updates and have had no trouble since that time. I was just thinking that maybe between Win10 updating and trying to run Boinc at the same time, maybe it ran out of RAM and decided that the update process is more important and just shut Boinc down so that Update could continue. Any thoughts???
Allen
Yes that's what Windows does now, it updates when we told it the pc is least used, which is never for most crunchers, but you MUST pick a time and then Windows does it thing cancelling all other things running to do so.
Also a Windows page file should be anywhere from 1.5 to 3 times the amount of ram in the pc and yes it is adjustable: https://computerinfobits.com/adjust-page-file-windows-10/#:~:text=How%20to%20adjust%20the%20paging%20file%20size%20in,Settings.%203%203.%20Change%20the%20virtual%20memory%20settings.
YES that 1.5 and 3 times the size of the ram is a guesstimate and will need to be adjusted for each person but if you have an SSD drive it can speed things up when needed.
Yes, I know about the pagefile size and will have to check what it is set for and Yes, I do have a 512 SSD that this is running on. I hate the way Win10 is taking over things and am contemplating changing all of my systems over to Linux. I'm really tired of Bill Gates World!
Thanks and I will get back here soon and let you know about the pagefile size, although it might be just a MS update problem.
Allen
mikey,
Checked the pagefile in Windows and found it to be set for a mere shy of 3 GBs. I have reset it to just shy of 6 GB and we shall see what happens. Really was going to set it higher, but since it worked "most of the time" with 3 GB, 6 is twice as much so.....
Thanks for you help. Will let you know if it fails again, but if it does, I will increase the pagefile size first.
Allen
Sounds like a plan
Hello again,
It's been quite awhile since I last spoke of this problem, so I thought I would get back to you and let you know that I've not seen an app crash since.
I am running flat out on all cpu and gpu instances without any trouble for now.
Would have gotten back sooner, but the Covid decided to put me and the wife down for awhile. Fortunately, we managed to come out the other side
of it and are trying to catch up on what has gone dorment for sometime. Both of us are over 70, so it was not fun.
Now that I have said that things are looking up with my computer problem, I'd better keep a closer eye on it for awhile.
Thanks to all for helping me tackle this challenge!
Allen
|
|