Welcome to MilkyWay@home

Computation errors


Advanced search

Message boards : Number crunching : Computation errors
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileZa69uzZ
Avatar

Send message
Joined: 6 Apr 12
Posts: 42
Credit: 3,215,609
RAC: 0
3 million credit badge8 year member badge
Message 69940 - Posted: 19 Jun 2020, 12:03:19 UTC

Isn't it fantastic need to find out if it was at my end or the milkyway@home end.
I stopped work and cancelled; let the errors transmit.
Has anyone else ran into this? in current GPU work units?
ID: 69940 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2408
Credit: 450,644,880
RAC: 27,936
300 million credit badge10 year member badgeextraordinary contributions badge
Message 69948 - Posted: 20 Jun 2020, 11:21:40 UTC - in response to Message 69940.  
Last modified: 20 Jun 2020, 11:22:45 UTC

Isn't it fantastic need to find out if it was at my end or the milkyway@home end.
I stopped work and cancelled; let the errors transmit.
Has anyone else ran into this? in current GPU work units?


Since you didn't make your computers visible and you didn't explain the problem it's very hard to try and figure out your problem.

I run gpu workunits on multiple pc's and have no problems.
ID: 69948 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileZa69uzZ
Avatar

Send message
Joined: 6 Apr 12
Posts: 42
Credit: 3,215,609
RAC: 0
3 million credit badge8 year member badge
Message 69952 - Posted: 20 Jun 2020, 19:59:16 UTC - in response to Message 69948.  
Last modified: 20 Jun 2020, 20:01:19 UTC

Can someone please tell me what logging option check boxes to set on BOINC to log the issue
And where to obtain the error details to post if theres anything additional i need to do.
I am not showing my pc.
News will tell you Australia is suffering a hacking episode.
COVID19 has news media trying to provoke a cold war.
It is a good computer. A properly built and maintained system by a very experienced operator. If I was inexperienced or dumb then I would show it off.
I have just refreshed my PC with desktop software than server software and new passwords as a precaustion and enhanced my router security.
My drivers are not even a month old.
There shouldn't be an issue running.
When someone can please satisfy my queries I can log the work units and post any futrther errors.
I cannot crunch any further until I can report problems properly.
ID: 69952 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileKeith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 335
Credit: 215,097,660
RAC: 320,575
200 million credit badge9 year member badgeextraordinary contributions badge
Message 69953 - Posted: 20 Jun 2020, 21:00:23 UTC - in response to Message 69952.  

Allowing your computers to be visible at the project does open them up to any hacking. It only shows what they are running when your host contacts the project at each scheduler connection and your current tasks. Simply provide the URL link to the errored work after changing your project preferences for your computers to be visible. That setting has no bearing on your physical computers in your network. Whatever prevention you have in place for your network is not compromised.
ID: 69953 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileZa69uzZ
Avatar

Send message
Joined: 6 Apr 12
Posts: 42
Credit: 3,215,609
RAC: 0
3 million credit badge8 year member badge
Message 69954 - Posted: 20 Jun 2020, 22:04:04 UTC - in response to Message 69953.  

Allowing your computers to be visible at the project does open them up to any hacking. It only shows what they are running when your host contacts the project at each scheduler connection and your current tasks. Simply provide the URL link to the errored work after changing your project preferences for your computers to be visible. That setting has no bearing on your physical computers in your network. Whatever prevention you have in place for your network is not compromised.


I appreciate the advice and I have successfully run tasks since so I am waiting to see errors again that will prompt me to unhide my computer. I did a lot of work to it last night so my morale is high this morning.
Nice and snappy too.
ID: 69954 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2408
Credit: 450,644,880
RAC: 27,936
300 million credit badge10 year member badgeextraordinary contributions badge
Message 69955 - Posted: 20 Jun 2020, 23:33:30 UTC - in response to Message 69952.  

Can someone please tell me what logging option check boxes to set on BOINC to log the issue
And where to obtain the error details to post if theres anything additional i need to do.
I am not showing my pc.
News will tell you Australia is suffering a hacking episode.
COVID19 has news media trying to provoke a cold war.
It is a good computer. A properly built and maintained system by a very experienced operator. If I was inexperienced or dumb then I would show it off.
I have just refreshed my PC with desktop software than server software and new passwords as a precaustion and enhanced my router security.
My drivers are not even a month old.
There shouldn't be an issue running.
When someone can please satisfy my queries I can log the work units and post any futrther errors.
I cannot crunch any further until I can report problems properly.


Click on my name and you can view my pc's and see what people could see about your pc's, it's not enough to make them any more hackable than they are right now.

One suggestion would be to roll back the drivers to ones a bit older. Another would be to stop overclocking anything if you are are.
ID: 69955 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileZa69uzZ
Avatar

Send message
Joined: 6 Apr 12
Posts: 42
Credit: 3,215,609
RAC: 0
3 million credit badge8 year member badge
Message 69957 - Posted: 21 Jun 2020, 19:43:11 UTC - in response to Message 69955.  

I discovered the cause.
Its a condition called sagging.
Being so damn heavy this monolithic GPU unit doesn't bode well in a standard pc tower.
It does have a metal backplate but that doesn't stop stress on the PCIE slot connector.
One slight slip and it malfunctions.
Now I have it running just by repositioning the card on the PCIE slot and tightening up the mounting bracket screw.
It really needs a gpu mounting bracket base I tried to get one it came damaged.
Guess I need to try again.
.
ID: 69957 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2408
Credit: 450,644,880
RAC: 27,936
300 million credit badge10 year member badgeextraordinary contributions badge
Message 69958 - Posted: 21 Jun 2020, 23:38:51 UTC - in response to Message 69957.  

I discovered the cause.
Its a condition called sagging.
Being so damn heavy this monolithic GPU unit doesn't bode well in a standard pc tower.
It does have a metal backplate but that doesn't stop stress on the PCIE slot connector.
One slight slip and it malfunctions.
Now I have it running just by repositioning the card on the PCIE slot and tightening up the mounting bracket screw.
It really needs a gpu mounting bracket base I tried to get one it came damaged.
Guess I need to try again. .


If you have the room lay the pc down on it's side until you get one. Can you make one or use a zip tie?
https://graphicscardhub.com/gpu-brace-support/ I'm assuming that's what you mean
ID: 69958 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileZa69uzZ
Avatar

Send message
Joined: 6 Apr 12
Posts: 42
Credit: 3,215,609
RAC: 0
3 million credit badge8 year member badge
Message 69959 - Posted: 22 Jun 2020, 14:58:16 UTC - in response to Message 69958.  

I discovered the cause.
Its a condition called sagging.
Being so damn heavy this monolithic GPU unit doesn't bode well in a standard pc tower.
It does have a metal backplate but that doesn't stop stress on the PCIE slot connector.
One slight slip and it malfunctions.
Now I have it running just by repositioning the card on the PCIE slot and tightening up the mounting bracket screw.
It really needs a gpu mounting bracket base I tried to get one it came damaged.
Guess I need to try again. .


If you have the room lay the pc down on it's side until you get one. Can you make one or use a zip tie?
https://graphicscardhub.com/gpu-brace-support/ I'm assuming that's what you mean


you are spot on. Yes thats what I need. I had a look around and yeah its either a bracket or a pc case than tower that I need.
I actually put a ruler there for now :)
ID: 69959 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2408
Credit: 450,644,880
RAC: 27,936
300 million credit badge10 year member badgeextraordinary contributions badge
Message 69960 - Posted: 22 Jun 2020, 21:33:41 UTC - in response to Message 69959.  

I discovered the cause.
Its a condition called sagging.
Being so damn heavy this monolithic GPU unit doesn't bode well in a standard pc tower.
It does have a metal backplate but that doesn't stop stress on the PCIE slot connector.
One slight slip and it malfunctions.
Now I have it running just by repositioning the card on the PCIE slot and tightening up the mounting bracket screw.
It really needs a gpu mounting bracket base I tried to get one it came damaged.
Guess I need to try again. .


If you have the room lay the pc down on it's side until you get one. Can you make one or use a zip tie?
https://graphicscardhub.com/gpu-brace-support/ I'm assuming that's what you mean


you are spot on. Yes thats what I need. I had a look around and yeah its either a bracket or a pc case than tower that I need.
I actually put a ruler there for now :)


I hope you can find one. If you are a wood guy you could always notch a stick and put it between the top and bottom of the case, a small screw thru the top and bottom of the case should hold the stick in place since it isn't really a bracket just a 'don't fall out' thing.
ID: 69960 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JAWAWA

Send message
Joined: 28 Jun 20
Posts: 2
Credit: 810,257
RAC: 1
500 thousand credit badge
Message 69973 - Posted: 3 Jul 2020, 21:46:47 UTC
Last modified: 3 Jul 2020, 21:52:49 UTC

Hi, everyone!

New to the project and just started to run some tasks from last week,
Now I am working on a N-Body Simulation over 11.5 hours but it seems stuck in 34.725% forever,
the total estimated time has been extended from 7.5 hrs to near 17 hrs,
Is that normal or the files are broken? What should I do now?
Thank you very much!
----------------------------------------------------------------------------------------------------------------------
Application Milkyway@home N-Body Simulation 1.76 (mt)
Name de_nbody_06_08_2020_v176_40k__data__2_1588605902_541679
State Suspended - user request
Received 29/06/2020 23:53:31
Report deadline 11/07/2020 23:53:30
Resources 12 CPUs
Estimated computation size 11,205 GFLOPs
CPU time 11:28:46
CPU time since checkpoint 00:16:15
Elapsed time 09:00:28
Estimated time remaining 16:55:57
Fraction done 34.725%
Virtual memory size 16.23 MB
Working set size 940.00 KB
Directory slots/1
Process ID 1412
Progress rate 3.960% per hour
Executable milkyway_nbody_1.76_windows_x86_64__mt.exe
----------------------------------------------------------------------------------------------------------------------
ID: 69973 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2408
Credit: 450,644,880
RAC: 27,936
300 million credit badge10 year member badgeextraordinary contributions badge
Message 69975 - Posted: 4 Jul 2020, 10:00:42 UTC - in response to Message 69973.  

Hi, everyone!

New to the project and just started to run some tasks from last week,
Now I am working on a N-Body Simulation over 11.5 hours but it seems stuck in 34.725% forever,
the total estimated time has been extended from 7.5 hrs to near 17 hrs,
Is that normal or the files are broken? What should I do now?
Thank you very much!
----------------------------------------------------------------------------------------------------------------------
Application Milkyway@home N-Body Simulation 1.76 (mt)
Name de_nbody_06_08_2020_v176_40k__data__2_1588605902_541679
State Suspended - user request
Received 29/06/2020 23:53:31
Report deadline 11/07/2020 23:53:30
Resources 12 CPUs
Estimated computation size 11,205 GFLOPs
CPU time 11:28:46
CPU time since checkpoint 00:16:15
Elapsed time 09:00:28
Estimated time remaining 16:55:57
Fraction done 34.725%
Virtual memory size 16.23 MB
Working set size 940.00 KB
Directory slots/1
Process ID 1412
Progress rate 3.960% per hour
Executable milkyway_nbody_1.76_windows_x86_64__mt.exe
----------------------------------------------------------------------------------------------------------------------


It says 'suspended' that means your settings have the unit suspended for some reason, since your pc'sare hidden it's hard to tell but ideas could be because you have Boinc set to not run while you are using the pc, or because you are running some other Project at the same time. N-Body tasks use as many cpu cores as you have in the pc so trying to run another Project at the same time won't work very well.
ID: 69975 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
JAWAWA

Send message
Joined: 28 Jun 20
Posts: 2
Credit: 810,257
RAC: 1
500 thousand credit badge
Message 69976 - Posted: 5 Jul 2020, 1:37:24 UTC - in response to Message 69975.  

Hi, thank you for reply,
I suspend it by myself because I noticed the progress has been stuck in there for hours,
even I force it to run always. the elapsed time is ticking but the remaining time is extending with it.
Although I put all 12 cores to run the project, but the CPU time is only 50%,
so I still could surf websites or watch movies while the Boinc is running
I did the same in the first few days without any issues.

Anyway, now something just happened magically.
everything seems back to normal this morning, the progress is moving on.

I'm not sure what happened, but I abort all other No-Body simulation units on the queue list.
ID: 69976 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2408
Credit: 450,644,880
RAC: 27,936
300 million credit badge10 year member badgeextraordinary contributions badge
Message 69977 - Posted: 5 Jul 2020, 10:42:40 UTC - in response to Message 69976.  

Hi, thank you for reply,
I suspend it by myself because I noticed the progress has been stuck in there for hours,
even I force it to run always. the elapsed time is ticking but the remaining time is extending with it.
Although I put all 12 cores to run the project, but the CPU time is only 50%,
so I still could surf websites or watch movies while the Boinc is running
I did the same in the first few days without any issues.

Anyway, now something just happened magically.
everything seems back to normal this morning, the progress is moving on.

I'm not sure what happened, but I abort all other No-Body simulation units on the queue list.


Alof of us no longer run the N-Body cpu workunits from here because of those types of problems, they have plenty of gpu workunits that finish faster and don't seem to have any problems running. There is a 10 minutes wait period before you get new tasks but if you run a zero resource project with short workunits it works out just fine.
ID: 69977 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
grumpy

Send message
Joined: 14 Dec 07
Posts: 9
Credit: 3,469,124
RAC: 5,263
3 million credit badge10 year member badge
Message 70064 - Posted: 28 Aug 2020, 21:50:21 UTC

keep getting this error

<core_client_version>7.16.7</core_client_version>
<![CDATA[
<message>
Incorrect function.
(0x1) - exit code 1 (0x1)</message>
<stderr_txt>
<search_application> milkyway_separation 1.46 Windows x86 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 4 </number_WUs>
<number_params_per_WU> 26 </number_params_per_WU>
Using SSE4.1 path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using SSE4.1 path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using SSE4.1 path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using SSE4.1 path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
17:18:11 (956): called boinc_finish(1)

</stderr_txt>
]]>

AMD Ryzen 9 3950X 16-Core Processor [Family 23 Model 113 Stepping 0]
Coprocessors AMD AMD Radeon RX 5700 XT (8176MB) OpenCL: 2.0
ID: 70064 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2408
Credit: 450,644,880
RAC: 27,936
300 million credit badge10 year member badgeextraordinary contributions badge
Message 70067 - Posted: 29 Aug 2020, 2:44:37 UTC - in response to Message 69953.  

Allowing your computers to be visible at the project does open them up to any hacking. It only shows what they are running when your host contacts the project at each scheduler connection and your current tasks. Simply provide the URL link to the errored work after changing your project preferences for your computers to be visible. That setting has no bearing on your physical computers in your network. Whatever prevention you have in place for your network is not compromised.


I think you meant to say Allowing your computers to be visible at the project does NOT open them up to any hacking.
ID: 70067 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2408
Credit: 450,644,880
RAC: 27,936
300 million credit badge10 year member badgeextraordinary contributions badge
Message 70068 - Posted: 29 Aug 2020, 2:47:46 UTC - in response to Message 70064.  

keep getting this error

<core_client_version>7.16.7</core_client_version>
<![CDATA[
<message>
Incorrect function.
(0x1) - exit code 1 (0x1)</message>
<stderr_txt>
<search_application> milkyway_separation 1.46 Windows x86 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 4 </number_WUs>
<number_params_per_WU> 26 </number_params_per_WU>
Using SSE4.1 path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using SSE4.1 path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using SSE4.1 path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
Using SSE4.1 path
Error getting number of platform (-1001): CL_PLATFORM_NOT_FOUND_KHR
Failed to get information about device
Error getting device and context (1): MW_CL_ERROR
Failed to calculate likelihood
17:18:11 (956): called boinc_finish(1)

</stderr_txt>
]]>

AMD Ryzen 9 3950X 16-Core Processor [Family 23 Model 113 Stepping 0]
Coprocessors AMD AMD Radeon RX 5700 XT (8176MB) OpenCL: 2.0


You said THIS gpu has been crunching tasks here in the past? I did not know they had updated their database...hmmm. Either way check that your AMD drivers are still loaded and NOT the MS drivers that Windows 10 updates LOVES to use instead.
ID: 70068 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profilemikey
Avatar

Send message
Joined: 8 May 09
Posts: 2408
Credit: 450,644,880
RAC: 27,936
300 million credit badge10 year member badgeextraordinary contributions badge
Message 70069 - Posted: 29 Aug 2020, 2:50:46 UTC - in response to Message 69976.  

Hi, thank you for reply,
I suspend it by myself because I noticed the progress has been stuck in there for hours,
even I force it to run always. the elapsed time is ticking but the remaining time is extending with it.
Although I put all 12 cores to run the project, but the CPU time is only 50%,
so I still could surf websites or watch movies while the Boinc is running
I did the same in the first few days without any issues.

Anyway, now something just happened magically.
everything seems back to normal this morning, the progress is moving on.

I'm not sure what happened, but I abort all other No-Body simulation units on the queue list.


Cpu time being only 50% is the problem...the tasks take alot of memory and cpu time and you are doing other things with the pc so of coruse the n-body tasks are stopping waiting for free time and memory. Maybe try running them only when you are snoozing.
ID: 70069 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
grumpy

Send message
Joined: 14 Dec 07
Posts: 9
Credit: 3,469,124
RAC: 5,263
3 million credit badge10 year member badge
Message 70102 - Posted: 3 Sep 2020, 3:30:33 UTC - in response to Message 70068.  

Actually this is a new computer I never used it before on this project. It works on einstein@home. It's one those newer cards.
The error seems to point to an unknown hardware . So it's not supported... too bad! Wish they add it.
As far as the driver, I've installed the amd one.
ID: 70102 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Computation errors

©2020 Astroinformatics Group