Welcome to MilkyWay@home

download failures on GPU tasks

Questions and Answers : Unix/Linux : download failures on GPU tasks
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile rfzl

Send message
Joined: 17 Nov 09
Posts: 4
Credit: 111,166,377
RAC: 0
Message 68351 - Posted: 22 Mar 2019, 17:32:13 UTC

I built out a new Mint 19.1 rig with AMD processor and Nvidia 1030 GPU. All was well until the new servers started back up. Everything seems in order but I now get download failures for GPU jobs. Here is what the logs show:

Fri 22 Mar 2019 12:18:39 PM CDT | Milkyway@Home | Started download of milkyway_1.46_x86_64-pc-linux-gnu__opencl_nvidia_101
Fri 22 Mar 2019 12:20:41 PM CDT | | Project communication failed: attempting access to reference site
Fri 22 Mar 2019 12:20:41 PM CDT | Milkyway@Home | Temporarily failed download of milkyway_1.46_x86_64-pc-linux-gnu__opencl_nvidia_101: transient HTTP error
Fri 22 Mar 2019 12:20:41 PM CDT | Milkyway@Home | Backing off 00:16:09 on download of milkyway_1.46_x86_64-pc-linux-gnu__opencl_nvidia_101
Fri 22 Mar 2019 12:20:43 PM CDT | | Internet access OK - project servers may be temporarily down.

It seems to show that the issue is on the server side but I am not getting these errors on my other Mint machine, but it is running Mint 18. Any suggestions on what the issues may be or where to look? There are no related errors in the syslogs.

A back story on this machine is that I had issues on the initial install due to a sour SSD drive and bad sectors. It seems to be resolved. I also played around with some of the extra library packages specific to Boinc. I may have inadvertently created a conflict. I have removed everything except the basic Boinc install (manager and client)

Thanks, Ron
ID: 68351 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gunnar Hjern

Send message
Joined: 14 Oct 16
Posts: 4
Credit: 25,072,475
RAC: 0
Message 68355 - Posted: 23 Mar 2019, 1:27:37 UTC - in response to Message 68351.  

I'm also having problem starting up some new machines (not running m@h before):
I get a lot of tasks, but they all are listed as "downloading" - apparently waiting for one single file:
milkyway_1.46_x86_64-pc-linux-gnu
that just refuses to get downloaded.
It stands waiting for hours and hours, still at 0%.
It just has to be some server issue.
Hope the admins can attend it soon, so I can get my 5 new machines to run some m@h soon.
//Gunnar
ID: 68355 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
bluestang

Send message
Joined: 13 Oct 16
Posts: 112
Credit: 1,174,293,644
RAC: 0
Message 68356 - Posted: 23 Mar 2019, 3:07:24 UTC

In another thread someone mentioned they went to the "https://milkyway.cs.rpi.edu/milkyway/download/" folder and manually downloading and placing in the MilkyWay folder on your machine.
ID: 68356 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gunnar Hjern

Send message
Joined: 14 Oct 16
Posts: 4
Credit: 25,072,475
RAC: 0
Message 68372 - Posted: 23 Mar 2019, 23:31:39 UTC - in response to Message 68356.  

Yes, I've read it, and I actually also successfully tested it on one of my computers:
After having downloaded, and chmod:ed, and placed the file in the correct folder, it started by itself after a while.
However, this is certainly not the correct procedure to solve the problem!
I still wonder what stopped Boinc from downloading it in the first place?
Is it some kind of certificate problem with the https-server?
//Gunnar
ID: 68372 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile rfzl

Send message
Joined: 17 Nov 09
Posts: 4
Credit: 111,166,377
RAC: 0
Message 68386 - Posted: 25 Mar 2019, 2:58:24 UTC

Just a followup to my original post.

I decided to stop all Boinc tasks and do a complete rebuild of the problem host. I re-formatted the SSD drive and loaded a brand new image of Mint 19.1 Cinnamon. Updated the Nvidia driver and installed only the base Boinc client and manager. I also did reboots between each component install to be sure there were no conflicts or issues. Once the apps were installed, I added some of the projects I support and let Boinc run. As before, I got a download error from Milkyway but none of the other projects.

I did go ahead and try the manual download of the offending project file as mentioned in the previous post. This did correct the initial problem and I am now processing Milkyway tasks. So, for now, I am up and running on my new rig.

As Mint 19 is a new long term release and has some changes from earlier versions, I can only conjecture that the issue has something to do with the combination of Mint 19.1 and the new servers on Milkyway. I also suspect that other combinations of a new install of a client OS may not suffer from the same issues. Until someone can do proper testing or can fix the issue on Milkyway's end, I'll just do what tasks I can and wait.
ID: 68386 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Gunnar Hjern

Send message
Joined: 14 Oct 16
Posts: 4
Credit: 25,072,475
RAC: 0
Message 68389 - Posted: 25 Mar 2019, 9:36:29 UTC - in response to Message 68386.  

You do not need to worry about just Linux Mint being a noteworthy part of the problem - I'm having the same problem with Xubuntu 14.04, 16.04.06, and also 18.04.02!
I suspect it has something to do with the servers, or maybe that the servers using some combination of https and other settings that makes the client unable to download.
I don't know if the project admins are viewing these forums daily or if we should PM them?
//Gunnar
ID: 68389 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile rfzl

Send message
Joined: 17 Nov 09
Posts: 4
Credit: 111,166,377
RAC: 0
Message 68391 - Posted: 26 Mar 2019, 3:13:14 UTC - in response to Message 68389.  

I have to agree with your assessment. I found a similar error on my Win10 machine this morning. It appears that when a new task is being requested/sent, and it requires a new "config" file it fails. As this all seemed to have started with the new servers, that would seem to be where the error is. There is a note that the admins are working on the work units but nothing regarding this kind of failure. It seems likely that this issue is only with new clients that have not been running prior to the server upgrade. My old client is working like a champ.

As for messaging the admin, that seems like a good idea. I have not done so as I usually am late to the game on such things so I become redundant.

Ron
ID: 68391 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : download failures on GPU tasks

©2024 Astroinformatics Group