Message boards :
Number crunching :
Problems downloading Stars and Volume files
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
what seems to be happening is that people are downloading a lot of work units, so for some work units they've already been processed and removed from the server by the time a user gets to them in their queue. i'm looking into a way to fix this so there won't be stuck downloads. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
i've updated our server to limit the amount of in progress work units a client has, this should help fix the problem with people not being able to download files. |
Send message Joined: 7 Nov 07 Posts: 13 Credit: 20,143,762 RAC: 0 |
Hmmm... how about making the WU's 50x longer, that would sure save a lot of upload/download on multi-core machines, I think one of mine went thru four or five hundred wu's yesterday (counting the ones that choked on download). I'm sure this must be stressing your servers as well? |
Send message Joined: 8 Oct 07 Posts: 24 Credit: 111,325 RAC: 0 |
Happy to report that I have had no stuck download problems with the latest bunch of wus (under windows) which arrive in pairs which I presume, from the "Reached host limit of 2" message, is by design. Seeing a checkpoint message, I was even brave enough to exit and restart Boinc to test that, and checkpointing works OK. (Big issue on some other projects with much longer wus.) Running Ubuntu on a virtual machine gives "unrecoverable error" for every wu even before it gets a chance to start so I've set no new work on that one for now. |
Send message Joined: 29 Sep 07 Posts: 18 Credit: 4,533,464 RAC: 0 |
I got an download error - Linux 32b with Boinc 5.10.28 Sa 10 Nov 2007 15:09:33 CET|Milkyway@home|Sending scheduler request: To fetch work. Requesting 173 seconds of work, reporting 0 completed tasks Sa 10 Nov 2007 15:09:38 CET|Milkyway@home|Scheduler request succeeded: got 1 new tasks Sa 10 Nov 2007 15:09:40 CET|Milkyway@home|Started download of astronomy_1.07_i686-pc-linux-gnu Sa 10 Nov 2007 15:09:40 CET|Milkyway@home|Started download of parameters_generated_1194582642_44170 Sa 10 Nov 2007 15:09:41 CET|Milkyway@home|Giving up on download of parameters_generated_1194582642_44170: file not found Sa 10 Nov 2007 15:09:41 CET|Milkyway@home|Started download of stars.txt Sa 10 Nov 2007 15:09:43 CET|Milkyway@home|Finished download of astronomy_1.07_i686-pc-linux-gnu Sa 10 Nov 2007 15:09:43 CET|Milkyway@home|Giving up on download of stars.txt: file not found Sa 10 Nov 2007 15:09:43 CET|Milkyway@home|Started download of volume.txt Sa 10 Nov 2007 15:09:45 CET|Milkyway@home|Giving up on download of volume.txt: file not found Edit: after that i got only the message: Sa 10 Nov 2007 15:33:32 CET|Milkyway@home|Sending scheduler request: To fetch work. Requesting 173 seconds of work, reporting 0 completed tasks Sa 10 Nov 2007 15:33:37 CET|Milkyway@home|Scheduler request succeeded: got 0 new tasks Matthias |
Send message Joined: 29 Sep 07 Posts: 18 Credit: 4,533,464 RAC: 0 |
One more download error with my Linux machine Sa 10 Nov 2007 16:06:29 CET|Milkyway@home|Sending scheduler request: To fetch work. Requesting 174 seconds of work, reporting 0 completed tasks Sa 10 Nov 2007 16:06:34 CET|Milkyway@home|Scheduler request succeeded: got 1 new tasks Sa 10 Nov 2007 16:06:36 CET|Milkyway@home|Started download of parameters_generated_1194740565_7292 Sa 10 Nov 2007 16:06:36 CET|Milkyway@home|Started download of stars.txt Sa 10 Nov 2007 16:06:38 CET||Project communication failed: attempting access to reference site Sa 10 Nov 2007 16:06:38 CET|Milkyway@home|Finished download of parameters_generated_1194740565_7292 Sa 10 Nov 2007 16:06:38 CET|Milkyway@home|Giving up on download of stars.txt: file not found Sa 10 Nov 2007 16:06:38 CET|Milkyway@home|Started download of volume.txt Sa 10 Nov 2007 16:06:41 CET||Access to reference site succeeded - project servers may be temporarily down. Sa 10 Nov 2007 16:06:41 CET|Milkyway@home|Giving up on download of volume.txt: file not found This time the download of the parameters_generated file was ok after that again only this message: Sa 10 Nov 2007 16:07:39 CET|Milkyway@home|Sending scheduler request: To report completed tasks. Requesting 432 seconds of work, reporting 1 completed tasks Sa 10 Nov 2007 16:07:44 CET|Milkyway@home|Scheduler request succeeded: got 0 new tasks Matthias |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
i think the problem with these is the test we were running finished so we stopped generating new work. there should be more work up and available soon. |
Send message Joined: 29 Aug 07 Posts: 115 Credit: 502,662,458 RAC: 1,621 |
i think the problem with these is the test we were running finished so we stopped generating new work. there should be more work up and available soon. Just FYI, even after the limit applied to the number of WUs a machine can get at a time, the downloads are getting stuck. When I checked my machines this morning over 10 of them had this problem. Edit: all the stuck files were the parameters* files. None were stuck trying to download stars or volume. |
Send message Joined: 8 Oct 07 Posts: 52 Credit: 5,923,986 RAC: 5,331 |
Edit: all the stuck files were the parameters* files. None were stuck trying to download stars or volume. I've gotten several sets of "stuck" parameter files. And download "retries" always fail immediately. The only way around the problem I've found is to "abort" them. |
Send message Joined: 9 Nov 07 Posts: 131 Credit: 180,454 RAC: 0 |
Edit: all the stuck files were the parameters* files. None were stuck trying to download stars or volume. Ditto for me, I just suspended the project temporarily until the problem is identified. I even detached and reattached to see if that would help with no luck. CLICK TO HELP BUILD |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Edit: all the stuck files were the parameters* files. None were stuck trying to download stars or volume. some of the parameter sets from the first run got messed up. the best thing to do is just abort them. with the increased work unit time, and the server limiting queues i think the problem should be fixed for future searches. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
i've been trying to find a way to either manually restore these work units server side so they can be downloaded without people having to abort them, or something else along those lines but no luck so far. |
Send message Joined: 3 Oct 07 Posts: 11 Credit: 456,000 RAC: 1,530 |
I download a stars but volume is still not find . |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I download a stars but volume is still not find . I think i'm going to move to boinc trying to download the volume file from a fixed URL as opposed to specifying a file in the work unit. i think this should help as that file shouldn't ever be deleted. |
Send message Joined: 3 Oct 07 Posts: 11 Credit: 456,000 RAC: 1,530 |
little confused , work should be working without volume file ? because me job are running ;] |
Send message Joined: 29 Aug 07 Posts: 3 Credit: 16,603,856 RAC: 0 |
Installed 5.10.8 on AMD64, RHEL5, 2.6.18-8.1.15.el5 I can see the files downloading into the folder ~/BOINC/projects/milkyway.cs.rpi.edu_milkyway astronomy_1.07_i686-pc-linux-gnu parameters_generated_1194765844_83 parameters_generated_1194773034_362 parameters_generated_1194773041_456 parameters_generated_1194802758_61 parameters_generated_1194802761_94 parameters_generated_1194804157_264 parameters_generated_1194804159_288 parameters_generated_1194804163_343 stars.txt volume2.txt But the BOINC Manager says all of the downloads have failed. Some of the files created have a zero file size including stars.txt The message log shows the download is successful. So it looks like they were sent from the server that way. |
Send message Joined: 29 Aug 07 Posts: 3 Credit: 16,603,856 RAC: 0 |
After about 15 attempts I just got a complete stars.txt file and now I'm crunching something. |
Send message Joined: 28 Aug 07 Posts: 52 Credit: 8,353,747 RAC: 0 |
If anyone has some issues to download missing files go here: http://milkyway.cs.rpi.edu/milkyway/download/ |
©2024 Astroinformatics Group