Message boards :
News :
Scheduled Server Maintenance Concluded
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
Half of my 140 units are working fine. It's no big deal if they fail after 2 seconds, it's only wasted 2 seconds of my computer's time. With multi-GPU setups, there will be so many failed tasks that the client tries to "protect" the network by postponing the communication with the server (the idea is to stop faulty machines from thrashing thousands of workunits, but it has backfired on us in this case). |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Not happening with me, only 50% of the CPU tasks are failing. So it always has plenty to do until the next update. Most of the GPU ones I'm getting are 146 which work fine. Just checked - 3 of the last 31 GPU tasks were 140 and failed. 28 were 146 and all worked. |
Send message Joined: 25 Feb 13 Posts: 580 Credit: 94,200,158 RAC: 0 |
GIPICS, I've had issues using the WU cancelling feature in the past. Especially when you try to cancel 100,000+ workunits. I let the server queue run for a full day before putting up the new runs. This shrank the queue drastically. I expect they should all be cleared out soon. Everyone, Sorry for the inconvenience caused as the work units clear the queue. I tried to mitigate the issue beforehand since I knew it would be a problem. Luckily, the new application seems to be working as intended so everything should be running well once the work units are cleared. Jake |
Send message Joined: 25 Nov 16 Posts: 2 Credit: 3,962,420 RAC: 0 |
I have been unable to download any work units today and am totally out of MilkyWay@Homework units at this point. I am running a iMac with MacOS 10.12.4. My other BOINC projects are operating fine. Do I need to update something? |
Send message Joined: 30 Mar 09 Posts: 63 Credit: 621,582,726 RAC: 0 |
I have been unable to download any work units today and am totally out of MilkyWay@Homework units at this point. I am running a iMac with MacOS 10.12.4. My other BOINC projects are operating fine. Do I need to update something? As you can see here; http://milkyway.cs.rpi.edu/milkyway/apps.php there is no new application for the Mac yet. http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4129 |
Send message Joined: 24 Apr 17 Posts: 8 Credit: 77,149,813 RAC: 0 |
Is still a massacre. A lot of faulty wus Unattended hosts get deferred communication, postponed on 24h this is a drama... |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Why are people getting so upset? My computers are working fine (Windows 10, Intel CPUs, AMD graphics). If you don't have enough work, sign up for a backup project. |
Send message Joined: 24 Apr 17 Posts: 8 Credit: 77,149,813 RAC: 0 |
Because unattended hosts get sometimes BSOD or more often just stuck for 24h waiting for countdown's end before to do a new update for wus request. But is not about getting upset, is about wasting time.. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
I've never had a BSOD from a broken WU, there must be something else wrong with your machine. And what is this 24 h wait? My computers ask for more WU whenever they need it. |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
I've never had a BSOD from a broken WU, there must be something else wrong with your machine. BSODs happen, with or without BOINC, even on perfectly fine machines. Your computers have about 1000 tasks all together. But a single PC with 3 or 4 powerful GPUs will have +30k tasks and many of them will end in computational errors, because they are incompatible with the new app. With so many computational errors, communication with the server gets deferred. It's by design, because servers normally don't expect so many thrashed workunits, to put it simple. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
If you're getting a BSOD, you should run memtest on a triple scan, most likely you have faulty memory. I never ever get a BSOD doing anything. Try increasing your queue size or adding a backup project on priority 0 (it will only run if you run out of Milkyway tasks). |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
If you're getting a BSOD, you should run memtest on a triple scan, most likely you have faulty memory. I never ever get a BSOD doing anything. Things are not that simple and you are not even trying to comprehend the problem here. Milkyway allows only 80 tasks per GPU, so it's impossible to have a large queue (as you have suggested). 80 tasks aren't enough for even a full hour with a Tahiti GPU, therefore regular communication with the server is extremely important, to be able to obtain new tasks often enough. Of course, I have backup projects. But Tahiti GPUs are strong only in FP64 nowadays, therefore they excel only in Milkyway. My Gridcoin magnitude and rewards are decreased whenever backup projects kick in. It's simply inefficient to use Tahitis for FP32 projects today and only PrimeGrid is a viable FP64 alternative, however only for very long WUs which take days to crunch. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
What's the time limit for re-contacting the Milkyway server? I've seen my machines contact them far more often than once an hour. And it's only a small proportion that fail. Anyway, just be patient as this is only a temporary problem! And if we're only getting 80 tasks per GPU, this would seem to indicate there isn't that much work to be done on this project. I really don't understand why some people need to have their machines running flat out 24/7. If it runs out of work to do sometimes, so be it. Get another project, or not. As for your BSOD, seriously you shouldn't get that no matter what any program does wrong. Only bad memory (or sometimes a bad graphics card) can cause a BSOD. What is the code on the BSOD? |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
What's the time limit for re-contacting the Milkyway server? I've seen my machines contact them far more often than once an hour. And it's only a small proportion that fail. Anyway, just be patient as this is only a temporary problem! You still don't get it, just repeating the same irrelevant stuff over and over again. Normally, communication is deferred for 90 seconds after every Update. But, to explain it for the fourth time, plenty of computation errors (which we are getting now due to new app version) are deferring communication for up to 24hrs. I hope it's clear now? As for BSODs, I am not getting any, that's just one of your superficial assumptions. And even if I did, I doubt I would ask you for help. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Sorry I didn't notice you said there's a longer deferral for errors. But I'm hardly getting any errors, only a small amount of the older WUs fail, and the majority of those are CPU tasks, the GPU runs fine for 90% of tasks. If you're getting a lot more errors, perhaps there's a fault with your setup? Is it overclocked? Are you using significantly different GPUs to me which show up a bug in the new application? You said earlier "BSODs happen, with or without BOINC, even on perfectly fine machines." I was correcting you. BSOD is a hardware error. |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
You said earlier "BSODs happen, with or without BOINC, even on perfectly fine machines." I was correcting you. BSOD is a hardware error. https://support.microsoft.com/en-us/help/17074/windows-7-resolving-stop-blue-screen-errors "Stop errors (also sometimes called blue screen or black screen errors) can occur if a serious problem causes Windows 7 to shut down or restart unexpectedly. These errors can be caused by both hardware and software issues." |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
Not my experience as a computer tech since 1997. I've stopped them all by replacing faulty memory, or the odd one had a dodgy GPU. Almost every software error will be caught by Windows (since about version 2000). |
Send message Joined: 30 Mar 09 Posts: 63 Credit: 621,582,726 RAC: 0 |
About 'how much errors' appears on one machine; State: All (3620) · In progress (322) · Validation pending (0) · Validation inconclusive (439) · Valid (2639) · Invalid (13) · Error (207) The errors here were all 1 second errors, same as all the wingmen! Notice the 'in progress (322)'; In fact this is a 2 GPU machine, with only one GPU crunching for Milkyway! 80/GPU...........not in my case luckely! |
Send message Joined: 13 Oct 16 Posts: 112 Credit: 1,174,293,644 RAC: 0 |
Not my experience as a computer tech since 1997. I've stopped them all by replacing faulty memory, or the odd one had a dodgy GPU. Almost every software error will be caught by Windows (since about version 2000). lol, you don't run enough WUs in a single day to even make good observation anyways. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
lol, you don't run enough WUs in a single day to even make good observation anyways. I run three other projects aswell, and Milkyway isn't the highest priority. Anyway, I currently pay for the electricity for the crunching, so I don't do as much as I used to. I did at one point have about 10 of the latest GPUs running 24/7. About 'how much errors' appears on one machine; So not that many as a percentage. Nothing to worry about. |
©2024 Astroinformatics Group