Message boards :
News :
Scheduled Server Maintenance Concluded
Message board moderation
Previous · 1 · 2 · 3
Author | Message |
---|---|
Send message Joined: 28 Nov 14 Posts: 51 Credit: 86,696,721 RAC: 0 |
Be nice if the server didn't keep on sending last gen WU again and again. As resends that is... I've started aborting any I see, but I'm just monitoring 1 rig with a quick look at the others now and then. Regards, Cliff. -- Been there Done That, still no Damn T-Shirt |
Send message Joined: 1 Apr 10 Posts: 49 Credit: 171,863,025 RAC: 0 |
I have just been aborting them. No big deal!! Regards John |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
I have just been aborting them. No big deal!! Indeed, people get too worked up about these things. Run more than one project, allow your computer to not always be running, whatever. It's hardly the end of the world if you're not calculating something 24/7. The project is being updated, some hiccups will arise, get used to it. |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
I have just been aborting them. No big deal!! How do you abort a task on an unattended machine? |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
I have just been aborting them. No big deal!! They abort themselves. |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
They abort themselves. Ah, more of your erroneous assumptions. They don't abort themselves. They end in 'Computational error' (1 (0x1) Unknown error number). Which is completely different from 'Aborted' (203 (0xcb) EXIT_ABORTED_VIA_GUI). The important difference being that computation errors defer communication with the server and aborted tasks don't. So indeed, it's preferable to abort them, but that's not possible on unattended machines and I certainly don't have the whole day to abort deprecated Milkyway tasks. Frankly, I am tired of correcting you all the time. Can't you research anything before you post? |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
They abort themselves. I don't know what your problem is, but my computers work just fine. If things end in errors, they stop all by themselves. Then BOINC finds something else to do. Stop being so bloody rude I'm getting sick of your arrogance. |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
I don't know what your problem is, but my computers work just fine. So you haven't noticed yet this is the thread about the problems which have appeared after latest Scheduled Server Maintenance? It's in the title of the thread, in case you have missed it. Computational errors mentioned here so often are the result of the latest Milkyway@home upgrades - Jake himself kindly invited us to provide some feedback on that. And your hosts are also producing such errors, check here for example: https://milkyway.cs.rpi.edu/milkyway/results.php?hostid=686691&offset=0&show_names=0&state=6&appid So, you see the problem now? You are trying to provide some advice, but you don't even know what you are talking about (not even your own hosts). You should be grateful for this education, not angry. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
I don't know what your problem is, but my computers work just fine. I've got 194 valid and only 17 in error, hardly cause for concern. It's not preventing me getting any more from the server, and if it did it would switch to Einstein, Universe, or SETI without my intervention. And I have NEVER had a blue screen from any project, apart from a computer with dodgy RAM - I used to build them and used BOINC to do a 3 day burn in test. Any errors or excessively high temperatures and I checked the hardware. It's you that keeps being abusive, I'm a calm person. |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
I've got 194 valid and only 17 in error, hardly cause for concern. It's not preventing me getting any more from the server OK. But, to explain it once again, with multiple GPUs and higher output, you would get so many computational errors that it would prevent you from getting more tasks from the server. I guess it's the mechanism to prevent faulty machines from thrashing thousands of workunits. But in this case computational errors are not happening because of faulty hardware, but because of recent switch to Milkyway version 1.46, deprecating thousands of older tasks which cannot be computed successfully with newer application. In normal circumstances, 17 computational errors with 194 valids, that would be a 8% failure rate - very high and a strong indication of some hardware problems. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
I've got 194 valid and only 17 in error, hardly cause for concern. It's not preventing me getting any more from the server Is the server block not done on a percentage? If my 17 to 194 isn't triggering it, why should 170 to 1940? Actually I can't be sure it hasn't triggered it on mine - is that recorded in my results data somewhere? I might not notice as Milkyway is set to 25% share at the moment, all I'd see is it using the other three projects. |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
Is the server block not done on a percentage? If my 17 to 194 isn't triggering it, why should 170 to 1940? That's a good question. Jake will know this for sure, but I believe the trigger is set on absolute numbers, not on a percentage. Because 1% error-rate on a machine with 8 powerful GPUs is still literally thousands of thrashed tasks, while on a machine with just one low-range GPU, 1% is a couple of tasks only. It makes more sense to use absolute numbers to protect the network from faulty machines. Actually I can't be sure it hasn't triggered it on mine - is that recorded in my results data somewhere? I think such events aren't shown in the Event Log, therefore the only way to spot it is to check your hosts last contact time through your Milkyway account. If your host hasn't contacted the server for hours or more, communication might have been deferred due to computational errors. Or, if the machine is attended, check it through your BOINC Manager - communication deferred timer is always shown on the Projects tab (under Status column). |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
That's a good question. Jake will know this for sure, but I believe the trigger is set on absolute numbers, not on a percentage. Because 1% error-rate on a machine with 8 powerful GPUs is still literally thousands of thrashed tasks, while on a machine with just one low-range GPU, 1% is a couple of tasks only. It makes more sense to use absolute numbers to protect the network from faulty machines. I disagree. Let's say I have 1 GPU, and you have 8 of the same model GPU in one machine. I give back 2 faulty tasks and 98 good ones. You give 16 faulty tasks and 784 good ones. We both have a 2% failure rate, and we're both giving 98% useful results. Why should your computer be treated any differently by the server? Yes, 8 times as many faulty tasks, but you're doing 8 times as many good ones too. Now if you had 8 single GPU machines, and one of them was giving all the faulty tasks, and the other 7 weren't, then I'd say the server should block your one computer. I think such events aren't shown in the Event Log, therefore the only way to spot it is to check your hosts last contact time through your Milkyway account. If your host hasn't contacted the server for hours or more, communication might have been deferred due to computational errors. Or, if the machine is attended, check it through your BOINC Manager - communication deferred timer is always shown on the Projects tab (under Status column). Can't really tell, as it flips between 4 projects. I can't see a notice in the messages saying "Message from Server - deferred for 1 day for faulty WUs" or similar. |
Send message Joined: 28 Mar 09 Posts: 9 Credit: 16,162,511 RAC: 0 |
Hi Jake, this started this morning: <search_application> milkyway_separation 1.46 Windows x86 double OpenCL </search_application> Reading preferences ended prematurely BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.' Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4' Switching to Parameter File 'astronomy_parameters.txt' <number_WUs> 5 </number_WUs> <number_params_per_WU> 21 </number_params_per_WU> Number of parameters doesn't make sense The tasks with the four you mentioned are all failing while those not from those runs are completing. All four types of tasks do not like the number_params_per_WU set to 21, those tasks that have this parameter set to 20, run ok. |
Send message Joined: 20 Mar 08 Posts: 108 Credit: 2,607,924,860 RAC: 0 |
Jake, what are these being generated from scratch (not quota-fill) today? de_modfit_19_3s_146_bundle5_ModfitConstraintsWithDisk_1... They all fail, and the names don't fit neither your list nor the old run. |
Send message Joined: 22 Apr 09 Posts: 95 Credit: 4,808,181,963 RAC: 0 |
I can't see a notice in the messages saying "Message from Server - deferred for 1 day for faulty WUs" or similar. Well, there are so many erroneous tasks now (apparently, a new batch that is failing has arrived) you will almost surely get a 24hrs deferrment too. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
I can't see a notice in the messages saying "Message from Server - deferred for 1 day for faulty WUs" or similar. Yes I did notice a load of failures sitting on 3 of my machines this morning. I guess failures help them to fix programming problems, and they don't use much computing time. Having 4 projects helps with errors and server downtime etc. I've never had a computer sat idle. Presumably the new application will be more efficient or something, so it's worth experimenting with it. |
Send message Joined: 20 Jun 10 Posts: 5 Credit: 744,974,914 RAC: 0 |
I can't see a notice in the messages saying "Message from Server - deferred for 1 day for faulty WUs" or similar. Same problem here! |
Send message Joined: 10 Feb 09 Posts: 52 Credit: 16,291,993 RAC: 12 |
A LOT of wus are ok Some wus crashes after 2 seconds: Reading preferences ended prematurely Strange. |
Send message Joined: 5 Jul 11 Posts: 990 Credit: 376,143,149 RAC: 0 |
A LOT of wus are ok I'm getting less and less. I think the problem is fixed, we're just using up the broken ones. My main fast computer is getting them all succeeding now. Only one of the slower computers which had some queued up is failing a few. |
©2024 Astroinformatics Group