Posts by Peter Hucker
log in
1) Message boards : Number crunching : Will n-body ever end? 1.64 (Message 66607)
Posted 15 Sep 2017 by Peter Hucker
They've always worked fine for me (on 4 completely different computers), although I may have started doing Milkyway after you experienced the problem. Maybe they fixed a bug?
2) Message boards : News : Scheduled Server Maintenance Concluded (Message 66429)
Posted 11 May 2017 by Peter Hucker
A LOT of wus are ok
Some wus crashes after 2 seconds:
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
malloc failed: 2969687792 bytes
20:35:15 (7596): called boinc_finish(1)


Strange.


I'm getting less and less. I think the problem is fixed, we're just using up the broken ones. My main fast computer is getting them all succeeding now. Only one of the slower computers which had some queued up is failing a few.
3) Message boards : News : Scheduled Server Maintenance Concluded (Message 66417)
Posted 9 May 2017 by Peter Hucker
I can't see a notice in the messages saying "Message from Server - deferred for 1 day for faulty WUs" or similar.

Well, there are so many erroneous tasks now (apparently, a new batch that is failing has arrived) you will almost surely get a 24hrs deferrment too.


Yes I did notice a load of failures sitting on 3 of my machines this morning. I guess failures help them to fix programming problems, and they don't use much computing time. Having 4 projects helps with errors and server downtime etc. I've never had a computer sat idle. Presumably the new application will be more efficient or something, so it's worth experimenting with it.
4) Message boards : News : Scheduled Server Maintenance Concluded (Message 66408)
Posted 8 May 2017 by Peter Hucker
That's a good question. Jake will know this for sure, but I believe the trigger is set on absolute numbers, not on a percentage. Because 1% error-rate on a machine with 8 powerful GPUs is still literally thousands of thrashed tasks, while on a machine with just one low-range GPU, 1% is a couple of tasks only. It makes more sense to use absolute numbers to protect the network from faulty machines.


I disagree. Let's say I have 1 GPU, and you have 8 of the same model GPU in one machine. I give back 2 faulty tasks and 98 good ones. You give 16 faulty tasks and 784 good ones. We both have a 2% failure rate, and we're both giving 98% useful results. Why should your computer be treated any differently by the server? Yes, 8 times as many faulty tasks, but you're doing 8 times as many good ones too. Now if you had 8 single GPU machines, and one of them was giving all the faulty tasks, and the other 7 weren't, then I'd say the server should block your one computer.

I think such events aren't shown in the Event Log, therefore the only way to spot it is to check your hosts last contact time through your Milkyway account. If your host hasn't contacted the server for hours or more, communication might have been deferred due to computational errors. Or, if the machine is attended, check it through your BOINC Manager - communication deferred timer is always shown on the Projects tab (under Status column).


Can't really tell, as it flips between 4 projects. I can't see a notice in the messages saying "Message from Server - deferred for 1 day for faulty WUs" or similar.
5) Message boards : News : Scheduled Server Maintenance Concluded (Message 66406)
Posted 8 May 2017 by Peter Hucker
I've got 194 valid and only 17 in error, hardly cause for concern. It's not preventing me getting any more from the server

OK. But, to explain it once again, with multiple GPUs and higher output, you would get so many computational errors that it would prevent you from getting more tasks from the server.

I guess it's the mechanism to prevent faulty machines from thrashing thousands of workunits. But in this case computational errors are not happening because of faulty hardware, but because of recent switch to Milkyway version 1.46, deprecating thousands of older tasks which cannot be computed successfully with newer application. In normal circumstances, 17 computational errors with 194 valids, that would be a 8% failure rate - very high and a strong indication of some hardware problems.


Is the server block not done on a percentage? If my 17 to 194 isn't triggering it, why should 170 to 1940?

Actually I can't be sure it hasn't triggered it on mine - is that recorded in my results data somewhere? I might not notice as Milkyway is set to 25% share at the moment, all I'd see is it using the other three projects.
6) Message boards : News : Scheduled Server Maintenance Concluded (Message 66404)
Posted 8 May 2017 by Peter Hucker
I don't know what your problem is, but my computers work just fine.

So you haven't noticed yet this is the thread about the problems which have appeared after latest Scheduled Server Maintenance? It's in the title of the thread, in case you have missed it. Computational errors mentioned here so often are the result of the latest Milkyway@home upgrades - Jake himself kindly invited us to provide some feedback on that. And your hosts are also producing such errors, check here for example:

https://milkyway.cs.rpi.edu/milkyway/results.php?hostid=686691&offset=0&show_names=0&state=6&appid

So, you see the problem now? You are trying to provide some advice, but you don't even know what you are talking about (not even your own hosts). You should be grateful for this education, not angry.


I've got 194 valid and only 17 in error, hardly cause for concern. It's not preventing me getting any more from the server, and if it did it would switch to Einstein, Universe, or SETI without my intervention.

And I have NEVER had a blue screen from any project, apart from a computer with dodgy RAM - I used to build them and used BOINC to do a 3 day burn in test. Any errors or excessively high temperatures and I checked the hardware.

It's you that keeps being abusive, I'm a calm person.
7) Message boards : News : Scheduled Server Maintenance Concluded (Message 66399)
Posted 7 May 2017 by Peter Hucker
They abort themselves.

Ah, more of your erroneous assumptions. They don't abort themselves. They end in 'Computational error' (1 (0x1) Unknown error number). Which is completely different from 'Aborted' (203 (0xcb) EXIT_ABORTED_VIA_GUI). The important difference being that computation errors defer communication with the server and aborted tasks don't. So indeed, it's preferable to abort them, but that's not possible on unattended machines and I certainly don't have the whole day to abort deprecated Milkyway tasks.

Frankly, I am tired of correcting you all the time. Can't you research anything before you post?


I don't know what your problem is, but my computers work just fine. If things end in errors, they stop all by themselves. Then BOINC finds something else to do. Stop being so bloody rude I'm getting sick of your arrogance.
8) Message boards : News : Scheduled Server Maintenance Concluded (Message 66397)
Posted 7 May 2017 by Peter Hucker
I have just been aborting them. No big deal!!

How do you abort a task on an unattended machine?


They abort themselves.
9) Message boards : News : Scheduled Server Maintenance Concluded (Message 66393)
Posted 7 May 2017 by Peter Hucker
I have just been aborting them. No big deal!!

Regards

John


Indeed, people get too worked up about these things. Run more than one project, allow your computer to not always be running, whatever. It's hardly the end of the world if you're not calculating something 24/7. The project is being updated, some hiccups will arise, get used to it.
10) Message boards : News : Scheduled Server Maintenance Concluded (Message 66386)
Posted 5 May 2017 by Peter Hucker
lol, you don't run enough WUs in a single day to even make good observation anyways.


I run three other projects aswell, and Milkyway isn't the highest priority. Anyway, I currently pay for the electricity for the crunching, so I don't do as much as I used to. I did at one point have about 10 of the latest GPUs running 24/7.

About 'how much errors' appears on one machine;
State: All (3620) · In progress (322) · Validation pending (0) · Validation inconclusive (439) · Valid (2639) · Invalid (13) · Error (207)

The errors here were all 1 second errors, same as all the wingmen!

Notice the 'in progress (322)';
In fact this is a 2 GPU machine, with only one GPU crunching for Milkyway!
80/GPU...........not in my case luckely!


So not that many as a percentage. Nothing to worry about.
11) Message boards : News : Scheduled Server Maintenance Concluded (Message 66383)
Posted 5 May 2017 by Peter Hucker
Not my experience as a computer tech since 1997. I've stopped them all by replacing faulty memory, or the odd one had a dodgy GPU. Almost every software error will be caught by Windows (since about version 2000).
12) Message boards : News : Scheduled Server Maintenance Concluded (Message 66381)
Posted 5 May 2017 by Peter Hucker
Sorry I didn't notice you said there's a longer deferral for errors. But I'm hardly getting any errors, only a small amount of the older WUs fail, and the majority of those are CPU tasks, the GPU runs fine for 90% of tasks. If you're getting a lot more errors, perhaps there's a fault with your setup? Is it overclocked? Are you using significantly different GPUs to me which show up a bug in the new application?

You said earlier "BSODs happen, with or without BOINC, even on perfectly fine machines." I was correcting you. BSOD is a hardware error.
13) Message boards : News : Scheduled Server Maintenance Concluded (Message 66379)
Posted 5 May 2017 by Peter Hucker
What's the time limit for re-contacting the Milkyway server? I've seen my machines contact them far more often than once an hour. And it's only a small proportion that fail. Anyway, just be patient as this is only a temporary problem!

And if we're only getting 80 tasks per GPU, this would seem to indicate there isn't that much work to be done on this project.

I really don't understand why some people need to have their machines running flat out 24/7. If it runs out of work to do sometimes, so be it. Get another project, or not.

As for your BSOD, seriously you shouldn't get that no matter what any program does wrong. Only bad memory (or sometimes a bad graphics card) can cause a BSOD. What is the code on the BSOD?
14) Message boards : News : Scheduled Server Maintenance Concluded (Message 66377)
Posted 5 May 2017 by Peter Hucker
If you're getting a BSOD, you should run memtest on a triple scan, most likely you have faulty memory. I never ever get a BSOD doing anything.

Try increasing your queue size or adding a backup project on priority 0 (it will only run if you run out of Milkyway tasks).
15) Message boards : News : Scheduled Server Maintenance Concluded (Message 66375)
Posted 5 May 2017 by Peter Hucker
I've never had a BSOD from a broken WU, there must be something else wrong with your machine.

And what is this 24 h wait? My computers ask for more WU whenever they need it.
16) Message boards : News : Scheduled Server Maintenance Concluded (Message 66372)
Posted 5 May 2017 by Peter Hucker
Why are people getting so upset? My computers are working fine (Windows 10, Intel CPUs, AMD graphics). If you don't have enough work, sign up for a backup project.
17) Message boards : News : Scheduled Server Maintenance Concluded (Message 66365)
Posted 4 May 2017 by Peter Hucker
Not happening with me, only 50% of the CPU tasks are failing. So it always has plenty to do until the next update. Most of the GPU ones I'm getting are 146 which work fine. Just checked - 3 of the last 31 GPU tasks were 140 and failed. 28 were 146 and all worked.
18) Message boards : News : Scheduled Server Maintenance Concluded (Message 66363)
Posted 4 May 2017 by Peter Hucker
Half of my 140 units are working fine. It's no big deal if they fail after 2 seconds, it's only wasted 2 seconds of my computer's time.
19) Message boards : News : Scheduled Server Maintenance Concluded (Message 66361)
Posted 4 May 2017 by Peter Hucker
I'm getting the same, they stop after 1 or 2 seconds. I don't know how long I've had them for though, as I run 4 projects.
20) Message boards : News : Scheduled Server Maintenance Concluded (Message 66354)
Posted 3 May 2017 by Peter Hucker
I see the remaining time has been fixed on the 5 bundles. Thanks for that.


Next 20

Main page · Your account · Message boards


Copyright © 2017 AstroInformatics Group