Welcome to MilkyWay@home

Posts by Vortac

1) Message boards : Number crunching : Invalids Exit status 0 (0x0) after server came back (Message 68206)
Posted 19 days ago by Vortac
Post:
Same here - lots of invalids, on all machines. I had zero invalids before the last outage.
2) Message boards : Number crunching : Server can't open database? (Message 68200)
Posted 21 days ago by Vortac
Post:
I noticed that both seti and milkyway are no longer on the gridcoin whitelist. Both went on the "Excluded Projects" list in March

Those are just occasional glitches. Milkyway and SETI are back already. Collatz will be back soon too, I guess.
3) Message boards : News : Database Maintenance 12-18-2018 - Ended (Message 67956)
Posted 20 Dec 2018 by Vortac
Post:
Database was down again today, for even longer than usual. I hoped this maintenance would fix those problems.
4) Message boards : Number crunching : Titan V (Message 67299)
Posted 3 Apr 2018 by Vortac
Post:
There's some info here:
https://steemit.com/gridcoin/@cautilus/my-quest-to-use-an-nvidia-titan-v-for-boinc-titan-v-and-1080-ti-boinc-benchmarks
5) Message boards : Number crunching : Errors (Message 66647)
Posted 18 Sep 2017 by Vortac
Post:
Hey Everyone,

Sorry, I put up a bad set of runs on Friday. They're down now and I'm going to check out the errors now.

Sorry again,

Jake

Proverbial Friday afternoon, nothing implemented on a Friday afternoon works correctly. The problem usually becomes self-evident right after the last employee has left the building :)
6) Message boards : Number crunching : Errors (Message 66615)
Posted 16 Sep 2017 by Vortac
Post:
Another bad run, most likely.
7) Message boards : Number crunching : Trouble with a Titan Black (Message 66603)
Posted 11 Sep 2017 by Vortac
Post:
My guess is that Titan Black shows its full strength only with CUDA applications.
8) Message boards : Number crunching : AMD VEGA FE (Message 66525)
Posted 6 Jul 2017 by Vortac
Post:
Hello.
What about AMD VEGA FE? Do you thing it would have the right power to crunch on Collatz, Milkyway or Einstein?
Thanks for INFO. I am thinking of purchasing one just pro boinc.

HD7970 (AMD Tahiti) still trumps it in FP64. I guess it would be better suited for Einstein or Collatz.
9) Message boards : News : Scheduled Server Maintenance Concluded (Message 66416)
Posted 9 May 2017 by Vortac
Post:
I can't see a notice in the messages saying "Message from Server - deferred for 1 day for faulty WUs" or similar.

Well, there are so many erroneous tasks now (apparently, a new batch that is failing has arrived) you will almost surely get a 24hrs deferrment too.
10) Message boards : News : Scheduled Server Maintenance Concluded (Message 66407)
Posted 8 May 2017 by Vortac
Post:
Is the server block not done on a percentage? If my 17 to 194 isn't triggering it, why should 170 to 1940?

That's a good question. Jake will know this for sure, but I believe the trigger is set on absolute numbers, not on a percentage. Because 1% error-rate on a machine with 8 powerful GPUs is still literally thousands of thrashed tasks, while on a machine with just one low-range GPU, 1% is a couple of tasks only. It makes more sense to use absolute numbers to protect the network from faulty machines.


Actually I can't be sure it hasn't triggered it on mine - is that recorded in my results data somewhere?

I think such events aren't shown in the Event Log, therefore the only way to spot it is to check your hosts last contact time through your Milkyway account. If your host hasn't contacted the server for hours or more, communication might have been deferred due to computational errors. Or, if the machine is attended, check it through your BOINC Manager - communication deferred timer is always shown on the Projects tab (under Status column).
11) Message boards : News : Scheduled Server Maintenance Concluded (Message 66405)
Posted 8 May 2017 by Vortac
Post:
I've got 194 valid and only 17 in error, hardly cause for concern. It's not preventing me getting any more from the server

OK. But, to explain it once again, with multiple GPUs and higher output, you would get so many computational errors that it would prevent you from getting more tasks from the server.

I guess it's the mechanism to prevent faulty machines from thrashing thousands of workunits. But in this case computational errors are not happening because of faulty hardware, but because of recent switch to Milkyway version 1.46, deprecating thousands of older tasks which cannot be computed successfully with newer application. In normal circumstances, 17 computational errors with 194 valids, that would be a 8% failure rate - very high and a strong indication of some hardware problems.
12) Message boards : News : Scheduled Server Maintenance Concluded (Message 66401)
Posted 7 May 2017 by Vortac
Post:
I don't know what your problem is, but my computers work just fine.

So you haven't noticed yet this is the thread about the problems which have appeared after latest Scheduled Server Maintenance? It's in the title of the thread, in case you have missed it. Computational errors mentioned here so often are the result of the latest Milkyway@home upgrades - Jake himself kindly invited us to provide some feedback on that. And your hosts are also producing such errors, check here for example:

https://milkyway.cs.rpi.edu/milkyway/results.php?hostid=686691&offset=0&show_names=0&state=6&appid

So, you see the problem now? You are trying to provide some advice, but you don't even know what you are talking about (not even your own hosts). You should be grateful for this education, not angry.
13) Message boards : News : Scheduled Server Maintenance Concluded (Message 66398)
Posted 7 May 2017 by Vortac
Post:
They abort themselves.

Ah, more of your erroneous assumptions. They don't abort themselves. They end in 'Computational error' (1 (0x1) Unknown error number). Which is completely different from 'Aborted' (203 (0xcb) EXIT_ABORTED_VIA_GUI). The important difference being that computation errors defer communication with the server and aborted tasks don't. So indeed, it's preferable to abort them, but that's not possible on unattended machines and I certainly don't have the whole day to abort deprecated Milkyway tasks.

Frankly, I am tired of correcting you all the time. Can't you research anything before you post?
14) Message boards : News : Scheduled Server Maintenance Concluded (Message 66396)
Posted 7 May 2017 by Vortac
Post:
I have just been aborting them. No big deal!!

How do you abort a task on an unattended machine?
15) Message boards : News : Scheduled Server Maintenance Concluded (Message 66382)
Posted 5 May 2017 by Vortac
Post:
You said earlier "BSODs happen, with or without BOINC, even on perfectly fine machines." I was correcting you. BSOD is a hardware error.

https://support.microsoft.com/en-us/help/17074/windows-7-resolving-stop-blue-screen-errors

"Stop errors (also sometimes called blue screen or black screen errors) can occur if a serious problem causes Windows 7 to shut down or restart unexpectedly. These errors can be caused by both hardware and software issues."
16) Message boards : News : Scheduled Server Maintenance Concluded (Message 66380)
Posted 5 May 2017 by Vortac
Post:
What's the time limit for re-contacting the Milkyway server? I've seen my machines contact them far more often than once an hour. And it's only a small proportion that fail. Anyway, just be patient as this is only a temporary problem!

And if we're only getting 80 tasks per GPU, this would seem to indicate there isn't that much work to be done on this project.

I really don't understand why some people need to have their machines running flat out 24/7. If it runs out of work to do sometimes, so be it. Get another project, or not.

As for your BSOD, seriously you shouldn't get that no matter what any program does wrong. Only bad memory (or sometimes a bad graphics card) can cause a BSOD. What is the code on the BSOD?

You still don't get it, just repeating the same irrelevant stuff over and over again. Normally, communication is deferred for 90 seconds after every Update. But, to explain it for the fourth time, plenty of computation errors (which we are getting now due to new app version) are deferring communication for up to 24hrs. I hope it's clear now?

As for BSODs, I am not getting any, that's just one of your superficial assumptions. And even if I did, I doubt I would ask you for help.
17) Message boards : News : Scheduled Server Maintenance Concluded (Message 66378)
Posted 5 May 2017 by Vortac
Post:
If you're getting a BSOD, you should run memtest on a triple scan, most likely you have faulty memory. I never ever get a BSOD doing anything.

Try increasing your queue size or adding a backup project on priority 0 (it will only run if you run out of Milkyway tasks).

Things are not that simple and you are not even trying to comprehend the problem here. Milkyway allows only 80 tasks per GPU, so it's impossible to have a large queue (as you have suggested). 80 tasks aren't enough for even a full hour with a Tahiti GPU, therefore regular communication with the server is extremely important, to be able to obtain new tasks often enough.

Of course, I have backup projects. But Tahiti GPUs are strong only in FP64 nowadays, therefore they excel only in Milkyway. My Gridcoin magnitude and rewards are decreased whenever backup projects kick in. It's simply inefficient to use Tahitis for FP32 projects today and only PrimeGrid is a viable FP64 alternative, however only for very long WUs which take days to crunch.
18) Message boards : News : Scheduled Server Maintenance Concluded (Message 66376)
Posted 5 May 2017 by Vortac
Post:
I've never had a BSOD from a broken WU, there must be something else wrong with your machine.

And what is this 24 h wait? My computers ask for more WU whenever they need it.

BSODs happen, with or without BOINC, even on perfectly fine machines.

Your computers have about 1000 tasks all together. But a single PC with 3 or 4 powerful GPUs will have +30k tasks and many of them will end in computational errors, because they are incompatible with the new app. With so many computational errors, communication with the server gets deferred. It's by design, because servers normally don't expect so many thrashed workunits, to put it simple.
19) Message boards : News : Scheduled Server Maintenance Concluded (Message 66364)
Posted 4 May 2017 by Vortac
Post:
Half of my 140 units are working fine. It's no big deal if they fail after 2 seconds, it's only wasted 2 seconds of my computer's time.

With multi-GPU setups, there will be so many failed tasks that the client tries to "protect" the network by postponing the communication with the server (the idea is to stop faulty machines from thrashing thousands of workunits, but it has backfired on us in this case).
20) Message boards : News : Scheduled Server Maintenance Concluded (Message 66358)
Posted 4 May 2017 by Vortac
Post:
I am still getting tens of de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3 tasks. They are all erroring out immediately, deferring communication with the server. In turn, when there are plenty of such errored tasks, communication gets deferred for hours and the queue gets empty. The only solution is to force a manual update or to abort such tasks before they get errored out. But that's possible only for attended machines. Left alone, when there are plenty of such errors, the communication gets deferred further and further.


Next 20

©2019 Astroinformatics Group