Welcome to MilkyWay@home

Posts by Vortac

21) Message boards : Number crunching : Trouble with a Titan Black (Message 66603)
Posted 11 Sep 2017 by Vortac
Post:
My guess is that Titan Black shows its full strength only with CUDA applications.
22) Message boards : Number crunching : AMD VEGA FE (Message 66525)
Posted 6 Jul 2017 by Vortac
Post:
Hello.
What about AMD VEGA FE? Do you thing it would have the right power to crunch on Collatz, Milkyway or Einstein?
Thanks for INFO. I am thinking of purchasing one just pro boinc.

HD7970 (AMD Tahiti) still trumps it in FP64. I guess it would be better suited for Einstein or Collatz.
23) Message boards : News : Scheduled Server Maintenance Concluded (Message 66416)
Posted 9 May 2017 by Vortac
Post:
I can't see a notice in the messages saying "Message from Server - deferred for 1 day for faulty WUs" or similar.

Well, there are so many erroneous tasks now (apparently, a new batch that is failing has arrived) you will almost surely get a 24hrs deferrment too.
24) Message boards : News : Scheduled Server Maintenance Concluded (Message 66407)
Posted 8 May 2017 by Vortac
Post:
Is the server block not done on a percentage? If my 17 to 194 isn't triggering it, why should 170 to 1940?

That's a good question. Jake will know this for sure, but I believe the trigger is set on absolute numbers, not on a percentage. Because 1% error-rate on a machine with 8 powerful GPUs is still literally thousands of thrashed tasks, while on a machine with just one low-range GPU, 1% is a couple of tasks only. It makes more sense to use absolute numbers to protect the network from faulty machines.


Actually I can't be sure it hasn't triggered it on mine - is that recorded in my results data somewhere?

I think such events aren't shown in the Event Log, therefore the only way to spot it is to check your hosts last contact time through your Milkyway account. If your host hasn't contacted the server for hours or more, communication might have been deferred due to computational errors. Or, if the machine is attended, check it through your BOINC Manager - communication deferred timer is always shown on the Projects tab (under Status column).
25) Message boards : News : Scheduled Server Maintenance Concluded (Message 66405)
Posted 8 May 2017 by Vortac
Post:
I've got 194 valid and only 17 in error, hardly cause for concern. It's not preventing me getting any more from the server

OK. But, to explain it once again, with multiple GPUs and higher output, you would get so many computational errors that it would prevent you from getting more tasks from the server.

I guess it's the mechanism to prevent faulty machines from thrashing thousands of workunits. But in this case computational errors are not happening because of faulty hardware, but because of recent switch to Milkyway version 1.46, deprecating thousands of older tasks which cannot be computed successfully with newer application. In normal circumstances, 17 computational errors with 194 valids, that would be a 8% failure rate - very high and a strong indication of some hardware problems.
26) Message boards : News : Scheduled Server Maintenance Concluded (Message 66401)
Posted 7 May 2017 by Vortac
Post:
I don't know what your problem is, but my computers work just fine.

So you haven't noticed yet this is the thread about the problems which have appeared after latest Scheduled Server Maintenance? It's in the title of the thread, in case you have missed it. Computational errors mentioned here so often are the result of the latest Milkyway@home upgrades - Jake himself kindly invited us to provide some feedback on that. And your hosts are also producing such errors, check here for example:

https://milkyway.cs.rpi.edu/milkyway/results.php?hostid=686691&offset=0&show_names=0&state=6&appid

So, you see the problem now? You are trying to provide some advice, but you don't even know what you are talking about (not even your own hosts). You should be grateful for this education, not angry.
27) Message boards : News : Scheduled Server Maintenance Concluded (Message 66398)
Posted 7 May 2017 by Vortac
Post:
They abort themselves.

Ah, more of your erroneous assumptions. They don't abort themselves. They end in 'Computational error' (1 (0x1) Unknown error number). Which is completely different from 'Aborted' (203 (0xcb) EXIT_ABORTED_VIA_GUI). The important difference being that computation errors defer communication with the server and aborted tasks don't. So indeed, it's preferable to abort them, but that's not possible on unattended machines and I certainly don't have the whole day to abort deprecated Milkyway tasks.

Frankly, I am tired of correcting you all the time. Can't you research anything before you post?
28) Message boards : News : Scheduled Server Maintenance Concluded (Message 66396)
Posted 7 May 2017 by Vortac
Post:
I have just been aborting them. No big deal!!

How do you abort a task on an unattended machine?
29) Message boards : News : Scheduled Server Maintenance Concluded (Message 66382)
Posted 5 May 2017 by Vortac
Post:
You said earlier "BSODs happen, with or without BOINC, even on perfectly fine machines." I was correcting you. BSOD is a hardware error.

https://support.microsoft.com/en-us/help/17074/windows-7-resolving-stop-blue-screen-errors

"Stop errors (also sometimes called blue screen or black screen errors) can occur if a serious problem causes Windows 7 to shut down or restart unexpectedly. These errors can be caused by both hardware and software issues."
30) Message boards : News : Scheduled Server Maintenance Concluded (Message 66380)
Posted 5 May 2017 by Vortac
Post:
What's the time limit for re-contacting the Milkyway server? I've seen my machines contact them far more often than once an hour. And it's only a small proportion that fail. Anyway, just be patient as this is only a temporary problem!

And if we're only getting 80 tasks per GPU, this would seem to indicate there isn't that much work to be done on this project.

I really don't understand why some people need to have their machines running flat out 24/7. If it runs out of work to do sometimes, so be it. Get another project, or not.

As for your BSOD, seriously you shouldn't get that no matter what any program does wrong. Only bad memory (or sometimes a bad graphics card) can cause a BSOD. What is the code on the BSOD?

You still don't get it, just repeating the same irrelevant stuff over and over again. Normally, communication is deferred for 90 seconds after every Update. But, to explain it for the fourth time, plenty of computation errors (which we are getting now due to new app version) are deferring communication for up to 24hrs. I hope it's clear now?

As for BSODs, I am not getting any, that's just one of your superficial assumptions. And even if I did, I doubt I would ask you for help.
31) Message boards : News : Scheduled Server Maintenance Concluded (Message 66378)
Posted 5 May 2017 by Vortac
Post:
If you're getting a BSOD, you should run memtest on a triple scan, most likely you have faulty memory. I never ever get a BSOD doing anything.

Try increasing your queue size or adding a backup project on priority 0 (it will only run if you run out of Milkyway tasks).

Things are not that simple and you are not even trying to comprehend the problem here. Milkyway allows only 80 tasks per GPU, so it's impossible to have a large queue (as you have suggested). 80 tasks aren't enough for even a full hour with a Tahiti GPU, therefore regular communication with the server is extremely important, to be able to obtain new tasks often enough.

Of course, I have backup projects. But Tahiti GPUs are strong only in FP64 nowadays, therefore they excel only in Milkyway. My Gridcoin magnitude and rewards are decreased whenever backup projects kick in. It's simply inefficient to use Tahitis for FP32 projects today and only PrimeGrid is a viable FP64 alternative, however only for very long WUs which take days to crunch.
32) Message boards : News : Scheduled Server Maintenance Concluded (Message 66376)
Posted 5 May 2017 by Vortac
Post:
I've never had a BSOD from a broken WU, there must be something else wrong with your machine.

And what is this 24 h wait? My computers ask for more WU whenever they need it.

BSODs happen, with or without BOINC, even on perfectly fine machines.

Your computers have about 1000 tasks all together. But a single PC with 3 or 4 powerful GPUs will have +30k tasks and many of them will end in computational errors, because they are incompatible with the new app. With so many computational errors, communication with the server gets deferred. It's by design, because servers normally don't expect so many thrashed workunits, to put it simple.
33) Message boards : News : Scheduled Server Maintenance Concluded (Message 66364)
Posted 4 May 2017 by Vortac
Post:
Half of my 140 units are working fine. It's no big deal if they fail after 2 seconds, it's only wasted 2 seconds of my computer's time.

With multi-GPU setups, there will be so many failed tasks that the client tries to "protect" the network by postponing the communication with the server (the idea is to stop faulty machines from thrashing thousands of workunits, but it has backfired on us in this case).
34) Message boards : News : Scheduled Server Maintenance Concluded (Message 66358)
Posted 4 May 2017 by Vortac
Post:
I am still getting tens of de_modfit_fast_19_3s_140_bundle5_ModfitConstraints3 tasks. They are all erroring out immediately, deferring communication with the server. In turn, when there are plenty of such errored tasks, communication gets deferred for hours and the queue gets empty. The only solution is to force a manual update or to abort such tasks before they get errored out. But that's possible only for attended machines. Left alone, when there are plenty of such errors, the communication gets deferred further and further.
35) Message boards : News : Scheduled Server Maintenance May 2nd (Message 66317)
Posted 2 May 2017 by Vortac
Post:
It's OK, we'll have more challenges in the future anyway :)
36) Message boards : News : Scheduled Server Maintenance May 2nd (Message 66315)
Posted 2 May 2017 by Vortac
Post:
Argh, we are in the middle of a challenge here

https://boincstats.com/en/stats/challenge/team/chat/928
37) Message boards : Number crunching : Crunching numbers without a GPU (Message 66250)
Posted 28 Mar 2017 by Vortac
Post:
Of course. If you check Milkyway applications page, you will see that every GPU app has its CPU version. And N-Body applications are actually only CPU.

http://milkyway.cs.rpi.edu/milkyway/apps.php
38) Message boards : Number crunching : Weekly Maintenance? (Message 66248)
Posted 28 Mar 2017 by Vortac
Post:
Yes, with a powerful GPU (like 280x) it's absolutely impossible to cache enough workunits to avoid running out of work during maintenance which usually takes about two hours (I think). A 280x can crunch through 250-300 workunits in two hours, but it's possible to cache only 80.
39) Message boards : Number crunching : Thousands of validation errors, no good work? (Message 66240)
Posted 23 Mar 2017 by Vortac
Post:
I noticed recently that my credit average on Milyway has dropped. On investigation, I see I have thousands of work units with validation errors, and nothing (recently) that has validated successfully.

Given I don't see the message boards exploding, I presume this is specific to my computer -- what should I look for?

Check the GPU temperatures - maybe the fan has falied and the card is overheating?

If that's not the case, it is possible that the card is dying. Try to lower the clocks (even below factory settings) and then monitor if there's any improvement.
40) Message boards : Number crunching : Congratulations Stefan on doing Task 2000000000 (Message 66239)
Posted 23 Mar 2017 by Vortac
Post:
Stefan is most likely running AMD Tahiti HD7970. Most affordable FP64 power ever.


Previous 20 · Next 20

©2024 Astroinformatics Group