Welcome to MilkyWay@home

Posts by Frank

1) Message boards : News : Server Maintenance 12:00 PM ET (16:00 UTC) 9/23/2022 (Message 74445)
Posted 14 Oct 2022 by Frank
Post:
I am certainly relieved that the worrisome Notice won't be showing up on my computer any longer. However, I am concerned that the Project may be crashing. Task completions are down 40% and are falling.
You can't look at any of the various elements of the process and say that it is working well. Task creation, Task distribution, Task execution, Task completion. Task validation and Task error detection are seriously flawed.?
If we can't get back on course we are going to crash. What I need to know is, are we going to correct our course and get this turkey under control or load the life boats?
2) Message boards : Number crunching : Can't Complete WU In Time (Message 73990)
Posted 22 Jul 2022 by Frank
Post:
Hal,
You are chasing ghosts. You don't have some weird intermittent fault in your computers. What you don't have is an appreciation that once your computer uploads the results of your computation the job is not completed. Until the task is validated, invalidated or errors its clock is still running. If there is enough delay by the validation process, you can run out of time and Milky will cancel the task.
It ain't right but it is the way it is. And, it isn't you or your computers; keep on truckin'.
3) Message boards : News : News General (Message 73644)
Posted 22 May 2022 by Frank
Post:
I don't think overload of the servers is a problem. Just look at the numbers of users active on a given day. It has been hanging around 6,000 for weeks and the number of active contributors is around 15,000.
However, I have seen the response time of the servers stretch out substantially. I believe the slowness of server response may lie on the Internet. Could be electronic interference or overload of an intermediate server (Internet speed will always be controlled by the slowest server and collisions simply stop transmission until there is a channel open).
On the Errors by "Timed out - no response" yesterday I encountered 27 of them. They did not error in 2 minutes but rather in 2 hours.
If reboot can stabilize server operation I would encourage it. Probably in the dead of night,
I am encouraged by your attitude and intelligence. Maybe the crater can wait. How ever the befuddlement fix can not.
4) Message boards : News : News General (Message 73631)
Posted 21 May 2022 by Frank
Post:
I have a simple question. Are we done here or has MW cratered? Well, maybe MW is just over-subscribed and don't need all the computing power they have available.
MW seems hopelessly befuddled. The supply of runnable tasks is sporadic, at best. Validation is a bad joke, all the validation required copies of tasks don't get sent. Tasks error out for no response after. 2 minutes in the client. What kind of a project plan could tolerate 8 million unsent N-Body tasks? Why are there 0 Separation tasks, especially on a weekend? Talk about erratic, what about the servers? I could continue this list but it isn't necessary.
There are a ton of problems. There is no money to generate fixes. So, are we done here?
5) Message boards : News : News General (Message 73094)
Posted 24 Apr 2022 by Frank
Post:
5 w/u the son of 7 w/u. Today, I encountered two that invalidated the tasks. I think the 5 w/u corruption is a sickness (a variant of the 7 w/u decease) that can be present in untold numbers of tasks yet to be run. Yes, it's a pandemic. It has to be corrected.
And, no there were no computer error, except those errors committed by the computer that built the tasks.
6) Message boards : News : News General (Message 73082)
Posted 23 Apr 2022 by Frank
Post:
I am sure you all remember the w/u =7 tasks that caused invalidations. Well, this morning I encountered the son of w/u=7. Its signature is w/u=5 and it causes validation errors. Que paso?
7) Message boards : News : New Separation Runs (Message 71825)
Posted 25 Feb 2022 by Frank
Post:
Well, maybe I wouldn't have misunderstood if you stated that you were merely considering moving your computers to Einstein. My advice to you would have been the same. Don't go; Einstein at home is not a nice place.
8) Message boards : News : New Separation Runs (Message 71780)
Posted 21 Feb 2022 by Frank
Post:
Yo Tom,
I would hate to see you truck off to Einstein. From my point of view, you have been the glue that holds this Project together. If you are a little restless and out of sorts it may be due to the much dreaded winters in northern New York. You may find that Utopia is no where near Einstein.
There seems to be some rhetoric bouncing around "I love science", "I love recognition of contribution", etc.. We all do it for our own reasons and you can't deny the validity (there's that word again) of any of them, I, personally do it to avoid freezing to death during the Wyoming winter (far more potent than NY winters. My computers heat my house.
Your job is not to define a good MilkyWay client . i, for one will find it difficult to do without your help, They might even miss my 132 CPUs and 36 GPUs (hey, I was a SETI guy for 21 years).
9) Message boards : News : Validator Outage (Message 71351)
Posted 12 Nov 2021 by Frank
Post:
`Tom,
The 7 WUs are back. Back consuming 1.62 times the energy required to run the normal 4 WU tasks and wasting 1.62 times the computing time. It is a serious problem.
I know you will "unstick" the hung 7 WU task and all will be well for a couple of days. That isn't a fix it is a work around. And, it is not fair to you; you have to spend a bunch of time mucking around in the software keep the system running; again and again.
What must we do to get a solution to this problem? It has to be fixed. We can't trivialize this problem as an inconvenience we can tolerate. Some users might tolerate it but many will not (including me).
Save the Wilkyway!! Is that cosmic or what?
10) Message boards : News : Validator Outage (Message 71307)
Posted 5 Nov 2021 by Frank
Post:
Tom,
We need you to unstuck the software again. Today I encountered about 60 7 WU tasks. They were all sent today and run today. On average they spent 6 hours in my computers. So they are fresh; indicating to me that you have a stuck one.
And no, I haven't forgotten my war against Errors While Computing. Did you know that Rosetta at Home is experiencing a bunch of flawed tasks, even as we speak. Milkyway and Rosetta are both BOINC users. Makes one wonder whether the malfeasant software might live in BOINC servers. I would like to see the end of Errors While Computing; they imply my computers erred. It ain't true.
11) Message boards : News : Validator Outage (Message 71288)
Posted 30 Oct 2021 by Frank
Post:
I don't disagree with any thing you put in your last message; but I am wondering when the source code can be be worked on.
You have validated how the 7 WU Tasks are coming from within the Project's domain. This problem showed up in September. It had been doing well up that point. What changed?
Maybe all you have to do is reinstall the program that is causing the corruption. Or, maybe a subroutine can be added to the client app to make sure that if the number of WUs is greater than 4 the task is aborted.
Believe or not 7 WU Tasks cost the client 1.62 more computation and power and there are a total of three clients that are going to have to run these corrupt tasks. They are not free. We crunchers need a fix.
12) Message boards : News : Validator Outage (Message 71283)
Posted 28 Oct 2021 by Frank
Post:
7 WU tasks are back and becoming more prevalent. Over the past 24 hours I have encountered 60. There is still trouble in River City.
13) Message boards : News : Validator Outage (Message 71254)
Posted 19 Oct 2021 by Frank
Post:
I thought it must be Halloween already since I was frightened. The dreaded 7 WU Invalids were back in their hundreds ( actually 120s). All the corrupt tasks were sent at 6:00 UTC on October 19, 2021.
Hopefully, all of the bad tasks had been buried in the reservoir of "Waiting to be sent tasks". If so, their numbers will diminish over the new few days. Hopefully.
14) Message boards : News : Validator Outage (Message 71252)
Posted 17 Oct 2021 by Frank
Post:
Tom Donlon, you done good.
In the last 24 hours I have completed 11,000 tasks and experienced 2 Invalids (7 WU) and O Errors While Computing..
15) Message boards : News : Validator Outage (Message 71243)
Posted 13 Oct 2021 by Frank
Post:
These "Invalids" are really becoming irritating. I am encountering about 450 per day and each is consuming 1.625 of the normal computer time. Are we making any progress on eliminating the 7 WUs tasks?
I won't even bitch about the "Error While Computing " Tasks. They are not Errors While Computing since no computing is ever done. Actually, they are Initialization Errors. I don't care what they are I just want them to go away.
On the Invalids, I am fairly certain that tasks that end up as 7 WU are sent as 4 WU. The assessment 7 WUs is made by the clients' computers. Maybe what we need is a subroutine in Initialization that tests the WU Count before computation starts and if it is not 4 aborts the run. I don't like creating "workarounds" as fixes but it would be better than we have today.
16) Message boards : News : Validator Outage (Message 71195)
Posted 30 Sep 2021 by Frank
Post:
Keith Myers posted (mesage 71189) to this thread and related that the heart of the Milkyway app may be somewhat mushy and could cause both validation and "errors while computing" problems. In response to his posting he received "Not what this thread is about".
We need to pay attention to what Keith says. He's only been doing this for 25 years and may be the strongest expert we have. The connection between the validation and the errors and the poor coding will probably become evident when the problems are fixed.
I am not real happy about either of the problems I encounter about 200 Invalids every day along with about 30 Computer Errors.
If the 7 w/u tasks are messing up validation , how is that se are getting more 7 w/u tasks all the time. Something must be happening between the creation of the task and its starting execution. Could Internet noise be causing the corruption of the task?
The Errors while computing find that the executing computer has tried to run an unknown command. It seems that "unknown command" is not due to an app problem, otherwise, all tasks would error after
experiencing the first one. Did you know that Internet Noise causes most transfer errors? It drives me nuts on my computers, cell phones and smart TVs (some folks say that is more like a putt than a drive).
I really want these problems to be fixed, soon, so lots of luck.
17) Message boards : News : New Separation Runs 6/9/2021 (Message 70948)
Posted 10 Jul 2021 by Frank
Post:
I am seeing 11.44% Invalids and fit is increasing daily. I have seen reports where four computers (2 Windows 10 and 2 Linux) tried to evaluate 4 completed tasks. 2 Tasks were judged to be Invalid, 1 Windows 10 and 1 Linux. Makes me think that problem doesn't reside in an operating system.
Keep smilin'.
18) Message boards : News : New Separation Runs 6/9/2021 (Message 70928)
Posted 27 Jun 2021 by Frank
Post:
Well, if you wanted Invalids on the gapfix_bgset3 you got them by their hundreds. Personally, I don't like them. They are mainly occurring on CPU tasks but some show up on GPU tasks.
19) Questions and Answers : Windows : Milkyway@Home Uses Only One of Three GPUs (Message 69725)
Posted 17 Apr 2020 by Frank
Post:
Keith,
I plumb forgot to credit you for your assistance during my journey through the dark ages. You provided valuable insight during the journey and continue to do so now. I'll research your current suggestion and see if I can improve the stability of my setup. Thanks again!
20) Questions and Answers : Windows : Milkyway@Home Uses Only One of Three GPUs (Message 69723)
Posted 16 Apr 2020 by Frank
Post:
A while ago I reported that MilkyWay was using only 1 of 3 available GPUs on several of my computers. I was baffled so I asked for help. Very quickly I had a number of respondents that provided helpful information. Unfortunately, after checking on configurations and settings as suggested, I still had the problem. It had become a major mystery.
Well, the mystery is solved. I know what was preventing my Idle GPUs from getting work. It all boils down to CPU load. MW captures all the CPU power available for its CPU Tasks. If MV tries to start a GPU Task it will normally fit within the CPU domain using the small chunks of time not committed to the running CPU. When MV tries to start a second and third GPU Tasks MV will stop a couple of CPU Tasks to make room for the GPUs. Hooray. About 6 CPU tasks are running along with 3 GPU Tasks - I'm a happy camper but it is a house of cards.
If the six CPU tasks happen to be individual Separation tasks and an Nbody task comes looking for a home MV will kick all the separation tasks into the waiting to run queue. I guess those tasks in the waiting to run queue count against available CPU time. MV will not tolerate more than 100% of CPU time being used so MW waits for the GPU tasks to complete the won't allow any to start. You are down to 6 Processors and 1 GPU. I know because I've bee there and done that.
To prevent this from happening to me and by guys (computers) I just went to the Milkyway@Home Preferences and selected only one of the CPU Task types (Nbody or Separation) to be allowed. They don't seem to play well together. Now I have all five of the computers I have committed to WilkyWay running 3 PUs each.
So, thanks to Mikey and Joseph Stateson for the help and guiding nudges. Without them I would still be wandering in the Never-Never.


Next 20

©2024 Astroinformatics Group