Welcome to MilkyWay@home

Computation errors


Advanced search

Message boards : Number crunching : Computation errors
Message board moderation

To post messages, you must log in.

AuthorMessage
Andy_Taximan

Send message
Joined: 15 May 16
Posts: 3
Credit: 38,418,196
RAC: 0
30 million credit badge4 year member badge
Message 68915 - Posted: 25 Jul 2019, 13:32:00 UTC

Can somebody take a look, seem to be getting too many errors today, running pc @ stock settings now
https://milkyway.cs.rpi.edu/milkyway/results.php?hostid=808617&offset=0&show_names=0&state=6&appid=
ID: 68915 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileMarsinph

Send message
Joined: 13 Nov 10
Posts: 22
Credit: 108,259,694
RAC: 3
100 million credit badge9 year member badge
Message 68916 - Posted: 25 Jul 2019, 13:52:59 UTC - in response to Message 68915.  

Hello,
It was the same for me this morning (about 50%). Admin know and works on it.
See https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4487
ID: 68916 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJoseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 233
Credit: 1,277,266,415
RAC: 20
1 billion credit badge10 year member badge
Message 68917 - Posted: 25 Jul 2019, 14:22:11 UTC - in response to Message 68915.  
Last modified: 25 Jul 2019, 14:40:14 UTC

I thought this was fixed after reading the post under news quote: "Thanks for bringing that to my attention, none of the runs are returning data. I'll try to fix that as soon as I can."

I did get about 100-200 w/o error but this morning I see another 253 errored out and 212 more waiting to error out.

Compounding the problem is that Einstein coincidently has a series of bad runs that affect older GPUs like my S9000 boards

[EDIT] only a few more errored out. Looks like there are just a few of the bad ones left Out of those 253 "waiting" only about 5 were bad and I now have over 500 downloaded and do not see any more errors.

Thanks for the fix.
ID: 68917 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Andy_Taximan

Send message
Joined: 15 May 16
Posts: 3
Credit: 38,418,196
RAC: 0
30 million credit badge4 year member badge
Message 68918 - Posted: 25 Jul 2019, 14:29:38 UTC

Ah, at least they only taking 2 seconds to error.
ID: 68918 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileJoseph Stateson
Avatar

Send message
Joined: 18 Nov 08
Posts: 233
Credit: 1,277,266,415
RAC: 20
1 billion credit badge10 year member badge
Message 68919 - Posted: 25 Jul 2019, 14:34:55 UTC - in response to Message 68918.  

Ah, at least they only taking 2 seconds to error.


Agree, unlike the new batch of Einstein tasks that run for 6-7 hours and then show 35 - 45 days to complete.
ID: 68919 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileMarsinph

Send message
Joined: 13 Nov 10
Posts: 22
Credit: 108,259,694
RAC: 3
100 million credit badge9 year member badge
Message 68921 - Posted: 25 Jul 2019, 15:08:14 UTC - in response to Message 68917.  

JStateSon,
I think more some bad series than a problem of old GPU.
A Radeon RX580 is not a old stuff !

I will let finish all I have (about 2 hours if no error), waiting the next serie download

Best regards





I thought this was fixed after reading the post under news quote: "Thanks for bringing that to my attention, none of the runs are returning data. I'll try to fix that as soon as I can."

I did get about 100-200 w/o error but this morning I see another 253 errored out and 212 more waiting to error out.

Compounding the problem is that Einstein coincidently has a series of bad runs that affect older GPUs like my S9000 boards

[EDIT] only a few more errored out. Looks like there are just a few of the bad ones left Out of those 253 "waiting" only about 5 were bad and I now have over 500 downloaded and do not see any more errors.

Thanks for the fix.
ID: 68921 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTom Donlon
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 45
Credit: 41,430,296
RAC: 109,097
30 million credit badge1 year member badge
Message 68922 - Posted: 25 Jul 2019, 15:48:12 UTC

Hey all,

There was a problem with a flag in the parameters for the runs last night. I fixed it around 10 PM. You may still get some bad runs as your queues purge, but after a day or two at most things should continue to run smoothly.

Tom
ID: 68922 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileMarsinph

Send message
Joined: 13 Nov 10
Posts: 22
Credit: 108,259,694
RAC: 3
100 million credit badge9 year member badge
Message 68924 - Posted: 25 Jul 2019, 16:03:52 UTC - in response to Message 68922.  

Hello Tom
Thank you for update.
It is not a so big problem at our side, because they crashs atfter 1 or 2 seconds.
It was more problem if they crash a few second before end of crunch
Have a nice day.
Best regards

PS, if you have cold, ask wertern europe.... we will send you "degrees" for free !
In Belgium, in the latest two centuries, the first time we reach such temperature.
To compare in Egypt summer (now and also the highest temperature).
It is warmer here than in the desert of Egypt at Abou-Simbel temple ! "Only 39°C"
So it is fully normal if you see less returned results. We preserve our hosts
ID: 68924 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
robertmiles

Send message
Joined: 30 Sep 09
Posts: 210
Credit: 22,337,484
RAC: 0
20 million credit badge10 year member badgeextraordinary contributions badge
Message 68931 - Posted: 27 Jul 2019, 1:22:26 UTC

2 computation errors for me just now.

The following error message suggests that both were due to bad input files:

Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
ID: 68931 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ProfileTom Donlon
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 45
Credit: 41,430,296
RAC: 109,097
30 million credit badge1 year member badge
Message 68938 - Posted: 30 Jul 2019, 15:11:38 UTC - in response to Message 68931.  

robertmiles,

Did the workunits actually abort computation, or did they just output that error message? That's a typical output from the separation application that basically just says "you aren't using a lua file so I'm defaulting back to this parameter file". When I test these runs on my client I get that "error" every time because we don't use separation that way anymore.

- Tom
ID: 68938 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Computation errors

©2020 Astroinformatics Group