Welcome to MilkyWay@home

Failing workunits

Message boards : News : Failing workunits
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 42821 - Posted: 13 Oct 2010, 17:26:58 UTC

We're experiencing some issues with the RPI computer science login servers since some time over the weekend, so we've been unable to fix the problem with the failing workunits for the separation 0.4 runs. We're waiting for them to be restored from a backup before we can fix the issue.
ID: 42821 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile White Mountain Wes
Avatar

Send message
Joined: 24 Jul 09
Posts: 32
Credit: 18,088,471
RAC: 388
Message 42824 - Posted: 13 Oct 2010, 18:02:03 UTC - in response to Message 42821.  

Thank you for the update. It's gratly appreciated.
ID: 42824 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
pstehno
Avatar

Send message
Joined: 16 Jun 10
Posts: 6
Credit: 7,402,186
RAC: 0
Message 42844 - Posted: 14 Oct 2010, 15:21:57 UTC

Thanks for the update. Will check back to see when the problem is fixed.
ID: 42844 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile John Black

Send message
Joined: 3 May 10
Posts: 74
Credit: 1,532,760
RAC: 0
Message 42854 - Posted: 14 Oct 2010, 19:28:16 UTC - in response to Message 42844.  

Hi Matt. Thanks for the update. You will know how frustrating it is some of my wus predicted at c16 hours have been running for 38 hours with 15 to go I presume that these will error out eventually.
I've only got an E4700 core duo and one does S@H and the other MW@H. S@H is having server problems and MW@H is erroring out with the 0.4 or is it 0.04 software. Why don't we just go back to the previous version? Check out the posts on Number crunching>Computation errors everybody is having this problem and migrating to other projects.

Good luck with the fix we, out here in the ether, hope that you manage to fix it soon and appreciate all your efforts to supply us with work for free
ID: 42854 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
w1hue

Send message
Joined: 13 Feb 09
Posts: 49
Credit: 72,372,187
RAC: 0
Message 42900 - Posted: 17 Oct 2010, 3:13:08 UTC - in response to Message 42821.  

Is that why I've gotten zero credit for two WUs that took over 40 hours to complete? I hope that I eventually get credit, but in the meantime, I'll suspend the project and let my machine crunch on other projects.
ID: 42900 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brent

Send message
Joined: 16 Mar 10
Posts: 12
Credit: 22,284,745
RAC: 0
Message 42901 - Posted: 17 Oct 2010, 3:47:36 UTC

Well another 45 hours of wasted computer time and electricity for 2 more -161 error code failures (in addition to several previous ones). But that is it! I have aborted all of my other Milkyway tasks and will accept no more tasks until I am notified this problem has been fixed.


ID: 42901 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Orion1958
Avatar

Send message
Joined: 6 Apr 09
Posts: 1
Credit: 1,560,491
RAC: 0
Message 42903 - Posted: 17 Oct 2010, 5:28:14 UTC

Hi Folks,
I to have been hit with failed work units. I've had a total of 7 work units and I'm going to have what looks like number eight that has failed. It started on

    [*]10/11/10--19h--15m--55s
    [*]10/12/10--31h--13m--51s
    [*]10/12/10--28h--06m--19s
    [*]10/13/10--19h--26m--52s
    {*]10/14/10--19h--11m--04s
    [*]10/16/10--25h--17m--14s
    [*]10/17/10--35h--02m--48s



h= Hour, m= minutes, s= seconds

These failed work units have taken up all most 200 hours of time.
I just wanted you all to know when it started for me.

orion1958 (RickyF)


ID: 42903 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
RONALD1701

Send message
Joined: 21 Mar 09
Posts: 11
Credit: 14,806,072
RAC: 0
Message 42911 - Posted: 17 Oct 2010, 11:48:45 UTC

Hi Folks,

The WUs with 0.40 have still problems, results are: error while computing

Two examples:

The first WU ID 221388801 took over 17 hours in computing ELAP Time and the second WI ID 219212040 took over 34 hours on an INTEL i7 965 (8 cores) with VISTA 64 bit OS.

221388801 167364804 16 Oct 2010 17:51:48 UTC 17 Oct 2010 11:34:04 UTC Error while computing 62,653.59 60,971.77 426.59 --- MilkyWay@Home v0.40

219212040 165787964 13 Oct 2010 19:58:58 UTC 16 Oct 2010 16:57:16 UTC Error while computing 124,433.33 123,003.90 860.59 --- MilkyWay@Home v0.40

I hope, that can be fixed ASAP.

Kind regards,
Ronald
ID: 42911 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Zwickelklopps

Send message
Joined: 9 Feb 10
Posts: 1
Credit: 7,857,169
RAC: 0
Message 42928 - Posted: 18 Oct 2010, 9:32:38 UTC

Hi Folks,

since the last days I get only "accounting irregularity" for 100% computed Milkyway@home-tasks.

I changed the Mainsystem at weekend from xp to win 7, but this isn't the problem; it exists under xp and win 7.

Other workunits, e.g. from SIMAP, works successful.

I hope you can fix this problem in the next time.

Thanks & Good Luck
ID: 42928 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
mfbabb2

Send message
Joined: 29 Sep 09
Posts: 18
Credit: 46,059
RAC: 0
Message 42993 - Posted: 20 Oct 2010, 3:36:02 UTC

Does the latest Milkyway WU software use SSE2 or similar? Some older machines do not have those instructions (especially AMD), and that may be the cause of this rash of Compute Errors with Illegal Instruction.
ID: 42993 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Matt Arsenault
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 8 May 10
Posts: 576
Credit: 15,979,383
RAC: 0
Message 42996 - Posted: 20 Oct 2010, 5:01:31 UTC - in response to Message 42993.  

Does the latest Milkyway WU software use SSE2 or similar?


The N-body is supposed to require SSE2 because without it, it's a pain to get consistent results with the x87 FPU. The separation isn't. I think there might have been some 'build system pollution' where the SSE2 flags were infecting the separation build.

Some older machines do not have those instructions (especially AMD), and that may be the cause of this rash of Compute Errors with Illegal Instruction.


Intel added SSE2 in 2001, and AMD added it in 2003, so really old.

ID: 42996 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brent

Send message
Joined: 16 Mar 10
Posts: 12
Credit: 22,284,745
RAC: 0
Message 43064 - Posted: 21 Oct 2010, 16:06:56 UTC - in response to Message 42993.  

Does the latest Milkyway WU software use SSE2 or similar? Some older machines do not have those instructions (especially AMD), and that may be the cause of this rash of Compute Errors with Illegal Instruction.


Well since I am having my failures on an Intel i5-750 system with Windows 7, I don't believe this is the problem.
ID: 43064 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Sutaru Tsureku

Send message
Joined: 30 Apr 09
Posts: 99
Credit: 29,853,513
RAC: 1,056
Message 43065 - Posted: 21 Oct 2010, 16:16:12 UTC
Last modified: 21 Oct 2010, 16:25:17 UTC

A team mate and I got errors at all of the WU series of:
de_14_2s_5_x and
de_16_2s_5_x.

The errors are from GTX295 (team mate) and from GTX260-216 cards (my).


It look like this (as example) (GTX260-216):

Exit status -1073741819 (0xffffffffc0000005)

- exit code -1073741819 (0xc0000005)

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00408036 read attempt to address 0x0224B000

Engaging BOINC Windows Runtime Debugger...


..and..

(GTX295)
Reason: Access Violation (0xc0000005) at address 0x00408036 read attempt to address 0x023AF000

Reason: Access Violation (0xc0000005) at address 0x00408036 read attempt to address 0x023D6000



It's a problem with the 0.24 cuda23 app, or with the WUs?
ID: 43065 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
RONALD1701

Send message
Joined: 21 Mar 09
Posts: 11
Credit: 14,806,072
RAC: 0
Message 43110 - Posted: 23 Oct 2010, 14:43:56 UTC

Hi Folks,

The WUs with 0.40 have still problems, results are: Validate error

This WU take 20 to 33 hours ELAP Time or more.

Other WUs are estimated in ELAP-Time much more, due to the fact, that the runtime is awful long, up to 80-120 or more hours in computing ELAP Time, I decided to abort all those longrunners.

My System is: INTEL i7 965 (8 cores) with VISTA 64 bit OS.

224837318 165691129 21 Oct 2010 5:16:37 UTC 23 Oct 2010 4:31:04 UTC Validate error 53,701.63 53,047.92 358.39 --- MilkyWay@Home v0.40
224801303 169891267 20 Oct 2010 23:57:58 UTC 23 Oct 2010 8:03:05 UTC Validate error 120,921.55 119,557.60 807.73 --- MilkyWay@Home v0.40
224863918 169581572 21 Oct 2010 16:32:50 UTC 23 Oct 2010 11:41:07 UTC Validate error 78,787.94 77,470.22 523.39 --- MilkyWay@Home v0.40
224818756 169904633 21 Oct 2010 1:14:38 UTC 23 Oct 2010 11:57:10 UTC Validate error 91,987.54 90,332.60 610.29 --- MilkyWay@Home v0.40

UTC 23 Oct 2010 14:08:22 UTC Aborted by user 87,497.46 86,091.69 581.64 --- MilkyWay@Home v0.40
224776309 169873920 20 Oct 2010 23:15:04 UTC 23 Oct 2010 14:08:22 UTC Aborted by user 145,051.26 143,105.00 966.82 --- MilkyWay@Home v0.40
224773732 169871374 20 Oct 2010 23:10:36 UTC 23 Oct 2010 14:08:22 UTC Aborted by user 148,016.59 145,909.30 985.77 --- MilkyWay@Home v0.40
224733715 169771092 20 Oct 2010 22:02:27 UTC 23 Oct 2010 14:08:22 UTC Aborted by user 154,943.80 152,746.70 1,031.96 --- MilkyWay@Home v0.40

I hope, that you can fix this asap.

Best regards,
Ronald
ID: 43110 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile bbarde2

Send message
Joined: 13 Feb 10
Posts: 1
Credit: 743,176
RAC: 0
Message 43112 - Posted: 23 Oct 2010, 17:39:48 UTC

I am getting the following error and Boinc will not process any downloads.

milkyway_0.4_windows_intelx86.exe has encountered a problem and needs to close. We are sorry for the inconvenience.

I have this running on several other computers.

Does anyone have any ideas? I uninstalled and reinstalled several times as well as deleting all folders and registry entries.

Running on a Pentium III windows Xp
ID: 43112 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
RONALD1701

Send message
Joined: 21 Mar 09
Posts: 11
Credit: 14,806,072
RAC: 0
Message 43130 - Posted: 24 Oct 2010, 9:57:48 UTC

Hi Folks,

good news, this 8 WU-Tasks de_seperation v0.40 went thru without any problems. Thanks to Matt, Travis and the team.


226427677 171024106 23 Oct 2010 16:32:46 UTC 24 Oct 2010 9:14:53 UTC Completed and validated 60,072.07 59,266.79 400.41 213.78 MilkyWay@Home v0.40
226427676 171024105 23 Oct 2010 16:32:46 UTC 24 Oct 2010 9:32:11 UTC Completed and validated 61,135.63 60,270.72 407.19 213.78 MilkyWay@Home v0.40
226427675 171024104 23 Oct 2010 16:32:46 UTC 24 Oct 2010 8:47:22 UTC Completed and validated 58,459.32 57,540.41 388.74 213.78 MilkyWay@Home v0.40
226427674 171024103 23 Oct 2010 16:32:46 UTC 24 Oct 2010 8:46:42 UTC Completed and validated 58,395.09 57,585.02 389.05 213.78 MilkyWay@Home v0.40
226427673 171024102 23 Oct 2010 16:32:46 UTC 24 Oct 2010 8:45:47 UTC Completed and validated 58,291.51 57,536.37 388.72 213.78 MilkyWay@Home v0.40
226427660 171024089 23 Oct 2010 16:32:46 UTC 24 Oct 2010 8:36:12 UTC Completed and validated 57,774.64 57,013.30 385.18 213.78 MilkyWay@Home v0.40
226427659 171024088 23 Oct 2010 16:32:46 UTC 24 Oct 2010 9:53:58 UTC Completed and validated 58,912.21 58,050.43 392.19 213.78 MilkyWay@Home v0.40
226427658 171024087 23 Oct 2010 16:32:46 UTC 24 Oct 2010 9:03:55 UTC Completed and validated 59,452.36 58,563.46 395.66 213.78 MilkyWay@Home v0.40

Best regards,
Ronald

ID: 43130 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
hazyjohn

Send message
Joined: 13 Jul 09
Posts: 1
Credit: 12,316
RAC: 0
Message 43135 - Posted: 24 Oct 2010, 16:58:45 UTC - in response to Message 43130.  

Hello,

Since last week I would like to add new work from Milkyway, but every time I started to download a new task, an error happens.

When I check the message page on the BOINC manager, it shows this:

work fetch resumed by user
update requested by user
sending scheduler request: Requested by user.
Requesting new tasks
Scheduler request completed: got 1 new tasks
Started download of stars-td82-2stream-30.txt
Started download of de_separation_82_3s_00_1_1669219_1287930653_search_parameters
Finished download of de_separation_82_3s_00_1_1669219_1287930653_search_parameters
Finished download of stars-td82-2stream-30.txt

After the above mentioned, nothing happens and after restarting the BOINC manager I got this error message:

milkyway_0.4_windows_intelx86.exe has encountered a problem and needs to close. We are sorry for the inconvenience.

And here is the signature of this error:

AppName: milkyway_0.4_windows_intel86.exe
AppVer: 0.0.0.0
ModName: milkyway_0.4_windows_intel86.exe
ModVer: 0.0.0.0
Offset: 000198f9

Can anyone tell me how to fix this? It's going on for more than 2 weeks now.

Thanks,

John

ID: 43135 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile John Black

Send message
Joined: 3 May 10
Posts: 74
Credit: 1,532,760
RAC: 0
Message 43151 - Posted: 25 Oct 2010, 7:20:05 UTC - in response to Message 43135.  
Last modified: 25 Oct 2010, 7:35:48 UTC

Ever since the new software I have been having problems. At first all I got was calculation errors but now it seems that they have disappeared to be replaced by endless calculations.

I have two WUs behaving as follows

de_12_3s_5_606570_1287603352_0 elapsed 36:41 to completion 6:30

de_12_3s_5_606558_1287603352_0 elapsed 35:47 to completion 7:41

these are running with a high priority have kicked two other MW WUs off the processors and are preventing a S&H WU starting. I have another 6 MW WUs waiting to calculate and time is running down before they are due to be reported. Although I am sure that something is wrong I am allowing these WUs to continue to see the outcome.
I would appreciate any comment on my endless calculation problem as with only an E4700 processor with two cores and a weak GPU I do not feel that I am making progress or contributing to anything worthwhile as long as this persists.

I have not managed to sucessfully calculate a MW Wus in the last 14 days HELP
ID: 43151 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Brent

Send message
Joined: 16 Mar 10
Posts: 12
Credit: 22,284,745
RAC: 0
Message 43158 - Posted: 25 Oct 2010, 15:22:19 UTC

Bump

Is this issue being addressed
ID: 43158 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile John Black

Send message
Joined: 3 May 10
Posts: 74
Credit: 1,532,760
RAC: 0
Message 43159 - Posted: 25 Oct 2010, 15:26:41 UTC - in response to Message 43151.  

Ever since the new software I have been having problems. At first all I got was calculation errors but now it seems that they have disappeared to be replaced by endless calculations.

I have two WUs behaving as follows

de_12_3s_5_606570_1287603352_0 elapsed 36:41 to completion 6:30

de_12_3s_5_606558_1287603352_0 elapsed 35:47 to completion 7:41

these are running with a high priority have kicked two other MW WUs off the processors and are preventing a S&H WU starting. I have another 6 MW WUs waiting to calculate and time is running down before they are due to be reported. Although I am sure that something is wrong I am allowing these WUs to continue to see the outcome.
I would appreciate any comment on my endless calculation problem as with only an E4700 processor with two cores and a weak GPU I do not feel that I am making progress or contributing to anything worthwhile as long as this persists.

I have not managed to sucessfully calculate a MW Wus in the last 14 days HELP


The first unit is completed and validated but the run time of 158,069 secs seems excessive compared to previous MW runs and will reduce my throughput dramatically. Can anybody tell me if this is the expected run time or am I doing something wrong? At this rate I will be unable to run the WUs I have to the required reporting time
ID: 43159 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Message boards : News : Failing workunits

©2024 Astroinformatics Group