Welcome to MilkyWay@home

Posts by TLSI2000

1) Message boards : Number crunching : Computational Error? (Message 58474)
Posted 1 Jun 2013 by TLSI2000
Post:
To get back to operational, I downgraded back to 13.1 and it works fine.

To be successful at this downgrade, you *must* use the separate Catalyst Uninstall which is downloadable from the same page as the installs. The Catalyst Install Manger's included Uninstall option does not do the clean-up necessary to get back to a solid re-install condition.

I am now back in operation.
Good Luck.

Uninstall Link

http://support.amd.com/us/gpudownload/windows/Pages/catalyst-uninstall-utility.aspx
2) Message boards : Number crunching : Computational Error? (Message 58407)
Posted 26 May 2013 by TLSI2000
Post:
As I updated from AMD Catalyst 13.1 to 13.4 today, the MilkyWay work units are all failing.

Something that I did not expect, so searching the forum brought up this thread.
Running BOINC 7.0.64 on Win7 64bit AMD 69xx Cayman GPU


The error log indicates an exception, and dumps a BOINC debug trace to the log.
The error seems to occur at the end of processing.

--------------------------------------------
Example for Task: 481431579 WorkUnit: 368827958 computer: 366518


Using AMD IL kernel
Binary status (0): CL_SUCCESS
Estimated AMD GPU GFLOP/s: 2765 SP GFLOP/s, 691 DP FLOP/s
Using a target frequency of 30.0
Using a block size of 6144 with 121 blocks/chunk
Using clWaitForEvents() for polling (mode -1)
Range: { nu_steps = 320, mu_steps = 1600, r_steps = 1400 }
Iteration area: 2240000
Chunk estimate: 3
Num chunks: 4
Chunk size: 743424
Added area: 733696
Effective area: 2973696
Initial wait: 27 ms
Integration time: 37.905295 s. Average time per iteration = 118.454047 ms


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x000007FEDE811DCD read attempt to address 0x00000010

Engaging BOINC Windows Runtime Debugger...
3) Message boards : News : N-Body 1.08 (Message 57584)
Posted 21 Mar 2013 by TLSI2000
Post:
These seem to be running fine, with a run time coming in at 2 to 4 hours
for an older AMD 2.4 ghz

But the credit calculation seems to be a bit odd

Run time _ _ CPU time _ _ Credit _ _ Application
6,357.59 _ _ 6,357.59 _ _ 26.84 _ _ MilkyWay@Home N-Body Simulation v1.08
6,404.13 _ _ 6,404.13 _ _ 27.04 _ _ MilkyWay@Home N-Body Simulation v1.08
9,509.63 _ _ 9,496.64 _ _ 13.22 _ _ MilkyWay@Home N-Body Simulation v1.08
4) Message boards : News : might have found the error (Message 55782)
Posted 14 Oct 2012 by TLSI2000
Post:
And I have seen a number of errors on the version '3' WUs overnight as well.

These seem to have an #IND in the result as:

<stream_only_likelihood> -3.638176510600306 -10.877692835483177 -1.#IND00000000000 </stream_only_likelihood>
5) Message boards : News : might have found the error (Message 55773)
Posted 14 Oct 2012 by TLSI2000
Post:
A Follow-up...

After two hours of the version '3' , I have seen *zero* errors on them.

Still having a few errors on the version '1' and '2' WUs as those batches run their course.

It looks like the problem is solved.

Thx.
6) Message boards : News : another test run 'de_separation_22_3s_edge_1' (Message 55754)
Posted 13 Oct 2012 by TLSI2000
Post:
I have been seeing computational errors that occur at the end of processing on roughly one out of ten WUs

Error Examples
Milkyway@Home 1.02 MilkyWay@Home (opencl_amd_ati) de_separation_22_3s_edge_1_1350087199_193416_2 00:00:51 (00:00:01) 10/13/2012 9:41:15 AM 10/13/2012 9:43:56 AM 0.05 CPUs + 1 ATI GPU 1.96 Reported: Computation error (1,) Sagita

Milkyway@Home 1.02 MilkyWay@Home (opencl_amd_ati) ps_separation_22_3s_free_1_1350087199_232226_2 00:00:56 (00:00:02) 10/13/2012 9:30:33 AM 10/13/2012 9:32:29 AM 0.05 CPUs + 1 ATI GPU 3.57 Reported: Computation error (1,) Sagita
7) Message boards : News : NBody Update and New Runs (Message 55668)
Posted 7 Oct 2012 by TLSI2000
Post:
As we wait for the new N-body tasks,

does the sub-project have an intended restart date ?
8) Message boards : News : New NBody test searches (Message 55327)
Posted 10 Aug 2012 by TLSI2000
Post:
When the outstanding n-Body work units finally get down to zero, is there to be another series ?

The count is now at 2.
9) Message boards : News : Nbody updated to 0.60 (Message 49419)
Posted 19 Jun 2011 by TLSI2000
Post:
Thank You !!!!!

I have had all of these on my WinXP64 systems fail since the last version.

They are now going through.

Thanks !

10) Message boards : News : another attempt at the max time limit elapsed fix (Message 48925)
Posted 22 May 2011 by TLSI2000
Post:
Even since the new version of the NBody .40 I have been getting another error:

<core_client_version>6.12.26</core_client_version>
<![CDATA[
<message>
There are no child processes to wait for. (0x80) - exit code 128 (0x80)
</message>
]]>


This means that three of my servers cannot process for MW.

These just com up for processing and exit immediately.

no app_info - just normal processing
11) Message boards : Number crunching : annoying pop-up (Message 48658)
Posted 9 May 2011 by TLSI2000
Post:
I know it is not in there at version 6.12.15 -- so I just upgraded to 6.12.26 to be able to make that thing go away.

It really was becoming anoying

Thanks for the info on it.
12) Message boards : News : fix to the invalid workunit problem (Message 48595)
Posted 8 May 2011 by TLSI2000
Post:
3 up - three down - all immediate computation errors


Thanks for the effort.

I'm not such a big player, so I think that I will go elsewhere for a while and come back later.

Thanks.
13) Message boards : News : fix to the invalid workunit problem (Message 48586)
Posted 8 May 2011 by TLSI2000
Post:


I have three systems, all running the MT version of the NBody code on CPUs (no GPUs)

The 32-bit runs fine on the dual processor system, without an app_info file.
Both 64bit systems with 12 cores fail immediately, both with and without an app_info file.


So the problem is not isolated to just the GPU systems.
14) Message boards : News : N-body updated to 0.40 (Message 48443)
Posted 2 May 2011 by TLSI2000
Post:
I am looking at two servers that will not calculate an n-body correctly at all.
I have reset the project on each (twice), and currently am running with no XML file for these.

They all error out immediately with an exit status 128

I have tried the several versions on the XML file presented here, but to no avail.

The version I am using is the one automatically downloaded on the resets, for a 64-bit XP server --
milkyway_nbody_0.40_windows_x86_64__mt

and the two associated dlls are thee as well.
15) Message boards : News : updated the CPU applications (Message 42748)
Posted 11 Oct 2010 by TLSI2000
Post:
a few examples:

this Milkyway@home 0.40 MilkyWay@Home de_16_2s_5_19106_1286482216_0
22:17:35 (22:13:34) 10/9/2010 4:14:37 PM 10/9/2010 6:00:44 PM Reported: Computation error (0,)

this Milkyway@home 0.40 MilkyWay@Home de_13_2s_5_609171_1286475411_0 21:31:01 (21:27:20) 10/9/2010 1:03:55 PM 10/9/2010 2:00:26 PM Reported: Computation error (0,)

this Milkyway@home 0.40 MilkyWay@Home de_16_3s_5_612076_1286476176_0 20:20:48 (20:16:59) 10/9/2010 12:24:45 PM 10/9/2010 12:54:46 PM Reported: Computation error (0,)

16) Message boards : News : updated the CPU applications (Message 42708)
Posted 9 Oct 2010 by TLSI2000
Post:
What is really painful is to watch one of the new WUs process, then get to that 20-22 hour mark and end in what looks like a 'normal' end of processing, but show up as a computation error. So far, this is about 6 cpu-days of processing that is thrown away. I am close to aborting all MW in the queue and going elsewhere for a while.
17) Message boards : News : started a new nbody search: de_nbody_model1_1 (Message 42052)
Posted 11 Sep 2010 by TLSI2000
Post:
Most (about 70%) abort in the first second.

On my two systems, they are taking 20-40 minutes, of the few that don't abort immediately.

And I have had a couple of 'runaways', that completed less than 1% after 20-25 minutes, with an ever increasing estimated time of completion well over an hour.
I aborted these manually




©2024 Astroinformatics Group