Welcome to MilkyWay@home

Broken WUs

Message boards : Number crunching : Broken WUs
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4

AuthorMessage
Profile Paul D. Buck

Send message
Joined: 12 Apr 08
Posts: 621
Credit: 161,934,067
RAC: 0
Message 31118 - Posted: 17 Sep 2009, 17:20:44 UTC - in response to Message 31116.  
Last modified: 17 Sep 2009, 17:37:58 UTC

But it is supposedly fixed in the 6.10.5 preview version Cruch3r compiled from the source in the svn trunk. By the way, they are messing around a lot with the scheduler again according to the checkin notes ;)

I have seen a couple of notes from folks on Collatz I think where people are having trouble with 6.10.5 too ...

In that this was my only error (as best as I can tell with the hyperactive purge) I will stick with what is working for me.

{edit}
Most of the changes but one are cosmetic/minor as best as I can tell. The only one I think might make any difference in any of the bad things most of us have seen is the one where a change was made to RR Sim to only model up to the number of CPUs ... though I will have to admit I was never able to figure out the code for RR SIm so I cannot evaluate if the change does what it is suggested it will do ...

I do know that if you look at the output from turning on the dump of RR Sim's messages that it did not come close to making a model that made sense to me as to reliable use of the resources ...

At any rate, if the model done by RR Sim is more realistic it cannot help but make the Resource Scheduler and Work Fetch modules work better ... but, one more point, I am not sure if the modeling of GPUs is properly done either ...
ID: 31118 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Adi

Send message
Joined: 25 Mar 09
Posts: 5
Credit: 72,241,116
RAC: 0
Message 31515 - Posted: 25 Sep 2009, 18:34:37 UTC

Please look here:

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=58901

all are "Completed, marked as invalid"
except a few (3-4) I manually aborted

I even changed the boinc ver (6.4.5 and 6.6.40).

The comp is a dual quad, CANNOT be overclocked, and I didn't have this issues until now.

Thanks
ID: 31515 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 31519 - Posted: 25 Sep 2009, 20:05:12 UTC - in response to Message 31515.  

Please look here:

http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=58901

all are "Completed, marked as invalid"
except a few (3-4) I manually aborted

I even changed the boinc ver (6.4.5 and 6.6.40).

The comp is a dual quad, CANNOT be overclocked, and I didn't have this issues until now.

Thanks

Strange. Can you underclock it?
The WUs complete much too fast. Either you got some WUs with wrong parameters, or your CPU does something wrong. Especially the vectorized 64bit application stresses a CPU quite much. Maybe it gets too hot and you have to clean the cooler? But that is just some random guessing.
ID: 31519 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Adi

Send message
Joined: 25 Mar 09
Posts: 5
Credit: 72,241,116
RAC: 0
Message 31532 - Posted: 25 Sep 2009, 23:16:39 UTC - in response to Message 31519.  
Last modified: 25 Sep 2009, 23:19:37 UTC

Thanks for reply.

It's a server (Fujitsu-Siemens), cannot change any parameters of the procs, RAM, etc., coolers are clean (in double exemplary, to hot-plug them, also are the power sources), temps of proc/RAM/system are OK.

All the other projects are OK (seti, climate, einstein, aqua).

So the problems are definitely not the computer or boinc client.

And the milky WUs you saw were completed in around 45 mins, but are reported in ~10 min.

Right now on milky I have "No new tasks", and 0 tasks to do.
I'll try next days to get few tasks and see how are reported.
ID: 31532 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile verstapp
Avatar

Send message
Joined: 26 Jan 09
Posts: 589
Credit: 497,834,261
RAC: 0
Message 31549 - Posted: 26 Sep 2009, 3:34:24 UTC
Last modified: 26 Sep 2009, 3:35:46 UTC

Of course a reboot can often set things right.
- Reboot? Server? Over my dead body.
Cheers,

PeterV

.
ID: 31549 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile kashi

Send message
Joined: 30 Dec 07
Posts: 311
Credit: 149,490,184
RAC: 0
Message 31560 - Posted: 26 Sep 2009, 6:58:17 UTC

Changed kernels recently on the computers that are giving error results? I think Ubuntu 9.04 causes problems with some BOINC projects, perhaps 2.6.18-164.el5 is also not compatible with MilkyWay.
ID: 31560 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Adi

Send message
Joined: 25 Mar 09
Posts: 5
Credit: 72,241,116
RAC: 0
Message 31575 - Posted: 26 Sep 2009, 17:33:19 UTC - in response to Message 31560.  
Last modified: 26 Sep 2009, 17:39:04 UTC

No reboot (it's not winblow$), no kernel problem (some good results were done with the same kernel).

I suspect boinc client, because, with 6.6.36, i get:

Sat 26 Sep 2009 08:16:35 PM EEST	Milkyway@home	Message from server: (won't finish in time) BOINC runs 98.5% of time, computation enabled 100.0% of that


And I know this is not true, it's a problem in 6.6.36 with the boinc's scheduler.
6.4.5 and 6.6.40 get wu's, but shows only 1/8 of the time run for milky WUs (maybe because of the 8 processors?).

Well, I'll leave it as it is, waiting for a new boinc client and/or MW app.
I'll get from time to time some WUs, for testing.

Thanks again for replies, have a nice weekend.



LATER EDIT:

Supposition 1:
It seems the sequence: 6.4.5, overwritten with 6.6.36, overwritten with 6.6.40, solved the problem,
Supposition 2:
Maybe drowning the milky's queue done the trick.


I'll post to confirm/infirm that it's working, but, of course, even if it'll work, I won't know which supposition is correct :)
ID: 31575 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Adi

Send message
Joined: 25 Mar 09
Posts: 5
Credit: 72,241,116
RAC: 0
Message 31638 - Posted: 27 Sep 2009, 19:59:30 UTC - in response to Message 31575.  

Problem persists :(

Right now a task finished in 00:49:57 (as reported by boinc 6.6.40,
and close to the average for the other tasks, 00:50:17),
which is 2997 seconds of computation, but on the milkyway site it says

115351319 	112943238  	26 Sep 2009 17:31:40 UTC  	27 Sep 2009 19:18:15 UTC  	Completed, marked as invalid  	561.56  	3.23  	0.00


I'll probably reset, even detach/reattach, in the next days.
(all the other projectys are fine, and seti also runs the optimized app.).

(The errors are from trying SSE4.2, but proc only 4.1).
ID: 31638 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Adi

Send message
Joined: 25 Mar 09
Posts: 5
Credit: 72,241,116
RAC: 0
Message 31876 - Posted: 2 Oct 2009, 18:34:02 UTC - in response to Message 31638.  

Problem solved after resetting the project.

Thanks everybody for help.
ID: 31876 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4

Message boards : Number crunching : Broken WUs

©2024 Astroinformatics Group