Welcome to MilkyWay@home

Posts by ChertseyAl

41) Message boards : Number crunching : 20 workunit limit (Message 2372)
Posted 18 Mar 2008 by Profile ChertseyAl
Post:
5h return time ? what's that a PII@300 MHz ? or do you mean the time that pases befor your boinc client contacts the server ?


Sorry, must have explained this on another thread that you missed.

1.8G machine, WU time 15m. 20WU max. Last WU in will take a minimum of 5h to get crunched and reported. FIFO and all that.

So every WU that machine crunches will take 5h from arrival to reporting. Agreed?

Solution: Set a buffer of 0.001 hours, so that only one WU is ever active. But that throttles the faster hosts. Also, given the 'reliability' of MW, that ain't a good strategy ;)

Now, a BOINC client that can limit the number of WUs per project (say 2 for MW), give them priority over all else, but respect the deadlines of backup projects (and we need BU projects when MW is still flaky) to avoid starving other projects would be good.

I'm really thinking that genetic models don't suit BOINC. BOINC is great for boring, grinding, tedious number-crunching, but evolving models won't work.

BTW, this has nothing to d0o with MW. MW is one of my fave projects. But, as such, I'd rather see it run for the science than for the user.

Al.
42) Message boards : Number crunching : 20 workunit limit (Message 2368)
Posted 18 Mar 2008 by Profile ChertseyAl
Post:
It does. So-called "panic mode". *grin*
But it would be bad if BOINC would go into constant panic mode and only crunch Milkyways and you don't get your other WUs finished.


Exactly the point I raised yesterday.

The current computing model doesn't really suit BOINC, it relies on rapid reporting.

Options:

1) Abandon BOINC and go stand-alone.

2) As 1 above, but farm out 'child' genetic threads to BOINC.

3) Make WUs a mix of parallel genetic seeds, increasing crunching time, but feeding more improved start points back into the matrix.

I'm actually thinking that option 1 might be best. My slow host is turning over WUs in 5 hours. This is too slow. I will probably set this host to NNT as I suspect more 'old' science is jsut that ... OLD.

Al.
43) Message boards : Number crunching : More Work !!! Please :) (Message 2338)
Posted 17 Mar 2008 by Profile ChertseyAl
Post:
Got 20 on my lappy as well but not on the other boxes. ;-(


I seem to have got 10 on my slowest host, nothing on the faster ones.

Never mind, by tomorrow morning I may have processed a load on the other machines.

Time to crunch some Sleep@Home ...

Al.
44) Message boards : Number crunching : 20 workunit limit (Message 2331)
Posted 17 Mar 2008 by Profile ChertseyAl
Post:
Replying to myself ... Well, at least I listen to myself sometimes ;) ...

The remaining WUs are being cleared at about 132 WU/hour.

I usually clear about 15 WU/hour.

So we've either got 9 active 'consumers' or a load of totally irrelevant results waiting to be returned from hosts bunged-up with other work.

Let's abort the old WUs from the server and make the science count.

I no longer run a number of projects because they let stale work trickle in way past it's sell-by date, only to be discarded.

Sorry, I went all *serious* for a moment there ;)

Al.
45) Message boards : Number crunching : More Work !!! Please :) (Message 2329)
Posted 17 Mar 2008 by Profile ChertseyAl
Post:
I heard an unfounded rumor that I was somehow involved.


Yeah, and they even caught you on camera :P

see the pic:




No no no, that's JRenkar, the HERETIC that started this FIASCO.

BURN THE WITCH!!!!!

Al.
46) Message boards : Number crunching : 20 workunit limit (Message 2327)
Posted 17 Mar 2008 by Profile ChertseyAl
Post:
The server has been out of work for nearly 24 hours now, and there are still over 5000 WUs out in the wind. Seems to me that if new work generation depends on the old results, the deadline should be shortened.


And the rate that these are being cleared is very slow.

Could we use server-side aborts on 'old' WUs? Yes, it's annoying when your cache/stash gets aborted, but I can't see any point on wasting cycles on useless work.

If this isn't feasible, maybe BOINC just isn't the right platform? Maybe the work driving the next generation work should be non-BOINC and other work farmed out to a slower BOINC network.

Maybe try 12 hours and see how that goes, at least if the WU ends up on a duffer, it would time out and possibly get sent to a faster, more reliable host.


Tight deadlines are a nightmare. MW will end up in High Pri permanently and other projects will be starved.

Maybe if each WU carried work from a number of different genetic seeds and ran for a while longer?

When MW is running, it's fine for me. But my slow host may take 5 hours to turn around the last WU in a batch :(

I can't help but feel that server-side aborts are the way to go, but machines that are not on a permanent net connection are still going to waste work :(

Al.
47) Message boards : Number crunching : Smooth sailing-Quiet board (Message 2323)
Posted 17 Mar 2008 by Profile ChertseyAl
Post:
.....Travis....Dave....wake up....


Oooh, you are in SO much trouble when they turn up ;)

Al.

48) Message boards : Number crunching : GECCO2008 paper accepted (Message 2317)
Posted 17 Mar 2008 by Profile ChertseyAl
Post:
Nice paper Travis!

Question, the below paragraph seems to indicate the results returned quickest update the database and generate a new line to compute. If this is so, how useful are the slower computers as it seems their results as you said will be outdated when received. I would think a minimum "work unit crunch time" suggestion pointing out the "real time" model updating as a qualifier for computers for this project so people with slow units do not waste their time and your server space with outdated results would be needed.

In the first phase of the algorithm (while the population
size is less than the maximum population size) the server
is being initialized and a random population is generated.
When a request work message is processed, a random pa-
rameter set is generated, and when a report work message
is processed, the population is updated
with the parameters
and the fitness of that evaluation. When enough report work
messages have been processed
, the algorithm proceeds into
the second phase which performs the actual genetic search.
In the second phase, report work will insert the new pa-
rameters and their fitness into the population but only if
they are better than the worst current member and remove
the worst member if required to keep the population size
the same. Otherwise the parameters and the result are dis-
carded. Processing a request work message will either return
a mutation or reproduction (crossover) from the population.


AS Travis explained in This Thread?

Al.
49) Message boards : Number crunching : Smooth sailing-Quiet board (Message 2308)
Posted 17 Mar 2008 by Profile ChertseyAl
Post:


I hope it works. :-P


No good :(

We're well and truly Renkar'd ;)

Al.
50) Message boards : Number crunching : More Work !!! Please :) (Message 2300)
Posted 16 Mar 2008 by Profile ChertseyAl
Post:
Well since I jinxed us with that other post maybe if I say need more work it will work?


Nope. No Cute Kitten picture. Nice try, but no cigar.

Al.
51) Message boards : Number crunching : Smooth sailing-Quiet board (Message 2295)
Posted 16 Mar 2008 by Profile ChertseyAl
Post:
It's called the 'Post of Death' server subroutine..... :(


Don't worry - I've figured it out.

If Cori posts *AND* includes a picture of a cute kitten, we'll be OK. Seems to be some kind of 'dark force' that operates invisibly. I've not had time to extraplote this from forum postings across all 10 dimensions though. I got distracted by a cute kitten in the garden.

Maybe Travis can factor CK (Cute Kitten) into the incomprehsible technobabble that passes for a GECCO paper.

:)

Al.
52) Message boards : Number crunching : Smooth sailing-Quiet board (Message 2290)
Posted 16 Mar 2008 by Profile ChertseyAl
Post:
Seems to be common in alpha-beta projects....guess nobody has anything to say then and is a compliment to the administrators :D

Btw-nice to see!


Yup - Everything running very nicely client-side. Plenty of work, no more freezings WUs.

The server even stood up to a post from Cori yesterday ;)

Al.
53) Message boards : Number crunching : More Work !!! Please :) (Message 2260)
Posted 14 Mar 2008 by Profile ChertseyAl
Post:
[quote*Grin* [/quote]

Cori has posted. Expect long server outages ;)

Al.
54) Message boards : Number crunching : How about more than 20 workunits at a time. (Message 2163)
Posted 11 Mar 2008 by Profile ChertseyAl
Post:
Yes, would be a good workaround for the current situation! ;-)


Alternatives:

1) Make the work units bigger, like they used to be.

2) Don't let Dave out of the lab. Push food under the door if he gets hungry(pizza is nice and thin).

3) Every time the server fails, each member of the IT department has to eat a MilkyWay bar. For those that are unfamilar with this 'treat', it's like a mix of clay and window putty, fluffed up to make make it half the weight it should be, and all covered in sickly sweet chocolate. That should focus their minds.

Al.
55) Message boards : Number crunching : Server Outages (Message 2150)
Posted 10 Mar 2008 by Profile ChertseyAl
Post:
this really needs looked at, and fixed


Yeah, we need to get rid of the FreeBSD as server OS ASAP (blame the labstaff for that one).. hopefully after spring break we'll ge that all sovled...

BTW... I'm getting mad without my daily dosis of Milkyway ;)



Alternative: Get Dave to live in the lab, do reboots etc.

I have no idea who Dave is, but we need this guy within easy reach of the reboot button ;)

I have to say, this is the most exciting project I've every taken part in. Random server access, Krazy Kredit, good science, and a real community spirit. Long may it last :)

Al.
56) Message boards : Number crunching : Server error: can't attach shared memory (Message 2122)
Posted 8 Mar 2008 by Profile ChertseyAl
Post:
I'm sure you are aware of this, but I'll post it anyway:

08/03/08 17:53:30|Milkyway@home|Message from server: Server error: can't attach shared memory

Same behaviour on 3 hosts. All pending uploads now cleared, it's just the reporting that fails.

Cheers,

Al.
57) Message boards : Number crunching : More Work !!! Please :) (Message 2046)
Posted 7 Mar 2008 by Profile ChertseyAl
Post:


it should be genrating more work, you guys getting any yet?


Yes! Got some for 2 hosts. The second host didn't get the full amount requested - I guess I drained you dry ;)

Al.
58) Message boards : Number crunching : application v1.21/v1.22 errors/memory leaks/crashes here (Message 2024)
Posted 7 Mar 2008 by Profile ChertseyAl
Post:
1.21 runs very slightly slower than 1.19 on my P4 2.4 XP32, and very slightly faster on my Celeron 2.93 XP32 - Not much in it really.

Progress bar is still running slow on both (about a third slower than it should be).

No problems so far :)

Al.
59) Message boards : Number crunching : credit issues (Message 1975)
Posted 6 Mar 2008 by Profile ChertseyAl
Post:
Let us know if 1.19 is running that much faster on the windows environment -- if so we might have to reduce credit awarded to keep it in line with other projects.


1.19 is running slightly slower (about 5%) than 1.18, 32bit XP. i.e. Still more than twice as fast as before.

So, credit is still way too high.

But rather than reduce the credit (which is OK), could we just have longer WUs? About 30 minutes to an hour would be nice :)

[Oh, progress bar is running at 10% of actual progress (not a big deal)]

Al.


60) Message boards : Number crunching : credit issues (Message 1907)
Posted 5 Mar 2008 by Profile ChertseyAl
Post:
Let us know if 1.19 is running that much faster on the windows environment -- if so we might have to reduce credit awarded to keep it in line with other projects.


What's happening with 1.18? Might be nice if you replied to my post, it would give some idea if you've actually done anything ;)

Al.


Previous 20 · Next 20

©2024 Astroinformatics Group