Welcome to MilkyWay@home

No work being D/Led & no warning messages

Message boards : Number crunching : No work being D/Led & no warning messages
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 9579 - Posted: 2 Feb 2009, 22:55:02 UTC - in response to Message 9576.  

Travis,

Logan is right. The real problems for those of us with fast machines are twofold.

1. The estimated duration for a WU, when it is published by the project is way too long (ex. the older WUs show up at around 22 hours after a reset on my machine) After several dozen wus have completed, the DCF has been reduced to the correct amount, but it is considered too low for BOINC to allow more work.

2. The new set of WUs started last week (I38 & I39) had estimated durations out of line with the other WUs. They were about 25% longer to crunch, but the published duration was about 2x.

People weren't seeing problem #1 because we were just above the theshold of the problem until the new WUs drove our DCFs down even lower.

My suggestion is that the published duration for each WU be cut by a factor of 10. That way the fast machines will still have reasonable DCF, and older, slower machines can have DCFs greater than 1.0. Also, try to make sure that each new set of WUs have duration estimates in line with other outstanding sets.

Glenn


I'll update the code thats calculating the flops tonight, hopefully it should smooth things out.

ID: 9579 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 9581 - Posted: 2 Feb 2009, 23:29:03 UTC - in response to Message 9579.  

I just updated the fpops estimate. Newly generated workunits should reflect the different estimate. Let me know how it works out.
ID: 9581 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Labbie
Avatar

Send message
Joined: 29 Aug 07
Posts: 327
Credit: 116,463,193
RAC: 0
Message 9582 - Posted: 2 Feb 2009, 23:57:18 UTC - in response to Message 9581.  

I just updated the fpops estimate. Newly generated workunits should reflect the different estimate. Let me know how it works out.


Yep, I've got some that are estimating a whole 3 seconds. Haven't gotten to them in the queue yet tho'.


Calm Chaos Forum...Join Calm Chaos Now
ID: 9582 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 9583 - Posted: 3 Feb 2009, 0:06:22 UTC - in response to Message 9582.  

I just updated the fpops estimate. Newly generated workunits should reflect the different estimate. Let me know how it works out.


Yep, I've got some that are estimating a whole 3 seconds. Haven't gotten to them in the queue yet tho'.


Might take boinc a little bit to adjust :)
ID: 9583 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Labbie
Avatar

Send message
Joined: 29 Aug 07
Posts: 327
Credit: 116,463,193
RAC: 0
Message 9584 - Posted: 3 Feb 2009, 0:12:09 UTC - in response to Message 9583.  

I just updated the fpops estimate. Newly generated workunits should reflect the different estimate. Let me know how it works out.


Yep, I've got some that are estimating a whole 3 seconds. Haven't gotten to them in the queue yet tho'.


Might take boinc a little bit to adjust :)



Oh yeah, I'm used to that. ;)


Calm Chaos Forum...Join Calm Chaos Now
ID: 9584 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Cori
Avatar

Send message
Joined: 27 Aug 07
Posts: 647
Credit: 27,592,547
RAC: 0
Message 9585 - Posted: 3 Feb 2009, 0:16:24 UTC - in response to Message 9581.  
Last modified: 3 Feb 2009, 0:17:54 UTC

I just updated the fpops estimate. Newly generated workunits should reflect the different estimate. Let me know how it works out.

It seems to have worked at least on my C2D lappy so far: the estimated time to completion was now 4 sec but it has gone up already to 6:57 min after the first WUs being crunched.
But my DCF on that host is still 1.014688 (before it was 0.01) - so that seems fine!
Didn't check my other boxes yet, because it's 1:15am here (German time) and I have to crunch my zzz unit now. *grin*
Lovely greetings, Cori
ID: 9585 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Kevint
Avatar

Send message
Joined: 22 Nov 07
Posts: 285
Credit: 1,076,786,368
RAC: 0
Message 9586 - Posted: 3 Feb 2009, 0:38:43 UTC



Yep - looks like it has been fixed.


Checked a couple of boxes and they are requesting new work.


Thank YOU LOGAN (my awesome team mate) and Gleng - for figuring out what the problem was.
.
ID: 9586 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 9606 - Posted: 3 Feb 2009, 14:09:31 UTC - in response to Message 9581.  
Last modified: 3 Feb 2009, 14:17:05 UTC

I just updated the fpops estimate. Newly generated workunits should reflect the different estimate. Let me know how it works out.

Just looked to the WUs and see you have set it to the value I send you. But to be correct, the single stream WUs are worth ~1.25 TFlop, the ones with two streams more like ~1.9 TFlop. If you look at the runtimes for these WUs, you will see more than 50% increase for the WUs with two streams. At least if you look at the runtimes of some (the fastest ones) optimized applications ;)

And btw., SETI is using single precision calculations. So one flop here (double precision) should be worth more than over there. Do you know the official point of view (David Anderson) for comparing single precision flops (SETI) with double precision flops (MW)?
ID: 9606 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 9629 - Posted: 3 Feb 2009, 22:04:43 UTC - in response to Message 9606.  

I just updated the fpops estimate. Newly generated workunits should reflect the different estimate. Let me know how it works out.

Just looked to the WUs and see you have set it to the value I send you. But to be correct, the single stream WUs are worth ~1.25 TFlop, the ones with two streams more like ~1.9 TFlop. If you look at the runtimes for these WUs, you will see more than 50% increase for the WUs with two streams. At least if you look at the runtimes of some (the fastest ones) optimized applications ;)

And btw., SETI is using single precision calculations. So one flop here (double precision) should be worth more than over there. Do you know the official point of view (David Anderson) for comparing single precision flops (SETI) with double precision flops (MW)?


Actually, what I did was calculate the number of times the calculate_probabilities function was invoked (since this is doing the vast majority of the work), then calculated the number of fpops in there.

I counted multiplies as 1, divides as 5 and sin/cos/pow/exp/sqrt as 10. Pretty gross simplification but it looks like it's a decent estimate -- if anyone has better values for these please let me know :)

I don't know what Dave A's stance on single vs double prescision flops is. It seems like the estimation only has to be a rather loose estimate (or very loose considering our previous ones).
ID: 9629 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Cluster Physik

Send message
Joined: 26 Jul 08
Posts: 627
Credit: 94,940,203
RAC: 0
Message 9633 - Posted: 3 Feb 2009, 22:29:42 UTC - in response to Message 9629.  
Last modified: 3 Feb 2009, 22:45:41 UTC

and sin/cos/pow/exp/sqrt as 10. Pretty gross simplification but it looks like it's a decent estimate -- if anyone has better values for these please let me know :)

These instructions are far more expensive, especially when no native CPU instruction for it exists and they are implemented in the mathlib of your compiler. I've counted directly the needed floating point instructions in the assembler of the GPU code. These figures should be quite exact. I've send it to you already some hours ago. Actually together with two formulas how to deduce the count for arbitrary parameters of the WUs ;)

It seems like the estimation only has to be a rather loose estimate (or very loose considering our previous ones).

My comment was more targeting the calculation of the credits based on the same scheme as SETI is doing. For runtime estimate you are right, it has to be only a loose fit.
ID: 9633 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile speedimic
Avatar

Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 10580 - Posted: 14 Feb 2009, 0:32:43 UTC - in response to Message 9509.  

I am seeing this wonky behavior on at least one of my rigs too........

Runs the cache down until the very last WU has been crunched, and then finally requests more work......


uuuhhhh... it's back!
one of my rigs stopped requesting work...
mic.


ID: 10580 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
msattler

Send message
Joined: 15 Jul 08
Posts: 288
Credit: 5,474,012
RAC: 0
Message 10582 - Posted: 14 Feb 2009, 0:36:22 UTC - in response to Message 10580.  

I am seeing this wonky behavior on at least one of my rigs too........

Runs the cache down until the very last WU has been crunched, and then finally requests more work......


uuuhhhh... it's back!
one of my rigs stopped requesting work...

Hit the update button.....I just got work on several of my rigs..
I am the Kittyman.

Please visit and give a Click for Seti City.




ID: 10582 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile speedimic
Avatar

Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 10583 - Posted: 14 Feb 2009, 0:39:49 UTC - in response to Message 10582.  

LOL - as if boinc cares about users hitting buttons...

Sat 14 Feb 2009 01:37:01 AM CET|Milkyway@home|Reporting 2 completed tasks, not requesting new tasks

mic.


ID: 10583 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
msattler

Send message
Joined: 15 Jul 08
Posts: 288
Credit: 5,474,012
RAC: 0
Message 10586 - Posted: 14 Feb 2009, 0:43:30 UTC - in response to Message 10583.  

LOL - as if boinc cares about users hitting buttons...

Sat 14 Feb 2009 01:37:01 AM CET|Milkyway@home|Reporting 2 completed tasks, not requesting new tasks

I would guess it's a workshare issue then....if you are running more than one project on the rig.
I am the Kittyman.

Please visit and give a Click for Seti City.




ID: 10586 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile speedimic
Avatar

Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 10590 - Posted: 14 Feb 2009, 0:52:40 UTC - in response to Message 10586.  

LOL - as if boinc cares about users hitting buttons...

Sat 14 Feb 2009 01:37:01 AM CET|Milkyway@home|Reporting 2 completed tasks, not requesting new tasks

I would guess it's a workshare issue then....if you are running more than one project on the rig.


100% MW this one.

might need a reset...

well whatever, I'll go to bed now and look if boinc likes some new work tomorrow...
mic.


ID: 10590 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
msattler

Send message
Joined: 15 Jul 08
Posts: 288
Credit: 5,474,012
RAC: 0
Message 10592 - Posted: 14 Feb 2009, 0:55:49 UTC - in response to Message 10590.  

LOL - as if boinc cares about users hitting buttons...

Sat 14 Feb 2009 01:37:01 AM CET|Milkyway@home|Reporting 2 completed tasks, not requesting new tasks

I would guess it's a workshare issue then....if you are running more than one project on the rig.


100% MW this one.

might need a reset...

well whatever, I'll go to bed now and look if boinc likes some new work tomorrow...

Well, if you have no work cached, you can do a reset without losing anything, and it won't delete your opti app, if you are using one....

So try THAT button....LOL.
I am the Kittyman.

Please visit and give a Click for Seti City.




ID: 10592 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 10594 - Posted: 14 Feb 2009, 0:58:58 UTC

Go no work, hit 'update' and got 12. So they might be sparse yet.
Doesn't expecting the unexpected make the unexpected the expected?
If it makes sense, DON'T do it.
ID: 10594 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile speedimic
Avatar

Send message
Joined: 22 Feb 08
Posts: 260
Credit: 57,387,048
RAC: 0
Message 10643 - Posted: 14 Feb 2009, 9:47:31 UTC - in response to Message 10592.  

Well, if you have no work cached, you can do a reset without losing anything, and it won't delete your opti app, if you are using one....

So try THAT button....LOL.


I demand a "Go to sleep and wait"-button!
Sleeping and doing nothing fixed the problem.

mic.


ID: 10643 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 10645 - Posted: 14 Feb 2009, 9:56:18 UTC - in response to Message 10643.  

Well, if you have no work cached, you can do a reset without losing anything, and it won't delete your opti app, if you are using one....

So try THAT button....LOL.


I demand a "Go to sleep and wait"-button!
Sleeping and doing nothing fixed the problem.


Speaking of a "go to sleep and wait" button. I think i just pressed it :) see ya'll tomorrow.
ID: 10645 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
msattler

Send message
Joined: 15 Jul 08
Posts: 288
Credit: 5,474,012
RAC: 0
Message 10646 - Posted: 14 Feb 2009, 10:00:19 UTC - in response to Message 10645.  

Well, if you have no work cached, you can do a reset without losing anything, and it won't delete your opti app, if you are using one....

So try THAT button....LOL.


I demand a "Go to sleep and wait"-button!
Sleeping and doing nothing fixed the problem.


Speaking of a "go to sleep and wait" button. I think i just pressed it :) see ya'll tomorrow.

Rust never sleeps..........
I am the Kittyman.

Please visit and give a Click for Seti City.




ID: 10646 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : No work being D/Led & no warning messages

©2025 Astroinformatics Group