Welcome to MilkyWay@home

maximum time limit elapsed bug

Message boards : News : maximum time limit elapsed bug
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7

AuthorMessage
S@NL - EStorm

Send message
Joined: 15 Jul 11
Posts: 14
Credit: 5,978,191
RAC: 0
Message 50324 - Posted: 21 Jul 2011, 14:24:46 UTC - in response to Message 50322.  
Last modified: 21 Jul 2011, 14:27:25 UTC

I know Link which is why I did this test to explain what happens if you do not have the app_info and you did not follow your solution.

I also checked the client_state difference with seti and mikly:
milky
<rsc_fpops_est> 25262095395789.531000</rsc_fpops_est>
<rsc_fpops_bound>2526209539578953.000000</rsc_fpops_bound>

seti
<rsc_fpops_est> 8707547718264.028300</rsc_fpops_est>
<rsc_fpops_bound>500000000000000000.000000</rsc_fpops_bound>

And i think that in milky the values translate in a larger app speed estimate than the WU size.
So they have to increase the WU size estimate (not to happy with that) or decrease the app speed estimate (which would be the same as with app_info).
ID: 50324 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
John Clark

Send message
Joined: 4 Oct 08
Posts: 1734
Credit: 64,228,409
RAC: 0
Message 50325 - Posted: 21 Jul 2011, 14:44:07 UTC

Got the 0.82 GPU client to crunch OK and validate - 4 WUs concerned. This was after removing the double M from the app_info file client use command.

But, it seemed to cause 2 problems - (a) the GPU load was at 98%-99% on a single WU, and slowed the graphics to an unbearable crawl; (b) the NCI CPU work I am crunching keeps on resetting, and that only happened after <iklway was reinstated.

I am getting BM 6.12.33 to replace the 6.12.22 I currently run to see if things improve. But, I have run 6.12.22 for months without problems. So, as far as I am concerned, 6.12.22 seems to be virtually bug free and stable. That is until the new MW GPU client stressed it.
Go away, I was asleep


ID: 50325 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Link
Avatar

Send message
Joined: 19 Jul 10
Posts: 572
Credit: 18,833,316
RAC: 639
Message 50329 - Posted: 21 Jul 2011, 16:38:17 UTC - in response to Message 50324.  

seti
<rsc_fpops_est>8707547718264.028300</rsc_fpops_est>
<rsc_fpops_bound>500000000000000000.000000</rsc_fpops_bound>

Where did you get that from? All my SETI WUs has rsc_fpops_bound = 10x rsc_fpops_est, and that's what it's supposed to be. Milkyway has 100x rsc_fpops_est, which is VERY high compared to other projects, a CPU task which should be completed within for example 10 hours would run 1000 hours if it gets stuck.



I know Link which is why I did this test to explain what happens if you do not have the app_info and you did not follow your solution.
(...)
And i think that in milky the values translate in a larger app speed estimate than the WU size.
So they have to increase the WU size estimate (not to happy with that) or decrease the app speed estimate (which would be the same as with app_info).

What happens without an app_info is that usually everything works as it should, just on a very small procentage of all computers there is a problem.

The problem on all machines I have seen until now was a wrong app speed estimate on new machines, where new means new app version (hence it started for many with 0.82) or somehow reseted app information on the server like the one I posted in message 50228.
ID: 50329 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Beyond
Avatar

Send message
Joined: 15 Jul 08
Posts: 383
Credit: 729,293,740
RAC: 0
Message 50330 - Posted: 21 Jul 2011, 16:41:12 UTC - in response to Message 50325.  

But, it seemed to cause 2 problems - (a) the GPU load was at 98%-99% on a single WU, and slowed the graphics to an unbearable crawl; (b) the NCI CPU work I am crunching keeps on resetting, and that only happened after <iklway was reinstated.

Try this command in your app_info.xml:

<cmdline>--gpu-target-frequency 55 </cmdline>

Increase the number for less graphics lag. Default is 35. Try 6.12.33 for the 2nd issue as various earlier 6.12.xx versions had inordinate task switching problems.
ID: 50330 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
S@NL - EStorm

Send message
Joined: 15 Jul 11
Posts: 14
Credit: 5,978,191
RAC: 0
Message 50331 - Posted: 21 Jul 2011, 17:06:26 UTC - in response to Message 50329.  

seti
<rsc_fpops_est>8707547718264.028300</rsc_fpops_est>
<rsc_fpops_bound>500000000000000000.000000</rsc_fpops_bound>

Where did you get that from? All my SETI WUs has rsc_fpops_bound = 10x rsc_fpops_est, and that's what it's supposed to be. Milkyway has 100x rsc_fpops_est, which is VERY high compared to other projects, a CPU task which should be completed within for example 10 hours would run 1000 hours if it gets stuck.


As mentioned from client_state file from one of my machines. But is doesn't really matter. And I don't know why the values have such a big difference. Could be because I replace my 2x5770 with a 6950.
When I started a week ago I checked the properties of a running WU and there the app speed estimate was greater then the WU size estimate hence the problem. Which was fixed by creating a app_info file.
ID: 50331 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7

Message boards : News : maximum time limit elapsed bug

©2024 Astroinformatics Group