Welcome to MilkyWay@home

Results valid at all? Memory leaks everywhere?

Message boards : Number crunching : Results valid at all? Memory leaks everywhere?
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Michael H.W. Weber

Send message
Joined: 22 Jan 08
Posts: 29
Credit: 242,726,778
RAC: 0
Message 1645 - Posted: 5 Feb 2008, 12:17:42 UTC
Last modified: 5 Feb 2008, 12:54:19 UTC

It seems that MilkyWay@home has a massive memory leak problem. I know, it has been stated before in another thread on this board:

There are still some memory leaks left in the code (maybe 4kb or so), we're still trying to find these, but these should be fixed by the next application update.


Here an example:

AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ [x86 Family 15 Model 75 Stepping 2]
<core_client_version>5.10.30</core_client_version>
<![CDATA[
<stderr_txt>


**********
**********

Memory Leaks Detected!!!

Memory Statistics:
0 bytes in 0 Free Blocks.
94 bytes in 3 Normal Blocks.
4652 bytes in 3 CRT Blocks.
0 bytes in 0 Ignore Blocks.
0 bytes in 0 Client Blocks.
Largest number used: 6054589 bytes.
Total allocations: -1393111203 bytes.

Dumping objects ->
c:researchboincapiboinc_api.c(155) : {54} normal block at 0x00992960, 4 bytes long.
Data: < > 00 00 AB 00
c:researchboinclibparse.c(142) : {53} normal block at 0x009928D8, 86 bytes long.
Data: < <color_scheme>T> 0A 3C 63 6F 6C 6F 72 5F 73 63 68 65 6D 65 3E 54
{46} normal block at 0x00992860, 4 bytes long.
Data: < @ > 18 40 99 00
Object dump complete.

</stderr_txt>
]]>

Intel(R) Pentium(R) 4 CPU 3.20GHz [x86 Family 15 Model 4 Stepping 9] [fpu tsc pae nx sse sse2 mmx]
<core_client_version>5.8.16</core_client_version>
<![CDATA[
<stderr_txt>


**********
**********

Memory Leaks Detected!!!

Memory Statistics:
0 bytes in 0 Free Blocks.
94 bytes in 3 Normal Blocks.
4652 bytes in 3 CRT Blocks.
0 bytes in 0 Ignore Blocks.
0 bytes in 0 Client Blocks.
Largest number used: 6054572 bytes.
Total allocations: -1393052500 bytes.

Dumping objects ->
c:researchboincapiboinc_api.c(155) : {49} normal block at 0x00742960, 4 bytes long.
Data: < > 00 00 85 00
c:researchboinclibparse.c(142) : {48} normal block at 0x007428D8, 86 bytes long.
Data: < <color_scheme>T> 0A 3C 63 6F 6C 6F 72 5F 73 63 68 65 6D 65 3E 54
{41} normal block at 0x00742860, 4 bytes long.
Data: <x>t > 78 3E 74 00
Object dump complete.


</stderr_txt>
]]>

AMD Athlon(TM) XP 1700+ [x86 Family 6 Model 6 Stepping 2]
core_client_version>5.10.30</core_client_version>
<![CDATA[
<stderr_txt>


**********
**********

Memory Leaks Detected!!!

Memory Statistics:
0 bytes in 0 Free Blocks.
94 bytes in 3 Normal Blocks.
4652 bytes in 3 CRT Blocks.
0 bytes in 0 Ignore Blocks.
0 bytes in 0 Client Blocks.
Largest number used: 6054589 bytes.
Total allocations: -1392784057 bytes.

Dumping objects ->
c:researchboincapiboinc_api.c(155) : {56} normal block at 0x009C2A58, 4 bytes long.
Data: < > 00 00 AD 00
c:researchboinclibparse.c(142) : {55} normal block at 0x009C29D0, 86 bytes long.
Data: < <color_scheme>T> 0A 3C 63 6F 6C 6F 72 5F 73 63 68 65 6D 65 3E 54
{48} normal block at 0x009C2958, 4 bytes long.
Data: < B > A0 42 9C 00
Object dump complete.


</stderr_txt>
]]>

AMD Athlon(tm) XP 2600+ [x86 Family 6 Model 8 Stepping 1]
core_client_version>5.10.30</core_client_version>
<![CDATA[
<stderr_txt>


**********
**********

Memory Leaks Detected!!!

Memory Statistics:
0 bytes in 0 Free Blocks.
94 bytes in 3 Normal Blocks.
4652 bytes in 3 CRT Blocks.
0 bytes in 0 Ignore Blocks.
0 bytes in 0 Client Blocks.
Largest number used: 6054588 bytes.
Total allocations: -1393044166 bytes.

Dumping objects ->
c:researchboincapiboinc_api.c(155) : {55} normal block at 0x009C2A58, 4 bytes long.
Data: < > 00 00 AD 00
c:researchboinclibparse.c(142) : {54} normal block at 0x009C29D0, 86 bytes long.
Data: < <color_scheme>T> 0A 3C 63 6F 6C 6F 72 5F 73 63 68 65 6D 65 3E 54
{47} normal block at 0x009C2958, 4 bytes long.
Data: <`A > 60 41 9C 00
Object dump complete.


</stderr_txt>
]]>

AMD Athlon(tm) 64 X2 Dual Core Processor 4800+ [x86 Family 15 Model 107 Stepping 1]
<core_client_version>5.10.30</core_client_version>
<![CDATA[
<stderr_txt>


**********
**********

Memory Leaks Detected!!!

Memory Statistics:
0 bytes in 0 Free Blocks.
94 bytes in 3 Normal Blocks.
4652 bytes in 3 CRT Blocks.
0 bytes in 0 Ignore Blocks.
0 bytes in 0 Client Blocks.
Largest number used: 6054589 bytes.
Total allocations: -1393144773 bytes.

Dumping objects ->
c:researchboincapiboinc_api.c(155) : {54} normal block at 0x00992960, 4 bytes long.
Data: < > 00 00 AB 00
c:researchboinclibparse.c(142) : {53} normal block at 0x009928D8, 86 bytes long.
Data: < <color_scheme>T> 0A 3C 63 6F 6C 6F 72 5F 73 63 68 65 6D 65 3E 54
{46} normal block at 0x00992860, 4 bytes long.
Data: < @ > 18 40 99 00
Object dump complete.


</stderr_txt>
]]>


Please also check this URL:
http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=3291913

Well, as you can see, the problem is that this problem does not occur in "some" cases. Above I have reported examples for 5 independent machines that all contain different CPU architectures. All WUs I have checked display this type of error. This error has also been reported by some people on our board to persist in all of the WUs checked - even though these WUs were all reported as "success".

Now my question: This doesn't mean that the entire work performed for this project is wasted, right? We are currently putting quite some efforts in supporting this project, but we would decide to just withdraw from it until these issues have been resolved - if the results were invalid at all - and then come back for later support. ;-)

The workunit limit (for the time being) is 20, there's not too much i can do about this at the moment, because of how things work server side. Any larger and most of the work people would be doing would be outdated and useless.

Here we have another issue. A quick fix would be highly appreciated. On modern machines MilkyWay@home WUs are finished within a few minutes making it impossible to crunch this project full time (24/7) if you do not have a permanent internet connection (just imagine a quad core machine). This fact should at least be posted on the main page of the project if the number of WUs per machine cannot be increased (the reason for why it cannot I actually also do not understand, by the way).

On windows machines, occasionally a workunit will break and this will display a popup message causing boinc to freeze. We're still not quite sure why this is happening yet.

This is still happening - since weeks now.

If you need assistance in nailing the bugs, please let us know such that we can save some log data or whatever you might need.

All the best for this project,
Michael.

P.S.: Other relevant threads in this fourm concerning the memory leak:

http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=100
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=197#1644
President of Rechenkraft.net e.V. - This planet's first and largest distributed computing organization.

ID: 1645 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Jayargh
Avatar

Send message
Joined: 8 Oct 07
Posts: 289
Credit: 3,690,838
RAC: 0
Message 1646 - Posted: 5 Feb 2008, 16:31:02 UTC - in response to Message 1645.  
Last modified: 5 Feb 2008, 16:53:37 UTC

snip...
It seems that MilkyWay@home has a massive memory leak problem. I know, it has been stated before in another thread on this board:

There are still some memory leaks left in the code (maybe 4kb or so), we're still trying to find these, but these should be fixed by the next application update.

Memory leaks are not an indication of valid or invalid it is just a symptom of an app that is not fully optmized to run on a given os. I have had memory leaks on other projects to no detriment and otherwise ran and validated with no problems.

As for the cache limit it has been stated elswhere as to why this is so...because the work generator rapidly changes parameters to reflect what the results are telling it is the best way to finish the search....otherwise you would/could be crunching work that would have no scientific use and be crunching just to crunch for credit.I have yet to see a quad-core or 8 core burn through 20 results in 20 minutes with the current wu length we are running...the only issue would be a 16 core...but this project is still ALPHA. At some point the work will get longer in time spent per workunit and should solve this problem.

Projects are not set-up to necessarily have users to get constant work...they are set up for the science ...all other considerations are and should be secondary imho.

ID: 1646 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Michael H.W. Weber

Send message
Joined: 22 Jan 08
Posts: 29
Credit: 242,726,778
RAC: 0
Message 1647 - Posted: 5 Feb 2008, 18:02:01 UTC - in response to Message 1646.  
Last modified: 5 Feb 2008, 18:09:23 UTC

Memory leaks are not an indication of valid or invalid it is just a symptom of an app that is not fully optmized to run on a given os. I have had memory leaks on other projects to no detriment and otherwise ran and validated with no problems.

Ok. So, the scientific value is not affected by this issue. That is what I wanted to make sure - thanks for the information. ;-)

As for the cache limit... because the work generator rapidly changes parameters to reflect what the results are telling...

Ok. So, new work units are based on the results of previous ones. Then a cache limitation makes indeed sense. Thanks a lot for the input.

I have yet to see a quad-core or 8 core burn through 20 results in 20 minutes with the current wu length we are running...

I did not say that. However, my estimation is that four current WUs will be completed within approx. 20 minutes on a good Quad system, hence in less than two hours a non-networked machine will be out of work if it was not running a second project. And running a second project together with MilkyWay@home was (and is) connected with issues if the "keep application in memory" switch is not enabled (and this switch is in fact disabled after registering with MilkyWay@home even if it was enabled before).

Projects are not set-up to necessarily have users to get constant work...they are set up for the science...

Well, as a scientist that is crystal clear to me - however, it would sometimes be nice to place such important introductory information on application requirements fixed somewhere on the project main page rather than putting it at some place in a forum which most people (just willing to contribute their spare cycles) often just don't have the time to read. Just an idea...
Of course I got the message this is an alpha project and for that it is quite good. So let's hope for a lot of cool results. ;-)

Michael.
President of Rechenkraft.net e.V. - This planet's first and largest distributed computing organization.

ID: 1647 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 1653 - Posted: 6 Feb 2008, 8:26:25 UTC - in response to Message 1647.  

Unfortunately, as of right now increasing the wu in progress limit for machines would be a bit counterproductive. We only expect the length of workunits to increase, as we add more wedges of stars and the modeling complexity increases, so hopefully this wont be a problem for very long.
ID: 1653 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Travis
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 30 Aug 07
Posts: 2046
Credit: 26,480
RAC: 0
Message 1654 - Posted: 6 Feb 2008, 8:27:08 UTC - in response to Message 1653.  

Unfortunately, as of right now increasing the wu in progress limit for machines would be a bit counterproductive. We only expect the length of workunits to increase, as we add more wedges of stars and the modeling complexity increases, so hopefully this wont be a problem for very long.


On another note, as of this week we have two undergrads working on this project, and on the top of the TODO list is getting the memory leaks fixed and a new version of the application out.
ID: 1654 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile JLDun
Avatar

Send message
Joined: 17 Nov 07
Posts: 77
Credit: 117,183
RAC: 0
Message 1665 - Posted: 7 Feb 2008, 9:57:04 UTC - in response to Message 1645.  

On another note, as of this week we have two undergrads working on this project, and on the top of the TODO list is getting the memory leaks fixed and a new version of the application out.

Nice. Thanks for the info.


P.S.: Other relevant threads in this fourm concerning the memory leak:

Leaky Memory
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=197#1644

Thanks for the plug. ;-p
ID: 1665 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile banditwolf
Avatar

Send message
Joined: 12 Nov 07
Posts: 2425
Credit: 524,164
RAC: 0
Message 1668 - Posted: 7 Feb 2008, 14:15:29 UTC

thanks for the update.
ID: 1668 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : Results valid at all? Memory leaks everywhere?

©2024 Astroinformatics Group