Message boards :
Number crunching :
Results valid at all? Memory leaks everywhere?
Message board moderation
Author | Message |
---|---|
Send message Joined: 22 Jan 08 Posts: 29 Credit: 242,726,778 RAC: 0 |
It seems that MilkyWay@home has a massive memory leak problem. I know, it has been stated before in another thread on this board: There are still some memory leaks left in the code (maybe 4kb or so), we're still trying to find these, but these should be fixed by the next application update. Here an example: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ [x86 Family 15 Model 75 Stepping 2] Please also check this URL: http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=3291913 Well, as you can see, the problem is that this problem does not occur in "some" cases. Above I have reported examples for 5 independent machines that all contain different CPU architectures. All WUs I have checked display this type of error. This error has also been reported by some people on our board to persist in all of the WUs checked - even though these WUs were all reported as "success". Now my question: This doesn't mean that the entire work performed for this project is wasted, right? We are currently putting quite some efforts in supporting this project, but we would decide to just withdraw from it until these issues have been resolved - if the results were invalid at all - and then come back for later support. ;-) The workunit limit (for the time being) is 20, there's not too much i can do about this at the moment, because of how things work server side. Any larger and most of the work people would be doing would be outdated and useless. Here we have another issue. A quick fix would be highly appreciated. On modern machines MilkyWay@home WUs are finished within a few minutes making it impossible to crunch this project full time (24/7) if you do not have a permanent internet connection (just imagine a quad core machine). This fact should at least be posted on the main page of the project if the number of WUs per machine cannot be increased (the reason for why it cannot I actually also do not understand, by the way). On windows machines, occasionally a workunit will break and this will display a popup message causing boinc to freeze. We're still not quite sure why this is happening yet. This is still happening - since weeks now. If you need assistance in nailing the bugs, please let us know such that we can save some log data or whatever you might need. All the best for this project, Michael. P.S.: Other relevant threads in this fourm concerning the memory leak: http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=100 http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=197#1644 President of Rechenkraft.net e.V. - This planet's first and largest distributed computing organization. |
Send message Joined: 8 Oct 07 Posts: 289 Credit: 3,690,838 RAC: 0 |
snip... It seems that MilkyWay@home has a massive memory leak problem. I know, it has been stated before in another thread on this board: Memory leaks are not an indication of valid or invalid it is just a symptom of an app that is not fully optmized to run on a given os. I have had memory leaks on other projects to no detriment and otherwise ran and validated with no problems. As for the cache limit it has been stated elswhere as to why this is so...because the work generator rapidly changes parameters to reflect what the results are telling it is the best way to finish the search....otherwise you would/could be crunching work that would have no scientific use and be crunching just to crunch for credit.I have yet to see a quad-core or 8 core burn through 20 results in 20 minutes with the current wu length we are running...the only issue would be a 16 core...but this project is still ALPHA. At some point the work will get longer in time spent per workunit and should solve this problem. Projects are not set-up to necessarily have users to get constant work...they are set up for the science ...all other considerations are and should be secondary imho. |
Send message Joined: 22 Jan 08 Posts: 29 Credit: 242,726,778 RAC: 0 |
Memory leaks are not an indication of valid or invalid it is just a symptom of an app that is not fully optmized to run on a given os. I have had memory leaks on other projects to no detriment and otherwise ran and validated with no problems. Ok. So, the scientific value is not affected by this issue. That is what I wanted to make sure - thanks for the information. ;-) As for the cache limit... because the work generator rapidly changes parameters to reflect what the results are telling... Ok. So, new work units are based on the results of previous ones. Then a cache limitation makes indeed sense. Thanks a lot for the input. I have yet to see a quad-core or 8 core burn through 20 results in 20 minutes with the current wu length we are running... I did not say that. However, my estimation is that four current WUs will be completed within approx. 20 minutes on a good Quad system, hence in less than two hours a non-networked machine will be out of work if it was not running a second project. And running a second project together with MilkyWay@home was (and is) connected with issues if the "keep application in memory" switch is not enabled (and this switch is in fact disabled after registering with MilkyWay@home even if it was enabled before). Projects are not set-up to necessarily have users to get constant work...they are set up for the science... Well, as a scientist that is crystal clear to me - however, it would sometimes be nice to place such important introductory information on application requirements fixed somewhere on the project main page rather than putting it at some place in a forum which most people (just willing to contribute their spare cycles) often just don't have the time to read. Just an idea... Of course I got the message this is an alpha project and for that it is quite good. So let's hope for a lot of cool results. ;-) Michael. President of Rechenkraft.net e.V. - This planet's first and largest distributed computing organization. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Unfortunately, as of right now increasing the wu in progress limit for machines would be a bit counterproductive. We only expect the length of workunits to increase, as we add more wedges of stars and the modeling complexity increases, so hopefully this wont be a problem for very long. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Unfortunately, as of right now increasing the wu in progress limit for machines would be a bit counterproductive. We only expect the length of workunits to increase, as we add more wedges of stars and the modeling complexity increases, so hopefully this wont be a problem for very long. On another note, as of this week we have two undergrads working on this project, and on the top of the TODO list is getting the memory leaks fixed and a new version of the application out. |
Send message Joined: 17 Nov 07 Posts: 77 Credit: 117,183 RAC: 0 |
On another note, as of this week we have two undergrads working on this project, and on the top of the TODO list is getting the memory leaks fixed and a new version of the application out. Nice. Thanks for the info.
Thanks for the plug. ;-p |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
thanks for the update. |
©2024 Astroinformatics Group