found a small memory error in the code

Author	Message
Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 9234 - Posted: 26 Jan 2009, 22:43:43 UTC Last modified: 27 Jan 2009, 22:16:19 UTC I don't think this is effecting anything at the moment, but it should be fixed and will be in v0.18. At the end of calculate_integral_convolved in evaluation_optimized.c: for (i = 0; i < [b]ap->convolve[/b]; i++) { free(N[i]); free(r_point[i]); free(r3[i]); } needs to be changed to: for (i = 0; i < [b]ia->r_steps[/b]; i++) { free(N[i]); free(r_point[i]); free(r3[i]); } ID: 9234 · Rating: 0 · rate: / Reply Quote

speedimic Send message Joined: 22 Feb 08 Posts: 260 Credit: 57,387,048 RAC: 0	Message 9239 - Posted: 26 Jan 2009, 23:42:23 UTC - in response to Message 9234. Can you please wait a while before releasing a new version? As long as there are no severe problems it's time to settle down a bit... ;) mic. ID: 9239 · Rating: 0 · rate: / Reply Quote

Kokomiko Send message Joined: 27 Sep 07 Posts: 8 Credit: 25,779,225 RAC: 0	Message 9242 - Posted: 27 Jan 2009, 1:17:56 UTC How about a release number instead a new version for minor bug fixes? Or is there anywhere a competition for the most versions in a week? ;) ID: 9242 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 9243 - Posted: 27 Jan 2009, 1:52:06 UTC - in response to Message 9239. Can you please wait a while before releasing a new version? As long as there are no severe problems it's time to settle down a bit... ;) Thats why i didn't update it yet :D This shouldn't have any effect but a small memory leak. ID: 9243 · Rating: 0 · rate: / Reply Quote

speedimic Send message Joined: 22 Feb 08 Posts: 260 Credit: 57,387,048 RAC: 0	Message 9279 - Posted: 27 Jan 2009, 22:13:14 UTC - in response to Message 9234. I don't think this is effecting anything at the moment, but it should be fixed and will be in v0.18. At the end of calculate_integral_convolved in evaluation_state.c: Didn't find it there... you mean evaluation_optimized.c, right? mic. ID: 9279 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 9281 - Posted: 27 Jan 2009, 22:16:08 UTC - in response to Message 9279. Last modified: 27 Jan 2009, 22:16:27 UTC I don't think this is effecting anything at the moment, but it should be fixed and will be in v0.18. At the end of calculate_integral_convolved in evaluation_state.c: Didn't find it there... you mean evaluation_optimized.c, right? Woops, yeah. Edited the post :) ID: 9281 · Rating: 0 · rate: / Reply Quote

[AF>HFR>RR] ThierryH Send message Joined: 2 Jan 08 Posts: 23 Credit: 495,882,464 RAC: 0	Message 13433 - Posted: 1 Mar 2009, 0:30:29 UTC Last modified: 1 Mar 2009, 0:30:54 UTC Other memory error in evaluation_optimized.c : Function free_constants, lines : for (i = 0; i < ap->number_streams; i++) { free(xyz[i]); } Must be replace by : for (i = 0; i < ap->convolve; i++) { free(xyz[i]); } With this error, each work unit create a memory leak of 1.5 Ko. Thierry. ID: 13433 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0	Message 13601 - Posted: 2 Mar 2009, 5:48:36 UTC Um, am I the only one that thinks that this might be one of the reasons my computers turn into turtles? ID: 13601 · Rating: 0 · rate: / Reply Quote

Travis Volunteer moderator Project administrator Project developer Project tester Project scientist Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0	Message 22964 - Posted: 21 May 2009, 23:09:29 UTC - in response to Message 13433. Other memory error in evaluation_optimized.c : Function free_constants, lines : for (i = 0; i < ap->number_streams; i++) { free(xyz[i]); } Must be replace by : for (i = 0; i < ap->convolve; i++) { free(xyz[i]); } With this error, each work unit create a memory leak of 1.5 Ko. Thierry. Thanks, this will be in the newest version of the code. ID: 22964 · Rating: 0 · rate: / Reply Quote

Misfit Send message Joined: 27 Aug 07 Posts: 915 Credit: 1,503,319 RAC: 0	Message 22974 - Posted: 22 May 2009, 0:58:44 UTC - in response to Message 13601. Um, am I the only one that thinks that this might be one of the reasons my computers turn into turtles? Check the turtle section in the BOINC WIKI for help. me@rescam.org ID: 22974 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0	Message 23017 - Posted: 22 May 2009, 13:14:26 UTC - in response to Message 22974. Um, am I the only one that thinks that this might be one of the reasons my computers turn into turtles? Check the turtle section in the BOINC WIKI for help. Just did thank you ... And strangely enough this code was there as an example ... ID: 23017 · Rating: 0 · rate: / Reply Quote

Misfit Send message Joined: 27 Aug 07 Posts: 915 Credit: 1,503,319 RAC: 0	Message 23038 - Posted: 22 May 2009, 19:07:49 UTC - in response to Message 23017. Um, am I the only one that thinks that this might be one of the reasons my computers turn into turtles? Check the turtle section in the BOINC WIKI for help. Just did thank you ... And strangely enough this code was there as an example ... And they made it a sticky me@rescam.org ID: 23038 · Rating: 0 · rate: / Reply Quote

Paul D. Buck Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0	Message 23073 - Posted: 23 May 2009, 0:38:44 UTC - in response to Message 23038. Um, am I the only one that thinks that this might be one of the reasons my computers turn into turtles? Check the turtle section in the BOINC WIKI for help. Just did thank you ... And strangely enough this code was there as an example ... And they made it a sticky Interesting... Well, I don't see the point of having 4 or 5 different documentation sites... Trac Wiki for development, UCB wiki for no information to speak of ... UBW, BOINC FAQ Service ... lord only knows how many others ... fracturing the information like that means that we have that many more times to make mistakes, and updating the same fact in 5 places means that you just spent 5 times as much effort as needed to get the information right ... It was why i quit ... It is obvious to me that the powers that be, Dr.Anderson and the project leaders are quite simply not interested in a good repository of information about BOINC. John37xxxx has been trying to get some other social networking stuff up and running ... and I wish him well, but, that too will likely fail because there is no project support. Sadly, the projects just cannot see that when they don't support these efforts, they die and that is one of the reasons we have only a couple hundred thousand participants world wide ... ID: 23073 · Rating: 0 · rate: / Reply Quote

dobrichev Send message Joined: 30 May 09 Posts: 9 Credit: 105,674 RAC: 0	Message 26947 - Posted: 2 Jul 2009, 21:26:56 UTC Here is the list of the memory leaks I found in application source v0.18 file boinc_astronomy.C void worker() ... free(sp); free_search_parameters(s); // <-- missing ... file evaluation_optimized.c void free_constants(ASTRONOMY_PARAMETERS ap) ... //for (i = 0; i < ap->number_streams; i++) { // <-- wrong for (i = 0; i < ap->convolve; i++) { // correct ... file evaluation_state.c void free_state(EVALUATION_STATE es) { int i; free(es->stream_integrals); for (i = 0; i < es->number_integrals; i++) { free_integral_area(es->integral[i]); free(es->integral[i]); // <-- missing } free(es->integral); } int read_checkpoint(EVALUATION_STATE* es) { ... if (1 > fscanf(file, "background_integral: %lf\n", &(es->background_integral))) return 1; free(es->stream_integrals); // <-missing, read_double_array below allocates memory es->number_streams = read_double_array(file, "stream_integrals", &(es->stream_integrals)); ... file search_parameters.c void free_search_parameters(SEARCH_PARAMETERS *parameters) { free(parameters->search_name); free(parameters->parameters); free(parameters->metadata); free(parameters); // <-- missing } ===== There is also a 4-byte memory leak somewhere in the boinc library, probably in diagnostics_win.cpp diagnostics_threads.push_back(pThreadEntry); ===== Additionaly, when checkpoint is loaded, the memory is previously allocated on the basis of the workunit, but the loops for data reading are based on the checkpoint content. This causes heap corruption when the checkpoint file was generated for another workunit due to the difference in the data member count. Same could occur when for some reason the checkpoint file is corrupt. ===== In general, since the amount of required heap memory is known at the beginning, all the memory allocation calls may be reduced to one allocation per data type. This should affect the performance, especially on multiprocessor mashines. ID: 26947 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 26976 - Posted: 3 Jul 2009, 7:05:14 UTC - in response to Message 26947. In general, since the amount of required heap memory is known at the beginning, all the memory allocation calls may be reduced to one allocation per data type. This should affect the performance, especially on multiprocessor mashines. I doubt it would have any significant effect on the performance. Why should it? ID: 26976 · Rating: 0 · rate: / Reply Quote

dobrichev Send message Joined: 30 May 09 Posts: 9 Credit: 105,674 RAC: 0	Message 26987 - Posted: 3 Jul 2009, 14:12:41 UTC - in response to Message 26976. All memory allocations are executed within seconds, so this will NOT affect the MW application performance. Compiler runtime usually is requesting larger memory blocks from OS, and later malloc calls are managed by the runtime internally within these blocks. When the amount of necessary space is known (which is our case) it is better to allocate memory at once instead of giving the runtime a chance to fragment the memory or to allocate unusual space from the OS. At least this is the way I like to code. The really slow process is the OS memory request in multiprocessor system due to the locks. Frequent OS memory allocation requests will affect the rest of the running processes and cause delays if some really memory intensive applications are running in parallel. This is not the case for the desktop machines. Actually I am not sure how BOINC environment is running the application. Probably on process termination all the memory leaks gone along with the process itself so there might be not memory allocation problems at all. A real benefit would be if the arrays of type x0, y0, z0, x1, y1, z1... are replaced with x0, x1... y0, y1... z0, z1 which allows effective loop vectorization. But this is another matter. ID: 26987 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 26990 - Posted: 3 Jul 2009, 14:37:28 UTC - in response to Message 26987. A real benefit would be if the arrays of type x0, y0, z0, x1, y1, z1... are replaced with x0, x1... y0, y1... z0, z1 which allows effective loop vectorization. But this is another matter. Do you mean the xyz[convolve][3] array? Loop vectorization works also with the current code which can be observed with the optimized Linux apps (which use unmodified code just with an autovectorizing compiler giving a factor 2 or 3 speedup with capable CPUs). ID: 26990 · Rating: 0 · rate: / Reply Quote

dobrichev Send message Joined: 30 May 09 Posts: 9 Credit: 105,674 RAC: 0	Message 27040 - Posted: 4 Jul 2009, 14:12:33 UTC - in response to Message 26990. Do you mean the xyz[convolve][3] array? Exactly. I compiled v0.18 source on Windows. Adding some intrinsics in the innermost loop and drawing out of the loops 1/(q*q), sinb, sinl, cosb and cosl along with some other minor changes resulted in performance improvement of ~11% over the latest Gipsel's SSE3 build for Windows. The half of CPU time is consumed for the exp() function in the inner loop. Using vectorized exp() from intel's compiler didn't help, but storing the intermediate results in array and executing the exp() in a separate loop helped. The rearangement I made is manually to process in parallel the two subsequent loop iterations, using the XMM registers. When x[j] and x[j+1] are located sequentially in the memory they can be loaded in one instruction directly in the register w/o wasting cycles and dirtying other registers for shuffling. This may be applicable for some other arrays too. BTW, any suggestions where and whether to publish my observations (source modifications, code and comments) are welcome. The message boards are messed up (just as I am doing with this post). I am not sure if anybody is still interested in CPU (not GPU) versions since the task is highly prone to parallelization and therefore GPU builds probably contribute almost 100% of the scientific results. ID: 27040 · Rating: 0 · rate: / Reply Quote

banditwolf Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0	Message 27041 - Posted: 4 Jul 2009, 14:46:10 UTC - in response to Message 27040. I'm interested in a faster Cpu app. I only have 1 computer, but faster is better. :) Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. ID: 27041 · Rating: 0 · rate: / Reply Quote

Cluster Physik Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0	Message 27090 - Posted: 5 Jul 2009, 9:13:06 UTC - in response to Message 27040. Last modified: 5 Jul 2009, 9:32:37 UTC I am not sure if anybody is still interested in CPU (not GPU) versions since the task is highly prone to parallelization and therefore GPU builds probably contribute almost 100% of the scientific results. That is one of the reasons I didn't post a link to a slightly improved CPU version. It is generally about 10-20% faster than the build you are referring to (would have to look it up, it's too long ago) and it does not use any intrinsics ;). Using a CPU is largely energy waste compared to a GPU so I stopped looking to the CPU application. Furthermore the credits would be even higher for the CPUs with faster applications. I think the project needs to consider an adjustment. For comparison, I calculated a WU for 74.24 credits on a C2D E8400@stock (3GHz). It took 1820 seconds (some minor stuff was running in the background). BTW, any suggestions where and whether to publish my observations (source modifications, code and comments) are welcome. The message boards are messed up (just as I am doing with this post). You can either open a new thread or write Travis a private message (that's the way I used). I think he is only interested in general modifications that will help the performance on all platforms. Some of the suggestions I've done never made it to the stock app (guess they were forgotten after the major improvements were done). And in the moment he is supposed to prepare the rollout of the GPU project and the CUDA apps. But maybe it will help and it should not be wasted, as the CPU project is planned to run in parallel. ID: 27090 · Rating: 0 · rate: / Reply Quote