Message boards :
Application Code Discussion :
found a small memory error in the code
Message board moderation
Author | Message |
---|---|
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I don't think this is effecting anything at the moment, but it should be fixed and will be in v0.18. At the end of calculate_integral_convolved in evaluation_optimized.c: for (i = 0; i < [b]ap->convolve[/b]; i++) { free(N[i]); free(r_point[i]); free(r3[i]); } needs to be changed to: for (i = 0; i < [b]ia->r_steps[/b]; i++) { free(N[i]); free(r_point[i]); free(r3[i]); } |
Send message Joined: 22 Feb 08 Posts: 260 Credit: 57,387,048 RAC: 0 |
Can you please wait a while before releasing a new version? As long as there are no severe problems it's time to settle down a bit... ;) mic. |
Send message Joined: 27 Sep 07 Posts: 8 Credit: 25,779,225 RAC: 0 |
How about a release number instead a new version for minor bug fixes? Or is there anywhere a competition for the most versions in a week? ;) |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Can you please wait a while before releasing a new version? Thats why i didn't update it yet :D This shouldn't have any effect but a small memory leak. |
Send message Joined: 22 Feb 08 Posts: 260 Credit: 57,387,048 RAC: 0 |
I don't think this is effecting anything at the moment, but it should be fixed and will be in v0.18. Didn't find it there... you mean evaluation_optimized.c, right? mic. |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
I don't think this is effecting anything at the moment, but it should be fixed and will be in v0.18. Woops, yeah. Edited the post :) |
Send message Joined: 2 Jan 08 Posts: 23 Credit: 495,882,464 RAC: 0 |
Other memory error in evaluation_optimized.c : Function free_constants, lines : for (i = 0; i < ap->number_streams; i++) { free(xyz[i]); } Must be replace by : for (i = 0; i < ap->convolve; i++) { free(xyz[i]); } With this error, each work unit create a memory leak of 1.5 Ko. Thierry. |
Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 |
Um, am I the only one that thinks that this might be one of the reasons my computers turn into turtles? |
Send message Joined: 30 Aug 07 Posts: 2046 Credit: 26,480 RAC: 0 |
Other memory error in evaluation_optimized.c : Thanks, this will be in the newest version of the code. |
Send message Joined: 27 Aug 07 Posts: 915 Credit: 1,503,319 RAC: 0 |
Um, am I the only one that thinks that this might be one of the reasons my computers turn into turtles? Check the turtle section in the BOINC WIKI for help. me@rescam.org |
Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 |
Um, am I the only one that thinks that this might be one of the reasons my computers turn into turtles? Just did thank you ... And strangely enough this code was there as an example ... |
Send message Joined: 27 Aug 07 Posts: 915 Credit: 1,503,319 RAC: 0 |
Um, am I the only one that thinks that this might be one of the reasons my computers turn into turtles? And they made it a sticky me@rescam.org |
Send message Joined: 12 Apr 08 Posts: 621 Credit: 161,934,067 RAC: 0 |
Um, am I the only one that thinks that this might be one of the reasons my computers turn into turtles? Interesting... Well, I don't see the point of having 4 or 5 different documentation sites... Trac Wiki for development, UCB wiki for no information to speak of ... UBW, BOINC FAQ Service ... lord only knows how many others ... fracturing the information like that means that we have that many more times to make mistakes, and updating the same fact in 5 places means that you just spent 5 times as much effort as needed to get the information right ... It was why i quit ... It is obvious to me that the powers that be, Dr.Anderson and the project leaders are quite simply not interested in a good repository of information about BOINC. John37xxxx has been trying to get some other social networking stuff up and running ... and I wish him well, but, that too will likely fail because there is no project support. Sadly, the projects just cannot see that when they don't support these efforts, they die and that is one of the reasons we have only a couple hundred thousand participants world wide ... |
Send message Joined: 30 May 09 Posts: 9 Credit: 105,674 RAC: 0 |
Here is the list of the memory leaks I found in application source v0.18 file boinc_astronomy.C void worker() ... free(sp); free_search_parameters(s); // <-- missing ... file evaluation_optimized.c void free_constants(ASTRONOMY_PARAMETERS *ap) ... //for (i = 0; i < ap->number_streams; i++) { // <-- wrong for (i = 0; i < ap->convolve; i++) { // correct ... file evaluation_state.c void free_state(EVALUATION_STATE* es) { int i; free(es->stream_integrals); for (i = 0; i < es->number_integrals; i++) { free_integral_area(es->integral[i]); free(es->integral[i]); // <-- missing } free(es->integral); } int read_checkpoint(EVALUATION_STATE* es) { ... if (1 > fscanf(file, "background_integral: %lf\n", &(es->background_integral))) return 1; free(es->stream_integrals); // <-missing, read_double_array below allocates memory es->number_streams = read_double_array(file, "stream_integrals", &(es->stream_integrals)); ... file search_parameters.c void free_search_parameters(SEARCH_PARAMETERS *parameters) { free(parameters->search_name); free(parameters->parameters); free(parameters->metadata); free(parameters); // <-- missing } ===== There is also a 4-byte memory leak somewhere in the boinc library, probably in diagnostics_win.cpp diagnostics_threads.push_back(pThreadEntry); ===== Additionaly, when checkpoint is loaded, the memory is previously allocated on the basis of the workunit, but the loops for data reading are based on the checkpoint content. This causes heap corruption when the checkpoint file was generated for another workunit due to the difference in the data member count. Same could occur when for some reason the checkpoint file is corrupt. ===== In general, since the amount of required heap memory is known at the beginning, all the memory allocation calls may be reduced to one allocation per data type. This should affect the performance, especially on multiprocessor mashines. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
In general, since the amount of required heap memory is known at the beginning, all the memory allocation calls may be reduced to one allocation per data type. This should affect the performance, especially on multiprocessor mashines. I doubt it would have any significant effect on the performance. Why should it? |
Send message Joined: 30 May 09 Posts: 9 Credit: 105,674 RAC: 0 |
All memory allocations are executed within seconds, so this will NOT affect the MW application performance. Compiler runtime usually is requesting larger memory blocks from OS, and later malloc calls are managed by the runtime internally within these blocks. When the amount of necessary space is known (which is our case) it is better to allocate memory at once instead of giving the runtime a chance to fragment the memory or to allocate unusual space from the OS. At least this is the way I like to code. The really slow process is the OS memory request in multiprocessor system due to the locks. Frequent OS memory allocation requests will affect the rest of the running processes and cause delays if some really memory intensive applications are running in parallel. This is not the case for the desktop machines. Actually I am not sure how BOINC environment is running the application. Probably on process termination all the memory leaks gone along with the process itself so there might be not memory allocation problems at all. A real benefit would be if the arrays of type x0, y0, z0, x1, y1, z1... are replaced with x0, x1... y0, y1... z0, z1 which allows effective loop vectorization. But this is another matter. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
A real benefit would be if the arrays of type x0, y0, z0, x1, y1, z1... are replaced with x0, x1... y0, y1... z0, z1 which allows effective loop vectorization. But this is another matter. Do you mean the xyz[convolve][3] array? Loop vectorization works also with the current code which can be observed with the optimized Linux apps (which use unmodified code just with an autovectorizing compiler giving a factor 2 or 3 speedup with capable CPUs). |
Send message Joined: 30 May 09 Posts: 9 Credit: 105,674 RAC: 0 |
Do you mean the xyz[convolve][3] array? Exactly. I compiled v0.18 source on Windows. Adding some intrinsics in the innermost loop and drawing out of the loops 1/(q*q), sinb, sinl, cosb and cosl along with some other minor changes resulted in performance improvement of ~11% over the latest Gipsel's SSE3 build for Windows. The half of CPU time is consumed for the exp() function in the inner loop. Using vectorized exp() from intel's compiler didn't help, but storing the intermediate results in array and executing the exp() in a separate loop helped. The rearangement I made is manually to process in parallel the two subsequent loop iterations, using the XMM registers. When x[j] and x[j+1] are located sequentially in the memory they can be loaded in one instruction directly in the register w/o wasting cycles and dirtying other registers for shuffling. This may be applicable for some other arrays too. BTW, any suggestions where and whether to publish my observations (source modifications, code and comments) are welcome. The message boards are messed up (just as I am doing with this post). I am not sure if anybody is still interested in CPU (not GPU) versions since the task is highly prone to parallelization and therefore GPU builds probably contribute almost 100% of the scientific results. |
Send message Joined: 12 Nov 07 Posts: 2425 Credit: 524,164 RAC: 0 |
I'm interested in a faster Cpu app. I only have 1 computer, but faster is better. :) Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. |
Send message Joined: 26 Jul 08 Posts: 627 Credit: 94,940,203 RAC: 0 |
I am not sure if anybody is still interested in CPU (not GPU) versions since the task is highly prone to parallelization and therefore GPU builds probably contribute almost 100% of the scientific results. That is one of the reasons I didn't post a link to a slightly improved CPU version. It is generally about 10-20% faster than the build you are referring to (would have to look it up, it's too long ago) and it does not use any intrinsics ;). Using a CPU is largely energy waste compared to a GPU so I stopped looking to the CPU application. Furthermore the credits would be even higher for the CPUs with faster applications. I think the project needs to consider an adjustment. For comparison, I calculated a WU for 74.24 credits on a C2D E8400@stock (3GHz). It took 1820 seconds (some minor stuff was running in the background). BTW, any suggestions where and whether to publish my observations (source modifications, code and comments) are welcome. The message boards are messed up (just as I am doing with this post). You can either open a new thread or write Travis a private message (that's the way I used). I think he is only interested in general modifications that will help the performance on all platforms. Some of the suggestions I've done never made it to the stock app (guess they were forgotten after the major improvements were done). And in the moment he is supposed to prepare the rollout of the GPU project and the CUDA apps. But maybe it will help and it should not be wasted, as the CPU project is planned to run in parallel. |
©2024 Astroinformatics Group