Welcome to MilkyWay@home

Dev Blog: Jake Weiss

Message boards : Application Code Discussion : Dev Blog: Jake Weiss
Message board moderation

To post messages, you must log in.

AuthorMessage
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 63770 - Posted: 25 Jun 2015, 16:26:57 UTC
Last modified: 25 Jun 2015, 16:27:33 UTC

Hey Everyone,

This is the first post to my development blog, stick with me its kind of long. Sidd and I were talking today about how sometimes we have long periods of time where we are silent on the forums. This is one attempt to prevent the silence.

So what is going on in this thread? I am going to be posting interesting things I am working on or just completed. There may be some complaining about frustrating bugs or runs that are taking forever to converge. Some of the stuff in this blog may or may not be about the code you all commonly interact with. Actually most of my time these days is spent coding software for analyzing results.

I also hope that this blog can serve as a sort of documentation of what I have accomplished for future MilkyWay@home scientists. As we all know too well, once someone finally hits their stride in this project and have a firm understanding of what is going on, they graduate and move on to bigger and better things. Sadly this leaves the new students lost in a pile of under or undocumented code.

Some Disclaimers:

    * This blog is not designed as a place for people to report bugs. If I bring something up in the blog the people have an opinion on I will be more than happy to discuss it, but bug reports belong either preferably in related news posts or in number crunching.

    * Grammar and spelling mistakes are likely throughout these posts. I will try my best, but some mistakes will inevitably sneak through. Sorry in advance.

    * Some of the things in this blog may be bleeding edge. As such they may fail and never be mentioned again. Even if they don't fail it may take a long time for them to be finished (think longer work units).



I hope this was a good enough introduction of what the intended purpose of this blog is. I will try to make my first productive blog post some time later today or tomorrow. Expect pretty pictures!

If you have any suggestions of things you would like to see me post in the blog let me know.

Happy Crunching,
Jake W.

ID: 63770 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 63780 - Posted: 29 Jun 2015, 17:41:09 UTC
Last modified: 29 Jun 2015, 17:47:41 UTC

Hey Everyone,

This is my first blog post, not including the description I posted last week.

I have been working mostly on in house code the past couple days. Most of this code is going to be used to help prove that modfit is actually doing what we say it is.

One of the ways I am trying to do this is by running parameter sweeps on the data to show our model's best likelihood is where we say it is under close to ideal conditions. Using simulated data based on our fitting model, we are able to say where the likelihood should peak and then see where it actually peaks. If these two are different, which at the moment they are, I have to revisit our simulation code and fitting code to try to find where we could have a bug.

An example of one of the parameter sweeps I just finished is shown in the following figure.


In this figure you should see two of our parameters for the angle of a stream on either axis and a color corresponding to the likelihood. The higher the likelihood, the color axis, the better our model fits our simulated data. Currently the expected best point is the blue point, which as you can see does not correspond to the best likelihood on the figure. I still have some work to do.

My working theory right now is that the code we use to simulate the data is not working correctly. Simulating data with the same density distribution our model is trying to fit is interesting and challenging because the model we use, while being relatively simple to understand for fitting, has some nuances when used in "reverse". The big issue I have been dealing with over the last couple days is how long a stream needs to be to ensure the entire width falls within our data. (Which if wrong would cause the issues seen in the above figure.)

This problem while seemingly simple becomes very difficult to wrap your head around when you think that these streams can actually be much larger than a data wedge. You have to make sure the entire plane of the end caps of your wedge do not intersect your wedge. You can think of this like collision detection in a video game where you try to see if two objects collide with each other. I am currently working on writing an algorithm to solve this problem, should not take me much longer.

So why is this actually a problem? Well lets think about if the stream is cut short. We will actually be changing the density distribution of the stream we are generating and it will no longer look like the model we are trying to simulate.

Anyway sorry if this got a bit technical or ranty or is a bit jumbled. Its my first attempt at this so hopefully they get better from here.

As always, Happy Crunching,

Jake W.
ID: 63780 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 63824 - Posted: 22 Jul 2015, 13:35:31 UTC

Rant Wednesday,

Sorry it has been so long since my last blog post, but this bug is more deeply seated in our in-house testing code than I once thought. Every time I think I have found a fix, it pops back up in a different parameter.

At this point I now have one dimensional parameters sweeps mapping out each parameter in the model. Sadly I can't do 2 dimensional sweeps of all of the parameters to see some co-variances because with 20 parameters. To do all of these parameter sweeps, you need (20 * 20) / 2 = 200 parameter sweeps, you divide by 2 to removed doubles. One dimensional parameter sweeps take about 10 minuted to run each, which is not terrible for 20 sweeps. The 2 dimensional sweeps take closer to 8 or 9 hours each to run which means running 200 of these is completely unreasonable. I would add some 1 dimensional plots, but they are just lines which aren't really too pretty to look at.

As of now, the parameters for this first generated test stream look like they peak in the correct place on the parameter sweeps, but the parameters on the second stream are completely off. Most of these parameters do not even peak on the rather large range I am looking over. For the third generated test all but one of the parameters peak where they should.

Long story short, I am still working on trying to get consistent and correct test data so that I can show if our new MilkyWay@home modfit client code is doing what we say it does.

On a happier note, I received a message today saying that BOINC has found a fix for the truncated stderr.txt files. This bug has been causing about a 3% validation error rate on MilkyWay@home and Seti@home work units. The fix will be available in the next BOINC client version (7.6.6) and I will mention it on number crunching and the news section soon. This means please update your clients as soon as possible.

I recently gave a talk at the Dudley Observatory which is a local group of amateur astronomers. They often invite professors and other academics from the New York Capital Region and surrounding regions to give astronomy and astrophysics talks. Sadly no one recorded the talk so there won't be a YouTube video for you all to watch, sorry.


Well I think that's all I have for now. I will continue to keep you up to date with the happenings with here at MilkyWay@home HQ so keep checking back.

Jake W.
ID: 63824 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 63849 - Posted: 4 Aug 2015, 15:54:29 UTC

Happy Tuesday everyone!

So warning to everyone this is going to be a very long post. You can read it, skim it, just read the TL;DR at the bottom or whatever you feel. Since this is kind of a technical post and I know the audience is diverse, I will try to help you decide how to parse this post: general crunchers (TL;DR), more science savvy crunchers interested in how MW@home works behind the scenes(Read/Skim), and future MW@home scientists who will inevitably be lost in the massive project that is MW@home(please read thoroughly and reference all of the papers).

Throughout the post you will see links to help define terms that you may not have heard of before. Obviously not all of the words will have this nice feature, so if you have any questions or want better clarification, feel free to ask me questions! I will answer as soon as possible.

So here we go:

Good news! I found the bug which prevented us from generating test data for modfit. Before I get to the bug lets first talk about the basic steps required to create test data. This will hopefully be a simple enough explanation for everyone to understand, but also give enough insight into why this bug took so long to find.

Building a Wedge:

Step zero involves some pre-calculations needed to generate the data. Mainly, we calculate how many stars we expect to find in the background and in each stream.

Step one in generating test data for MilkyWay@home is to generate a background distribution of stars consistent with that expected by our application. This is pretty easy to do since we already know the distribution we expect. Just normalize it by dividing by the functions maximum in the wedge to get the probability that a star will be present at any given point in our data wedge. Then we use a method called rejection sampling to generate stars according to this distribution. This method is not the fastest/most efficient, but it is the easiest to code and debug and since it is not in a time sensitive application we chose it.

Step two is to generate the each stream. To do this, we first calculate the length each stream runs through the wedge. Each stream is modeled as a cylinder in our data and the density of the stream drops with radius from the center line of the cylinder. Calculating the length involves making the stream incrementally longer until each end cap is completely out of the wedge. We need to ensure the entire end cap is out of the wedge because it is possible for a stream to still have stars within the wedge even if the center of the end cap is outside of the wedge. (I drew a few crude pictures to prove this to myself, but I currently don't have any worthy of putting on here. Once we know the length of the stream we can generate a cylindrical distribution of stars based on a uniform random number along the z-axis, a normal random number along the x-axis and a different normal random number along the y-axis.

Once all of the streams and background are generated, we have what I call an ideal distribution of stars. This is what the distribution would look like if all of the stars has a single brightness and if there were random error or observational effects. Since MilkyWay@home expects and accounts for observational effects and intrinsic characteristics of our generated data we must put this into our simulated data.

The final step to generating the test data is to simulate observational effects and intrinsic star properties on our stars. We start by "fuzzing" out star brightnesses to match the distribution we expect to find in our data. Basically we use its current point as the center of the distribution of star points and then either make it brighter or dimmer according to a specific distribution. Next we add in the detection efficiency of the telescope used to take our actual data, and simulate the number of stars which will no longer meet out selection criteria due to random error in brightness measurements (especially for distant stars). This results in our test data having fewer stars father away from us than closer. This makes sense because dimmer stars are less likely to be picked up by our telescope than bright ones due to random error. Also brightness measurements become less accurate the dimmer the object (due to random error) so we again expect fewer stars to fall within our selection criteria farther away (dimmer).

The Bug:

So our bug was in the way we calculate the length of the stream in step two. Instead of increasing the length of the cylinder along the z-axis or essentially the direction along the cylinder, we increased the length along the x-axis. The result was essentially a strange shape the resembled a stream, but was not really a cylinder and it was also 90 degrees off of the orientation is was supposed to have. This bug is now fixed and our test data simulator seems to be working.

For some proof that it is working now lets look at the same plot I linked earlier, but now with our new simulated data.



In this plot, the blue dot represents the expected location of the likelihood peak and the green dot represents the actual likelihood peak in this parameter sweep. They are almost perfectly on top of each other! The small difference is easily accounted for by the step sizes used in our sweep and is well within our expected error tolerance for our program.

Next Steps:

I put new runs up for Separation Modfit over the weekend:

de_modfit_fast_15_3s_136_sim1Aug1_1
de_modfit_fast_15_3s_136_sim1Aug1_2
de_nonmodfit_fast_15_3s_136_sim1Aug1_1
de_nonmodfit_fast_15_3s_136_sim1Aug1_2

These runs are going to show how easy it is for Modfit to optimize to the expected values. I am also running nonmodfit runs which will show what differences we expect to see. This is important because it will give us a sanity check when we run modfit on real data and want to compare it to Matt Newby's results run before modfit. All of these results will be included in a paper I am writing for publication explaining the Modfit algorithm and how it changes our fits.

Hope you guys appreciate the transparency that this blog is bringing to our development process. If you guys have any suggestions for things you want me to include in my blog posts, or questions about what I just posted please let me know.

Jake

TL;DR: I found the bug in the test data simulation program and we have new runs up with the correct data. If you like pretty pictures and want proof look at the parameter sweep and see the two dots are almost the same.[/b]
ID: 63849 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 64679 - Posted: 17 Jun 2016, 18:30:51 UTC

Hello Everyone,

Looks like it has been a long time since updating my dev blog. Sorry about that. On a personal note, I have finished almost all of my graduate classes over the last two semesters (hence my lack of activity on the project), and now I am devoting almost all of my time to research.

What does this mean for MilkyWay@home?

It means there will be someone dedicated to keeping everything as up to date as possible and to begin developing new, and cool things for all of you. My most recent projects have included a complete update to everything on the server and implementing site-wide HTTPS. I have also done some behind the scenes work improving out back-up systems and keeping separation producing scientific results. My next big project involves combining the MilkyWay@home and MilkyWay@home Separation (Modfit) projects to a single project. I have some other interesting projects planned too, but I am not ready to make any promises yet.

What science has been done with MilkyWay@home Modfit recently?

I have enough evidence that the algorithm is working and robust enough to be used on large scale to fit the entire Milky Way Halo. I am close to having preliminary results for some of the first hundred thousand or so starts that I am fitting. These preliminary results plus evidence for the robustness and accuracy of the algorithm will be included in a paper, hopefully before the end of the Summer. Since this is my first scientific publication as a lead author, I expect it to be a little slow going. As per usual, when it is published, I will put a link to it with the result of the MilkyWay@home publications. I will try to summarize the results in a layman's format in a future dev blog post too.

Can I run MilkyWay@home on a RaspberryPi?

YES! There are now instructions on how to compile the MilkyWay@home code to run on a Raspberry PI. This may end up being an officially supported platform in the not so distant future, but until then please use these instructions submitted by cbeckham to get up and running. He was kind enough to test and write the instructions for us.

I guess that's all I have for today.

Happy Crunching.

Jake
ID: 64679 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 64781 - Posted: 30 Jun 2016, 13:44:36 UTC

Hello Everyone,

In the newest installation of my dev blog I'm going to go through some of the things I have just finished as well as outline some of the work I have planned for the rest of the Summer.

Recently Completed work:

* Released Modfit under the MilkyWay@home officially supported application.

* Improved run time of Mac application by factor of 8.

Current projects:

* Recompile Linux 32-bit Binaries as Static Compiles.

* Fix the inevitable segfaults of doom that will occur after recompiling the Linux 32-bit Binaries.

* Fix the MilkyWay@home automated build system.

* Look into building a 32-bit Windows application and fix all of the compilation errors that are inevitable (and why we stopped supporting it in the first place). (I've already spent 2-3 days worth of time on this.)

* Write a scientific paper on the results we've recently gotten back from MW@home proving Modfit is awesome and the way of the future for our project.

* Fix the work unit availability problems for GPU users.

* Maintain runs on MilkyWay@home and ensure continued science output from the project.

* Update MW@home science page.

This is not a comprehensive list, but these essentially are the major goals I have for right now. I will try to give a more fun technical blog post some time next week outlining maybe some of the issues we are seeing and their solutions, but for now I am going to get back to doing work.

Happy Crunching.

Jake
ID: 64781 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Application Code Discussion : Dev Blog: Jake Weiss

©2024 Astroinformatics Group