I recently dug up a talk that I gave last semester, and Travis suggested that I post it up for the community - so here it is: (ppt)
I'll give a quick run-down of the talk; feel free to post questions below!
I gave this talk as part of the regular Astrophysics Seminar here at RPI's Department of Physics and Astronomy. I know I gave this last October, but the science and methods are still the same today. Below I'll add a quick narrative for some of the slides:
(forgive me for the numerous wikipedia links, most of wiki's science articles are well-written and understandable, and they were easy to find) :)
- Slide #3:
Early Astronomy relied on observers to gather data "by eye." Renaissance astronomers, such as Tycho Brahe, did this with instruments similar to surveying tools - line up a star along a sight, then read the dials to get the position. With the advent of telescopes, astronomers could not only see fainter stars, but also make more accurate measurements. Edwin Hubble's (image, right) and his contemporaries would look through a telescope, but would gather much of their data through photographic plates. Photographic plates allowed a lot of data to be recorded at once. Modern CCD cameras, such as those on the Hubble Space Telescope (image, left) do roughly the same thing as photographic plates, but are cheaper, better, and faster. CCD cameras allow modern astronomers to take large amounts of data very quickly. In order to glean information from these enormous data sets we need quick and robust computational methods. (center image: a command that we all wish worked...)
- Slide #4
This side discusses the Sloane Digital Sky Survey. More detailed information on the survey can be found at the SDSS website. I will mention that the survey only includes data from the North and South "Galactic Caps" - that is, directly above the galaxy and directly below, relative to Earth. This is because the disk of the Milkyway galaxy is full of dust which obscures distant views; therefore, we must look above or below the disk if we want to see anything that is not in the disk.
- Slide #5
The Sagittarius Dwarf Galaxy is a small galaxy that is currently merging with our Milkyway Galaxy. Since galaxies are not "hard" objects, they pass through each other several times until gravity sorts everything into one coherent galaxy. When a small galaxy (such as Sag.) collides with a much larger one (like the Milkyway), the bigger one barely shows any effect while the smaller one orbits. As the smaller galaxy orbits, it looses stars to the more powerful gravitational pull of the large galaxy, "fading away" as it gets absorbed. While orbiting, the smaller galaxy gets stretched out by tidal forces - the same type of forces that caused comet Shoemaker-Levy 9 to break up and dramatically crash into Jupiter. These tidal forces have created "tidal tails" the precede and follow the core of the Sag. Dwarf as it orbits.
The tidal tails exist far from our position (Sag. dwarf's center is about 28 kpc away, while the distance to the tails varies depending on position), so there are some errors involved in the distances. Our goal is to map out the position of the Sag. tails, and then fit an orbit. Just like mapping the orbit of Io lets you find the mass of Jupiter, mapping the orbit of the Sag. Dwarf will let you know not just the mass of the Milkyway, but also the distribution of the mass. Then we will know where to look for dark matter. Also, by knowing where the Sag. stream is, we can then remove it from the data, and study what is left over, including the general galactic halo structure.
- Slide #6
This is a diagram that I created to show the structure of the Milkyway galaxy, our position relative to the Sag. Dwarf and it's tails, and how a SDSS data stripe fits into the picture. The Sun's position should be a bit farther out than I placed it, but in general the figure roughly to scale. You'll note that there are two disks in the Milkyway - the thin disk and the thick disk. The thin disk contains most of the dust and gas in the galaxy, while the thick disk contains most of the stars - they do overlap. I can answer specific questions below.
- Slide #7
This slide shows a side view of SDSS data stripe #82. (left image) You'll notice a denser (darker) region on the left - this is the Sag. stream. (The dense part on the right is due to the disk of our galaxy) In order to map the stream, we needed to determine distances. Distances can be determined from the brightness and type of stars, but selecting the type is difficult. To make sure that we were selecting only one type of star, we chose a band on the H-R diagram that contains only one type of star (lower right image). We chose "f-turnoff" stars, so named because they are f-type stars at the "turn-off" from the main sequence. (stars are "on the main sequence" when the are happily and stably fusing Hydrogen in their cores, like our Sun) You'll notice from the diagram that only f-turnoff stars exist in the region between the two green lines, this is the region of the H-R diagram that we selected to use for distancing. We are looking into using Blue Horizontal Branch stars, but f-turnoff stars are more plentiful, and therefore more desirable as distance tools.
- Slide #8
In order to map the Sag. stream, we need a model. Choosing a model is tricky, as you do not know everything about that object that you are trying to model. So we have to guess a model, test it, and then modify it if necessary. Initially, we choose a model that contains the physics that we believe governs the situation, and if this turns out to be wrong, then that means there are more physics at work. This is the essence of science - discovering something that we don't know, and then figuring it out.
The model is fairly mathematical; I will summarize it as choosing a Herquist-type distribution for the background halo stars, (the equations at lower left) and modelling the stream as a cylinder that cuts through the data set. We also assume that the stream has a gaussian distribution of star densities with respect to the center of the stream. In other words, as you get farther from the center of the stream, the number of stream stars per volume of space goes down.
- Slide #9
The next step is to fit our model to the data. This is done through a Maximum Likelihood calculation. The details of the maximum likelihood problem are very mathematical, but the general idea is that every set of parameters has a probability of being the correct parameters - this probability is then related to the likelihood that these parameters are correct. So then we search for the parameters that are the most probable, that is, the parameters that have the maximum likelihood. Note that every likelihood calculation takes time, so it is to our advantage to use as few parameter sets as possible to find the maximum likelihood.
Also, the Law of Large Numbers tells us that for large data sets (such as ours), it is a good idea to use gaussian distributions in the model.
- Slide #10
I think this slide explains itself well; but again, feel free to ask questions.
- Slide #11
The no free lunch metaphor illustrates the idea that there is no "best" algorithm for every problem, and that every problem has it's own best solution. In my example, you have three poor students, each wanting an inexpensive lunch. But each eatery specializes in a different kind of food, so somebody is going to end up not being happy.
(and yes, the use of three doomed characters from a Shakespearean tragedy was intentional) :)
- Slide #12
Conjugate Gradient Descent is a search method that looks at how the parameters are changing in the neighborhood of the current set of parameters, then moves in the direction of the best likelihood, thereby moving to a maximum very quickly. However, it can quickly move to nearby "local" maximums, and miss the "global" (overall) maximum.
- Slide #13
The line search is a method that makes gradient descent faster. I think the slide explains things well, but questions are always welcome.
- Slide #14
Monte-Carlo Markov Chain is a rather simple and quick way to search for a maximum likelihood. There are lot's of different Monte-Carlo methods out there. We actually don't use Monte Carlo methods, as they don't lend themselves well to our problem.
- Slides #15-17
These slides highlight the genetic algorithms that run on Milkyway@home. Travis explains these in great detail in his thesis and several of his talks, so I will direct your attention to those for more information.
- Slide #18
Outdated Milkway@home statistics. Kind of quaint, actually. :)
- Slides 19 and 22
These are examples of stripes after the maximum likelihood parameters have been found. #19 is a stream (left) separated from the background (right). #22 is (from left to right, top to bottom) the original data set, 3 separated streams, and the separated background.
I hope this provides more information on the astronomy side of Milkyway@home. I am sorry that I can only offer the powerpoint version of my talk; if anybody would like to convert it into other formats, that would be awesome. :) Feel free ask questions!