Message boards :
MilkyWay@home Science :
MilkyWay@Home Progress Report (Old)
Message board moderation
Author | Message |
---|---|
Send message Joined: 11 May 09 Posts: 30 Credit: 81,093 RAC: 0 |
Hello MW@Home, We have been getting a lot of questions lately about whether or not we are getting real scientific results, why the new sgr runs are getting better likelihoods, what the numbers crunched actually mean, what our scientific goals are, etc. In return we've posted our publications on the front page-- but, like some other BOINC user said: you guys aren't astrophysicists. I'm going to try and summarize the physics portion of this project in layman’s terms-- from the data collection to the current achievements to the future plans. Basically, one day we came into the office and were greeted by this: Just kidding. In the beginning there were stars. Then there were people, and every night these people would look up into the skies and they would notice how intriguingly complex a place the heavens really were. Wandering planets travel throughout an ever turning mosaic of mythology and mysticism. The sky was one of the biggest mysteries of the ancient world-- what do these pictures mean? Our project begins with the Sloan Digital Sky Survey (http://www.sdss.org/), an ambitious project whose goal is to map out as large a portion of the sky as possible. To this date the SDSS has mapped about a quarter of the sky, including over 300 million objects. These are (l, b) (l and b are galactic longitude and latitude respectively-- the equator being the galactic plane) plots of the amount of sky covered by the Sloan Digital Sky Survey (SDSS1 on the left and SDSS2 on the right). But what really is a list of a couple million points in 3D space other than a massive problem to tackle? Sure, we can plot all these points together and get a gorgeous map of the sky, but once again-- what do these pictures mean? That’s where research astrophysicists come into play. One of the hot spots in galactic astronomy (astronomy relating to just the Milky Way) at the moment is stellar stream mapping. The general idea is that the Milky Way Galaxy actually has a couple of smaller galaxies mixed in with it, probably from galactic collisions (click here for a simulation of how a galactic collision turns a galaxy into a stream-- simulation by Kathryn V. Johnston at Columbia University) beginning sometime in ancient history and continuing to this day (don't worry, it is very seldom that actual material like stars or planets collide-- there is so much empty space that it is highly improbable). The Sagittarius Dwarf Galaxy is one of the closer galaxies residing in our own and it is our particular area of interest. In general an astrophysics problem revolves around creating a model on a computer system that will replicate what we see in the sky-- if a model matches exactly then we can leapfrog off the information that model reveals to work on a bigger, more involved problem. Currently, the MW@Home BOINC application is made to model plates of stars. We input a 2.5 degree cross section of data (the shape is commonly called a wedge, or stripe) and the program attempts to create a new, uniformly dense wedge of stars from the input wedge by removing a stream(s) of data. The streams it removes are necessarily cylindrical and their stellar density falls off in a Guassian manner (denser in the middle, sparser at the edges). Here is a sample separation, the upper right circle is the input wedge-- it is a density map of that cross section of sky with darker areas being more stellarly dense and lighter sections being less dense. The lower circle is the removed cylinder of stars and the upper left circle is the hopefully uniform wedge of stars that remains after removing the stars in the lower circle. For reference we (the Solar System) are at the exact center of these plots (since all of our data was gathered here on Earth). Each stream removed possesses 6 parameters: weight (% of stars in the stream), mu (a measure of angular position in the stripe, given by the ticks on the circumference of the above plots), r ( a measure of distance, given by the radial ticks above), phi (one 3D angle indicating direction of the removed cylinder), theta (the second required angle), and sigma (a measure of width). And each wedge background possesses 2 parameters: q (a measure of the flatness of the spheroid) and r0 (a measure of the radius of the spheroid core). So every run has 2+6n parameters, where n is the number of streams being modeled. Here is a top-down view of one possible model for the Sagittarius Dwarf Stream. The middle galaxy represents the Milky Way with the sun being the green dot within the disk. The blue stars are the general areas of the Sagittarius dwarf that we study. This is in the plane of the Sagittarius dwarf stream, so imagine we are looking down on top of a semi-flat structure-- click here for a 3D model produced by David Law at the University of Virginia. What we want to do is end up with as many data points as possible from BOINC-- we can use mu and r to plot the location in space and the angles phi and theta to plot the direction of the stream. What we end up with is a picture similar to the above. Here is a plot of all of the data point positions and directions found by Nathan Cole-- it is exactly the same as the picture just before it, just less artsy. Here is the corresponding plot in a plane perpendicular to the above. Imagine now that you had the prior plot on a piece of paper and you tilted it until all you see is a line. That line (which represents a plane) is signified by the middle line in this plot. So putting these two plots together would yield a 3D interpretation of the found points and directions of the streams. So from each run we want to obtain 3 good indicators: First, the separation plot should leave a near uniform background-- if there's still overdensities in the output, we are not getting an accurate picture of the 2 spheroid parameters. Second, the vectors in the plane should be cohesive-- we want the stream to flow rather than zig zag through space as it were. Third, we want the vectors in the perpendicular plane to be close to parallel to the plane-- again, we want it to flow, not zig zag. We did all that, Nate wrote his Thesis on it. So what are we doing now? Basically at this point we want to refine our results and get them to be more accurate. To do this we have stitched all the SDSS data together and taken wedges out that are perpendicular to the stream-- the general idea is that a perpendicular cross section is much easier to decipher than a skewed one-- thus our error measurements will be smaller and the likelihoods will be higher. I have just now begun runs on BOINC using this new geometry (all the recent *_sgr_* runs), although I have been working with it since the beginning of last summer on the 88 processor WCL grid here at RPI. For reference, it took me about a week per run on the MPI grid-- now I am getting about 5 runs per day on BOINC, it's amazing. Here is a simple diagram illustrating the idea-- the blue line is the stream in question and the black lines represent wedges of data. The left represents SDSS stripes and the right represents the improved perpendicular SGR stripes. Here is a juxtaposition of one of Nate's wedges (left, sdss stripe 13) and one of mine (right, sgr stripe 35) in the same area of 3d space-- notice how his stream stretches almost all the way across the stripe because it is tilted relative to the stripe while mine is nice and compact. This translates to smaller errors in our reported findings. So our basic BOINC goal is to now remap the whole stream so that it is not only cohesive like Nate's findings, but also more accurate than his findings. After that (a couple of months from now, optimistically) I will attempt to map any other streams we can find in the data and remove them as well. Then my fellow student, Matt Newby, can get into the meat of his project which is modeling the 2 spheroid parameters throughout the entire sky (imagine putting 30 uniform wedges into BOINC at once and searching for 0 streams). These are both tremendous topics in modern astrophysics-- first of all, the location and direction of the Sagittarius stream is still somewhat debated. Some people, like Nate, believe that the stream passes by us. Others think that the stream crashes down on top of the Sun. And the spheroid has yet to be accurately modeled. Such a model would make galactic simulations much easier to create as they would require less unknown variables in their simulations; and it could also provide valuable clues to the dark matter problem. I will reserve some posts below for the purpose of uploading new versions of the plots shown above (vectors and separations) for the stripes most recently crunched so you can check this thread periodically to see the science side of the progress here at MW@Home. On a related topic, the enterprising individual could make a screensaver from these images (perhaps the vectors walking across the screen, or a separation emerging from a parent wedge) --I regret to say that my screensaver-building skills are limited to slideshows and that such endeavors aren't exactly high priority here at the lab haha. If you PM me, then I can possibly find higher resolution images as well. I hope this helps you to understand what it is you are crunching. And thanks for helping us get this far! John Vickers |
Send message Joined: 11 May 09 Posts: 30 Credit: 81,093 RAC: 0 |
June 17 - Latest results. These are the best results from the past week-- this week we ran mostly "F" runs (named something like ps_sgr_*F_*_*). Basically these runs are of 5 degree wedges instead of 2.5 degree wedges. The images have the total field in the top right, the stream in the bottom right, and the background in the top left. SGR stripe 12: SGR stripe 18: SGR stripe 22: SGR stripe 26: SGR stripe 30: Combined vectors for this week: |
Send message Joined: 11 May 09 Posts: 30 Credit: 81,093 RAC: 0 |
Reserved |
Send message Joined: 11 May 09 Posts: 30 Credit: 81,093 RAC: 0 |
Reserved |
Send message Joined: 11 May 09 Posts: 30 Credit: 81,093 RAC: 0 |
Reserved |
Send message Joined: 21 Feb 09 Posts: 180 Credit: 27,806,824 RAC: 0 |
John, Travis, Dave - we're all happy to crunch for MW to help along with the stream deciphering. John, this is one of the best (and to be honest, most fun) explanations of a BOINC project I have seen. It makes me want to turn my work into a BOINC project someday. Congratulations to Nate on his thesis, too. Thank goodness I don't have to think about mine for another year. |
Send message Joined: 18 Nov 07 Posts: 280 Credit: 2,442,757 RAC: 0 |
Wow, thanks for the explanation John. Very good post! I was wondering about the 'fitness' we see in the graphs on the main page. What does it mean exactly? What does a fitness of 2.5 mean, and how much closer is it than 3.0? How close do you need to get before you can say with confidence that you've modelled the streams correctly, and how close do you think you can get before assumptions like say the uniform distribution of stars or the cylindricality of the stripes gets in the way? |
Send message Joined: 22 Dec 07 Posts: 51 Credit: 2,405,016 RAC: 0 |
John, I'm sitting here reading your post and I'm seriously impressed. Very well presented, and especially, very well explained for us non-astrophysicists. I was going to ask the same question, more or less, as Emanuel. One presumes that "fitness" means how close the results from the project are with respect to what we see in the sky, as you mention? Basically, how close to the expected results do the actual results have to be to produce significant data? Cheers Chris Seejay **Proud Member and Founder of BOINC Team Allprojectstats.com** |
Send message Joined: 11 May 09 Posts: 30 Credit: 81,093 RAC: 0 |
Hey MW@Home, I'm actually not very sure on the math behind the fitness calculation-- and things like "how close" are pretty arbitrary. I've asked Matt to try and help me answer this. The specifics behind that calculation can also be found in Nate's Thesis. The algorithm we work with is a maximum likelihood calculator. So the results with higher (less negative) fitnesses are better. So an answer of -2.5 is better than an answer of -3.0 and an answer of -0.0 is an absolutely perfect answer. On some of my stripes -2.6 is the highest I can get, while others can get all the way up to -2.4-- it's really variable It is difficult if not impossible to compare these fitnesses between stripes since the number is based on a number of factors such as star count and stripe volume which vary from stripe to stripe. Similarly "how close" we need to be to get accepted answers differs from stripe to stripe. A good guideline is if our errors on a parameter are a magnitude less than the found value. For example if we calculate a distance of 29 kpc, we want our error to be on the order of 1 kpc or less-- our errors are based on a Hessian calculation which accommodates all the variables. The uniform density of stars is perhaps a poor use of words on my part-- since that density is actually modeled by the parameters q and r0. As you get farther and farther away from the middle of the Milky Way, it gets less and less stellarly dense (how far and how quickly are predicted by q and r0). So basically the program will remove the stream and see how uniformly the field will fit a q and r0 model. Imagine the program pulls a stream out but its just a little off of the real stream so there is still a crescent shape of overdensity left. The program then goes through the stars, checking that the density is correct at a given position based on q and r0. When it runs into parts of the sky where it is too dense (like the crescent shape) or too sparse, it will lower the likelihood. So, in general you want to take your highest likelihood of all your runs and call that the answer. Unfortunately we are running into an issue where the mathematic best fit is physically impossible (for example a stream could fit well mathematically in a way that wouldn't appear in the next stripe-- but we know that the stream still exists in the next stripe so this answer cannot be!), so we need to look at "families" of results and pick the best from the family of reasonable results. So if the errors are small, and all the findings fit with each other when we look at the big picture, we can assume that we have a proper answer. The fitness is basically just a tool for finding what could be the best answers-- as sometimes it can lead you to an answer that is impossible. John Vickers |
Send message Joined: 6 May 09 Posts: 217 Credit: 6,856,375 RAC: 0 |
Excellent questions! "fitness" is generated by the choice of parameters, and is a measure of how close those parameters fit the SDSS data. Our goal is to make the fitness as close to zero as we can. By increasing the accuracy of the techniques we use, the fitness will move closer to zero, and we narrow down the possible characteristics of the stream. On the fitness graphs, closer to zero means "better fitness." When the fitness on a run levels out, and does not get any better, the current run has returned the best result that our techniques and data can provide. Data inherently contains errors due to technological limitations, which we account for using statistical methods. So right now, improving our techniques is key to getting better results. In terms of data, we use F-turnoff stars (as classified by the Hertzsprung–Russell diagram) from SDSS as our data-points. The absolute magnitude, which determines distance, varies from star to star. We cannot, therefore, know the exact distance to each star. So we model each star as having a probability of being in a magnitude range centered around absolute magnitude 4.2 and having a gaussian distribution. One method of optimizing the searches, then, is to improve the model for the absolute magnitude distributions. From what we can tell, the real distribution is not a symmetric gaussian, so one of the projects I am working on is adapting the code to include a double-sided gaussian magnitude distribution. Another optimization is John Vicker's work with rotating the stripe coordinates so that the stripes are perpendicular to the stream. (see details above in the sticky post). The SDSS data, as you can see from the wedge plots, does not make it easy to pick a spot and call it the "stream center," or to make a concrete measurement of the distribution of the stream. So we are actually finding the stream in a sense! So the project as a whole can be seen as significant - the closer the fitness, the better that we understand tidal streams, and the more we know about the galaxy. |
Send message Joined: 21 Feb 09 Posts: 180 Credit: 27,806,824 RAC: 0 |
Having a bash at reading Nate's thesis - getting probably 60-70% of it so far, or at least the gist, but maybe not the specific terms. A few things caught my eye: In principle, the algorithm could be run on any 2.5º degree wide great circle on the sky by simply modifying the coordinate transformations necessary for converting from the SDSS great circle system to the more traditional Galactic coordinate system. In this piecewise manner, the stream is modeled as a cylinder with length that is limited by the edge of the data in one stripe. Surely with adjacent stripes you can combine the F-turnoff star data from one stripe and the other to get better fit in the cylindrical transforms between stripes? Utilizing the minimum values stated above <snip> an evaluation an estimated 12,500 minutes, or approximately 200 days, would be required for a single optimization on a single processor. Unlucky that you weren't able to put any MW@Home results into your thesis Nate! Also, what processor was Nate talking about with the 200 day simulation =P With the feeder being changed, you can bet the 103.425 TFLOPS value will go up considerably - though that doesn't accurately reflect 43850 hosts - only (according to BOINCStats) 19,216 active hosts are giving 140 TFLOPS. Sorry, I'm in paper review mode and I'm just nit picking :) A couple of questions: How is the data from SDSS given to you? Just in the form of a table of Galactocentric Cartesian coordinates with star density at that point? (If so, it may be possible for someone here to develop a screen saver for MW with that data). Also, given that the Sgr data is across the galactic centre (if I read that correctly) and liable to systemic errors, I assuming you apply the SDSS data as probability density functions. What determines the PDFs in this sense, given that I assume they wouldn't be symmetrical; would it be determined by a function on the data points set around it? Does this not introduce additional error, or are you just reducing an equivalent signal to noise ratio? I'll continue reading, I'm sure I'll be back with more questions. But thanks for your time if you can answer them :) |
Send message Joined: 11 May 09 Posts: 30 Credit: 81,093 RAC: 0 |
Borandi, What you've said about combining 2 wedges to make bigger wedges is a good idea, and one I have recently been playing with (all the sgr_*F_ runs, the F stands for Fat, haha). I guess the primary issue is that we don't want the total volume to have a curved section of stream in it, since the program removes straight cylinders. The more nu space we use, the more noticeable the curvature becomes. To get a better fit of the transforms between stripes I'm thinking that we may recombine all the data and then shift the stripes 1.25 degrees so we end up with stripes in between the old stripes (for example we could combine the top half of stripe 11 with the bottom half of stripe 12 and end up with a stripe 11.5-- this could double our data points). I honestly have no clue what processor Nate started his work on, I wasn't even at RPI then haha. SDSS data as I recieved it was a 1.06 gig, 14 column file with the columns representing things such as coordinates, color, distance, etc. this 1.06 gigs represents only stars classified as F-turnoff, I think all SDSS data is around 3.3 TB. You can get a look at their data through the SDSS website http://das.sdss.org/www/html/ -- I'm unsure if it's available for download though. I know Dr. Heidi Newberg worked closely with the people at SDSS and she may have gotten the data from her colleagues. Could you please rephrase that bit about sgr data? When you say sgr data I'm thinking you are referring to the new sgr coordinate system-- in which case the data is the same, all points were just passed through a rotation matrix to get them in a new orientation. And when you say that data past the galactic center is liable to systemic error, I'm thinking that most of the data we are studying is far away from the galactic plane and thus not particularly affected by the center-- we avoid the area of the sky that you need to peer through the galactic center to see. Cheers, John Vickers |
Send message Joined: 7 Mar 08 Posts: 5 Credit: 231,427 RAC: 0 |
a very good idea would be to create a milkyway progress report discussion thread for questions, other wise this thread will get as long as the optimized apps thread and you'll easily loose track of the real reports and it's harder to read. |
Send message Joined: 2 Sep 07 Posts: 2 Credit: 23,845 RAC: 0 |
I have re-written this as an article on the United BOINC website; MilkyWay@Home Progress in plotting the stars Thank you John Vickers, you write very well about science. You make it easy to read! You should install blogging software on this website, maybe MilkywayDomain/blog. Install something like WordPress, its free and it would allow you to blog properly about your project science. It would be way better and easier than trying to post it in forum messages. Wordpress; http://wordpress.org/ John. Website Admin, UnitedBoinc.com |
Send message Joined: 3 Jan 09 Posts: 139 Credit: 50,066,562 RAC: 0 |
Im Am very impressed with your explanation of what we have been crunching. I have a question though, It appears that the Sagittarius dwarf galaxy has at least twice gone through the milkyways galactic plane. Has it been captured and will slowly be absorbed in our galaxy? |
Send message Joined: 16 Mar 09 Posts: 21 Credit: 52,815 RAC: 0 |
Great explanation!!! |
Send message Joined: 11 Nov 07 Posts: 232 Credit: 178,229,009 RAC: 0 |
Thank you John for the explanation and a funny & intresting reading. |
Send message Joined: 21 Dec 07 Posts: 24 Credit: 4,567,143 RAC: 0 |
All very nice- You guys could do a really cool screen saver based upon your subject matter. Screen savers are not just toys for us graphics fanatics. They help us sell the idea of BOINC to co-workers in our offices who stroll by our desks when they should be working. I got two others in my office because of screen savers. One guy was attracted by WCG's Help Conquer Cancer, the other by Einstein@home. You guys should be able to do something really cool. >>RSM http://sciencesprings.wordpress.com http://facebook.com/sciencesprings |
Send message Joined: 12 Aug 08 Posts: 253 Credit: 275,593,872 RAC: 0 |
It is this aforementioned interaction that makes this project my main interest. Thank you to all involved!
|
Send message Joined: 13 Mar 09 Posts: 2 Credit: 43,534 RAC: 0 |
okay, so I am having a hard time logging in and everytime I try to add myself to V club or the milky way team It just doesn't happen. So can anyone help me figure what is gooing on please, meaning my account with Milky way? Much thanks to all of you whom have visited me while in such crisis. Just wish it could a different situation. Thank you Brenda ******L |
©2025 Astroinformatics Group