| log in |
Separation updated to 1.00
I've updated all of the separation applications to 1.00. For changes people might care about,- The old CAL version is gone; it's replaced with the OpenCL application. On AMD/ATI GPUs (older than 79xx) it is using some hackery to use the same IL kernel as before so it should be as fast. However this also means the Radeon 38xx cards aren't supported with new stuff.
- Radeon 79xx stuff should work
- The occasional validate errors from empty / truncated stderr should stop
- AVX will be used if available on Linux and Windows (64 bit only for Windows)
- I've increased the default GPU target frequency so GPU stuff should make things less laggy on average. You can now also configure this with the web preferences now so you don't need to use app_info stuff if you want to play with that.
- Partial workaround with high CPU usage with recent Nvidia drivers.*
As usual post problems you run into here.
* It should cut down on the CPU usage a bit while not sacrificing too much. I would recommend not using it unless you are very unhappy with the CPU usage on Nvidia. There are options to change the polling mode if you want to lower the CPU usage further while not slowing it down. (--gpu-wait-factor (default = 0.75) and --gpu-polling-mode (default = - 2) work similarly to how they did with the old CAL one, but slightly different). With the default of -2 it will use mode -1 unless it is an Nvidia driver newer than the one that introduced the high CPU issue, where it will use mode 0. Mode -1 uses the correct waiting method, mode 0 use the correct waiting method with an initial sleep based on time estimates, and modes > 0 are a polling period in milliseconds. The wait factor is a sort of correction of the time estimate used for the initial wait. The default is 0.75, to wait for 75% of the estimated time before trying to poll.
Remainder of the Double Credits: A Valentine's Day Present
A few weeks ago, we ran double credits in order to make up for a database error that occurred earlier in 2011. Unfortunately, the server crashed with just 12 hours left until the make-up credits were finished.
Now that the new server is up and running (and stable! very stable!), and everyone here is back from travels and winter break, we are ready to finish running those last 12 hours of double make-up credits.
If there are no significant objections, we'll run the double credits on St. Valentine's Day (February 14), from 10:00 am until 10:00 pm, US Eastern time (EST). This corresponds to 3pm (Feb14)-3am (Feb15) UTC.
Happy crunching!
5 Feb 2012 | 3:25:39 UTC
· Comment
Raised GPU work limits
I've raised the work limit to 20 tasks per GPU to see what happens.
7 Jan 2012 | 20:03:17 UTC
· Comment
New server test post
Everything should be running on the new server now.
5 Jan 2012 | 8:10:11 UTC
· Comment
Moving to a new server
Hi Everyone,
The new server is in, and we'll be migrating everything over to it this week. We expect to have everything done Tuesday or Wednesday next week. We are expecting to be able to move the entire database over, so we don't expect that you'll lose any work or credit during the transition. Sometime along the way we'll be shutting down the old server for a period of time while we migrate the database over, and redirect milkyway.cs.rpi.edu over to the new machine.
We'll be trying to make it as seamless and painless as possible for everyone. Hopefully this will solve a lot of our crashing issues as well.
--Travis
27 Dec 2011 | 19:39:05 UTC
· Comment
N-body updated to 0.80
I've updated all the N-body applications to 0.80.
These add a new kind of likelihood calculation (and soon validation method) we're probably going to switch to using that will allow adding GPU versions sometime in the near future (the GPU version is pretty much ready to go, but we need to fiddle with these kinds of issues first).
Since the n-body is statistical, for any simulation the result can be anywhere in a distribution. So far for validation we have relied on the results from any 2 system being identical, but this is more problematic when you try to include GPUs. The new results should be "fuzzier" and more resistant to some other types potential problems the old likelihood calculation could have.
9 Dec 2011 | 3:27:35 UTC
· Comment
New Server
A new server has been ordered. Because delivery times are slow now (due to flooding in Thailand producing slow delivery times for disks), the estimated ship date is December 29, which means we do not expect to get the new server running until the middle to end of January 2012. We will keep watching the current server for repeats of previous software glitches, and hopefully will keep that server running smoothly for the next two months.
Thanks,
Heidi
2 Dec 2011 | 21:24:21 UTC
· Comment
Milkyway@home back up
Matthew A was able to get the database stabilized, and it looks like we are running smoothly again.
It appears that the SQL query threads were not timing out properly, and were jamming the CPU with worthless tasks. This has been remedied.
While we expect to run smoothly for now, other database issues may pop up - the FreeBSD operating system that is currently running Milkyway@home is old, and notoriously incompatible with modern SQL. Of course, we hope to not have more problems.
We are ordering a new server, and should transition over to it in mid-December. We'll post the specs once we finalize them, and keep you posted.
Cheers,
Matthew Newby
30 Nov 2011 | 21:57:32 UTC
· Comment
Recent Problems with the Database
Excuse us for a while...
It appears that the server became unresponsive shortly after everyone left for Thanksgiving break. We've determined that the SQL database is melting down and monopolizing system resources, and we are currently pursuing a solution.
If everything goes well, Matthew A. will have us back up tonight. If not, it could be another day or 2 before things stabilize.
We are in the process of ordering a new server - we hope to transition over the winter break.
We'll keep you posted.
Cheers,
Matthew N
29 Nov 2011 | 21:08:38 UTC
· Comment
Separation Status and New Runs
The separation runs that analyse the northern galactic cap (Sloan Digital Sky Survey, or SDSS Stripes 9-27) have nearly finished - soon we'll be able to wrap all of that data together, and with data from Nathan Cole's PhD thesis, we'll have a nice scientific paper out in the next few months. We'll keep you notified.
That doesn't mean that the separation code will be retired - far from it. We're starting new separation runs (the ones with "mix" in the name) that are testing the robustness of the separation code. We'll be running simulated data sets along side real data to test several aspects of the stream-fitting process. The main questions are: What would Milkyway@home do if something different from our model exists in the data? And how different would that something have to be in order to modify our results?
Also, the release of SDSS Data Release 8 earlier this year gives us access to several stripes of data in the southern galactic cap. We are currently processing this data (It's not as continuous as the northern data, so we have to cut out areas with spotty data), and it will eventually run on Milkway@home.
We are also looking to improve the code on Milkyway@home. My brand new paper studies the distribution function of stars in the Milky Way halo, and provides a new convolution kernel that should make Milkyway@home more effective. There are a few other results from that paper, and inside work, that we would like to implement in the Milkyway@home searches. When we update these functions, we will run stripes over again and look for differences.
Long story short: We're almost done with one part of Milkyway@home's mission, but there's still plenty to do. Happy Crunching!
Cheers - Matthew N
17 Nov 2011 | 0:49:48 UTC
· Comment
server back up
Looks like there was a problem with one of the hard drives. We've replaced it and it looks like the database is back into RAM, so hopefully we'll have some smooth sailing from here on out.
3 Nov 2011 | 19:24:33 UTC
· Comment
server issues
We'll be taking the server down tomorrow to figure out what this new hardware problem is. It looks like some issue with the interconnect is causing the server to crash repeatedly.
2 Nov 2011 | 15:46:27 UTC
· Comment
milkyway back up
Looks like the server had a hard crash. Not quite sure from what. A couple tables in the database were corrupted, but I think I've gotten them fixed and things should be working again. I'll keep posting updates here.
31 Oct 2011 | 20:18:48 UTC
· Comment
fixed the problem with no new work
Hi Everyone,
The problem should be fixed -- I put in some changes yesterday which limits the maximum amount of workunits that can be in the database at any given time; since the server had around 300,000 WUs in the database when i made the change I put the cap at 400,000. Looks like you guys requested a ton more work and there are 400,000 WUs in the database at the moment. I'm going to increase the cap to 500,000 which hopefully should be more than enough.
The reason I'm putting the cap in is to prevent the problem we had earlier when I had to wipe clean the database. For whatever reason sometimes when the database is being backed up to disk or if there's a problem with the feeder, the code that calculates the number of workunits available fails and the work generator continues to generate new workunits without stopping -- flooding the database. This extra check should prevent that from happening.
Anyways, work should be flowing again shortly; and I'm working on a better solution to this problem then just a hard cap on the number of workunits in the database (although we probably need that to keep the database from getting unresponsive).
--Travis
25 Oct 2011 | 19:41:28 UTC
· Comment
Making Milkyway@home Better
Recently Milkway@home has seen some major crashes, and there have been stability issues popping up. There have been plenty of rumors running around the message boards as to what the issue is. There has been discussion, from both Milkyway crunchers and staff, of buying a new server. The Milkyway staff has been discussing this issue for awhile, and we've recently come to some conclusions.
There's a strong consensus that the Milkyway server hardware is not to blame. It appears that the OS is buggy, being fairly old and forced to run modern database software (among other things).
In order to update the OS (and other software), we would need a new server so that the old one could continue to run during the install and upgrade process. We have access to funding for a new server, but the issue is that we do not have the staff to oversee the installation of BOINC/MW and the eventual migration. Especially since our computing support staff has seen major cutbacks in the recent past.
The donations from Milkyway crunchers have already funded one student's work on Milkyway@home during the last summer. In addition to funding that student, donations have just recently funded a Milkyway development machine which will soon be capable of compiling all of the Milkyway client software across multiple architectures simultaneously. We take your donated money seriously, and will ONLY use it to fund students or Milkyway-related hardware purchases.
So while we appreciate the recent drive by our users to donate or raise money for new hardware, it is actually support staff that we need the most. Direct monetary donations to Milkyway will be the most useful to us, and can be made through our donation page.
We are looking to hire a part-time computer scientist to clean up the server, work on server stability, and create a better experience for our crunchers. This person might also assist in migrating us over to a new server, when the time comes. We only need to raise a few hundred more dollars to meet the salary goal. We also need to raise money for next summer's salaries.
ALSO, we will be creating a hardware donation page similar to SETI@home's. Matt A. needs a few GPU's to continue his dev work, and he is also looking to build a twin to the new Milkyway dev machine. I'll let him provide the details, and all of this will be posted on the new page. Look for it in the next few days.
Donations of old GPUs will always be useful and welcome to Milkyway@home. When the hardware donation page is up, we will provide an address to which these may be sent. I believe that all donations will be tax-deductible.
Thank you for providing your valuable cycles to us, as they provide us with the ability to crunch enormous amounts of data and make Milkyway@home one of the top BOINC projects. This is all we really need to keep Milkyway@home running and successful. But if you would like to help make Milkyway@home BETTER, please consider making a small donation to help support our staff.
Cheers,
Matthew N
12 Oct 2011 | 21:45:36 UTC
· Comment
sending out work again
Think I have things squared away on our end. Let me know if you're having any problems with the workunits being sent out.
11 Oct 2011 | 4:15:28 UTC
· Comment
feel free to cancel any in progress WUs
Looks like I'm going to have to drop the result and workunit tables to get the database working again. Feel free to cancel any workunits you have in progress. I apologize for this but it's looking like it's the only way to get the project back on it's feet in any reasonable amount of time.
9 Oct 2011 | 19:03:00 UTC
· Comment
We're back! (somewhat)
As seems par for the course, if I leave for the weekend and don't have internet access milkyway likes to crash and go into panic mode. It looks like we a crash while some process had the database tables locked. This caused everything to freeze and back up. I'm waiting for some queries to finish up in the database, then we should be able to turn everything back on. Things might be a bit slow until then.
Thanks for hanging in there while the site was down.
--Travis
9 Oct 2011 | 17:13:34 UTC
· Comment
upgrading assimilator/validator update
Just an update -- I've pretty much gotten the new assimilator, validator and work generator daemons working (they're going to be separate now). I'm probably going to be testing work generation first (with the old assimilator/validator) to make sure things are working correctly -- probably tomorrow night I'll be sending out test workunits. After I get those sent out and coming back correctly, I'll start testing the new assimilation and validation.
--Travis
3 Oct 2011 | 19:27:24 UTC
· Comment
upgrading assimilator and validator
So I've finally gotten settled and have been making progress with the bug where some workunits are terminating early because of maximum CPU time elapsed. I'll probably be bringing things up and down a bit over the next few days as I test and debug the newer assimilator and validator.
16 Sep 2011 | 17:19:06 UTC
· Comment
Server Components Restored
One of the files on the server had it's permissions corrupted, and brought most of the components to a halt. I have resolved the issue, and everything appears to be running fine now.
-Matthew N
15 Sep 2011 | 18:18:37 UTC
· Comment
Server Components Down
Several of the Milkyway@home server components have gone down. We're working to get them back up.
-Matthew N
15 Sep 2011 | 17:28:15 UTC
· Comment
Settling at UND
Hi Everybody,
I've finally somewhat gotten settled out here in UND, and have time to start working on MilkyWay a little bit more seriously again. ;) I've come up with a fix for the maximum time limit elapsed bug, and that's my main priority right now. It's going to take a bit of coding on my part but I'll keep you posted.
--Travis
22 Aug 2011 | 22:07:58 UTC
· Comment
Happy Wednesday!
Hello friends....
It's Wednesday and all is well. The assimilators appear to have been down briefly earlier but they are up now.
Have a great rest of the week!
Blurf
17 Aug 2011 | 16:49:00 UTC
· Comment
8/12/2011-Friday Update
Just updating the News Section.
Friday about Noon EST and all servers are running at this point.
Have a great weekend!
Blurf
8 Aug 2011 | 23:54:52 UTC
· Comment
maximum time limit exceeded bug
It seems like people are still (sigh) having this problem. Let me know if you're seeing it (and give me a host id) so I can try and debug it.
--Travis
25 Jul 2011 | 18:04:26 UTC
· Comment
back up (hopefully)
Got some news from labstaff:
back online assuming that the portable AC is enough to keep up with the added load until the real unit is repaired
Things have been really hot (close to 100 degrees F) here in Troy, and it looks like the air conditioning unit cooling the server room milkyway@home was in totally broke. They have a portable AC in as a replacement and things have cooled off slightly, so we should be up and running unless another heat wave comes through. No ETA on the real unit being repaired yet but we will keep you posted...
I'm also in the process of moving out to my new position at University of North Dakota these coming weeks, so if things are a little slow on my end getting things working or fixing problems, that's why.
25 Jul 2011 | 16:23:00 UTC
· Comment
Make-Up Credits to Run Wednesday and Thursday
Hello,
We'll be running two days of double-credits this week on Wednesday and Thursday to make up for outages earlier in the summer.
Cheers,
Matthew N
18 Jul 2011 | 23:11:26 UTC
· Comment
DOUBLE CREDIT FUNDRAISER CANCELLED
(Posting this to get it up as a news item it is also in the original thread below this one in the News section).
Dear MilkyWay@home volunteers,
It appears that I have made a mis-step in management of MilkyWay@Home, regarding double credits for fundraising. I believed that this had been done before. I have since discovered that not only I was incorrect, but it is also inconsistent with the BOINC credit system. As volunteers, you have donated enormous computational resources to my group, and on top of that have created a powerful community that not only helps us with software but also teaches us how to be responsible members of your community. We will be running double credit two days this week to make up for lost credit in the past, but we will not be running a double credit fundraiser in September. I apologize for any inconvenience this has caused. I will need a little time to understand what the right way is to do this, so stay tuned.
Best Wishes,
Prof. Newberg
18 Jul 2011 | 15:00:37 UTC
· Comment
DOUBLE CREDITS Fundraiser - Request for Funding for MilkyWay@home
The big news! We have been planning a double-credit fund-raising event here at Milkyway@home. To explain the details, here's a letter from Prof. Heidi Newberg:
Dear MilkyWay@home volunteers,
I want to first thank all of you for helping us study the Milky Way galaxy by donating your CPU and GPU cycles to our project. Some of you have also contributed your computational expertise, or contributed cash or GPU cards to our projects. We appreciate all of these gifts. MilkyWay@home CPU/GPU cycles have contributed to defining the distribution of stars in the Sagittarius dwarf tidal stream. Stars in this stream were pulled off of the Sagittarius dwarf galaxy, which is in orbit around the Milky Way, by the Milky Way's gravity. In addition, novel computational techniques have been developed for asynchronous parameter searches using a heterogeneous volunteer network.
Over the next year we plan to modify MilkyWay@home to measure the overall distribution of stars in the Milky Way halo. The magnitude of your generosity has made it possible for us to dream bigger dreams. We are starting to transform MilkyWay@home to do n-body simulations of several dwarf galaxy tidal disruptions at the same time. We can then change the model parameters for the Milky Way's gravity and for the properties of the dwarf galaxies so that they match the properties of actual tidal streams in the Milky Way. This is a big job and a new field, and we would not be able to contemplate doing this if you were not volunteering the equivalent of one of the largest supercomputers in the world to this task. Presently, we have taken only the first steps towards this, by simulating the disruption of a single dwarf galaxy and varying only the parameters of the dwarf itself. We have quite a lot to learn in developing the correct techniques for solving this problem, but if we are successful we will eventually find out how much mass (mostly dark matter) is in the Milky Way, and how it is distributed in space.
A year ago, I was successful in getting an NSF grant funded that supports my astronomy research on the Milky Way, and a significant part of that grant was justified by the use of your volunteer computing cycles. We are still seeking grant funding that will support the computer science research side of MilkyWay@home. A few years ago when we were stretched for funding between grants, we raised about $7000 in private donations, most of which were from MilkyWay@home volunteers. The remaining portion of this funding is currently being used on summer salary to support Matthew Arsenault, who is working on converting our n-body code so that it will run on GPUs. Matthew recently received his BS degree from RPI, and has been working on MilkyWay@home code for more than a year. He will be entering our PhD program this fall. He will be supported as a teaching assistant during the coming school year, but I am again turning to the volunteers to help me support him in summer 2012. Currently, I do not have funding to guarantee that I can support him. The cost of a graduate student summer salary plus institutional overhead is about $10,000. Of course, we can always do more with extra funding, but students are my top priority. Matthew has done a fabulous job implementing our current n-body simulations.
We are therefore launching the following "thank you" and fundraising campaign. On September 1, 2011, we will run MilkyWay@home with double credit. Hopefully, this will at least make up for some of the credit lost when MilkyWay@home was not operational for software, hardware, or air conditioning emergencies. We understand that these have been frustrating to some users, but be assured that they were much more frustrating for us.
In addition, we will run MilkyWay@home with double credit for one additional day for every $500 raised between now and September 1, 2011. Donations can be made with paypal or credit card online at:
http://www.dudleyobservatory.org/MilkyWayAtHome/MilkyWayAtHome2.html
Payments by check can be made out to Dudley Observatory and mailed to: Dudley Observatory, 107 Nott Terrace, Suite 201, Schenectady, NY 12308, with MilkyWay@home in the comments line of the check, or with a letter that explains that the donation is to MilkyWay@home. For donations in other forms, you may contact me directly (heidi@rpi.edu). The donations are tax deductible, and you will receive a letter from Dudley Observatory that can be used for tax purposes. Donations will be used to fund graduate or undergraduate students, travel, publication, materials, or equipment costs related to Milky Way research and education. For gifts over $1000, donors may specify a person, family, or organization to be acknowledged in research publications which are partially supported by this donation, and to receive a signed, printed copy of the publication.
Again, we thank you for your support of our research project. We have been overwhelmed by the number of people who are willing to contribute.
Sincerely,
Prof. Heidi Newberg
Further questions and comments are welcome in this thread.
14 Jul 2011 | 17:54:06 UTC
· Comment
Recent Outage
Milkyway@home was down for a few hours yesterday. This was due to several networks at RPI being retuned, and Milkway@home lost it's proper pointing information when this happened. This was a minor outage and shouldn't happen again.
Thank you for your patience.
-Matthew N
14 Jul 2011 | 16:30:49 UTC
· Comment
ps_separation_17_test
A few people have been getting workunits error out with an error like:
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Error reading astronomy parameters from file 'astronomy_parameters.txt'
Trying old parameters file
Error reading into stream_max
Error reading into optimize_parameter
Error reading stream_weight for stream 2
ps_separation_17_test uses a new parameter file which we hope will fix these problems. let us know if you're having any problem with it here.
28 Jun 2011 | 6:07:55 UTC
· Comment
started a ps nbody search
I started up a 'ps' nbody search. Let me know if you're having any problems with these workunits.
28 Jun 2011 | 4:45:05 UTC
· Comment
started some new searches
I've taken down the 'de' searches and started up some 'ps' searches. Let me know if you're having any problems with them.
27 Jun 2011 | 23:51:05 UTC
· Comment
Added FreeBSD applications
I added both Nbody and Separation applications for FreeBSD amd64.
24 Jun 2011 | 6:07:29 UTC
· Comment
fix for old clients being awarded 0 credit for nbody workunits
Looks like the problem has been fixed. Older clients should now be receiving a more realistic amount of credit for the nbody workunits. Let me know if you're still having any problems.
23 Jun 2011 | 17:18:02 UTC
· Comment
Any remaining major credit or application problems?
I should have the problem with a few of the nbody workunits awarding 0 credit fixed today. Are there any other significant problems in credit or with the applications?
If not, I think we'll have the double credit days next monday and tuesday, as things have finally been running smoothly. Or would it be better to have them on a weekend?
23 Jun 2011 | 16:52:55 UTC
· Comment
maximum time limit elapsed bug
Is anyone still having this? Please let me know so I can check out the entries for your hosts in the database.
22 Jun 2011 | 18:08:26 UTC
· Comment
Nbody updated to 0.60
A pretty minor update. Report problems here.
I added a 32 bit OS X one, although it doesn't get multithreaded.
Update: For Windows, I got the names of the DLLs backwards for BOINC the first time for 0.60. Then BOINC's copying of the DLLs wasn't working for no particular reason. Then it turns out one of the DLLs I uploaded somehow was corrupted, but in any case it seems to now be working. So the Windows ones now say version 0.66, but nothing actually changed from the earlier ones in case you noticed those.
17 Jun 2011 | 18:59:35 UTC
· Comment
Separation updated to 0.82
Minor update to fix (hopefully all of) the problems with 0.80
Edit:
Another minor update for the regular 32 bit Windows application to 0.84. This is basically the same thing rebuilt so it should work on Windows 2000
13 Jun 2011 | 13:18:25 UTC
· Comment
Separation updated to 0.80
I've updated 15 of the 16 different versions of the separation for all systems. Post here if you have any problems (particularly if you have a problem with the new 32 bit Linux GPU applications...)
New features:
- Faster CPU calculations because of SSE2 intrinsics from Crunch3r. This should get a bit faster later when I get to some other stuff.
- The binaries now include the different SSE levels for the critical function (e.g. x87/SSE2/SSE3) all in one, and will use the appropriate one for the detected CPU's capabilities, so you don't need any special __sse2 version or anything. The problems on systems without SSE2 which happened sometimes for the GPU applications should also be fixed.
- Checkpointing for GPUs for both OpenCL/Nvidia and ATI/AMD for CAL. It will checkpoint no more frequently than after at least 10% progress, so it might not checkpoint as frequently as your settings specify if you have a particularly slow GPU. GPU checkpointing is a bit slow when it does happen, so if you don't want it, you can disable it with the flag --gpu-disable-checkpointing.
- The Nvidia OpenCL platform should now always be used avoiding issues if you had both AMD's and Nvidia's installed
- More reliable chunking for different GPUs. I didn't play with the time estimates so much, but I think the actual run times should now be closer to the estimates. There should also be fewer cases where a GPU will end up using the slowest possible work size option. I'll probably have to fiddle with this some more if people still aren't happy with the lag.
- Work around for a Catalyst driver problem where sometimes the GPU was reported as 0 Mhz, resulting in much more lag which I think was some peoples' problem.
- New flag that some people requested: --process-priority (-b). On Windows this is 0 (lowest) - 4 (highest) for overriding the process priority. On Linux this is the nice value.
- CAL specific: Removed --responsiveness-factor flag. Use --gpu-target-frequency or --non-responsive instead depending on what you want to do.
- Actually updated the 32-bit OS X application
- In the event of a crash on Windows, you should no longer be bothered with useless crash dialogs
12 Jun 2011 | 21:03:19 UTC
· Comment
another outage this week
It seems like RPI is still working on the cooling issue, so milkyway may be going up and down this week. Labstaff contacted us and said everything should be back to normal by Thursday. If things are running smoothly after thursday and the weekend, we'll have the double credit days next week.
--Travis
7 Jun 2011 | 17:44:53 UTC
· Comment
Outage
Sorry about the outage. RPI had some cooling issues, and many systems (including milkyway) were down.
6 Jun 2011 | 23:45:14 UTC
· Comment
testing a new search 'ps_test_1'
I'm testing a new framework for running our searches, which should make them a bit more reliable and reduce server load. I've sent out a small batch of 1000 (and won't be generating new ones) to see if this works. Let me know if they crunch correctly for you, they should be named 'ps_test_1_X' where X is some number.
25 May 2011 | 11:25:34 UTC
· Comment
trying another fix for maximum time elapsed bug
Let me know if the error is still persisting. Sorry this is taking so long.
24 May 2011 | 4:18:35 UTC
· Comment
another attempt at the max time limit elapsed fix
So I've tried yet another fix. For newly generated workunits, let me know if anyone is still getting the problem with maximum time limit elapsed.
If no one is, then we'll probably have the double credit days sometime next week or the week after.
20 May 2011 | 20:16:46 UTC
· Comment
more maximum time limit elapsed bug stuff
Tried yet another fix on the server end of things. Let me know if you're still seeing maximum time limit elapsed errors.
18 May 2011 | 21:07:56 UTC
· Comment
another change for the maximum time limit elapsed bug
I've tried yet another fix (the rsc_fpops_bound is now 10000 times higher than our estimate). I'm really hoping this should cover most everyone that's still having workunit immediately error out. Let me know if it works.
9 May 2011 | 10:24:30 UTC
· Comment
maximum time limit elapsed bug
It looks like more than a few people are still having a problem where clients are aborting workunits early because of max time limit elapsed.
If you could post here with any information about your client (and if you're using an anonymous platform or anything like that), I'd appreciate it -- so it's all in one place for us to look over.
Is anyone having this happen for the CPU applications, or is it just limited to the GPU applications?
8 May 2011 | 6:26:04 UTC
· Comment
fix to the invalid workunit problem
I think I've fixed the problem with workunits all being marked invalid. Let me know if newly reported workunits are validating ok.
7 May 2011 | 7:52:11 UTC
· Comment
n-body workunits with maximum time elapsed
I've been pretty busy trying to track down this bug (I've seen the same thing happening to us over at DNA@Home), and tried a couple changes to the database today. Is anyone still seeing this issue, or did that happen to fix it?
7 May 2011 | 5:12:52 UTC
· Comment
animation of the n-body simulations
I got this from Ben Willet today. It's an animation which gives an example of what some of the n-body simulation workunits you guys are running are simulating.
And his caption from his thesis:
This animation shows N-body simulations of the Sagittarius (large),
4 May 2011 | 22:12:31 UTC
· Comment
Orphan (medium) and GD-1 (small) stellar streams. The Milky Way Galaxy is
shown edge-on, with the Sun located on the left side of the disk. The
legend shows the +Z direction, out of the Galactic disk. The stellar
streams are created by allowing spherical collections of particles to
evolve for four billion years within the Milky Way's gravity. The
Sagittarius and Orphan streams are formed from dwarf galaxies, whereas
the GD-1 stream is formed from a globular cluster. The Orphan Stream has
been used by Rensselaer researchers to imply a total Galactic mass less
than originally believed.
testing new credit policy on nbody workunits
I finally have BOINCs new credit policy implemented for the nbody workunits. It's to the point where I need to test it live. If you're having any problems with the nbody workunits being validated incorrectly, or being assigned weird amounts of credit (note -- the new credit policy is adaptive so the credit awarded can change over time, and it might take awhile to stabilize), this is the place to let me know.
Once I've gotten this debugged and working correctly, we'll have a days of double credit as promised for all our recent outages; and to make up for any weirdness in testing the new credit policy with the nbody workunits.
So I'd like to thank everyone for their patience in dealing with us getting everything updated and working correctly these last few weeks.
--Travis
29 Apr 2011 | 19:55:28 UTC
· Comment
Server back up
Looks like moving the server was successful. Let us know if you're having any problems with connecting the BOINC clients (it has a new IP address).
Hopefully the new home for MilkyWay will be a bit more stable. It has better air conditioning and a few other things.
25 Apr 2011 | 20:31:44 UTC
· Comment
server being moved
Our server is being moved, hopefully we will be back up in a couple hours.
25 Apr 2011 | 14:52:45 UTC
· Comment
ATI application updated to 0.60
This should fix the slowdown on Windows if you have a high CPU load.
You can now configure the polling mode sort of like what the old one had.
The --gpu-target-frequency <number> is similar to the -f flag the old one had. The default is 30 hz. This can be used in place of the --responsiveness-factor one I added in the last minor release (or in addition to, although I don't know why you would want to. I would recommend using this one instead and I'll probably remove the other at some point).
The --gpu-polling-mode <int> is similar to the -b flag the old one had. A negative integer will use busy waiting and have a high CPU load (like what always happened in 0.58). 0 will use the same method that was used in 0.59. A positive integer > 0 sets the polling frequency in milliseconds. The default now is to poll every 1 ms which seems to have solved the slowdowns with a high cpu load without increasing the cpu usage much.
Update: Now at 0.62 since I'm good at screwing up
12 Apr 2011 | 19:12:13 UTC
· Comment
increased WU limits
I've increased the workunit limits some more, seeing as things stabilized an the server still seems to be running nice and snappy. RIght now the total limit is up to 48, keeping the max of 3 per CPU, and with a new max of 12 per GPU. I think the server should be able to handle this, given the use of the new applications not hammering the hard disks nearly as much. If things still stay nice and smooth for the next few days I'll try increasing them again.
11 Apr 2011 | 3:40:20 UTC
· Comment
ATI application updated again
I've updated the new ATI application to 0.59. This fixes the 100% CPU problem, and adds a command line flag -r <some number> or --responsiveness-factor <some number>. This number just multiplies the estimate for how long things are expected to take to decide how to break things into smaller pieces to allow the screen a chance to redraw. Numbers greater than 1 should make things more responsive. 0 ignores any need for responsiveness for maximum speed.
I'm not exactly happy with how it decides to break the problem down to keep the screen responsive; I think how it's done now can still cause slowdowns on some GPUs (and is kind of fragile), and you shouldn't have to do anything to keep things responsive. I'm not really sure the best way to do it. If you feel you actually need to use this, complaints here with what GPU you have might be useful.
11 Apr 2011 | 1:52:35 UTC
· Comment
sending out some more workunits
I'm trying something new. I'm setting the a max total, max CPU, and max GPU workunits in the config_aux.xml file. Maybe this will put a cap on things.
Right now the total max will be 16, the cpu max (per processor) will be 3, and the gpu max (per processor) will also be 3. If this seems to work, and the server can handle it we'll start to increase these limits and see how things go.
10 Apr 2011 | 1:04:51 UTC
· Comment
another update
So it looks like the problem of hosts getting too many WUs is somewhere within the scheduler. I'm hoping I can get some sort of feedback about this so we can get it debugged. No ETA yet, but I'm still hoping for sometime this weekend.
In the meantime I'm going to try and get the nbody assimilator working, so when we can start sending out workunits again we'll be able to send out those as well.
9 Apr 2011 | 22:59:33 UTC
· Comment
ATI application (v0.57) should be available now
I'm sending work out again. Let me know how its working -- and if you're getting more than 3 wus per core and 3 wus per GPU.
9 Apr 2011 | 21:59:24 UTC
· Comment
sending out workunits
I'm starting to generate work again. Please let me know if you're getting more workunits than our max_wus_in_progress for either CPU or GPU should allow (3 and 3 respectively).
9 Apr 2011 | 20:50:37 UTC
· Comment
and a little payback
Since we had to wipe the database clean to get the server up and running, I know a lot of you lost a lot of credit on work you had done while the server has been down. So just as a thank you for everyone sticking with us while we've been dealing with these problems, and as a little payback for all the work lost; once we get the server up and running smoothly again we're going to have a temporary period of double credit (I'm thinking a day or two). Hopefully it will make up for a bit of the work that was lost.
9 Apr 2011 | 20:07:17 UTC
· Comment
an update
So labstaff installed a new root drive on milkyway today. This should hopefully improve the performance of the database in the future. I'll be sending out workunits this evening to see how things go, and to make sure that the workunit caps are working for both the CPU and GPU workunits. If the caps are in place (so our database doesn't get killed again), we should be back up and running, at least for the separation app.
When that's working I can get back to getting the nbody simulation assimilator running with the new credit scheme as the upgrade deprecated the old credit policy it was using.
9 Apr 2011 | 19:53:51 UTC
· Comment
Old ATI applications deprecated
So I figured while everything was breaking and people are already mad at us for having to wipe the workunit and result tables from the database, it would be time to deprecate the old ATI GPU applications. From here on out, we'll not be generating a parameter file for the applications (as all the new versions accept these from the command line which significantly cuts down on the servers file IO), and we'll not be accepting results coming in a file (all the new versions write the result to standard error which goes right into the database, also saving us a lot of file IO).
Matt A is updating the ATI application so they'll be available and automatically downloaded when new work is generated, but for people with customized app info and who are using old versions of the ATI application this is fair warning. We should probably be generating new work sometime this weekend, and the workunits being sent out will break with the older versions of the ATI applications.
9 Apr 2011 | 6:02:52 UTC
· Comment
some progress
Well 4 hours later the database is cleared out, and at least the website is functioning again (so let the flaming begin? But seriously don't kick me too hard while we're down). Next step is to clean up the upload/download directories. Hopefully have that done tonight.
9 Apr 2011 | 5:06:03 UTC
· Comment
bad news
So the big bug that happened after the server code was updated was that there wasn't a working limit for GPU work, which means a lot of clients downloaded thousands of workunits, crippling the database; and even worse crippling the disk of the server even harder reporting all of those workunits.
I'm pretty sure the only way we're going to be able to get the server back up and running in any reasonable amount of time is to clear out the entire workunit and result tables. They're so large I can't even query their size.
So basically to get things up and running again I'll be clearing out the result and workunit tables of the database -- we're hoping we can get that done sometime this weekend. As well as cleaning out our upload and download directories, which are slow almost to the point of unresponsiveness. After things are cleared out, we'll bring things back up (with working workunit limits for both CPUs and GPUs) and I'm hoping things will run fairly smoothly after that.
I know it sucks and I wish we had a better solution, but I think this is the best we can do. Feel free to cancel any workunits/results you have currently crunching; and try not to flame us too hard. :P
9 Apr 2011 | 1:44:25 UTC
· Comment
reducing max_wus_in_progress
I think with the new server updates people are alleowd to get quite a few more workunits than before -- which is part of why the site is running so slow (there are almost 700,000 tasks floating around).
<br>
I've dropped the max_wus_in_progress to 3 for both CPUs ans GPUs. Hopefully this will make the site somewhat responsive again.
8 Apr 2011 | 5:47:02 UTC
· Comment
work generation again
Work is being generated for the separation workunits. Still working on the nbody assimilator but it should hopefully be good to go by the end of the day.
7 Apr 2011 | 18:12:58 UTC
· Comment
taking the assimilator/validator down for a bit
So I'm going to be trying to fix some of all the things Matt broke last night. I think a lot of extra workunits got sent out, so I'm going to be turning off work generation for awhile for the server to get back up on its feet.
<br>
Not quite sure what else we're going to need to do, but it looks like a lot of things got messed up. Just stick with us while we try and get things fixed.
<br>
--Travis
6 Apr 2011 | 22:12:33 UTC
· Comment
Server updated
I've updated most of the server side stuff (and seems like I broke everything in the process). I think I've cleaned up most of the problems, but if you notice anything else weird post it here.
6 Apr 2011 | 7:40:45 UTC
· Comment
Milkyway@Home for iPhone / iOS released
Milkyway@Home (separation) 0.56 is now available for iOS 4 devices!
For some reason they didn't allow this on the app store, so if you want to install it you have to jailbreak.
How to install MilkyWay@Home on iOS:
1. Jailbreak your device. There are many guides - Google for your device model and version of your software (Settings -> General -> About)
2. Start Cydia after jailbreaking. Allow it to Upgrade Everything.
3. Go to Manage -> Sources -> Edit -> Add and type in:
http://phajas.xen.prgmr.com/repo
4. Search for "Milkyway" and install the "Milkyway" package. The description is "Model the galaxy!"
5. Now, on your homescreen, you will find a Milkyway icon for MilkyWay@Home on your device. Tap it to get started.
6. Tap the "Start" button to begin computing. The device checkpoints if you leave.
7. Once done, send your results (via email)
You're done!
1 Apr 2011 | 6:22:39 UTC
· Comment
N-body updated to 0.40
The N-body simulation has been updated to 0.40. All systems are now using OpenMP for threading. The old static JSON configuration we were using has been replaced with Lua (so we can have totally arbitrary initial distributions of particles). Now we'll be fitting dwarf models with multiple components (e.g. a dark matter shell around the dwarf galaxy).
The old applications won't work. Any old search workunits still left also won't work.
Since some people sometimes want download links, here's the source and binaries:
Source:
http://milkyway.cs.rpi.edu/milkyway/download/src/milkyway_nbody_0.40.tar.xz
Linux:
http://milkyway.cs.rpi.edu/milkyway/download/milkyway_nbody_0.40_x86_64-pc-linux-gnu__mt
http://milkyway.cs.rpi.edu/milkyway/download/milkyway_nbody_0.40_i686-pc-linux-gnu__mt
OS X:
http://milkyway.cs.rpi.edu/milkyway/download/milkyway_nbody_0.40_x86_64-apple-darwin__mt
Windows: (also need the dlls)
http://milkyway.cs.rpi.edu/milkyway/download/milkyway_nbody_0.40_windows_intelx86__mt.exe
http://milkyway.cs.rpi.edu/milkyway/download/libgomp-1.dll
http://milkyway.cs.rpi.edu/milkyway/download/pthreadGC2.dll
http://milkyway.cs.rpi.edu/milkyway/download/milkyway_nbody_0.40_windows_x86_64__mt.exe
http://milkyway.cs.rpi.edu/milkyway/download/libgomp_64-1.dll
http://milkyway.cs.rpi.edu/milkyway/download/pthreadGC2_64.dll
28 Mar 2011 | 22:45:23 UTC
· Comment
Issues with the Milkyway@home Support Server
The Computer Science account server at RPI has crashed, making it hard for us to access the Milkyway@home server. The staff is working hard to restore it.
We have to wait this one out. In the meantime, the Milkyway@home server may start doing crazy things - please bear with us until we regain access. Thanks!
-Matthew
10 Feb 2011 | 1:44:03 UTC
· Comment
Nvidia OpenCL updated
I've updated the Nvidia/OpenCL application to 0.52 which should fix the failures on the 23* tasks.
8 Feb 2011 | 1:00:11 UTC
· Comment
a few drive errors
Looks like there were a few drive errors this morning (last night). Everything should be running again now.
7 Feb 2011 | 14:12:01 UTC
· Comment
bypassing server set cache limits
While we appreciate everyone wanting to crunch more MilkyWay@Home by increasing their cache limits; this is part of the reason why we've had so many server problems lately with an unresponsive validator. Mainly, our machine/database is not fast enough to keep up with the additional amount of workunits this is causing in the database. So if anyone is modifying their BOINC client to artificially increase their cache we're asking you to stop so the project will be more stable (until we can further improve our hardware). A few of the really offending clients (who have cached 1k+ workunits) are being banned as they haven't responded to us, and they're hurting everyones ability to crunch the project as a whole.
<br>
So in short, we need you guys to work with us as we're working with limited hardware that can't handle more than 500k+ workunits at a time -- our cache is low partially for this reason. Second, as we've said in a bunch of previous threads in the past, due to the nature of the science we're doing at the project we need a low cache because this really improves the quality of the work you guys report.
<br>
As you (hopefully) know by now, we search for structure and try to optimize parameters to fit that structure within the Milky Way galaxy. And lately we've been also doing N-Body simulation of the formation of those structures. What your workunits are doing is trying to find the optimal set of parameters for those N-Body simulations to end up best representing our sky survey data or to fit those different structures (like dwarf galaxy tidal streams) from that data.
<br>
To do this, we use strategies which mimic evolution. The server keeps track of a population of good potential solutions to these problems, and then generates workunits by mutating some solutions, and using others to create offspring. You guys crunch the data and return the result -- if it's a good one we insert it into the population which improves as a whole. Over time, we get very very good solutions which aren't really possible using other deterministic approaches.
<br>
If people have large caches, that means the work they're crunching can come from very old versions of those populations which have since evolved quite a bit away from where they were when the user filled up their cache. When they return the results there's a lower chance for the results to improve the population of results we're currently working with.
<br>
So that's why our cache is so low, and we'd really appreciate it if you worked with us on this. There are other great BOINC projects out there which can help fill in missing crunch time when we go down, and the BOINC client can definitely handle running more than one at a time. So it might not be too bad to explore some of the other great research going on out there. :)
<br>
Thanks again for your time and understanding,<br>
--Travis
24 Jan 2011 | 19:34:21 UTC
· Comment
assimilator back up
Sorry about the delay. I'm hoping the assimilator will be able to catch up with all the WUs waiting for validation. I'll be keeping an eye on things tonight to make sure everything runs smoothly.
17 Dec 2010 | 3:33:42 UTC
· Comment
project back up
It looks like there was a problem with the operating system drive on the server. We should be ordering it as well as another back up hard drive to hopefully be able to more pre-emptively stop these kinds of problems.<br>
And thanks to Blurf for letting us know it was down.<br>
--Travis
14 Dec 2010 | 22:52:25 UTC
· Comment
Update Nvidia drivers
If you're having problems with the Nvidia OpenCL application, try updating your drivers: http://www.nvidia.com/Download/index.aspx?lang=en-us
It seems that only fairly recent drivers work.
12 Dec 2010 | 5:29:43 UTC
· Comment
Updated separation to 0.50
I've updated the separation application to 0.50 for Windows, Linux x86/x86_64/PPC, and OS X x86_64/PPC. I've also added the Nvidia OpenCL application, but I'm not sure if the scheduler's been updated to automatically send it out. I also haven't yet removed the old CUDA one.
This should fix the crashing of 17_3s workunits and a few other things.
10 Dec 2010 | 7:57:06 UTC
· Comment
OpenCL for Nvidia available for testing
The OpenCL application for Nvidia GPUs is ready for testing for Windows and Linux x86_64. I'm particularly interested in the performance / responsiveness tradeoff on mid-low range GPUs.
Many thanks to cncguru for donating his GTX 480. If I hadn't had it, it would be about 30% slower than it is.
http://milkyway.cs.rpi.edu/milkyway/download/test/milkyway_separation_0.48_x86_64-pc-linux-gnu__cuda_opencl.tar.gz
http://milkyway.cs.rpi.edu/milkyway/download/test/milkyway_separation_0.48_windows_intelx86__cuda_opencl.zip
Extract these to the project directory. On Windows this is something like C:\ProgramData\BOINC\projects\milkyway.cs.rpi.edu_milkyway
On Ubuntu for me, this is /var/lib/boinc-client/projects/milkyway.cs.rpi.edu_milkyway
Minor update:
http://milkyway.cs.rpi.edu/milkyway/download/test/milkyway_separation_0.48.1_windows_intelx86__cuda_opencl.zip
http://milkyway.cs.rpi.edu/milkyway/download/test/milkyway_separation_0.48.1_x86_64-pc-linux-gnu__cuda_opencl.tar.gz
Another minor update:
http://milkyway.cs.rpi.edu/milkyway/download/test/milkyway_separation_0.48.2_windows_intelx86__cuda_opencl.zip
http://milkyway.cs.rpi.edu/milkyway/download/test/milkyway_separation_0.48.2_x86_64-pc-linux-gnu__cuda_opencl.tar.gz
3 Dec 2010 | 2:07:51 UTC
· Comment
disk replacement and some slides
I've been interviewing at the University of North Dakota (which is why you didn't get a news post about the drive upgrade until now), and gave a public talk to the faculty and students. I've made the slides available for all of you if you'd like to take a look at them. The talk was 'From Analyzing the Tuberculosis Genome to Modeling the Milky Way Galaxy: Using Volunteer Computing for Computational Science', and you can find a link on the project information page.
<br>
On another note, labstaff replaced the corrupted hard drive with a faster hard drive (the one previously installed was just an interim drive while we got the new one shipped). So that explains the outage and why a bunch of workunits are awaiting validation. I'm going to keep try an eye on things and make sure they all get cleared out, but I don't get home until wednesday so if anything really bad happens it might need to wait until then.
<br>
--Travis
30 Nov 2010 | 2:13:19 UTC
· Comment
more disk problems
Not milkyway disks, but it looks like a lot of the servers at RPI are going down for maintenance. I'll have things back up and running once all those issues are fixed (hopefully later today).
24 Nov 2010 | 10:49:03 UTC
· Comment
work generation back on
looks like the assimilator got through everything. i'm turning work generation back on, but to make sure everything is getting cleared out of the database correctly, i'm going to turn back on immediate purge (just so I can make sure that all the workunits/results are getting cleaned out of the database correctly and not slowing things down). Should increase the purge time back up to normal after I get that sorted out.
24 Nov 2010 | 10:44:31 UTC
· Comment
stopping work generation temporarily
I'm stopping work generation temporarily to try and get the assimilator/validator to catch up. Not quite sure what the problem is but hopefully will figure it out tonight.
24 Nov 2010 | 2:54:36 UTC
· Comment
corrupted disk
One of our disks was corrupted. It was swapped out today. We think this might have been what was causing the invalid workunits. If you're still having problems please let us know here.<br>--Travis
22 Nov 2010 | 23:53:09 UTC
· Comment
an update on the credit issue
So I know what the problem is with the extra credit, I'm just not sure why it's happening. For whatever reason we're getting duplicate assimilators running. I'll be fiddling around with things tonight and hopefully will have a fix.
16 Nov 2010 | 22:43:05 UTC
· Comment
updated the server side daemons to deal with the credit issue
I've updated the assimilator/validator, and I think this will fix the issue of extra credit being awarded. We'll know in a day or two. :)
11 Nov 2010 | 0:44:25 UTC
· Comment
extra credits
I'm trying to figure out why people are getting way more credits than normal. Are the workunits awarding more credit than they used to? I'm trying to see if maybe there's some kind of server bug that's awarding credit for workunits more than once.
10 Nov 2010 | 18:48:10 UTC
· Comment
scheduler update
I updated the scheduler, so hopefully it should be correctly sending out SSE2 applications. Let me know if it's working here.<br>--Travis
4 Nov 2010 | 21:07:36 UTC
· Comment
back up
Everything should be up and running again. I think I've fixed the last of the bugs that was causing holdup in workunits being validated.
1 Nov 2010 | 18:54:25 UTC
· Comment
downtime
I'm updating the assimilation/validation software to fix some bugs, so they'll be offline for a bit. Should be back up soon.
1 Nov 2010 | 17:17:39 UTC
· Comment
Slow workunits solved
I've updated the Windows and 32 bit Linux applications to mostly fix the very slow workunit problem. The problem ended up being in the math libraries, which affected the Windows applications as well as 32bit Linux.
This should also fix the workunits failing on systems without SSE2 from the previous builds. I had built the BOINC libraries before with SSE2, since it was required for the N-body. However, since it is irrelevant if they use SSE2 or not, it only ended up polluting the non-SSE2 separation binaries.
Non-SSE2 versions will still be quite a bit slower. The SSE2 builds (which also covers 64bit) should be in the range of 25-30% faster than the old 0.19s. The 32bit Linux version is now using crlibm, and will be slightly slower than it should be now as a quick way to get math using SSE2. The 64bit Linux + clang build is still the fastest, but only by < 10% now over the Windows + SSE2 version.
1 Nov 2010 | 15:57:21 UTC
· Comment
number of unvalidated results climbing
I think I figured out the problem with the server not being able to catch up with the unvalidated results. Some of the canonical results files were lost and this was causing the server to revalidate a bunch of results over and over. Should be fixed now.
29 Oct 2010 | 16:27:54 UTC
· Comment
separation assimilator working again
Had a pretty interesting bug last night which caused the separation assimilator to crash. The new versions of the applications were writing the search application multiple times to standard error, and the server was parsing what was between the first <search_application> and the last </search_application>, which threw a lot of messy stuff into the population logs of our searches and crashed everything. At any rate, it should all be resolved and working now.
21 Oct 2010 | 18:05:10 UTC
· Comment
PPC mac version for separation workunits
We've added a PPC version for macs (for the separation workunits). Let us know how it works here.
20 Oct 2010 | 0:38:02 UTC
· Comment
Publication on N-Body Work
We recently submitted a paper to the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS), http://www.ipdps.org/, Evolving N-Body Simulations to Determine the Origin and Structure of the Milky Way Galaxy’s Halo using Volunteer Computing. You can take a look at it here: http://www.cs.rpi.edu/~deselt/ipdps_2011.pdf
<br>
This paper goes into a lot of detail about what we're doing with the N-Body simulations and what we hope to learn about the Milky Way and it's halo from doing them. It also has details about the different N-Body searches we have been and are running (what models of the Milky Way they are using, and things like taht). Please feel free to ask any questions, however simple or complex in this thread. We'd be more than happy to answer these, because you're the one's making this research possible after all.
<br>
Thanks again!
--Travis
19 Oct 2010 | 22:53:52 UTC
· Comment
N-body updated to 0.2
Report any problems here. These won't validate against the older versions since part of the likelihood calculation changed.
19 Oct 2010 | 4:18:33 UTC
· Comment
a fix for the output file issue
I added the <optional/> tag to the result xml so the issue some people have been having with that file not being found should hopefully be fixed. I think the fix will only be for newly generated WUs, so if you've been having this problem I'd cancel whichever ones you're running, so you can get new ones with the right result xml.
18 Oct 2010 | 3:09:45 UTC
· Comment
Standalone screensaver test available for Linux
Testing applications are now available for Linux now that a major timing bug issue has been taken care of. Windows applications are also updated in the windows thread. Mac applications have been forwarded to the appropriate team members for compilation/testing and should be available soon.
Please feel free to leave suggestions/comments here and bug reports in this thread.
The demo is computationally expensive and will not run sufficiently fast on some machines (runs a bit choppy on a Pentium I5 2.4ghz CPU). The "cube_test" application has a much faster demo that uses the same graphics engine. If either works, the final screensaver should run smoothly on the same machine. The final screensaver will have precalculated paths and images, taking care of the speed issue.
mwdemo_linux.tar.gz (v2)
cube_test_linux.tar.gz (v2)
To use the application, download it and extract it to a folder on the desktop or in another folder if you prefer. Then run either of the two executable files that are extracted. As a precaution, it may be a good idea to save any data in other applications before running the full-screen version. For those that do not have Windows XP or later, a zip utility such as 7-zip or WinZip can be used to open the file. The controls listed below are similar to those of Celestia with a few more added in.
Source for all files can be found here.
CONTROLS
ESC - Exit application
Tab : Start over
Shift-Tab : save new start-point
Alt-tab : leave application temporarily when in full-screen mode
Prtsc or Print-screen: saves a screenshot
MOVEMENT
< / > : accelerate backwards /forwards
F1-F9 : change acceleration (each step is 10 times faster than the previous)
Arrow : look around
Ctrl-arrow : rotate view
Shift-arrow : revolve camera around galaxy center (inverted)
Alt-arrow : accelerate up/down/left/right
Spacebar : stop motion
Backspace : focus on center of galaxy
Shift-backspace : focus on Sol
Enter : travel to center of galaxy
Shift-enter : travel to Sol (gives an Earth view of the wedge)
APPEARANCE
1 / 2 / 3 : select object 1-3 (wedge, stream, galaxy)
+ / - : increase/decrease luminosity of object
Shift- + / - : increase/decrease star blur radius
Alt- + / - : increase, decrease number of stars visible
A : toggle axes view
C : toggle camera view
Notes:
If the application runs a little choppy, then it may help to hold keys a little longer if they do not activate every time, esp 1 / 2 / 3 when choosing an object.
(Alt-minus) and (Shift-minus) will quickly increase the frame-rate since they require fewer screen-writes.
As you move away from the stars, they will combine and brighten. The brightness can be reduced manually (see above).
Update: fixed broken links
Update: added replacement keys for foreign keyboards:
p and m keys replace + / -
b and f keys replace < / >
16 Oct 2010 | 2:05:05 UTC
· Comment
Failing workunits
We're experiencing some issues with the RPI computer science login servers since some time over the weekend, so we've been unable to fix the problem with the failing workunits for the separation 0.4 runs. We're waiting for them to be restored from a backup before we can fix the issue.
13 Oct 2010 | 17:26:58 UTC
· Comment
updated the CPU applications
Due to the validation issues, I've updated the CPU applications. Hopefully this should fix the problem. Let me know how they're working here.
<br>
--Travis
7 Oct 2010 | 18:08:46 UTC
· Comment
adaptive validation
I've turned on adaptive validation, so let me know if you're having any validation issues. What's going to happen now is that all results that would improve our searches are still always validated. However, those that won't improve our searches will be validated based on your error rate. The more bad results you return the more frequently your workunits will require validation.
6 Oct 2010 | 17:48:36 UTC
· Comment
Milkyway Screensaver Testing
I am back on the project after having been unexpectedly pulled away for some time. I will be releasing full screensaver demos in the next few weeks, hopefully for all platforms under this thread. Please feel free to leave suggestions/comments here and bug reports in this thread.
The following is the standalone application that can be used to determine whether the demo will work on anyone's machine (currently for Windows machines). The MilkyWay@Home screensaver does not use the conventional libraries used by most BOINC screensavers, so testing is necessary to make sure things run smoothly.
The demo is computationally expensive and will not run sufficiently fast on some machines (runs a bit choppy on a Pentium I5 2.4ghz CPU). The "cube_test" application has a much faster demo that uses the same graphics engine. If either works, the final screensaver should run smoothly on the same machine. The final screensaver will have precalculated paths and images, taking care of the speed issue.
mwdemo_win.zip (v2)
cube_test_win.zip (v2)
To use the application, download it and extract it to a folder on the desktop or in another folder if you prefer. Then run either of the two executable files that are extracted. As a precaution, it may be a good idea to save any data in other applications before running the full-screen version. For those that do not have Windows XP or later, a zip utility such as 7-zip or WinZip can be used to open the file. The controls listed below are similar to those of Celestia with a few more added in.
Source for all files can be found here.
CONTROLS
ESC - Exit application
Tab : Start over
Shift-Tab : save new start-point
Alt-tab : leave application temporarily when in full-screen mode
Prtsc or Print-screen: saves a screenshot
MOVEMENT
< / > : accelerate backwards /forwards (also b and f for foreign keyboards)
F1-F9 : change acceleration (each step is 10 times faster than the previous)
Arrow : look around
Ctrl-arrow : rotate view
Shift-arrow : revolve camera around galaxy center (inverted)
Alt-arrow : accelerate up/down/left/right
Spacebar : stop motion
Backspace : focus on center of galaxy
Shift-backspace : focus on Sol
Enter : travel to center of galaxy
Shift-enter : travel to Sol (gives an Earth view of the wedge)
APPEARANCE
1 / 2 / 3 : select object 1-3 (wedge, stream, galaxy)
+ / - : increase/decrease luminosity of object
Shift- + / - : increase/decrease star blur radius
Alt- + / - : increase, decrease number of stars visible
A : toggle axes view
C : toggle camera view
Notes:
If the application runs a little choppy, then it may help to hold keys a little longer if they do not activate every time, esp 1 / 2 / 3 when choosing an object.
(Alt-minus) and (Shift-minus) will quickly increase the frame-rate since they require fewer screen-writes.
As you move away from the stars, they will combine and brighten. The brightness can be reduced manually (see above).
Update: fixed broken links
Update: added replacement keys for foreign keyboards:
p and m keys replace + / -
b and f keys replace < / >
30 Sep 2010 | 5:54:06 UTC
· Comment
New Nbody Searches
I just started up new nbody searches:
de_nbody_model#_1
Where # = 2 through 6
Let me know how they run!
-Matthew
17 Sep 2010 | 19:39:09 UTC
· Comment
updated the nbody applications again
Now at v0.06. Let us know how they're running here.
14 Sep 2010 | 8:09:24 UTC
· Comment
started a new nbody search: de_nbody_model1_1
The workunits should take much longer to complete. Let me know how they are doing here (and I suppose you can complain if the credit is too much/too little). This should hopefully fix the problem with the workunits terminating prematurely as well.
11 Sep 2010 | 2:21:42 UTC
· Comment
looking for a linux pro :)
Long story short. I'm trying to get linux (ubuntu 10.4) triple booted on a mac pro with an ATI HD 4870 -- so we can compile some new ATI applications. After the initial screen with the keyboard and the little guy theres a blinking cursor then everything goes black. I get a little farther with the alternate install cd, however after hitting enter to install the screen goes black. Trying vga=771 and xforcevesa, haven't fixed it either. I was wondering if anyone had any idea what I need to do to get linux installed?
11 Sep 2010 | 2:08:13 UTC
· Comment
milkyway nbody applications updated
I've updated the nbody applications to v0.04. Let us know how they're doing here. I think they have quite a few updates the Matt A did that will make them more stable. I'll going to start up some longer WUs tonight for them.
7 Sep 2010 | 22:06:08 UTC
· Comment
starting/stoping new assimilators working
You can check the server status page now and see if they're running or not (they're nbody_assimilator and separation_assimilator). I just need to do a bit more debugging on the stop end of things; but starting works which is what was the most important as that should keep work flowing along nicely even if one of the daemons does crash.
<br>
On another note, we're going to be releasing another version of the nbody simulation application this week as a couple changes needed to be made to start it crunching on real data. We'll be starting some new searches which should bump the compute time for the nbody workunits up to the 2-10 hour range.
<br>
--Travis
7 Sep 2010 | 15:54:55 UTC
· Comment
nbody assimilator back up
Found a nasty bug in the nbody assimilator which took me awhile to debug. It should be up and running again.
<br>
On another note, I think I've figured out how to get the new assimilator/validators showing on the project status page (and also tied into the automatic start/stop scripts), so that should hopefully be running sometime tonight.
<br>
--Travis
1 Sep 2010 | 18:05:24 UTC
· Comment
had some corruption in the searches
I finally figured out why no work was being generated but both assimilators were up and running. Work should be flowing again now.
Next up is that I'm getting the new assimilators incorporated into the stop/start boinc scripts, which will mean they should be showing up on the server status page once that gets going, and automatically restarted after a server restart. So this should help a bit with the work outages.
23 Aug 2010 | 2:07:55 UTC
· Comment
windows intelx86 and x86_64 binaries added for the nbody simulations
Let us know how they're working here.
18 Aug 2010 | 3:18:52 UTC
· Comment
x86_64 linux nbody application added
Let us know how it's working here.
17 Aug 2010 | 4:26:45 UTC
· Comment
nbody simulation assimilator up and running
It looks like the nbody simulation assimilator is up and going. I know the workunits are really short, but right now we're just testing to see if the searches can get to the known best position and trying to debug the assimilator. It's a lot easier when we have a bit more of a work flow going.
<br>
Once we have things a bit more debugged, the nbody simulation workunit times should increase quite dramatically. Right now we're simulating around 4000 bodies, but we expect to increase that to around 100,000 to 1,000,000 bodies (or more).
16 Aug 2010 | 9:28:09 UTC
· Comment
added a i686 linux nbody application
Let me know if you're having any troubles with it here.
15 Aug 2010 | 23:57:52 UTC
· Comment
first nbody simulation workunits
I've sent out the first set of nbody simulation workunits, so let me know if you're having any problems with them here. Currently the only application is for 64 bit OS X, so if you're not getting them that's why. I should be putting up linux applications in the next day or so, and windows applications when Matt A. gets them to me.
15 Aug 2010 | 5:11:14 UTC
· Comment
Recent outages
Sorry about the recent outages. It looks like they've been doing some work on the power at RPI so they've had to bring restart the milkyway server a bit. Hopefully they're done and we won't be having so many problems!
<br>
On another note, the n-body simulation application and workunits should be going out sometime this week. More on that once I get the first workunits sent out.
11 Aug 2010 | 2:54:25 UTC
· Comment
Screensaver Demo
This thread will be continually updated with the screensaver progress in its final stages. Here is the latest screenshot of the wedge and galaxy backdrop with the current Sagittarius stream estimate provided by David R. Law et al. Adding color uses considerably more memory for pre-drawn frames along with a decrease in speed, so at this time, a trade-off would have to be made between having real-time motion and having color given that only a small portion of the CPU is dedicated to graphics. Color seems to be the popular choice so far, but additional votes are welcome.
Click here for a full size image.
2 Aug 2010 | 21:24:54 UTC
· Comment
N-Body Simulation of the Sagittarius Stream
N-Body simulation based on data compiled by David R. Law et al.
Here the simulation is shown intersecting wedge 82.
For those interested in learning more about the origin of the Sagittarius Stream, the above animation is based on data from an N-Body simulation created by David R. Law et al. which shows how the stream may have formed. The simulation was rendered using the same engine that will be used in the MilkyWay@Home screensaver.
A MilkWay@Home team member, Ben Willet has written a brief summary about the simulation and what our next MilkyWay@Home project will be!
This visualization shows the two different components of the Milkyway@home project. The circular wedge that you see is a 2.5 degree wide stripe of SDSS data. The bright overdensity in the stripe is composed of stars that have been torn off of the Sagittarius Dwarf Galaxy (Sgr) as it orbits around the Milky Way. The main goal of the project is to analyze this data stripe to determine the most likely parameters of those stars (e.g. how wide the overdensity is, where it is going, and how many stars are in it). To accomplish this goal, each user's computer analyzes a stripe with a certain parameter set, and determines how well those parameters match the data. These results are fed into the BOINC platform, which takes actions to decide the next parameter values, and chooses the best answer.
The rest of the visualization is a prelude to an upcoming BOINC project. The stars in motion are taken from an N-body simulation of the Sgr Dwarf Galaxy. We take the currently known position and velocity of the Sgr Dwarf Galaxy and run it back in time for anywhere between 3 and 6 billion years within a gravitational field that approximates the Milky Way. We then place a spherical group of stars at that predicted location, and run it forward to the current day. The Milky Way's gravity tears apart the dwarf, causing it to flow out into long streams of stars. It is these stars that are contained within the data wedges. The ultimate goal of this project is to utilize N-body simulations over BOINC to find the best fit model for the Milky Way's gravitational field. The ability to do N-body simulations over BOINC is currently being developed by our team of astronomers and computer scientists, and will soon be available to users.
References:
Law, Johnston & Majewski ApJ 619 807L 2005. http://arxiv.org/abs/astro-ph/0407566 Cole, et al.
ApJ 683 750C 2008. http://arxiv.org/abs/0805.2121
9 Jul 2010 | 18:26:02 UTC
· Comment
Work should be flowing
As usual, I go out of town and the server crashes. Work should be flowing, just give the server some time to catch up.<br>--Travis
5 Jul 2010 | 21:06:22 UTC
· Comment
Fetching project list error
We're looking into the error users are getting fetching the project list, and I think I put in a fix. Please let us know if you're still having problems.
25 Jun 2010 | 9:07:25 UTC
· Comment
New Website Coming Soon
We are currently in the process of testing our new site layout. If you'd like to get a sneak preview go to http://milkyway.cs.rpi.edu/milkyway/index_new.php
20 Jun 2010 | 1:40:45 UTC
· Comment
OSX applications updated again to v0.31
I updated the OSX applications again. This should fix the checkpointing and progress issue. Let me know how they work.
19 Jun 2010 | 0:24:42 UTC
· Comment
updating the stock osx/linux/windows applications
I'm going to be updating the stock applications in the next few days because I've discovered a bug in them when running the latest searches.
<br>
I just updated the OSX applications (to version 0.30). I had to modify the checkpointing a bit, so let me know if they are checkpointing correctly.
<br>
I'll be updating linux and windows tonight.
15 Jun 2010 | 18:50:13 UTC
· Comment
new searches
I've started up a bunch of new searches, let me know if you're having any problems with these.
15 Jun 2010 | 8:01:11 UTC
· Comment
MilkyWay@home screensaver coming soon
Hi everyone. I am new to the MilkyWay@home project and will be working with the RPI group this summer. I am currently busy with the visualization of the data. Below is a screenshot of one of the wedges being worked on as it appears in the screensaver (no alterations). I am currently optimizing the screensaver draw-routines to get a decent frame-rate while displaying realistic blur circles for the stars. Suggestions are welcome.
9 Jun 2010 | 20:06:29 UTC
· Comment
server is generating work again
Of course it had to crash. The server should be generating work again, just give it some time to catch up with all the requests.
6 Jun 2010 | 20:55:30 UTC
· Comment
away for the weekend
I'm going camping until Sunday, so please try not to break things while I'm gone!
3 Jun 2010 | 17:20:08 UTC
· Comment
more new stripes (15 and 16)
We've started up some new searches with more new areas of the sky. Let me know how these WUs are crunching. They'll start with:
<br>
de_test_s15_*<br>
de_test_s16_*<br>
2 Jun 2010 | 21:17:52 UTC
· Comment
new searches (de_test_s11 - s14)
I've started up some new searches (in a new area of the sky we need results for). Let me know if you're having any issues with these workunits.
1 Jun 2010 | 17:50:23 UTC
· Comment
paging cluster physik
Has anyone heard from Cluster Physik? I need to talk to him a bit about the new application and can't seem to get in contact via email or PMs. Is he still alive?
1 Jun 2010 | 16:43:14 UTC
· Comment
OSX validation fixed
I really think I have OSX validation working correctly now :) Let me know if you're WUs are getting flagged as valid.
1 Jun 2010 | 16:12:12 UTC
· Comment
server back up (again)
Of course the server had to crash on the long holiday weekend. Things should be up and running again.
1 Jun 2010 | 15:36:58 UTC
· Comment
server back up
Sorry about the downtime. Yesterday was unseasonably hot, it got close to 100 degrees here in Troy, and I think this was the cause of most of the computer science machines at RPI becoming unavailable (we couldn't even check email). Now that the computers are back, I've started up the daemons and work should be flowing again.
27 May 2010 | 17:18:19 UTC
· Comment
milkyway client code now on GitHub
In order to get our code out there and more easy to use, I've put it on github, here. I think this should make the testing process a bit easier (this way I don't have to keep putting zipped up versions of our code), and can also add people as contributors.
15 May 2010 | 1:51:32 UTC
· Comment
Update on the OS X situation
I've updated some of the server code and the OSX applications should now be correctly granted credit. Let me know if you're still having issues.
11 May 2010 | 6:13:03 UTC
· Comment
Updated OS X applications (again!)
They really should work this time, I promise. If you're running the OSX application with version < 0.29 i'd just abort the workunits.
9 May 2010 | 2:08:43 UTC
· Comment
IEEE Congress on Evolutionary Computation Paper and Conference
We've had a paper accepted to the 2010 IEEE Congress on Evolutionary Computation (IEEE CEC 2010), which is part of the 2010 IEEE World Congress on Computational Intelligence. The paper is available here and on the publications section of the main page for anyone interested in reading it.
<br><br>
I'll be presenting this work on Thursday, July 22nd if you happen to be at the conference. It goes into some detail about the different methods we use for finding good fits to the data and analyzes how they perform on simulations of different large scale computing systems.
<br><br>
Thanks again everyone for your support!
<br><br>
--Travis
7 May 2010 | 4:58:01 UTC
· Comment
Updated the OS X applications (again)
What was causing the crashes at the end of the workunit has been fixed. They should really work now. Sorry about all the trouble.
7 May 2010 | 4:07:19 UTC
· Comment
Updated the OS X applications
I've updated the OS X applications. This should fix the issue with checkpointing and the progress bar, let me know how they're working.
5 May 2010 | 20:44:14 UTC
· Comment
Milkyway3 v0.05 source released
I've released the v0.05 source code here (take your pick):<br>
http://milkyway.cs.rpi.edu/milkyway/download/mw3_v0.05.zip<br>
http://milkyway.cs.rpi.edu/milkyway/download/mw3_v0.05.tar<br><br>
The previous release had some errors in the sample input files (for the aux_bg_profile), that's been fixed in this recent update.
<br><br>
I'll be posting the correct expected values for the tests in this forum thread.
5 May 2010 | 20:35:51 UTC
· Comment
Summer Research Opportunity at MilkyWay@Home
We have money available to hire an undergraduate researcher to work with the MilkyWay@Home project this summer. Candidates must be a US citizen or permanent resident and enrolled as an undergraduate in the Fall of 2010 at an accredited university. The position will last for 10 weeks and pay 6000$. Applicants should be willing to come to RPI for the summer to work with the project.
<br><br>
To apply, we'll need an unofficial transcript, CV or resume including the names and contact information of two references. Please send the information to astro [at] cs.rpi.edu.
<br><br>
Potential areas of research involve n-body simulation, GPU programming and web interfaces. We'll be considering applicants until Wednesday, May 12th.
5 May 2010 | 17:52:13 UTC
· Comment
MilkyWay@Home and the BOINC Pentathalon
Just letting everyone know that we're the second project in the BOINC Pentathalon. There's more information to be had about this here:
http://www.setigermany.de/boinc_pentathlon/22_en_Welcome.html
3 May 2010 | 2:13:18 UTC
· Comment
The Science of Milkyway@home
I gave a talk a while ago on the astronomy side of Milkyway@home, and put it up here. I also made a forum thread that explains things in more detail, so please ask questions there.
The .ppt has also been posted on the Milkyway@home home page under "Publications and Talks."
--Matthew
28 Apr 2010 | 22:22:56 UTC
· Comment
RPI Center for Open Source Software (RCOSS) Presentation Tomorrow
I'm going to be giving a brief presentation about MilkyWay@Home tomorrow for RPI's Center for Open Source Software (RCOSS) at 4pm in JEC 3117, if you happen to be nearby and want to attend.<br>
But for those of you who cant, which is probably most of you :) I've made the slides available online. Feel free to ask me any questions about them. Here they are:<br>
[keynote] [powerpoint]<br>
--Travis
23 Apr 2010 | 4:07:14 UTC
· Comment
milkyway3 v0.04 source released
I've released the v0.04 source code here (take your pick):<br>
http://milkyway.cs.rpi.edu/milkyway/download/mw3_v0.04.zip<br>
http://milkyway.cs.rpi.edu/milkyway/download/mw3_v0.04.tar<br>
The new code contains a new method for modeling the background of the milky way's halo, which we hope will more accurately represent the background. Right now, there is no good model for this -- and finding a good model is one of the goals of this project, so if we determine a good one this will be a major result. We're very interested to see how this new model performs.
<br>
There have also been a couple changes to one of the input parameter files, and the output of the application. It will now output individual likelihoods for the background and each stream being modeled, as well as the total likelihood. This will allow us to optimize over them individually and hopefully get better fits for our models.
<br>
This source code also includes Kahan summation, which should make the results for the GPUs and CPUs closer.
<br>
There are new test parameter files included with the source, and if you want more information about the parameter files and expected results visit the forum thread in the news section.
22 Apr 2010 | 21:37:50 UTC
· Comment
issue with not getting work resolved
I think I found part of the issue some users were having not being able to get work. The feeder wasn't set to interleave workunits from multiple applications, so it got filled with WUs for the new application, which doesn't have all OS/architectures supported yet. That meant that the unsupported OS/architectures couldn't get workunits. It should be fixed now.
21 Apr 2010 | 7:53:10 UTC
· Comment
testing validator for new application
I'm testing the validator for the new application (milkyway3). It's currently going to be using the same validation strategy as the regular validator. Let me know if you have any issues.
21 Apr 2010 | 6:34:31 UTC
· Comment
testing new application (milkyway3)
I've sent out some workunits for the milkyway3 application. Let me know if you're having any problems with them.
20 Apr 2010 | 7:38:28 UTC
· Comment
bad workunits
I think i've fixed the problem with the recent batches of workunits not working. Let me know if they're still having problems.
19 Apr 2010 | 5:20:38 UTC
· Comment
How the new validator works
It seems lately there's been a bit of confusion as to how the new validator works, so I'm making this post to help explain everything to everyone.
<br>
All workunits are initially generated with a quorum of 1, so they all go through a first pass of the validator. During this first time through, the validator checks to see if the result is going to be inserted into one of the populations of our evolutionary algorithms. If it is going to, that means we'll be using it to generate new workunits. Because of this we need to validate it to make sure the result is a good one. The validator will then set the quorum to 2 and wait for another result. When this happens, your result will be set to "Completed, validation inconclusive." This doesn't mean your result was invalid or anything was wrong, just that the server is waiting for another result to validate it.
<br>
If your result won't improve one of our populations, there's still a chance that we're going to validate it. This is to make sure that people aren't using bad applications or scripts to scam the server for credit. In this case, the server will again increase the quorum of that workunit to 2, and your result will be set to "Completed, validation inconclusive." Again, nothing here is wrong with your result, its just that the server is waiting for another result to validate against.
<br>
Some results are simply validated without being checked, because we won't be using them to improve our search populations, and we didn't pick them for extra validation, they are simply marked valid and awarded credit. Previously, this happened to all results that didn't improve our searches, which is why you didn't see too many results being verified.
<br>
So this is how the new validation is working. Again, if you're seeing "Completed, validation inconclusive," that doesn't mean you won't be getting credit, it just means it went through the validator once and we're waiting for another result to compare it against. After that (if it's a valid result) it will be awarded credit.
<br>
If anyone has any other questions about the new validation system, please post inside this thread and I'll be happy to answer any questions.
<br>
--Travis
19 Apr 2010 | 3:15:46 UTC
· Comment
server down briefly
I'll be taking the server down for an hour or two tonight to update some validator and work generation code. It should be back up shortly.
19 Apr 2010 | 2:41:30 UTC
· Comment
testing new work generation
I'm testing some new server side work generation. Let me know if any of the new workunits are having problems. I should also be generating work for the new application tonight, so if you have any problems with the milkyway3 application let me know as well.
<br>
--Travis
18 Apr 2010 | 2:41:36 UTC
· Comment
DAIS 2010 paper
I've put up a link to our recently accepted paper in the 10th International Conference on Distributed Applications and Interoperable Systems (DAIS 2010) here: Validating Evolutionary Algorithms on Volunteer Computing Grids. It describes our previous validation work (the newer validator is a bit different and more strict). We still do the same validation (usually optimistic) but now we're also validating other workunits selected at random to prevent awarding credit to bad hosts.
13 Apr 2010 | 8:06:54 UTC
· Comment
workunits being generated
The database is all repaired and the file deleter/purge daemons have caught up so I'm generating work again. Let me know if there are any problems.
<br>
Thanks for bearing with us as I dealt with this database issue.
12 Apr 2010 | 2:30:09 UTC
· Comment
validation starting back up
I'm starting the validator back up, so you can get credit for your reported workunits. The database should be able to handle it.
<br>
I won't be sending out more workunits until the database is fully repaired however.
11 Apr 2010 | 20:45:04 UTC
· Comment
database still repairing
I just woke up and the database is still repairing. It should be good to go in another hour or two.
11 Apr 2010 | 20:41:23 UTC
· Comment
database problem fixed
I found the problem, things should be back up and running in a couple hours.
11 Apr 2010 | 0:42:01 UTC
· Comment
server issues
We're having some database issues which is why the server isn't sending out any work. I've contacted labstaff and elevated the ticket to emergency so hopefully we'll have things fixed shortly. I think we might have to move to yesterday's backup.
10 Apr 2010 | 22:07:46 UTC
· Comment
upgrading the ATI 58x0 application
It seems like BOINC has been having some issues upgrading the ATI client (which is why we're still seeing some weird invalidation). If you're running MW on a 58x0 ATI GPU, please detach and reattach -- this will force your BOINC client to download the new brook32.dll or brook64.dll files, which contain part of the fix to make these applications return the right value.
<br>
If you're using an anonymous platform, please make sure you're using the correct brook32.dll and brook64.dll files from here.
<br>
Thanks again for your time and patience as we upgrade the validator and these applications.
8 Apr 2010 | 18:55:27 UTC
· Comment
stock ATI 58x0 apps updated
I've updated the stock 58x0 ATI GPU applications. Let me know if they download and work correctly.
7 Apr 2010 | 2:24:03 UTC
· Comment
ATI 58x0 GPU fix released
A big round of thanks go to Cluster Physik for quickly updating the 58x0 ATI GPU application, which was causing the validation problems we've been experiencing lately.
<br>
If you're running MilkyWay@Home on a 58x0 ATI GPU, please upgrade your application. A link is to the application is here: here
<br>
Please take the time to update to this, as not only are the bad ATI 58x0 GPU applications causing their own workunits to be often flagged invalid, they report results quick enough that they have been quoruming up against valid results and causing the valid ones to be flagged invalid.
<br>
Thanks, Travis
7 Apr 2010 | 1:39:17 UTC
· Comment
milkyway3 v0.02 source released
As I said before, we're moving over to a new application. Here's a preliminary version that we'll be using to test it's validator and assimilator. It's going to have to be updated with a new fitness function which uses a different model of the Milky Way galaxy, but I'm still waiting for John Vickers to give me go ahead to start using that code -- should be later this week.
<br>
In the meantime, we can still start testing this code as it uses the same output format that John's new code will use. This new application should really improve server performance as it doesn't use search parameter files to send out new workunits, it will take the parameters from the command line instead. Additionally, it will report the fitness in stderr, which gets moved into the result's xml.
<br>
What this means is the server will be creating 1 less file for every workunit made (it currently creates 1 file), and receiving 1 less file for every result sent (it currently receives 1 file). This means sending workunits out and getting results in won't be hammering the filesystem nearly as hard. Right now the recent server crashes have been happening because the file deletion daemon just can't handle the sheer mass of WU files that are being generated, and it brings the server to a screeching halt which then crashes the other daemons. This new application should fix that problem so we really want to get everyone swapped over to it ASAP.
<br>
Contained in this news post (if you go to the forums) is the download location for the new source, and directions on how to compile and test it.
<br>
--Travis
7 Apr 2010 | 1:28:25 UTC
· Comment
quorum down to 2
The database is having a bit of trouble keeping up with all the new results due to a quorum of 3, so for the time being I'm dropping it to a quorum of 2.
<br>
On another note, we should have source code for the new application available tomorrow.
6 Apr 2010 | 2:11:10 UTC
· Comment
validator strictness
I've lowered the strictness of the validator from 10e-11 to 10e-10. I'm hoping this should significantly reduce the number of WUs flagged invalid. If the issue persists I might have to lower it farther to 10e-9. The new application will have the strictness back at 10e-11, so keep that in mind if you're compiling your own versions.
<br>
The issue we're having seems to be that the ATI 48xx GPUs and the ATI 58xx GPUs are returning different results, and if too many of either make it into the quorum they will invalidate the other results (including stock results). I'm still trying to determine if the 58xx GPU or the 48xx GPU is the one correctly validating against the stock application.
<br>
I've also updated the validator so if you check your tasks they will show what fitness they reported, so you can compare vs other tasks for the same workunit.
<br>
I'm hoping we should have this issue straightened out shortly, and thanks for your patience.
5 Apr 2010 | 16:35:39 UTC
· Comment
testing new validator
I've started up the new validator, so please be patient as I get all the kinks worked out over the next few days. Validation will now work as follows: Every result that could improve one of our searches will be validated (with a min quorum of 3 -- and the accuracy of the fitness reported must be within 10e-11 of the quorum results, this means that single precision GPU results will be flagged invalid). Results that won't improve a search will be validated 50% of the time until the error rates of hosts stabilizes in the database (this will probably take a couple weeks). Afterwards, for the results that don't improve our searches, we'll be using BOINC's adaptive validation based on hosts error rates (which will be between 10% and 100% depending on how many errors the host typically has).
<br>
On a side note, we'll also be updating the applications this week. We've made new background models for the milky way that we want to test. Additionally, there are some server related performance improvements that should help the server response time. I'm hoping to have the source code available by tuesday so people can compile their own applications, then make the full swap over to the new application sometime early next week.
4 Apr 2010 | 23:53:12 UTC
· Comment
new test workunits
I've generated a few test workunits with the new validator, I think everything should work correctly with them, but if not that's why.
If these come back OK, I'll be starting up the new validator later today.
4 Apr 2010 | 21:53:07 UTC
· Comment
new osx applications
I've updated the OSX applications (i686 and x86_64). Let me know if they're working. I think this will solve a problem we've been having with some types of workunits not crunching correctly.
2 Apr 2010 | 19:31:41 UTC
· Comment
Server outage
I'm taking the server down tonight (and probably most of the tomorrow).
I've made some big changes in the assimilator and validator which should help me implement new features and debug them in the future (mainly I rewrote them in Java so I don't have to worry about memory leaks or segmentation faults).
I'll be debugging them over the next couple days so expect some outages. Most notably, validation will be much stricter now; considering even though I've asked nicely we still see a lot of people trying to scam the system (scripts and single precision GPU clients for example). It's kind of sad that a few bad users have to ruin things for everyone (and make our work that much more difficult), but I guess thats the way things have to be.
On another note, we've had two papers accepted recently, one to the Distributed Applications and Interoperable Systems (DAIS 2010) conference (http://discotec.project.cwi.nl/index.php/DAIS:Main), and another to the World Congress on Evolutionary Computation (CEC 2010) http://www.wcci2010.org/topics/ieee-cec-2010. I'll be making these available after the validator/assimilator upgrades.
2 Apr 2010 | 2:36:58 UTC
· Comment
Scripting to remove WUs from certain searches
I'd also like to put a note out there to people running scripts to cancel WUs from certain searches, asking you to stop. These can lead to some bad server problems where some WUs accumulate on the server because they're not being processed fast enough. We'll be taking some steps on the server end to stop this kind of activity, but until that gets finished I'd like to ask people to stop. If it continues I'll start banning or zeroing out offending users credit. --Travis
11 Mar 2010 | 2:53:25 UTC
· Comment
Bad New Searches
Matt tried to start up some new searches (and looked like everything went wrong yet again). I've removed those and will start up some others (that shouldn't have any problems). I'll be meeting with Matt to go over the process of starting searches again so we stop having these problems. --Travis
11 Mar 2010 | 2:36:42 UTC
· Comment
Server Slowness
Matt has started up some new searches on a different area of the Milky Way and it looks like the server is having trouble keeping up with requests for the new data file of stars. Hopefully things will go back to normal after the new data file has been sent out to most of our users.
We're also in the process of purchasing more RAM for the server (which seems to be the current bottleneck), so hopefully this will help the problem in the future.
18 Feb 2010 | 23:50:42 UTC
· Comment
Visualization/Screensaver Work
Hey everyone,
I've joined the MilkyWay@Home project for the semester and have heard some rumblings of people wanting some sort of visualization, or screen saver, dare I say :-). We've come up with a few ideas and would like input on which sounds the most appealing to you.
The ideas are ordered from showing the most scientific data to the least. In other words, Idea 1 will give a very clear visualization of what the work units are doing, whereas Idea 3 is more eye candy with fewer details on the science. I want to know which end of the nitty-gritty/eye candy spectrum you prefer.
Idea 1: A zoomed in view of a wedge, displaying the streams being computed for the wedge with indications of progress so far, color-coding to indicate what stars are in what stream, etc. A very good view of what really is being computed, but not much of an overall picture.
Idea 2: An alternating view of the galaxy that starts zoomed out (showing the current best model of the galaxy), followed by a zoom or similar transfer to showing the currently computing wedge. These views would alternate at some specified time interval.
Idea 3: A constant zoomed out view of the galaxy with maybe some small pictures on the sides showing a currently computing wedge and some sort of progress indicator for current work units.
Obviously it would be ideal to incorporate aspects of all different views, but given time constraints, I'd like to focus my efforts on the most popular idea. I encourage you to throw out your own modifications on the ideas and we'll hash through this.
Let 'em rip
-Eric
13 Feb 2010 | 18:56:26 UTC
· Comment
MilkyWay@Home press release
It looks like physorg picked up our press release, so I've shared the story on reddit.com.
If you get the chance, please go and upvote the reddit post so we can spread the news! :)
11 Feb 2010 | 19:21:29 UTC
· Comment
RPI Press Release
RPI has put out a press release about our project. Thanks everyone for all your help! http://news.rpi.edu:80/update.do?artcenterkey=2685
10 Feb 2010 | 22:18:38 UTC
· Comment
Server Outage
Sorry about the outage. We had some hardware issues but everything should be back running smoothly now.
10 Feb 2010 | 19:18:10 UTC
· Comment
Questions and Answers forum
I actually thought we had removed those forums, but I guess the recent update of the web code made them available (as they were still in the database). If you have any questions or problems please just use the number crunching forum. I've removed the link to those other older forums.
10 Feb 2010 | 3:21:38 UTC
· Comment
Outage
Sorry about the earlier outage. Everything should be up and running smoothly now.
3 Feb 2010 | 4:11:49 UTC
· Comment
Finally fixed user of the day scripts :P
I really think I fixed them now. Finally!
29 Jan 2010 | 18:32:18 UTC
· Comment
lowering purge time
I'm lowering the purge time (sorry) to help speed up the database. This should hopefully improve response time for the webpage and downloading new workunits.
22 Jan 2010 | 7:45:04 UTC
· Comment
new searches testing different validation types
I've implemented optimistic and pessimistic validation for the particle swarm searches now as well (as opposed to just the differential evolution searches that were running before). I've tested them quite a bit so I think it's working correctly, but just warning you the server might crash a few times in the next few days while I get all the bugs sorted out.
Let me know if you're having any problems with the ps_s222_opt_1, ps_s222_pes_1 workunits.
22 Jan 2010 | 4:08:11 UTC
· Comment
Fix for the anonymous app_plan.xml
I updated the server with the fix for people whose app_plan.xml with count set < 1. Let me know if it worked.
19 Jan 2010 | 5:22:48 UTC
· Comment
Need a sample app_info.xml
Some of our users have said that they're having problems using the ATI application on an anonymous platform; when they have settings that specify no cpu, and no nvidia gpu work, and only ATI gpu work. If anyone is having this problem please post your app_info.xml so I can send it to the boinc_dev mailing list to get it debugged. thanks!
--Travis
18 Jan 2010 | 4:46:59 UTC
· Comment
Patch for GPU application's CPU utilization
I applied a patch which should hopefully fix the CPU utilization of the GPU applications, which was causing some users problems with running the CPU app alongside the GPU app. Let me know how it works.
17 Jan 2010 | 3:58:14 UTC
· Comment
Database Problem
We had a problem with the database, which caused the outage. I'm trying to get more information about what happened.
14 Jan 2010 | 19:13:16 UTC
· Comment
User of the Day and other scripts
I think I finally fixed the problem with the user of the day not updating, as well as a few other scripts.
13 Jan 2010 | 23:04:34 UTC
· Comment
ATI Application
After a few weeks of testing the deployment of the ATI application through BOINC it appears as most of the kinks have been worked out. Milkyway@Home would like to thank Cluster Physik for the development of the application and the many volunteers that assisted in reporting issues. Currently the application is available for Windows 32/64 bit and requires an ATI GPU with double precision capabilities. Milkyway@Home would also like to thank Advanced Micro Devices (AMD) for their generous donation of hardware.
13 Jan 2010 | 22:30:18 UTC
· Comment
Increased WU Deadline
I've increased the WU deadline to 8 days (up from 5). Let me know if this worked.
We've been letting our searches run a bit longer now because we're trying to get more accurate results, and we've also increased the WU size so some CPUs are having problems keeping up. Because of this I think it was a good time to increase the deadline. This might also help our project play a little nicer with other ones. Let me know how it works.
13 Jan 2010 | 22:17:56 UTC
· Comment
Old News Lost
Well, it looks like our old news couldn't make the transition to the new software. Hopefully it won't be missed too much. I think we've gotten most of the server problems ironed out, so let me know how things are working.
11 Jan 2010 | 5:07:58 UTC
· Comment
News Updates
Well, I tried to run the script to update the news forum, and it ended up crashing halfway through so we have a lot of REALLY old news :) I'm going to try and fix this.
--Travis
11 Jan 2010 | 5:04:40 UTC
· Comment
News is available as an RSS feed ![]()