Welcome to MilkyWay@home

Posts by Jake Weiss

61) Message boards : News : New Workunit Generation Pause (Message 68256)
Posted 15 Mar 2019 by Jake Weiss
Post:
Hey Everyone,

In preparation for the migration to the new server, we will be pausing the generation of new workunits today. This does not mean that we will stop sending out workunits. Instead, it means that the only workunits being sent out will be those needing cross validation. Hopefully, this will allow us to clear the workunit validation queue by Tuesday morning and simplify our transition process.

This may also mean that at somepoint this weekend or on Monday, there will be very few workunits left on the server to process. This is expected.

Best,

Jake
62) Message boards : Number crunching : Invalids Exit status 0 (0x0) after server came back (Message 68255)
Posted 15 Mar 2019 by Jake Weiss
Post:
Hey everyone,

I had hoped this would clear itself up as errored work units started to clear the queue. My guess what that this was caused by workunits being aborted or taking too long to be returned due to the outage.

To prevent this on the next outage, I am planning on stopping new workunit creation tonight in preparation for the migration to the new server next week. This means that no new workunits will be created, but old ones will still be validated as they're returned. Hopefully by Tuesday morning, when the switch occurs, there will be few or no outstanding workunits to be validated as we switch to the new server.

Sorry you guys feel under appreciated, that is not the case at all. The team is going through a pretty big transition in the background here as Sidd just graduated and has left the team. We are training up Eric to take his place, but there is an incredibly steep learning curve to this project. I am also working hard on making sure the transition to the new server is a smooth one.


Thank you all for your understanding.

Best,
Jake
63) Message boards : News : Planned Server Outage Tuesday March 19th (Message 68252)
Posted 13 Mar 2019 by Jake Weiss
Post:
Hey Everyone,

The new server is in and I am working on getting our project set up and migrated over. My current plan is to take the current server offline early on Tuesday and migrate the database to the new server. I'm not sure how long this will take since the database is quite large. Hopefully I will have everything back up and running sometime on Wednesday the 20th. Sorry for any inconvenience this will cause.

Thank you all for your continued support.

Best,
Jake
64) Message boards : News : New Server Installation (Message 68225)
Posted 7 Mar 2019 by Jake Weiss
Post:
That is the hope.

Jake
65) Message boards : News : New Server Installation (Message 68221)
Posted 7 Mar 2019 by Jake Weiss
Post:
Hey Everyone,

We will be receiving our new server today. This means we may experience a few planned outages over the next couple days as we get it set up and migrate our data. In addition to installing this new hardware, we will also be upgrading the server to the BOINC server version 1.0 that was recently released. As is usual during upgrades, there may be a few hiccups along the way. Please bear with us while we work to make things more stable in the long run.

Thank you all for your continued support.

Best,

Jake
66) Message boards : News : Server Downtime Over the Weekend (Message 68203)
Posted 5 Mar 2019 by Jake Weiss
Post:
Glad everything is running smoothly on your end.

Jake
67) Message boards : News : Server Downtime Over the Weekend (Message 68201)
Posted 5 Mar 2019 by Jake Weiss
Post:
They are not on different servers. I'm not sure why you were able to upload. That's strange.

Thank you everyone for being so understanding of our downtime this weekend.

Jake
68) Message boards : News : Micromanaging CPU vs GPU Workunit Limits (Message 68196)
Posted 1 Mar 2019 by Jake Weiss
Post:
Hey Everyone,

Sorry that things have been bad for so long. I just want to let you know that we have ordered a new server and it is on the way. Hopefully it will be here in a couple weeks and it should further relieve some of the database issues we have been experiencing.

Best,

Jake
69) Message boards : News : Server Downtime Over the Weekend (Message 68195)
Posted 1 Mar 2019 by Jake Weiss
Post:
Hey Everyone,

The building we house our server in will be doing maintenance on their power systems over the weekend. Unfortunately, that means there will be an outage from today at 4pm to Monday at 10am. I am sorry for the late notice.

We hope you all have a great weekend.

Best,
Jake
70) Message boards : News : BOINC Open Source Project Looking for Experienced Macintosh Developers (Message 68157)
Posted 13 Feb 2019 by Jake Weiss
Post:
Hey Everyone,

The Berkeley Open Infrastructure for Network Computing (BOINC) system is the software infrastructure used by MilkyWay@home and many other volunteer distributed computing projects. The BOINC Open Source Project is looking for volunteers to develop and maintain the BOINC client on Macintosh. The BOINC Client and Manager are C++ cross-platform code supporting MS Windows, Mac, Linux, and several other operating systems. We currently have a number of volunteer developers supporting Windows and Linux, but our main Mac developer is winding down his involvement after many years. He is prepared to help a few new Mac developers get up to speed.

If you have Mac development experience and are interested in volunteering time to help support and maintain the BOINC Mac client please have a look at the more detailed description here: ​https://boinc.berkeley.edu/trac/wiki/MacDeveloper

If you are not a Mac developer, but have other skills and are interested in contributing to BOINC, the link above also has more general information.

Best,
Jake
71) Message boards : News : Micromanaging CPU vs GPU Workunit Limits (Message 68046)
Posted 18 Jan 2019 by Jake Weiss
Post:
I am thinking, why do we even have a single threaded application? How would everyone feel about having only multithreaded, and if you want to run it single threaded wu you just choose to have 1 CPU used? Would that work?

Jake
72) Message boards : News : Micromanaging CPU vs GPU Workunit Limits (Message 68043)
Posted 18 Jan 2019 by Jake Weiss
Post:
Hi Link,

Can you explain a little more about what you want? Is there a reason you would prefer to run say 5 or 6 single-core runs instead of 1 multi-core run on 6 cores?

You can already limit the number of cores you crunch on through the BOINC manager so I am a little unsure what you are asking for, sorry.

Jake
73) Message boards : News : Micromanaging CPU vs GPU Workunit Limits (Message 68035)
Posted 17 Jan 2019 by Jake Weiss
Post:
Hey Everyone,

So it is looking like the server is struggling to make enough workunits for everyone. I am going to try increasing the workunit cache size to 10000 temporarily to hopefully help.

Jake

Looks like we are holding stable with this number in reserve. I'm going to leave it at this over night and hopefully it stays good. I'll check here periodically in case there are any issues.
74) Message boards : News : Micromanaging CPU vs GPU Workunit Limits (Message 68034)
Posted 17 Jan 2019 by Jake Weiss
Post:
Hey Everyone,

I will be tweaking some config options on the server to better improve database stability and allow for stockpiling of more workunits by GPU users.

The new workunit limits should be as follows:
Separation:
600 total
200 GPU (Per GPU up to 600)
40 CPU (Per CPU up to 600)
Nbody:
120 total
20 CPU (Per CPU up to 120)


Hopefully you guys notice this on your end. If we notice we are running out of workunits more frequently on the server, I will increase the workunit cache a bit.

Let me know what you all think about these numbers and how it is working for you.

Jake
75) Message boards : News : New Separation Runs (Message 68033)
Posted 17 Jan 2019 by Jake Weiss
Post:
Hey Bluestang,

The number of workunits we bundle is directly related to the number of streams we have in our model. More streams means a higher number of parameters to fit on the commandline and also a longer integral time because of a more complex model. These are scientifically motivated and sometimes we have to run those types of runs to get the science we need done. We try to compensate for the difference in computation time by changing the number of credits given for each. Unfortunately, our credit algorithm does not always give the perfect compensation for the differing computation times on all machines. Since we do not run these types of runs often, we have not worked on better refining our credit allocation for these types of runs.

Everyone,

I am about to try something to better micromanage GPU vs CPU workunits. Not sure if it will work. I will make a separate thread about this.

Jake
76) Message boards : News : New Separation Runs (Message 68025)
Posted 17 Jan 2019 by Jake Weiss
Post:
Hi Max,

There are necessary book keeping tasks that have to be performed between workunit runs. These include, doing the final parts of our likelihood calculation on all of the stars (after the GPU computes our large integral), recording the results of the current run, cleaning up data from the current run, and formatting data for the next run. Unfortunately, this does take some time to complete. If it makes you feel any better, these same tasks would also be completed during a normal single workunit, so it shouldn't be any different in overall runtime for 5 single work units or 5 bundled work units. The reason you are seeing different run times for these final parts likely has to do with the different number of stars in different jobs, and not anything to do with your GPUs.

xtatuk,

I understand your frustration with the database, please understand that we are frustrated with it too. Unfortunately, it is something that is not easy to fix. We are working on making the error messages a little more graceful and improving caching of our data driven web pages. This should hopefully make our website run smoother when our database is under heavy load from work requests. Also it should allow our database to prioritize on handling workunits when it is under heavy load. I doubt this will solve the problem entirely, but slowly we are working to improve the stability of the database through incremental changes.

Thank you all for your continued support.

Jake
77) Message boards : News : New Separation Runs (Message 68015)
Posted 16 Jan 2019 by Jake Weiss
Post:
Hey Everyone,

Just wanted you all to know I put up some new separation runs. These runs are back to fitting 3 streams and are bundled in groups of 5. The names of these runs are:

de_modfit_80_bundle5_3s_NoContraintsWithDisk200_1
de_modfit_81_bundle5_3s_NoContraintsWithDisk200_1
de_modfit_82_bundle5_3s_NoContraintsWithDisk200_1
de_modfit_83_bundle5_3s_NoContraintsWithDisk200_1
de_modfit_84_bundle5_3s_NoContraintsWithDisk200_1
de_modfit_85_bundle5_3s_NoContraintsWithDisk200_2
de_modfit_86_bundle5_3s_NoContraintsWithDisk200_1

If you have any trouble with these runs, please let me know.

Thank you all for your continued support.

Jake
78) Message boards : News : Nbody release v1.74 (Message 67954)
Posted 20 Dec 2018 by Jake Weiss
Post:
Looks like everything is all fixed.

Give us an update if you see any issues throughout the next day or two.

Thank you for your patience,

Jake
79) Message boards : News : New Badges for Membership Time (Message 67834)
Posted 5 Oct 2018 by Jake Weiss
Post:
Wow, looks like he's already at 10 years. That's incredible. I will have a meeting with Sidd, Jeff, and Eric to brainstorm new membership time badge ideas.

Jake
80) Message boards : News : Database Maintenance 9-4-2014 (Message 67797)
Posted 7 Sep 2018 by Jake Weiss
Post:
Hey Everyone,

Just wanted to explain the validation inconclusive rates after this maintenance, and most maintenance. This is not an issue with how quickly our database can handle the workunits, but how quickly our users can cross validate workunits. As many of you know, we can require up to 4 other users to cross validate workunits before they are considered valid.

At any given time, we have 300,000+ workunits being calculated by volunteers. When we take the server down for maintenance, many of these are completed while we are down. They are then all sent back to us at once to be validated, so we end up with a queue of 300,000 workunits or more that have to be validated by other volunteers to catch up. This doesn't require much work for our server or database, but it does take a long time for users to work through them all.

Sorry that there is such a backlog for validation.

Jake


Previous 20 · Next 20

©2024 Astroinformatics Group