| log in |
Message boards : News : feel free to cancel any in progress WUs
| Author | Message |
|---|---|
|
Looks like I'm going to have to drop the result and workunit tables to get the database working again. Feel free to cancel any workunits you have in progress. I apologize for this but it's looking like it's the only way to get the project back on it's feet in any reasonable amount of time. | |
| ID: 51312 | Rating: 0 | rate:
| |
|
Getting: 10/9/2011 3:24:06 PM|Milkyway@home|Message from server: Project is temporarily shut down for maintenance Also the Server & Task page don't want to load. Related I'm sure. ____________ Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. | |
| ID: 51313 | Rating: 0 | rate:
| |
|
Just wondering if - in the future - the number of tasks/WUs a person can have cached can be increased, so we can keep working while the project is down? As it is, I have been without work for about 3 days now. Thanks. | |
| ID: 51314 | Rating: 0 | rate:
| |
Getting: its been ~2 hours since you posted - server status page is back up, as are all the other MW@H web pages. ____________ | |
| ID: 51315 | Rating: 0 | rate:
| |
Looks like I'm going to have to drop the result and workunit tables to get the database working again. Feel free to cancel any workunits you have in progress. I apologize for this but it's looking like it's the only way to get the project back on it's feet in any reasonable amount of time. I have a full set of work units which are at "Ready to Report" status. Does this mean I am not going to get credit for them? | |
| ID: 51316 | Rating: 0 | rate:
| |
Looks like I'm going to have to drop the result and workunit tables to get the database working again. Feel free to cancel any workunits you have in progress. I apologize for this but it's looking like it's the only way to get the project back on it's feet in any reasonable amount of time. Im in the same position. | |
| ID: 51317 | Rating: 0 | rate:
| |
|
I only have 11 in the ready to report stage, of course you cannot abort those as they have already completed. | |
| ID: 51318 | Rating: 0 | rate:
| |
Just wondering if - in the future - the number of tasks/WUs a person can have cached can be increased, so we can keep working while the project is down? As it is, I have been without work for about 3 days now. Thanks. Don't you have a backup project? ____________ | |
| ID: 51319 | Rating: 0 | rate:
| |
Just wondering if - in the future - the number of tasks/WUs a person can have cached can be increased, so we can keep working while the project is down? As it is, I have been without work for about 3 days now. Thanks. If we increase the cache, then this type of database crash would happen significantly more often. We just don't have a powerful enough server to increase the size of the workunit and result tables that much. ____________ | |
| ID: 51320 | Rating: 0 | rate:
| |
Looks like I'm going to have to drop the result and workunit tables to get the database working again. Feel free to cancel any workunits you have in progress. I apologize for this but it's looking like it's the only way to get the project back on it's feet in any reasonable amount of time. Sadly, that's the case. :( It was taking about an hour to do a single query on the result table -- which is why everything was brought to a screeching halt. The only way I could get things responsive again was to clear the result and workunit tables. I'm going to have to lower the time workunits are kept in teh database, I think the number was too high and that's what caused the result table to get too large, corrupt and then crash the whole project. ____________ | |
| ID: 51321 | Rating: 0 | rate:
| |
|
What time limit are you going to try? | |
| ID: 51322 | Rating: 0 | rate:
| |
Just wondering if - in the future - the number of tasks/WUs a person can have cached can be increased, so we can keep working while the project is down? As it is, I have been without work for about 3 days now. Thanks. I originally started with SETI and stayed with them for many years. However, about 18 months ago, their application kept locking up and I started doing MW as my backup...then I quit SETI altogether due to the probs. I thought I had found a good project with MW ... Guess I'm going to have to look around and see what else I can find to give this machine something to do. | |
| ID: 51323 | Rating: 0 | rate:
| |
|
You could try world community grid. They have a bunch of projects such as searching for a drug for a certain tropical disease (sorry it's a pain to recall the spelling) that infects several million people a year mostly children (no vaccine) existing treatments can be fatal or with serious side effects. | |
| ID: 51324 | Rating: 0 | rate:
| |
What time limit are you going to try? right now 20% of a day, so about 5 hours. ____________ | |
| ID: 51325 | Rating: 0 | rate:
| |
If we increase the cache, then this type of database crash would happen significantly more often. We just don't have a powerful enough server to increase the size of the workunit and result tables that much. Oy.. :( What's your DB size and especially result table size when you start having problems? And I take it you can't increase the work size so there's more work to do for people for less result/workunits in the DB? | |
| ID: 51326 | Rating: 0 | rate:
| |
|
So, what's the status on getting things flowing again? | |
| ID: 51327 | Rating: 0 | rate:
| |
So, what's the status on getting things flowing again? Sometime tomorrow it's looking like. Things aren't quite ready yet and I do need to try and get some sleep. :( ____________ | |
| ID: 51328 | Rating: 0 | rate:
| |
|
Seems like I have done a ton of work for this project only to have my results say that the result is being checked,or some such thing,and no credit is given. Happens WAY too much | |
| ID: 51329 | Rating: 0 | rate:
| |
|
We should donate something to geta new server for MW. This project stable would be good any days O.o | |
| ID: 51330 | Rating: 0 | rate:
| |
|
If you are looking for another Astro project, consider Einstein (assuming you are running CPU projects). If you are running an ATI GPU configuration, there are a few options - but not astro oriented -- Collatz -- excellent stability there though the credit payout is lower than MW (for what its worth), or Moowrapper or Dnet. Those projects also work with CUDA GPU's as well (and all three of them do NOT require double precision GPU's).
____________ | |
| ID: 51331 | Rating: 0 | rate:
| |
(snip)... We just don't have a powerful enough server to increase the size of the workunit and result tables that much. What. Don't you have some nice NSF grant money you could throw at it? ;-) But seriously, what kind of hardware would we be talking about? (please be specific) I've already spent $1500 USD on my first cruncher, specifically buying double precision Radeons because I liked this project. If we're talking about a single machine built from off-the-shelf parts, I bet if you asked around the boards, enough people would rise to the occasion to make it happen. I'd be willing to get the ball rolling with, say, $500 USD. Just chip in what you can folks. Really. | |
| ID: 51332 | Rating: 0 | rate:
| |
|
This is frustrating | |
| ID: 51333 | Rating: 0 | rate:
| |
|
DC has a big problem. Volunteer computers have increasing capabilities (faster multicore CPU's and faster GPU's), project hardware was designed for lower traffic and less wu throughput. | |
| ID: 51334 | Rating: 0 | rate:
| |
|
Well I don't know anything about the topologie of the MW - WU so it is just a quess. If an increase of the WU count is not possible. Maybe it's possible to increase the Size of the WU's. | |
| ID: 51335 | Rating: 0 | rate:
| |
Well I don't know anything about the topologie of the MW - WU so it is just a quess. If an increase of the WU count is not possible. Maybe it's possible to increase the Size of the WU's. good point. Collatz gives you the choice to select 'Collatz' or 'Mini Collatz'. But with larger wu's a checkpointing must be implemented. | |
| ID: 51336 | Rating: 0 | rate:
| |
Oohh yes im struggeling too, to feed my (mostly NVIDIA) machines with Projects with real scientific sense (Hard with ATI Card cos there is only MW. Soon GPUGrid :)) *sign* I tried to donate but it does not accept my creditcard? O.o ____________ DSKAG Austria Research Team: http://www.research.dskag.at | |
| ID: 51337 | Rating: 0 | rate:
| |
|
DNETC and Moo Wrapper are running OK for ATI cards. Useful when MW is down. | |
| ID: 51339 | Rating: 0 | rate:
| |
DNETC and Moo Wrapper are running OK for ATI cards. Useful when MW is down. Too bad they don't run under BOINC manager. ____________ | |
| ID: 51340 | Rating: 0 | rate:
| |
|
hm? why? how do you mean that? I have dnetc as backup backup backup emergency energywasting project when SETI Backup offers no Astropulse on the MW Machine. Normaly over BOINC Manager.. | |
| ID: 51341 | Rating: 0 | rate:
| |
... Since 0.82 checkpointing is implemented. | |
| ID: 51342 | Rating: 0 | rate:
| |
|
10/10/2011 2:38:27 AM | Milkyway@home | Restarting task ps_separation_82_2s_mix0_1_3396869_0 using milkyway version 88 | |
| ID: 51343 | Rating: 0 | rate:
| |
hm? why? how do you mean that? I have dnetc as backup backup backup emergency energywasting project when SETI Backup offers no Astropulse on the MW Machine. Normaly over BOINC Manager.. You can't just attach to DNETC and Moo Wrapper via BOINC manager. ____________ | |
| ID: 51344 | Rating: 0 | rate:
| |
|
I for one would also be in favor of _much_ bigger workunits for GPUs (I would say roughly 100x bigger). As it is now, the turnaround time is ridiculously short: around every 1 - 2 min the server needs to be contacted for a new WU. And that is for a single GPU. No wonder your server cannot keep up. A side effect is that MW gets completely bullied by the backup projects. I get a maximum of roughly 25 - 35 min of work in my cache. So every time the server is unresponsive for that amount of time (and that happens quite often) my backup project immediately dumps 20 hours of work on me. If that would happen once every day (we are not very far off), I would be running 20 hours of backup project and only 4 hours of MW per day. I too have an ATI GPU, so the choices for backup projects are very limited and I find them all more or less useless, so I really don't want to be running these backup projects at all... | |
| ID: 51345 | Rating: 0 | rate:
| |
So every time the server is unresponsive for that amount of time (and that happens quite often) my backup project immediately dumps 20 hours of work on me. If collatz is your backup project, you can set the resource share to 0. This means, only one wu / gpu will be picked up. When that one finishes, the next one (again a single wu) is downloaded. | |
| ID: 51346 | Rating: 0 | rate:
| |
hm? why? how do you mean that? I have dnetc as backup backup backup emergency energywasting project when SETI Backup offers no Astropulse on the MW Machine. Normaly over BOINC Manager.. About DNETC, yes, you can attach to BOINC Manager because i was testing it last couple days and i had it attached to BOINC Manager, tho, i didn't found much information about what exactly i was processing, so i kinda abandoned for now DNETC, about Moo Wrapper i have no info how it can or cannot be attached to BOINC Manager. So far, i'm crunching PrimeGRID as backup project for MW@Home... ____________ | |
| ID: 51347 | Rating: 0 | rate:
| |
|
I second longer WU, but one problem is the mix of CPU and GPU for crunching. Increase the WU-length could make it impossible for CPU to crunch for M@W in a reasonable time. So the decision have to be made, if M@W is going to be a GPU-project only. | |
| ID: 51348 | Rating: 0 | rate:
| |
If collatz is your backup project, you can set the resource share to 0. This means, only one wu / gpu will be picked up. When that one finishes, the next one (again a single wu) is downloaded. I have Primegrid as my backup, it is the only backup project that runs on an ATI and I consider to be at least vaguely useful... If you set the resource share to zero, it only makes the project the backup. It does not limit the number of WUs that are downloaded once the backup kicks in. I think backup projects should work the way you describe, but they don't. I checked on the BOINC site. There is no way to force BOINC to only download a single WU at a time. | |
| ID: 51349 | Rating: 0 | rate:
| |
Increase the WU-length could make it impossible for CPU to crunch for M@W in a reasonable time. Is it strictly necessary that GPU and CPU WUs do the same amount of work? If so, then you will always have a problem since GPUs are so much faster... But I am not convinced that they need to be of the same size... | |
| ID: 51350 | Rating: 0 | rate:
| |
hm? why? how do you mean that? I have dnetc as backup backup backup emergency energywasting project when SETI Backup offers no Astropulse on the MW Machine. Normaly over BOINC Manager.. Then plz explain it to me how i done it with DNETC when you know it that exactly ... ;) ____________ DSKAG Austria Research Team: http://www.research.dskag.at | |
| ID: 51351 | Rating: 0 | rate:
| |
Increase the WU-length could make it impossible for CPU to crunch for M@W in a reasonable time. This is a question which must be answered by the project scientists. At present it seems to be a must. ____________ | |
| ID: 51352 | Rating: 0 | rate:
| |
Sorry, no. I use collatz as backup, I use ATI-Cards, and it's always only one wu that is downloaded. It depends on the projects server version, but at collatz this feature works perfect. | |
| ID: 51354 | Rating: 0 | rate:
| |
I finished a WU last night and it went out. Didn't see any changes so THEN I read this post. Am I to understand that task is LOST? Yes. ____________ Doesn't expecting the unexpected make the unexpected the expected? If it makes sense, DON'T do it. | |
| ID: 51355 | Rating: 0 | rate:
| |
On that note... This is my idea of a budget (best value/cost ratio) server. All prices are USD, quoted from US retailer Newegg.com on Oct 10, 2011(I do not work for them) I think we as a community could come up with this. I will put up $500 seed money. I have started a seperate thread for this in the cruncher section. All feedback, especially from those with experience with server configurations, is welcome. Not Included: Case/mounting hardware, power supply, raid hardware, which are all better left to whoever sets it up. Qty. Product Description Total Price 1 ASUS KGPE-D16 SSI EEB 3.61 Server Motherboard Dual Socket G34 AMD SR5690 DDR3 800/1066/1333 Item #: N82E16813131643 $429.9 2 AMD Opteron 6128 Magny-Cours 2.0GHz Socket G34 115W 8-Core Server Processor OS6128WKT8EGOWOF Item #: N82E16819105266 $499.98 16 Kingston 8GB 240-Pin DDR3 SDRAM DDR3 1333 ECC Registered w/ Parity Server Memory Model KVR1333D3Q8R9S/8G Item #: N82E16820139280 $1,215.84 Though i would discourage it, if we can get by with 64GB, we could save ~$600 2 Mushkin Enhanced Chronos Deluxe MKNSSDCR120GB-DX 2.5" 120GB SATA III MLC Internal Solid State Drive (SSD) Item #: N82E16820226225 $499.98 System/boot drive. 2x120GB is for possible raid or swap/pagefile. If neither is wanted we could get 1x240 for the same price or just save ~$250 2 Western Digital AV-GP WD20EURS 2TB SATA 3.0Gb/s 3.5" Internal Hard Drive -Bare Drive Item #: N82E16822136783 $189.98 I don't know if this application needs much local storage, but its cheap. 2 Thermaltake CLS0015 70mm 1 Ball, 1 Sleeve CPU Cooler for AMD Socket G34 1U Item #: N82E16835106158 $71.98 ------------------------ $2,907.75 | |
| ID: 51356 | Rating: 0 | rate:
| |
Actually you are both correct, the difference lies in the BOINC manager version that you are running. The 6.10.xx series requests a chunk of work and it gets multiple work units. The 6.12.xx series requests 1 work unit per idle resource. I have Collatz as my backup project for the HD5830 and it only maintains 1 work unit at a time right now. ____________ | |
| ID: 51359 | Rating: 0 | rate:
| |
hm? why? how do you mean that? I have dnetc as backup backup backup emergency energywasting project when SETI Backup offers no Astropulse on the MW Machine. Normaly over BOINC Manager.. I don't know. It's not listed for me when I go to attach a project. I guess there was no good news to report today... ____________ | |
| ID: 51361 | Rating: 0 | rate:
| |
|
Much Projects are not listend in there. Thats nothing new ;) Try DNETC.org (or .net) as projectURL. One of them work. | |
| ID: 51365 | Rating: 0 | rate:
| |
|
It isn't listed there (that might be a developer choice at BOINC central), but you can attach by this sequence in the BOINC client.
I don't know. It's not listed for me when I go to attach a project. I guess there was no good news to report today...[/quote] ____________ | |
| ID: 51367 | Rating: 0 | rate:
| |
|
I was following this discussion yesterday, since I completed a nice unit and it went out to you, only to discover belatedly that you were down. | |
| ID: 51374 | Rating: 0 | rate:
| |
|
I am gauging the cost of an appropriate server right now. | |
| ID: 51376 | Rating: 0 | rate:
| |
|
nevermind. | |
| ID: 51392 | Rating: 0 | rate:
| |
|
Travis, we had problems in the past then you came with the idea to clear the database very quick, so most of us could not see the results page. | |
| ID: 51394 | Rating: 0 | rate:
| |
Travis, we had problems in the past then you came with the idea to clear the database very quick, so most of us could not see the results page. He already has actually. http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2621&nowrap=true#51321 I'm going to have to lower the time workunits are kept in teh database, I think the number was too high and that's what caused the result table to get too large, corrupt and then crash the whole project. ____________ | |
| ID: 51404 | Rating: 0 | rate:
| |
Message boards :
News :
feel free to cancel any in progress WUs