Welcome to MilkyWay@home

Posts by HRFMguy

1) Message boards : Number crunching : Top GPU Models ranklist seem not accurate (Message 74697)
Posted 10 days ago by ProfileHRFMguy
Post:
This is one of the reasons that this thread was created...https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3551 (New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new!)
2) Questions and Answers : Wish list : Request for visualization of data as graphics (Message 74629)
Posted 3 Nov 2022 by ProfileHRFMguy
Post:
I was wondering if it would be possible to add some graphics to the project.

Nothing too fancy, but what would be really neat is a general view of the Milky Way Galaxy, and within, the program shows where in relation to Earth and the Galaxy, the current computer work is analyzing.

There are probably other priority items your team is focusing on. However, having a visualization showing where my computer is helping out in the Galactic Map would be a great, useful screensaver to have running!

Thoughts?
+1
3) Message boards : Number crunching : Daily graphs of server_status (Message 74628)
Posted 3 Nov 2022 by ProfileHRFMguy
Post:
WOW! This validation waiting graph is looking just like a hedge fund ladder attack against a meme stock!!
4) Message boards : Number crunching : Run Multiple WU's on Your GPU (Message 74576)
Posted 26 Oct 2022 by ProfileHRFMguy
Post:
Yes, your correct.

Based on the other discussions I try to write a script that will work around the limit by pausing the networking
Did this script ever get put into play?
5) Message boards : Number crunching : Daily graphs of server_status (Message 74490)
Posted 18 Oct 2022 by ProfileHRFMguy
Post:
Yes, it's obvious that the computers at Milkyway are not able to keep up with the WU's we are producing. I don't understand why they are not able to keep up since we are seemingly not finishing anymore now than when it suddenly stopped being able to keep up. The System always says everything is running and I've heard very little from Tom on the subject.
Wish we knew what is wrong.


The workunit generators don't generate tasks (wingman tasks or initial tasks) if the WU pools have more tasks than they should. So when the nbody pool had like 100k tasks in it, any tasks that you sent back were essentially put on hold because the WU generator wouldn't make any wingman tasks for validation until the pool was cleared.

It's not so much that the "computer can't keep up" as it is that the WU pool being overfilled is a nasty bug. I want to migrate to the new server hardware ASAP but other people have to get things compiled and sorted out before that can happen. Currently, the server is slow becaus eit is short on memory, so in that regard the computer can't keep up.



Thanks Tom, really appreciate the update info. Is the memory the type that we can help you with or is it "special" and only something we can contribute money for you to buy or is it even worth it, since you are talking about migrating to a new server anyway? It certainly will be nice when everything is updated and running smoothly again.
Thanks for your great efforts!!

Allen
+1 What Allen said. How much $$ to bulk up the memory? If the memory is upgraded after the new sever gets online, the old one can be used as a backup, or it can be run in parallel with he new one. Maybe they could split the load 50/50. I presume you are going to need more hardware for the multi-galaxy sims anyway, so there's also that....
6) Message boards : Number crunching : Daily graphs of server_status (Message 74477)
Posted 17 Oct 2022 by ProfileHRFMguy
Post:
I guess its time to do daily postings of what the graph looks like for workunits and discussions(?):
Yes, please do.
7) Message boards : News : Server Maintenance 12:00 PM ET (16:00 UTC) 9/23/2022 (Message 74447)
Posted 14 Oct 2022 by ProfileHRFMguy
Post:
I think I'm cursed so that whenever I travel, the server begins to time out. I was on a plane most of yesterday so I wasn't looking at milkyway, and then I got an email this morning telling me it had downages in the middle of the night.

This morning the server seemed to be running just fine again, but I restarted some processes and flushed the DB just in case. It all seems fine on my end, and the numbers look like they're improving.

We're very close to breaking through that 1k nbody task waiting limit at which point a these validation waiting WUs should begin to clear out.
Can you run a script once per week to do what you just did? Would that help?
8) Message boards : Number crunching : Validation Pending too many tasks (Message 74435)
Posted 13 Oct 2022 by ProfileHRFMguy
Post:
Really? I haven't seen it yet and just had to manually request tasks as the queue emptied out. Even with an empty queue I haven't been getting the max 300 tasks like before, just got 224 and even less before.
Well krap. Just watched it count down to zero and had to tickle it to send more work. Earlier today I was characterizing the AMD GPU by changing the app config file. Maybe that is what did it. I must confess, I did go all happy feet over it. At least I got the 300 WUs.
9) Message boards : Number crunching : Validation Pending too many tasks (Message 74429)
Posted 12 Oct 2022 by ProfileHRFMguy
Post:
Cheer up folks! It's only gonna get worse! The dead time between separation GPU reloads is gone! Wohoo! I seem to be getting new work at about 125 or so left to finish. Noice, so to speak.
10) Message boards : Number crunching : New Benchmark Thread - times wanted for any hardware, CPU or GPU, old or new! (Message 74410)
Posted 10 Oct 2022 by ProfileHRFMguy
Post:
Anybody using a Firepro S9150? Ebay has a few for $80 now. Or an S9170?

Theoretical performance is 2534 Gflops. My R9 280x is 1178, or about 1/2 as much.

Given as much cpu as is needed to keep it running full load, can I get 2X the work of the R9 280x(1 WU every 30 seconds)?

Also, TDP is slightly less for the S9150.
11) Message boards : Number crunching : Validation Pending too many tasks (Message 74409)
Posted 10 Oct 2022 by ProfileHRFMguy
Post:
I wish someone would give the Validator Server a swift kick to get a move on!! ...

... the admins still have sore feet from last time ...
And while they are at it, kick start the separation generator also. The well has run dry there.....
12) Message boards : News : News General (Message 74316)
Posted 29 Sep 2022 by ProfileHRFMguy
Post:
What about recompiling server?


The binaries will all need to be recompiled on the new server hardware. I have been keeping a list of some of the changes people have requested to our server software, but if you have suggestions feel free to share them here! When we recompile, it makes sense to do things like change the task pool sizes, remove the sleep loop in the transitioner, etc.
I think an option to get "wingman only" tasks would be nice. Hopefully that can be done by some settings at compile time.
13) Message boards : News : Server Maintenance 12:00 PM ET (16:00 UTC) 9/23/2022 (Message 74287)
Posted 26 Sep 2022 by ProfileHRFMguy
Post:
... (that's been done before.....)

I think that was a different problem, than we are having now?
Hopefully! That last one was a nightmare! But I'm willing to move over if need be.
14) Message boards : News : Server Maintenance 12:00 PM ET (16:00 UTC) 9/23/2022 (Message 74285)
Posted 26 Sep 2022 by ProfileHRFMguy
Post:
And, if we shouldn't ignore it, should we dog-pile onto n body to clear out the backlog? (that's been done before.....)
15) Message boards : MilkyWay@home Science : Science Summary (Message 74282)
Posted 25 Sep 2022 by ProfileHRFMguy
Post:
not that I am aware of....
16) Questions and Answers : Unix/Linux : boic sees 2 GPU's but only uses 1 (Message 74276)
Posted 24 Sep 2022 by ProfileHRFMguy
Post:
@cat22

any luck?
17) Questions and Answers : Unix/Linux : boic sees 2 GPU's but only uses 1 (Message 74242)
Posted 22 Sep 2022 by ProfileHRFMguy
Post:
I am no xml expert, but I think that gets you 2 instances running on one GPU, but does not address the second GPU at all.

So, how should it be done?
Within the app_config file, put in two app version sections, like I did below. Mine just happened to be 1 each of AMD and NVIDIA, but yours will be two NVIDIA. Try this and see if it works. Of course you will need to adjust ncpus and ngpus for each app_version to suit your particular situation. Just for grins, start with 1 each cpu and 1 each gpu for each app_version. If it works, then fine tune from there. Good luck!

<app_config>
<app_version>
<app_name>milkyway</app_name>
<plan_class>opencl_ati_101</plan_class>
<avg_ncpus>0.866</avg_ncpus>
<ngpus>0.333</ngpus>
</app_version>
<app_version>
<app_name>milkyway</app_name>
<plan_class>opencl_ati_101</plan_class>
<avg_ncpus>0.866</avg_ncpus>
<ngpus>1</ngpus>
</app_version>
<!--Your comment-->
</app_config>
18) Questions and Answers : Unix/Linux : boic sees 2 GPU's but only uses 1 (Message 74233)
Posted 22 Sep 2022 by ProfileHRFMguy
Post:
nope. windows 10 pro.
19) Questions and Answers : Unix/Linux : boic sees 2 GPU's but only uses 1 (Message 74232)
Posted 22 Sep 2022 by ProfileHRFMguy
Post:
I am no xml expert, but I think that gets you 2 instances running on one GPU, but does not address the second GPU at all.
20) Questions and Answers : Unix/Linux : boic sees 2 GPU's but only uses 1 (Message 74227)
Posted 21 Sep 2022 by ProfileHRFMguy
Post:
I have used this in the past with good luck. It yields 3 parallel instances running on AMD GPU, and 1 instance on the NVIDIA GPU.

<app_config>
<app_version>
<app_name>milkyway</app_name>
<plan_class>opencl_ati_101</plan_class>
<avg_ncpus>0.866</avg_ncpus>
<ngpus>0.333</ngpus>
</app_version>
<app_version>
<app_name>milkyway</app_name>
<plan_class>opencl_nvidia_101</plan_class>
<avg_ncpus>0.866</avg_ncpus>
<ngpus>1</ngpus>
</app_version>
<!--Your comment-->
</app_config>


Next 20

©2022 Astroinformatics Group