Welcome to MilkyWay@home

Posts by Keith Myers

1) Message boards : Number crunching : Delay in getting new work units untill all work units have cleared (Message 69133)
Posted 17 days ago by ProfileKeith Myers
Post:
JStateson tried the Windows AppVeyor artifact for the 7.15.0 development client at my suggestion and ran into the same issue of work unit starvation. The work_fetch.cpp module (4/20/2019) that controls work fetch has not changed between the 7.15.0 development branch and the latest 7.16.2 development branch in the past 5 months and which is scheduled to go mainline hopefully soon once the translations are finished. There may be more interactions with the other modules in the client involved with work fetch that have been patched or changed in the current development branch that may fix the issue. But I have my doubts.

I believe there are still server side misconfigurations that the new expected client will not fix.
2) Message boards : Number crunching : Delay in getting new work units untill all work units have cleared (Message 69127)
Posted 17 days ago by ProfileKeith Myers
Post:
My understating is the gridcoin people are allowing miners to keep their membership in local clubs

Not fully implemented yet. Still in beta testing for a few outside teams. Going well. Probably won't have the new general release open team clients available till the first of the new year.
3) Message boards : Number crunching : Rx570 vs. gtx 1080, 1080ti, 2080 (Message 69115)
Posted 21 days ago by ProfileKeith Myers
Post:
Are there other projects that heavily utilize fp64 as well?

I am only aware of MilkyWay with the requirement for FP64. Pretty much every other project only needs single precision.
4) Message boards : Number crunching : WUs not downloaded in time - rig is idling - doing no work ... (Message 69103)
Posted 23 days ago by ProfileKeith Myers
Post:

The Windows build artifacts are over at AppVeyor.
https://ci.appveyor.com/api/buildjobs/4bvvgoug1ej0x5mh/artifacts/deploy%2Fwin-client%2Fwin-client_master_2019-09-18_15ffc98a.7z


Just got around to downloading. It is 7.15.0

does that have the new feature that 16.1 has?

Going to let it run for a while and see what happens.

thanks!

The client is up to 7.16.2 now and has about 60 more commits from the master added to it. A lot of polishing. The close with the red X issue seems to be fixed now. From what I hear the only thing hanging up this as the new master release is the translations are still waiting to come in.
5) Message boards : Number crunching : Isn't this a waste of my CPU resources? (Message 69095)
Posted 24 days ago by ProfileKeith Myers
Post:
Today is 20 September. In the BOINC Manager I see among others two MilkyWay tasks with the following data:

  • Task #1: progress 44.758%, elapsed 6d 02:33:44, remaining: 7d 12:50:53, deadline 24 September, 04:04:47
  • Task #2: progress: 20.537%, elapsed 3d 05:55:39, remaining: 12d 13:25:52, deadline: 25 September, 23:37:01


Does it make any sense to continue these tasks? After all, it looks like there is no chance that they will be completed before the deadline!

I still have similar problems with the tasks of the MilkyWay. Apart from MilkyWay, I also carry out tasks of several other projects (Asteroids, Einstein, SETI). No other project has such problems.

The two computers on which the MW tasks are calculated are more or less equally loaded, i.e. there are no periods of significantly higher load. What's more, one of these computers does nothing else almost all the time.

The problem is with the Milkyway@home Separation 1.46 application.

It looks as if the MilkyWay has a problem with the correct estimation of the complexity of tasks.
It seems to me that it makes no sense to calculate the tasks long after the deadline.


Your Quadro normally has been crunching tasks in 600 seconds. So the fact that the tasks you show have been running for 6 days, means the card has gone for a walkabout with the drivers. Reboot the host.
6) Message boards : Number crunching : WUs not downloaded in time - rig is idling - doing no work ... (Message 69094)
Posted 24 days ago by ProfileKeith Myers
Post:
Yes, it is hard for me to diagnose these issues as I have never seen the behavior described. But I don't solely run MW on my hosts. They always have around 900 tasks in their cache and just keep topping off to reach 900. The only time I ever ran out of tasks was when the project was offline and I crunched through the entire cache. But as soon as the project came back, the first scheduler connection started refilling back to 900. But I run a spoofed client compiled from the latest source so I don't know if that is what insulates me. I also don't remember this issue when I ran the bone stock 7.14.2 client either however.
7) Message boards : Number crunching : WUs not downloaded in time - rig is idling - doing no work ... (Message 69086)
Posted 25 days ago by ProfileKeith Myers
Post:
OK, that is encouraging. Still wondering where that 12 minute timeout is coming from. I doubt it is a hard coded delay in the client. Could it be that the reason was the one espoused by the project scientist that stated you could just be hitting the RTS buffer when it is empty?
8) Message boards : Number crunching : Rx570 vs. gtx 1080, 1080ti, 2080 (Message 69084)
Posted 25 days ago by ProfileKeith Myers
Post:
I seem to do the work units in around 100 - 140 seconds for both my 1080 and 2080. And around 90 seconds for my 1080 Ti.


Milkyway needs a strong FP64 GPU.. I don't have any nvidia but only AMD to compare.. a Radeon VII can complete 4WU in 40-45secs (10-11secs each) and a Radeon 7970/280x can complete 3WU in 100sec (33secs each), a 7950/280 needs 3-4 secs more

Yes, won't argue that the consumer Nvidia cards are deliberately FP64 crippled compared to the prosumer or research cards so they don't cannibalize the sales of Teslas and Quadros. If your primary project is MilkyWay, then an ATI/AMD card makes the most sense. I have just stayed away from ATI/AMD because of the challenge of installing the drivers and maintaining them. The Nvidia drivers just install and run with no issues ever. The ATI/AMD drivers are a complete fiasco as all the constant posts of issues posted attest in the forums. That is what I seem to spend most of my time in the forums trying to help users with ATI/AMD cards that won't run compute.

I know that AMD cards are cheaper than Nvidia cards but I sometimes wish I could just tell someone, dump the AMD card and get a Nvidia card and you will be up and running instantly without the headache of the AMD drivers. All I am saying is that the perceived deficit of FP64 on Nvidia is not that great in the end. Discounting a Radeon 7 of course.

And if your primary project is Seti, then it is a no-brainer to get Nvidia cards since the applications available for Nvidia and Linux are so much better and faster than any Windows based card or AMD card.
9) Message boards : Number crunching : Rx570 vs. gtx 1080, 1080ti, 2080 (Message 69083)
Posted 25 days ago by ProfileKeith Myers
Post:
I seem to do the work units in around 100 - 140 seconds for both my 1080 and 2080. And around 90 seconds for my 1080 Ti.

Thank you. This is exactly what I was looking for. How many workunits do you run at once on all 3 of those cards?
I know there are cards far better suited to this project that can wipe the floor with mine. Maybe I'll grab an r9 280x down the road. I also know that every little bit helps and I'm certainly not looking at getting into the top anything. I simply don't have the finances or physical space for that. I just wanted to make sure I could maximize output with what I do have.

I only run single tasks on each card. Primary project is Seti with the special app which requires running only singles on each card. So all my projects only run singles.
10) Message boards : Number crunching : Rx570 vs. gtx 1080, 1080ti, 2080 (Message 69082)
Posted 25 days ago by ProfileKeith Myers
Post:
And Windows does better than Linux (or Darwin), which is a bit unusual.

I disagree. Here is a task both run on a 1080 Ti. Mine on Linux and my wingman on Windows.
https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1803261603
My card did the task 10 seconds faster.

Are you running two cards? It looks like you have a RTX 2080.
But I have looked mainly at the N-body (many of them), and that was my conclusion there. Perhaps it does not hold true for the GPUs, and the difference in OS may not be so important.

I either run 3 cards or 4 cards on each host. The one I referenced has two RTX 2070's, one GTX 1080 TI and one RTX 2080.
11) Message boards : Number crunching : WUs not downloaded in time - rig is idling - doing no work ... (Message 69081)
Posted 25 days ago by ProfileKeith Myers
Post:
Sorry to hear that didn't accomplish anything. I thought give it a shot, what the heck. I assume that if you ran the sched_ops_debug flag in logging options and it shows the client is asking for 0 seconds of gpu work? Have you used the work_fetch_debug flag to see what the shortfalls are for each component and project?

I am beginning to think that the consensus opinion here that the server is misconfigured and doesn't allow requests for work at the same time work is reported is correct.
12) Message boards : Number crunching : Rx570 vs. gtx 1080, 1080ti, 2080 (Message 69074)
Posted 25 days ago by ProfileKeith Myers
Post:
And Windows does better than Linux (or Darwin), which is a bit unusual.

I disagree. Here is a task both run on a 1080 Ti. Mine on Linux and my wingman on Windows.
https://milkyway.cs.rpi.edu/milkyway/workunit.php?wuid=1803261603
My card did the task 10 seconds faster.
13) Message boards : Number crunching : Rx570 vs. gtx 1080, 1080ti, 2080 (Message 69073)
Posted 25 days ago by ProfileKeith Myers
Post:
I seem to do the work units in around 100 - 140 seconds for both my 1080 and 2080. And around 90 seconds for my 1080 Ti.
14) Message boards : Number crunching : Delay in getting new work units untill all work units have cleared (Message 69072)
Posted 25 days ago by ProfileKeith Myers
Post:
https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4424

Already discussed.

Can you try the latest 7.15.0 client artifact if you are running Windows. It has the most recent work_fetch module.

https://ci.appveyor.com/api/buildjobs/4bvvgoug1ej0x5mh/artifacts/deploy%2Fwin-client%2Fwin-client_master_2019-09-18_15ffc98a.7z
15) Message boards : Number crunching : WUs not downloaded in time - rig is idling - doing no work ... (Message 69071)
Posted 25 days ago by ProfileKeith Myers
Post:

The Windows build artifacts are over at AppVeyor.
https://ci.appveyor.com/api/buildjobs/4bvvgoug1ej0x5mh/artifacts/deploy%2Fwin-client%2Fwin-client_master_2019-09-18_15ffc98a.7z


Just got around to downloading. It is 7.15.0

does that have the new feature that 16.1 has?

Going to let it run for a while and see what happens.

thanks!

It is made from the current master which is 7.15.0 and has the same work_fetch.cpp module that is in the client_release/7.16.1. So in that regard they are same. Mainly the client has the commit for the bugfix I requested #3076
https://github.com/BOINC/boinc/commit/0b5bae4cc98660538b76842dea8b5cf4a16d06f6

There are other changes in 7.16.1 compared to the master 7.15.0 but in areas I don't think has an impact on the inability of the client to request work for work turned in at MW. That is why I suggested running one of the later artifacts that has the latest code in work_fetch.cpp. All I ask is someone run it and see if anything changes for work scheduling.

[Edit]Just wanted to point out that the changes to work_fetch.cpp didn't just cover the issue with using a max_concurrent statement. DA also implemented many lines of new code specifically calling rr_simulation routines. It's those routines that help determine the correct shortfall in work requested and the part that I think will have the greatest effect on work requested at MW.
16) Message boards : Number crunching : Delay in getting new work units untill all work units have cleared (Message 69060)
Posted 26 days ago by ProfileKeith Myers
Post:
The system administrators just recently updated the server software to version 1.04 during the summer and it took them a month to reconfigure everything to somewhat working condition again from the previous server code. Give them some time to figure things out again.

They will need to update again since the BOINC server code branch 1.20 has been released.
17) Message boards : Number crunching : de_modfit_80_bundle4_4s_south4s - error messages (Message 69059)
Posted 26 days ago by ProfileKeith Myers
Post:
Can't find clinfo.
The various batch files I tried to use for this do not work.
I'll just keep hunting for a solution.
Maybe just reinstall drivers.

I just had to mess with windows, because when I upgraded my system (mobo and the works) windows didn't recognize the system. So I had to use a win8.1 key and then "upgrade" to win10 which I already have. Who knows what files that wiped out.


https://boinc.berkeley.edu/dl/clinfo.zip
18) Message boards : Number crunching : WUs not downloaded in time - rig is idling - doing no work ... (Message 69053)
Posted 26 days ago by ProfileKeith Myers
Post:
Anybody running MW exclusively on the BOINC 7.16.1 client and see very different pull requests? There are significant changes to work_fetch.cpp that eliminate a lot of the previous issues.


I cannot find a window 7.16.1 client. Poked around over at GitHub but didn't find an executable. I can no longer run 18.04 with my AMD s9000 boards (a long and sad story) and do not have the expertise to do the cross compile of the 16.1 source to windows. You know where to find the windows 16.1?


Github contains the source code. It does not house any executables. You download the source code and compile it yourself.

The client 7.16.1 branch can be found with the client tag at github. Just click on the branch arrow and scroll down to the client_release/7.16.1
https://github.com/BOINC/boinc/tree/client_release/7/7.16

The Windows build artifacts are over at AppVeyor.
https://ci.appveyor.com/api/buildjobs/4bvvgoug1ej0x5mh/artifacts/deploy%2Fwin-client%2Fwin-client_master_2019-09-18_15ffc98a.7z
19) Message boards : Number crunching : WUs not downloaded in time - rig is idling - doing no work ... (Message 69051)
Posted 27 days ago by ProfileKeith Myers
Post:
Anybody running MW exclusively on the BOINC 7.16.1 client and see very different pull requests? There are significant changes to work_fetch.cpp that eliminate a lot of the previous issues.
20) Message boards : Number crunching : de_modfit_80_bundle4_4s_south4s - error messages (Message 69028)
Posted 13 Sep 2019 by ProfileKeith Myers
Post:
You don't have the OpenCL component of your video drivers installed. If you are getting your drivers from Microsoft, that is the reason. Get your drivers directly from your card vendor.


Next 20

©2019 Astroinformatics Group