Welcome to MilkyWay@home

Posts by marmot

1) Message boards : Number crunching : bad argument #0 to 'calculateEps2' (Expected 3 or 6 arguments) (Message 77053)
Posted 17 days ago by marmot
Post:
Well you are doing your part in retiring the "bad" tasks as fast as possible then. Thanks.

I realized that pushing thru till they get 4 errors, then retired, was probably best and kept running.

The nasty thing about this is the BOINC client stalls after so many bad WU's, I need to do a manual update to Milkyway@home then BOINC client wants to d/l the master file table before reporting the finished WU's and getting a new batch.
2) Message boards : Number crunching : bad argument #0 to 'calculateEps2' (Expected 3 or 6 arguments) (Message 77018)
Posted 20 days ago by marmot
Post:
It looks as if orbit-fitting tasks in the new data feed (04_03_2024 in the task name) also have the problem that the previous feed (03_29_2024) had. (And like the previous data-set, the task names don't have "orbit_fitting" in them this time, which might hint at what the previous problem was!)

I still have some retries for earlier dates and those are viable.

Cheers - Al.


Since you looked over the data set naming did you notice the data sets have a 4 day cycle (excluding weekends)?

Einstein longs are about 3 days. Switched to them and will check back Wednesday.

Off topic, is my Marmot avatar showing?
3) Message boards : News : Separation Application Shutting Down on Tuesday, Jun 20th (Message 76121)
Posted 2 Jul 2023 by marmot
Post:

All this mess (and wasting computing ressources) could have been avoided if they stopped Separation in the right time and the right way.


A month notice, not a week, as some of us only check projects on the weekends, especially during the summer.

I finished my new server and was able to grind out 15k hours of WU's by Wednesday.
Actually, running so many WU's per card, it completed more WU per day than when the GPUs were set at 8WU/card. (It has 96GB RAM compared to the 16GB in it's prior host that died during my rush and forgot to reattach the power cable to the remaining GPU. It shouldn't have died, but 2 power ons 2x GPU error beeps later the 1090T was overheating in 3 secs. RIP 1090t box... you served me so well.)
That was surprising.
In 2020, decided to only run 8 WU/card instead of 12, which got 5%-10% more RAC, but figured this subproject would go on years longer.
The 100k hours of contribution was coming too soon.

Probably said this before, but thankyou all for the Separation work.
4) Message boards : News : Separation Application Shutting Down on Tuesday, Jun 20th (Message 76120)
Posted 2 Jul 2023 by marmot
Post:
I keep a monthly electricity budget
Let me stop you there. A budget is simply a way of worrying before the event. I just wait for the bills. If something is too much, I ease off a bit.

And I refuse to save power. All they need to do is stick more wind farms up and it would be dirt cheap anyway.


Wind and solar need to store power somewhere for use over night or on low wind days.
Skeptics Guide to the Universe debates this all the time. We need nuclear still.
And we have gone through 50% of the 12 year carbon budget to avoid 1.5C average warming in 3 years since 2020.

Crypto, without actual useful work done, GPU's that are not energy efficient, these are starting to be an issue.
We do science and this electricity is well used.

But, like I said, downclocking gets the best work per Watt and I'm not going to rewire my house so I can run 6 boxes each with 4800Watt PS.
I make decision based on science and data.
5) Message boards : News : Separation Application Shutting Down on Tuesday, Jun 20th (Message 75726)
Posted 19 Jun 2023 by marmot
Post:
I was quite enjoying 280 amps of 12V GPU power. The other projects just don't use as much electricity. I bought four blade server supplies for this project - two 2600W and two 1300W.


I keep a monthly electricity budget, and there is a thermal budget for the house before cooling overwhelms the electricity budget. So my machines have a 2500W total winter budget.
Every video card I've been able to underclock to achieve a sweet spot with best credit/watt. The 280x, rated at 200W, did it's most efficient returns at about 40-50W.
4 of them, running at the same 200W, could produce 65% more work than a single card.
Although there's the startup price of running 4 cards instead of 1 and the technology shift to host that many cards, but that 8 card per CPU tech is widely available now.
6) Message boards : News : Separation Application Shutting Down on Tuesday, Jun 20th (Message 75723)
Posted 19 Jun 2023 by marmot
Post:
Noticed this was ending late Friday night and so spent from Saturday night till Monday morning trying to complete my 3x Mi25 machine.
Headache after headache until I dropped in bed, exhausted, about noon Monday.

Looks like I'll not make the 100k WuProp hours of Separation work started in Dec 2015.

At least the deadline got me to work on the build. The parts had been here since October.
Didn't seem important anymore after Julie died.

Thanks for giving my Sapphire Vapor-X R9 280X Tri-X OC a great long run.

That high FP64 count, and superb ability to under/over clock made it a perfect match for Milkyway@Home.
7) Message boards : News : Where MilkyWay@home Was The Last Few Days (Message 70797)
Posted 18 May 2021 by marmot
Post:
Was it a profit making attempt through encryption->ransom or cyber intelligence gathering?
8) Message boards : Number crunching : So??? Where's Milkyway@Home been the last few days? (Message 70796)
Posted 18 May 2021 by marmot
Post:
Please see this thread: https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4720#70788


Thanks.

I only keep up to date at these forums on project status.
9) Message boards : Number crunching : So??? Where's Milkyway@Home been the last few days? (Message 70782)
Posted 17 May 2021 by marmot
Post:
You just going to let us wonder?
10) Message boards : News : Multi-threaded N-body is back (Message 70735)
Posted 14 Apr 2021 by marmot
Post:
Question about credit:

For tests on my one machine (unhidden in preferences now https://milkyway.cs.rpi.edu/milkyway/results.php?hostid=803986&offset=0&show_names=0&state=4&appid=), which is a v1 2665 Xeon (down clocked for lower heat) Separation takes 18k seconds for 227 credit (not overbooked) while N -Body is about total CPU of 10k-12k and running between 35-44 credit.
Since credit is based on total run length, and not CPU run time, at some projects; as a test, I over booked the 16 thread N-Body so the run time averaged about 9k seconds (5k to 18k run lengths) and the credit didn't \vary from the aprox. 40 credit on the first, normal, N Body test runs (about 1300 second run times).

Shouldn't an N Body that comes close to 1 core usage have similar credit to Separation WU's?
On average this machines N-Body (overbooked) give about 65 credit per 18k seconds CPU usage while Separation gives 227 credit per 18k CPU usage.

I search the forums for this discussion but not even the Google based search seemed to find a prior discussion.
11) Message boards : Number crunching : Updated GPU Requirements (Currently not supporting GPU tasks) (Message 68773)
Posted 22 May 2019 by marmot
Post:

I'll try the latest Catalyst on the 7490m and see if some Milkyway WU's start up.
(Looks like AMD freezes it at either 15.7.1 WHQL 7/29/2015 or Crimson beta Edition 16.2.1 3/1/2016, it's currently on Catalyst 14.9)


Correction: It's a Radeon HD 7450m w/ OpenCL 1.2 capability and 1GB RAM.

Milkyway won't send GPU WU's to it.

So just being capable of OpenCL 1.2 isn't enough.
What's the other criteria?
12) Message boards : Number crunching : AMD FirePro S9150 (Message 68753)
Posted 19 May 2019 by marmot
Post:

SOLVED!!!



Great!

That's almost the perfect the MSI control panel (voltage control is missing, that might require custom BIOS mod) you want to see.


I have my S9x00 boards working fine with the 2015 driver and have been able to set the clock speed whereas I could not do that with AMD latest "Pro Series".
The driver coders seem to be exclusively concerned with currently selling GPU's and won't protect older card functionality on new drivers. especially ones with a low user base. Encourage people to buy new and keep up the myth on majority of discussion forums that it's always best to use the newest driver. [/rant]

I then downloaded and extracted AMD_OpenCL64.dll from both the 2018Q4 and the 2019Q2 and put those at \windows\system32 and also at \windows\SysWOW64
Looks like I am stuck with the correct driver but a 4 year old opencl library.
How is your invalid rate on the 2015 opencl and driver? Has your credit/second improved by 710mhz/511mhz=39% ?

If you have 200 runs on the 2019 driver/newest OpenCL, and know the average GPU clock speed (~511), then do another 200 runs at the improved average clock speed(~710) on the 2015 drivers, you could compare the performance adjusted by CLOCK1/CLOCK2 and see if the older drivers are slower than expectations.

It would be impossible to determine if the speed increase is from the base driver or improvements to OpenCL code if you can't keep the same driver while swapping OpenCL. Seems like there could be a way to trick the newer OpenCL to work.
13) Message boards : Number crunching : Long crunch time on new N-Body simulations? (Message 68752)
Posted 19 May 2019 by marmot
Post:
Well, this isn't very surprising since the new application is using only one single core.
Ok, true, but I think the estimated completion time is being under-estimated. I had some tasks that took 24-26 hours to complete, but I think they were originally estimated at 11 hours or so. I didn't pay close enough attention to know for sure if this is the case. I'll have to check upcoming tasks to see if this is really the case or not.


BOINC client keeps a running average of completion times to estimate completion and the old runtimes outweigh the new runtimes in the average.

I think if you set, in cc_config.xml, <rec_half_life_days>0</rec_half_life_days> then restart BOINC and run it for an hour then set it back to default 10 days <rec_half_life_days>10</rec_half_life_days>, you'll reset the running averages (of all WU's) and it should be close to the right number in 24 hours.

I don't usually worry about the estimate (it's usually always wrong) and so haven't tested this.
https://boinc.berkeley.edu/wiki/Client_configuration
14) Message boards : Number crunching : AMD FirePro S9150 (Message 68732)
Posted 15 May 2019 by marmot
Post:
I tried the latest MSI even the beta version and when I clicked on the APPLY checkmark the changes I made want back to the default.


I had to go to an older non-WHQL driver to get the Power control to functionality back on two of my cards. (Installing more than 2 years newer driver past the manufacturer date on a card increases the risk of planned obsolescence problems).
Tried about 9 drivers, 17.11.4 was the winner.

Do you see the custom fan profile tab under MSI configurations? I get the feeling you're not concerned with fan sounds. Forcing my cards fans to 100% by 61C, with a custom MSI fan profiles, has made all of their BIOS algorithms decide to delay actions to reduce heat, leading to higher clock speeds.
15) Message boards : Number crunching : AMD FirePro S9150 (Message 68729)
Posted 13 May 2019 by marmot
Post:

I show 550-650 and rarely hit the design of 825 as you can see in the graph.



That brings me back to my earlier question; does MSI (or Linux equiv app) properly downclock, downvolt or down power your S9100?

Do you have ability to control the clocks or power meter?

If the internal BIOS of a GPU decides it's temp is too high, it downclocks and downvolts.
If you raise the power meter percentage then it waits till higher temps to start downvolt/clocking.

The S9100, which is a server card, might be deciding to start at the 60C showing in the graph.
Increasing the power meter to +25% -> +50% might get it to peg top clock regularly.

If BIOS/driver, won't give you access to the power meter then custom fan profile with MSI, set to hit 80% fan speeds by 58C, 100% by 62C would convince the BIOS algorithm to let the clocks run higher as the card is forced to cool off sooner.
16) Message boards : Number crunching : warnings & errors (Message 68728)
Posted 13 May 2019 by marmot
Post:


Made the change but don't know how to confirm the thread count. Where would I look?


I use a task manager, like Process Hacker, and look at the CPU usage of the WU.

Since you have 24 cores, a single threaded process will use 1/24 ~= 4.1 % under CPU usage.

4 threads should be using ~16.6%. You can use the properties feature of many task managers to look at the internal threads.
You should see 4 identically named child processes taking up 4.1% CPU and have much CPU cycles of usage.



I thought the app_config would assign multiple cores to a single task? Perhaps speeding up the computations?

BOINC doesn't have an AI to perform that manipulation. On some projects you'll just have to choose properly in project settings, some projects default to multi-thread (YAFU, XANSONs, etc.), while some projects are capable of multi-threading but it's a hidden feature you need to use app_config to enable.
Keep an eye out for optimized applications that you can optionally add also. SETI has one which has some partial multi threading.



<ngpus>x</ngpus> - the number of GPU instances used by the app

If nbody is a CPU task then what is the purpose to adding this line in the app_config file?


Milkyway has GPU WU's and the app_config.xml covers all WU's under Milkyway so if/when you run some GPU WU, it'll be useful.


avg_ncpus tells BOINC how many threads to reserve for the job scheduling


This is number of CPU's you want each WU to attempt to use (where it might use less in certain phases). It's a per WU variable.


cmdline tells BOINC how many threads the application may use at maximum
Milkway Nbody tasks can run on any number of threads that you care to set for them, but do keep the numbers you use for avg_ncpus and nthreads equal.


What you had built in your first post was fine, just make sure the cmdline is '--nthreads 4'.

Lastly, changes to app_config.xml or cc_config.xml don't take effect until you go to the advanced menu on BOINC Manager and select options -> read config files and you'll see the WU's readjust to the new settings in a few seconds.
BOINC Manager has a bug that it won't show the changes in the work unit descriptions even though the changes take effect.
So, it's usually best to shut down BOINC and restart it unless (and there are some) a project's WU doesn't make proper save points and will lose hours of work or some rare WU's will die upon suspension/restart (not naming names, but one I know of using Oracle VM WU's).

To make an app_info.xml (for anonymous platforms) change take effect, you must restart BOINC.

This info will help you out on all projects.
Good you are asking.
17) Message boards : Number crunching : warnings & errors (Message 68719)
Posted 12 May 2019 by marmot
Post:
Looking at the task output I'm seeing some errors & warnings and don't know how to address them.

<search_application> milkyway_separation 1.46 Windows x86 double OpenCL </search_application>
Reading preferences ended prematurely
BOINC GPU type suggests using OpenCL vendor 'Advanced Micro Devices, Inc.'
Error loading Lua script 'astronomy_parameters.txt': [string "number_parameters: 4..."]:1: '<name>' expected near '4'
Switching to Parameter File 'astronomy_parameters.txt'
<number_WUs> 4 </number_WUs>
<number_params_per_WU> 26 </number_params_per_WU>


This error is in all my valid work units visible.

You can ignore it.



C:\Users\jnedd\AppData\Local\Temp\\OCL8644T1.cl:183:72: warning: unknown attribute 'max_constant_size' ignored
__constant real* _ap_consts __attribute__((max_constant_size(18 * sizeof(real)))),
^
C:\Users\jnedd\AppData\Local\Temp\\OCL8644T1.cl:185:62: warning: unknown attribute 'max_constant_size' ignored
__constant SC* sc __attribute__((max_constant_size(NSTREAM * sizeof(SC)))),
^
C:\Users\jnedd\AppData\Local\Temp\\OCL8644T1.cl:186:67: warning: unknown attribute 'max_constant_size' ignored
__constant real* sg_dx __attribute__((max_constant_size(256 * sizeof(real)))),
^
3 warnings generated.


I've seen the warnings before too, asked about them and told they can be ignored.


<cmdline>--nthreads4.0</cmdline>


Is this syntax giving you 4 threads?
Always seen this as integer input:
<cmdline>--nthreads 4</cmdline>


Also, looking in the project directory I no longer see the GPU configuration xml file for milkyway_1.46_windows_x86_64__opencl_ati_101. Is this not supported?


This, I do not know the answer to and would like to see the response.
18) Message boards : Cafe MilkyWay : Don't mess with us... (Message 68716)
Posted 10 May 2019 by marmot
Post:
we'll give you the plague...



https://www.washingtonpost.com/nation/2019/05/08/couple-ate-raw-marmot-believed-have-health-benefits-then-they-died-plague/?utm_term=.ac1354551c24
19) Message boards : News : New runs of MilkyWay Nbody out (Message 68715)
Posted 10 May 2019 by marmot
Post:
Great news, everyone! The app selection preferences are available again in the Project Preferences! You can now choose to opt out of Nbody or Separation. Thank you all for your help in getting this feature back!
-Eric


Good job!
Thanks
20) Message boards : Number crunching : So no way to select project campaigns anymore on the new server code (Message 68711)
Posted 9 May 2019 by marmot
Post:
I think Universe and others I don't remember right now.


All the LHC@home WU's can multithread,
YAFU is all multithreaded work.
Cosmology is VM (boinc2docker) multithreaded
Amicable Numbers CPU work is minimum 2 cores.

RakeSearch has a user created optimized app that's supposedly faster than mt.


Next 20

©2024 Astroinformatics Group