Welcome to MilkyWay@home

Posts by Keith Myers

1) Message boards : Number crunching : bad argument #0 to 'calculateEps2' (Expected 3 or 6 arguments) (Message 77058)
Posted 7 days ago by Profile Keith Myers
Post:
I concur. Only 11 bad tasks out of 360 today for 3% error rate. Ten of those I retired with "too many errors"
2) Questions and Answers : Preferences : how to adjust resource share? (Message 77057)
Posted 9 days ago by Profile Keith Myers
Post:
I processed a little over 100 OPNG tasks a day for the past month. As long as you keep asking for work every couple of minutes you are rewarded. 110 tasks today for the 9th.
3) Message boards : Number crunching : bad argument #0 to 'calculateEps2' (Expected 3 or 6 arguments) (Message 77048)
Posted 12 days ago by Profile Keith Myers
Post:
I think someone accidentally sent normal batch to orbit-fitting queue

+1
4) Message boards : Number crunching : Why èarn some WUs a very low credit? (Message 77041)
Posted 13 days ago by Profile Keith Myers
Post:
Tasks like this one have no usable work involved. So no or very little credit. 0.11 credits

https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=943807851

If you inspect the stderr.txt output you see this kind of statement.

Number of particles in bins is very small compared to total. (0 << 1). Skipping distance calculation
5) Message boards : Number crunching : How can I fix the estimated duration time? (Message 77039)
Posted 13 days ago by Profile Keith Myers
Post:
Yes, if client_state says don't use dcf, then you can't affect the estimated. Only the dev of the science app can change the task generator profile and put in a proper rsc_fpops estimate for the task.
6) Message boards : Number crunching : bad argument #0 to 'calculateEps2' (Expected 3 or 6 arguments) (Message 77027)
Posted 13 days ago by Profile Keith Myers
Post:
Well you are doing your part in retiring the "bad" tasks as fast as possible then. Thanks.
7) Message boards : Number crunching : bad argument #0 to 'calculateEps2' (Expected 3 or 6 arguments) (Message 77025)
Posted 13 days ago by Profile Keith Myers
Post:
By the date in the task name, that one was from an earlier run with good formatting. There are a few straggler resends from earlier that still process fine with the Orbit Fitting app.

But 90% of the Orbit Fitting tasks from the past two days are bad. If you choose to still run them, then your production is likely to take the same 90% hit from all the bad tasks relative to the few good tasks.

The _04_03_2024 named series are the bad ones. Eventually the _04_03_2024 series will error process out after four failure counts.

[Edit 1]
I'm re-enabling them now to see if the bad ones are still coming through or see if the they have error counted out

[Edit 2]
Evidently still coming through so toggling them off again.
8) Message boards : Number crunching : bad argument #0 to 'calculateEps2' (Expected 3 or 6 arguments) (Message 77016)
Posted 13 days ago by Profile Keith Myers
Post:
Just disable the Orbit-fitting app in your Project preferences here on the website. Just do the regular N-body app.

https://milkyway.cs.rpi.edu/milkyway/prefs.php?subset=project
9) Message boards : News : Admin Updates Discussion (Message 77011)
Posted 14 days ago by Profile Keith Myers
Post:
You can also just switch off Orbit-Fitting tasks in Preferences. The normal N-body tasks are fine.
10) Message boards : News : Admin Updates Discussion (Message 77009)
Posted 14 days ago by Profile Keith Myers
Post:
Sure, go ahead. Everyone is in the same boat.

Haven't received any response from admin Kevin either when I alerted him to the issue early yesterday.

The tasks are slowly being eliminated from issuance again after hitting the max 4 errors allowed.
11) Message boards : Number crunching : bad argument #0 to 'calculateEps2' (Expected 3 or 6 arguments) (Message 77004)
Posted 15 days ago by Profile Keith Myers
Post:
Yep, badly formatted tasks. All mine are erroring out too.
12) Message boards : Number crunching : scheduler request timeout when reporting more than 1 result at once (Message 77000)
Posted 16 days ago by Profile Keith Myers
Post:
This an OLD problem that has existed at MW since its inception. It is impossible to report a task and receive a replacement task in the same scheduler connection. It can't be done without employing custom clients and workarounds.

It takes two scheduler connections, first to report a completed task and then to receive the replacement task(s) on the next scheduler connection.

And when you have depleted your cache and reported the last task, MW forces the client into a mandatory 10 minute backoff before allowing a new scheduler connection to replenish your cache.

This was always the main complaint of high production clients when the Separation work was available.

Nothing has changed in the scheduler code but the longer running N-body tasks mostly eliminate the problem because no tasks finish faster than the default 91 second scheduler connection interval.

But your high production host has duplicated exactly the same issue that the Separation tasks caused.
13) Message boards : News : Admin Updates Discussion (Message 76951)
Posted 1 Mar 2024 by Profile Keith Myers
Post:
Likely the name changed for the tasks and that is why your app_config does not work anymore. If they are releasing BOTH the old N-body and whatever the new Orbit tasks are named, just use two app_version sections.
14) Message boards : Number crunching : GFLOPS backwards (Message 76942)
Posted 27 Feb 2024 by Profile Keith Myers
Post:
Thanks Kevin, appreciated.
15) Message boards : Number crunching : GFLOPS backwards (Message 76939)
Posted 22 Feb 2024 by Profile Keith Myers
Post:
A problem I have seen for a long time is the GFLOPS number for each work unit. The more GFLOPS the faster the unit runs!
For example:
4079 GFLOPS about 3 hours 30 minutes run time.
65284 GFLOPS about 20 minutes run time.

A side effect of this is that when one of those 65284 GFLOP work units is downloaded BOINC thinks it will take a full day to run and doesn't download anything else until it finishes.

You are correct. I've got one of each type running. The estimated GFLOPS/sec is the same for both tasks at 0.92 GFLOPS/sec.
But the 4079 GFLOPS task is going to run for 1 hour 50 minutes and the 65,284 GFLOPS task is only estimated to run 22 minutes.

[Edit] I pinged Kevin to this thread for his attention.
[Edit2] This is backwards from standard BOINC client convention that the task property of estimated GFLOPs reflects the total amount of computation power needed to crunch the task.

Disregarding BOINC's broken ability to properly calculate GFLOPS for gpus, it should get this correct for cpu computation power based on the benchmark profile capability of each host.
16) Message boards : News : Admin Updates Discussion (Message 76927)
Posted 15 Feb 2024 by Profile Keith Myers
Post:
Jerry wrote:
The process sending out tasks could trivially issue a quorum in rapid succession even if the number of workunits in the queue is large.
I agree. It seems desirable that those who have insight into the NBody validator review whether or not the current minimum quorum of 1 really makes sense: *If* it is very unlikely that a single result can be validated (or worse: actually impossible),¹ then NBody workunits should be configured to minium quorum = 2 (and initial replication = 2). That's not only for the users' sake, it should (if the mentioned condition is true) also reduce the database size somewhat as it should reduce the number of workunits waiting for validation.

________
¹) I for one have never spotted a workunit which was validated from a single task. Hence it seems to me that it is indeed highly unlikely or impossible.

Separation tasks were almost always validated by a single task on "trusted" hosts.
17) Message boards : Number crunching : HIgh thread count applications (Message 76915)
Posted 11 Feb 2024 by Profile Keith Myers
Post:
In another forum I read an article about applications with high thread counts are not as efficient. For example 16 thread count application will not finish twice as fast as an 8 thread count application. This got me thinking about how I might improving my throughput by running multiple applications at a lower thread count AND can I increase CPU utilization on some systems. One of my computers is a XEON E5 2678 with 24 "CPUs". Milkyway uses 16 by default but with a config file I can run two applications at a time with 12 CPUs apiece. Seems like a "no brainer". But how much did I gain? To "prove" that I would need 2 test files that had equal run times. First run sequentially, then run concurrently. Anyone here ever see any data like this? Pointers to other articles?

Grab a task file from a local computer and copy it to a temp folder along with the MW MT app. Stop BOINC. Open a command window and cd to the temp folder and run the MT executable with the task file as its input. The application will run in the task file with the default 16 threads.

Save the stderr.txt output in the folder for later comparison. Then delete the output result file, the boinc_finish file, the lockfile if any and the stdderr.txt file. Don't delete the task file.

The run the application again but this time run it with the num_threads 12 parameter after the executable name and before the task file name.

If the application won't accept the num_threads parameter directly, then just export the value to your environment.
export OMP_NUM_THREADS=12

Then compare the stderr.txt files by comparing the runtimes by subtracting the start time from the finish time to find the elapsed time for both runs. If the elapsed time times two of the num_threads 12 run is less than double the 16 threads run, the 12 threads run times two tasks will be best for production.
18) Message boards : News : Admin Updates Discussion (Message 76902)
Posted 9 Feb 2024 by Profile Keith Myers
Post:
Yes, I saw your post and the solution is perfect for the task. From the header text for that file.

// delete results without a corresponding workunit.
// (in principle these shouldn't exist)

That matches exactly the issue the database is suffering from.

Why hasn't this been done during one of the backup or maintenance evolutions the project has done since Separation ended?
19) Message boards : News : Admin Updates Discussion (Message 76890)
Posted 7 Feb 2024 by Profile Keith Myers
Post:
Kevin, your post https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=5069&postid=76876

mentions some users old Separation tasks were cleared.

What was the criteria?

Why haven't all Separation tasks been cleared from the database?

I've still got 2800 Separation tasks hanging on in Valid, Invalid and Error categories.
20) Message boards : Number crunching : Constant Validation Inconclusive Results (Message 76884)
Posted 7 Feb 2024 by Profile Keith Myers
Post:
Nobody is getting any credit since nobody has yet had a wingman result assigned to their pending tasks.

The estimate is that around mid-month they will finally get through all the created _0 tasks over-issued and finally be able to generate the wingman _1 results and then everyone's pending results will finally be validated and they will get credit.


Next 20

©2024 Astroinformatics Group