Welcome to MilkyWay@home

Posts by Nick Name

1) Message boards : Number crunching : Increase performance (watts)? (Message 70249)
Posted 18 Dec 2020 by Nick Name
Post:
Yes, MilkyWay requires FP64 (double-precision) compute capability. Nvidia's consumer cards have always been terrible here.
2) Message boards : Number crunching : Need help with linux and app_info (Message 69880)
Posted 31 May 2020 by Nick Name
Post:
hi,
The whole system is dedicated to boinc 24/7/365 so for the i9-99000k the answer is 16 cpu threads
I only want to commit whatever number of cpu threads are required buy each GPU application.
My goal is to run 2 concurrent GPU tasks - 1 task per card (which it seems to be doing now that i set the global cpu % down to 90%)
and have the remainder cpu resources crunching CPU tasks.
So if each nvidia app actually requires a full cpu thread to keep it fed, then the remaining 14 threads should be crunching CPU tasks
What bothers me is that by using the Global "Use at most xx CPU percentage" option I am affecting other projects where if i had a decent app_info.xml
or app_config.xml (whatever i need) it would only apply to Mikyway and leave the other (presently idle) projects alone. I would only be running a single project
not more than that concurrently. e.g I switched to MW only because SETI isnt handing out work while they manage an overwhelming amount of returned results.
TIA

I would expect setting 90% would use all 16 threads, 14 for CPU and 2 for GPU, if all you're running is MilkyWay. The % might need tweaked if it's not working as expected.

14/16 = 87.5
Set your % to 88 - generally it's best to round up rather than use a fraction.

You can also set CPU % to 100 and tweak the app_config. The following says to run one task on the GPU with CPU use set to one tenth of a thread. This should get both GPUs working if you have CPU % set to 100, for a total of 18 tasks. As stated above, this doesn't limit what it will actually use, but you can set it to manipulate BOINC scheduling.

<app_config>

 <app>
  <name>milkyway</name>
  <gpu_versions>
   <gpu_usage>1.0</gpu_usage>
   <cpu_usage>0.10</cpu_usage>
  </gpu_versions>
 </app>

</app_config>


Alternatively, set cpu_usage to 1 to keep BOINC from running more than 16 tasks total, and to make sure the GPU has a full thread available for support. You'd have to do some testing on your own to see what works best for you.
3) Message boards : Number crunching : Need help with linux and app_info (Message 69865)
Posted 28 May 2020 by Nick Name
Post:
After reading the thread a bit more closely, the question seems to be why the 2nd GPU is detected but not being used. app_config and app_info are irrelevant in that context. Judging by this:

CUDA: NVIDIA GPU 0: GeForce GTX 1660 Ti
CUDA: NVIDIA GPU 1: GeForce GTX 1660 Ti

Both cards are detected and both should work. This snippet from a job log, "Found 2 CL devices", shows that the MilkyWay app is seeing both cards so I think we can rule out a driver or weird OpenCL problem, or an exclusion in cc_config.

This is just a guess, but your CPU may not have an available thread to support a task on the second GPU. BOINC will typically over commit the CPU when running GPU work. If you've set BOINC to use all 16 threads, it will run 16 CPU tasks and at least one more GPU task. I don't know how much CPU the Nvidia app schedules but generally Nvidia OpenCL tasks take a full thread. I suggest reducing the number of threads BOINC can use and see if that solves the problem. Your GPU task run times are much longer than the corresponding CPU time, that's an indication the CPU is overtaxed so reducing the load is a good idea just to help with that.

Another possibility is you have a CPU project that's gone into high priority mode. If that's the case it's likely keeping the 2nd GPU from running because BOINC is trying to get that work done before the deadline. If this is what's happening, usually the best thing to do is lower your work cache, i.e. Store at least N days of work / Store additional N days, then give it some time to clear out.

I'd also remove the app_config until you get things working, you can delete or rename it and re-read the config files.
4) Message boards : Number crunching : Need help with linux and app_info (Message 69861)
Posted 27 May 2020 by Nick Name
Post:
It sounds like you need to enable all GPUs in your cc_config.
<use_all_gpus>0|1</use_all_gpus>
If 1, use all GPUs (otherwise only the most capable ones are used). Requires a client restart.

This file should be found in /var/lib/boinc. Edit with a standard text editor, setting use_all_gpus to 1 and make sure it's saved as a .xml file. Restart BOINC. If cc_config doesn't exist, create it via the manager: Options -> Event Log Options, then Save.

You don't need app_info for this case, that would normally be used if you compiled your own app. An app_config will work. Create it with a text editor and save it to the project data folder.
This should be in /var/lib/boinc/projects/milkyway.cs.rpi.edu_milkyway.

<app_config>

 <app>
  <name>milkyway</name>
  <gpu_versions>
   <gpu_usage>0.49</gpu_usage>
   <cpu_usage>0.50</cpu_usage>
  </gpu_versions>
 </app>

</app_config>

This will run two tasks at a time. Adjust gpu_usage and cpu_usage depending on how many tasks you want to run. Make sure it's saved as app_config.xml. Just re-reading the config via the manager - Options -> Read Config Files - will start it working.

Finally, just to be clear re. running in parallel: If you meant Crossfire or SLI, that doesn't work for any BOINC project. You can use all your GPUs but they will run individually on separate tasks, not together on the same task.
5) Message boards : Number crunching : MilkyWay hogging processor time. (Message 69557)
Posted 19 Feb 2020 by Nick Name
Post:
I have World Community Grid tasks that are passing their due date, yet MilkyWay is hogging processor time when its projects aren't due for another five days. There hasn't been a task switch for twenty four hours (set for 60 minutes)
What's going on?

None of this - resource share etc. - should matter once work is in the queue and nearing the deadline. If there are tasks that are close to the deadline BOINC should recognize that and start running them. In severe cases of deadline pressure it should also stop running GPU work to free up a thread for these tasks. I can only think of a few reasons why that isn't happening.

1) Project (WCG in this case) is suspended or its tasks are suspended.
2) The number of tasks allowed to run is limited some way, for example by an app_config using the project_max_concurrent tag. In such a case BOINC will max out the number of threads it's allowed to with other projects.
3) I've never run the N-body or any other CPU work here. Maybe those jobs are hung in some way and BOINC is unable to finish them. I view this as the least likely cause as BOINC should still be able to pause them and switch to higher priority work.

If none of these are causing the problem I'd suggest heading over to the general BOINC forums and post your question there.
6) Message boards : Number crunching : Any enhanced applications developed for Milkyway to speed up GPU work? (Message 69445)
Posted 22 Jan 2020 by Nick Name
Post:
Nice of you folks to make this available to the public.

A CUDA app is probably not worth the effort here since Nvidia has crippled double-precision performance on their consumer GPUs. It might make sense if the double-precision requirement is dropped. There have been occasional comments about testing that but nothing has changed yet. AMD is probably going to have the edge here for a long time yet, for consumer hardware anyway.
7) Message boards : Number crunching : Should I be concerned? (Message 69280)
Posted 21 Nov 2019 by Nick Name
Post:
"Validation inconclusive" is misleading on this project. Those jobs should be labeled as "Validation Pending", like they are on other projects. So, as previously stated, it's nothing to worry about. "Invalid" or "Error" are the only ones to be concerned with.

The message you see might indicate some other problem though. Typically that appears when BOINC is trying to use a GPU that's not working well. You'll need to do some troubleshooting to see what's causing that. Your host shows as having two GPUs, but your run times are consistent with BOINC trying to start a work unit on a GPU that it can't use for some reason. That WU then has to wait until a GPU becomes available. There might be a hardware issue. Load up a GPU monitoring tool like GPU-Z, HWInfo or System Information Viewer to see what your GPU loads look like.
8) Message boards : Number crunching : N-Body complaint (Message 69263)
Posted 19 Nov 2019 by Nick Name
Post:
Abort. They will be sent to another host anyway if they haven't started by the deadline.

I don't run CPU work here so I don't know if the run time you're seeing is reasonable or not. In any case you have too much work. I recommend lowering your cache, aka Store at least N days of work (N being a number) to something lower than you have now. I don't like to set it at more than one day especially for new projects. Start raising it after you have returned some valid tasks.
9) Message boards : Number crunching : Long crunch time on new N-Body simulations? (Message 69161)
Posted 6 Oct 2019 by Nick Name
Post:
My guess is that the combination of work now taking a lot longer than before, plus multi-threaded work running again has really confused things. The client is also bad at accounting for anything in app_config that might affect run times, specifically max_concurrent, so if you are using those that might be part of the problem. Regardless, a low cache is the only way to keep from getting too much work.
10) Message boards : Number crunching : Long crunch time on new N-Body simulations? (Message 69157)
Posted 5 Oct 2019 by Nick Name
Post:
Is there a way to get the initial ETA estimation set higher for these tasks? Without knowing the code, I'm guessing this is something that should be sorted out on the server end, not the user end.

The best way to handle this is lower your cache (aka Store at least N days of work in your preferences), at least until the estimates are more accurate. I don't usually set mine to more than half a day, and I set it that low precisely because of seeing this problem.
11) Message boards : Number crunching : Invalids Exit status 0 (0x0) after server came back (Message 68234)
Posted 8 Mar 2019 by Nick Name
Post:
What is a high percentage? I am at 1.3%.


I'm currently ~5%. I was at 0% before, so this is an extremely high percentage comparatively speaking. I haven't looked at every single task but the ones I looked at validated on other machines. However those machines also had high numbers of invalid tasks, so definitely something strange going on with validation.
12) Message boards : Number crunching : longggg runs (Message 68124)
Posted 9 Feb 2019 by Nick Name
Post:
When this happens the GPU has stopped crunching. The most likely cause is a driver crash. It should never happen, but I wouldn't worry about it if it's not happening often. If it's frequent, you'll have to start doing some troubleshooting.
13) Message boards : Number crunching : Stats dropping for several days. (Message 68093)
Posted 2 Feb 2019 by Nick Name
Post:
My guess would be a database connection glitch or error that's putting your host into a 24 hour back-off. That would keep completed work from uploading and keep you from getting new work. I caught this happening on my host a couple days ago.
14) Message boards : Number crunching : database problems(?) (Message 67994)
Posted 9 Jan 2019 by Nick Name
Post:
The site in general has been pretty sluggish and unreliable the last few weeks, even outside the maintenance window. Hopefully not a sign of failing hardware.
15) Message boards : Number crunching : Nbody without disabling GPU? (Message 67959)
Posted 21 Dec 2018 by Nick Name
Post:
First, check your startup log. There should be a message saying your app_config is read, or a message saying there is a problem. It does look like there's a syntax problem. <app_config> is in there twice.


<app_config>

<app>
<name>milkyway</name>
<gpu_versions>
<gpu_usage>0.25</gpu_usage>
<cpu_usage>0.25</cpu_usage>
</gpu_versions>
</app>

<app>
<name>milkyway_nbody</name>
<max_concurrent>1</max_concurrent>
<app_version>
<app_name>milkyway_nbody</app_name>
<plan_class>mt</plan_class>
<avg_ncpus>12</avg_ncpus>
</app_version>
</app>

</app_config>

I don't run nbody but see if that works. You might need to use the cmdline parameter to control the number of nbody threads instead of ncpus.

https://boinc.berkeley.edu/wiki/Client_configuration
16) Message boards : News : Scheduled Maintenance Concluded (Message 65839)
Posted 15 Nov 2016 by Nick Name
Post:
Thanks everyone. I caught an unbundled one and it processed at the (before bundle) normal rate, so it seems things are working as they should. Any extra time they take can be explained by the stop/start of the bundled work units running at the same time.
17) Message boards : News : Scheduled Maintenance Concluded (Message 65836)
Posted 15 Nov 2016 by Nick Name
Post:
Nick Name -

The first task you have is bundled (5 "old" tasks), ran roughly 5 times longer than the second, and gave roughly 5 times more credit than the second.

What was not expected?

It is true that not all tasks being sent out are bundled. I have a few of those.

Thanks for your comment, I think I figured out what is happening. I was surprised to see some unbundled work units. The run time also surprised me, even taking into account the fact that I'm running six at once. That's why the bundled one I linked ran for 20 minutes. At first glance it seemed some tasks that should have been bundled, weren't, and were also running unusually long.

I don't have many of these and haven't been at the computer to see one come thru. If I catch one I'll probably run it by itself just for peace of mind.
18) Message boards : News : Scheduled Maintenance Concluded (Message 65834)
Posted 15 Nov 2016 by Nick Name
Post:
Please observe these work units.

This is expected.
https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1887793313
Name: de_modfit_fast_19_3s_136_bundle5_ModfitConstraints3
Run time: 20 min 41 sec
Credit: 133.66

This is not expected.
https://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1887790878
Name: de_modfit_fast_19_3s_136_ModfitConstraints3
Run time: 4 min 5 sec
Credit: 26.73

It appears there are some work units getting thru that are not bundled, but run 5x as long as an old single work unit and pay 1/5 as much. I have a handful of these.
19) Message boards : News : Scheduled Maintenance Concluded (Message 65715)
Posted 12 Nov 2016 by Nick Name
Post:
I can confirm that GPU load for both AMD and Nvidia is zero, on Windows 7. As others have said they are only running on the CPU.
20) Message boards : News : Scheduled Maintenance Friday November 11th (Message 65640)
Posted 10 Nov 2016 by Nick Name
Post:
Jake,

Thanks for your efforts.

The plan for the beginning is to start conservatively with 10 work units per bundle.


Does this mean we will be crunching 10 work units together (or whatever is in the bundle) at once? Right now I use an app_config to run six at a time, I intend to remove that before the new app is released but I would not like to try running 60 at a time.


Next 20

©2024 Astroinformatics Group