Welcome to MilkyWay@home

Run Multiple WU's on Your GPU

Message boards : Number crunching : Run Multiple WU's on Your GPU
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next

AuthorMessage
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 556,333,220
RAC: 55,136
Message 71165 - Posted: 28 Sep 2021, 18:48:58 UTC - in response to Message 71164.  

If I remember correctly, and assume I don't . . . . MW allows 300 tasks per card and a project maximum of 900 tasks.
ID: 71165 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toby Broom

Send message
Joined: 13 Jun 09
Posts: 24
Credit: 137,966,499
RAC: 27,036
Message 71167 - Posted: 28 Sep 2021, 20:27:35 UTC - in response to Message 71165.  

Yes, your correct.

Based on the other discussions I try to write a script that will work around the limit by pausing the networking
ID: 71167 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 556,333,220
RAC: 55,136
Message 71170 - Posted: 29 Sep 2021, 1:18:37 UTC - in response to Message 71164.  

I also assume that it would be better for the project to create larger work units? or at least an option for larger ones?

They already make larger units by bundling 4 or 5 tasks in the work unit.

But they ran into issues just recently with failed work units that for some reason the devs haven't figured out had the scheduler/splitter bundle 7 tasks into the WU and that exceeds the parameter length limit.

So making larger tasks doesn't seem possible at this time.
ID: 71170 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Ryan Munro

Send message
Joined: 22 Jun 09
Posts: 13
Credit: 76,418,496
RAC: 11,739
Message 71172 - Posted: 29 Sep 2021, 9:37:46 UTC - in response to Message 71157.  

Still won't go above 210w, timed it and when running 16 units at the same time it does about as much work in the same time as when I run 4 units at once, no real difference.
ID: 71172 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Toby Broom

Send message
Joined: 13 Jun 09
Posts: 24
Credit: 137,966,499
RAC: 27,036
Message 71182 - Posted: 29 Sep 2021, 20:12:58 UTC - in response to Message 71172.  
Last modified: 29 Sep 2021, 20:14:25 UTC

I assume the 3090 just can't do enough work to get to full power, in the Ampere SMs only half of the SM can do FP64 so, I can imagine the best case here is 3090 would be about half of max power which is about what you see I assume if you have 400 W BIOS, I have founders so I have 350 W and see about 180 W, with just one WU at a time

even on my Titan V, its not full bore with 7 WUs at once
ID: 71182 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 556,333,220
RAC: 55,136
Message 71183 - Posted: 29 Sep 2021, 21:25:25 UTC - in response to Message 71182.  

There are other gpu projects that will use every watt available to them on Ampere. GPUGrid, Minecraft, Einstein for example.
We just need Petri to give the MW app a good work over like he has done with Seti and Einstein.
ID: 71183 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Wrend
Avatar

Send message
Joined: 4 Nov 12
Posts: 96
Credit: 251,528,484
RAC: 0
Message 72357 - Posted: 30 Mar 2022, 18:12:16 UTC
Last modified: 30 Mar 2022, 18:22:17 UTC

Bearing that in mind, I'm quite pleased that this project can make good use of the DP/FP64 capabilities of my Titan Black GPUs, whereas other projects can't. Yes, the project is more niche for it, but then aptly so are my GPUs. If anyone is to be held responsible for it, I think it would be best placed at Nvidia's feet for generally limiting the capabilities of their cards to profit more from segmenting their market.

As an update to my previous post, I have applied new thermal paste to my GPUs and it made a quite significant difference, dropping full load temperatures over 10°C; and I say over since I was hitting thermal throttling levels before. It would seem the stock thermal paste had an effective lifespan of up to about 5 years for cards that are in active use.

Currently I'm back to running two tasks per GPU to help keep room temperatures and fan speeds and noise down. It's nice to be having my cards doing some good work in the background again.

Best regards.
ID: 72357 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Chooka
Avatar

Send message
Joined: 13 Dec 12
Posts: 101
Credit: 1,782,758,310
RAC: 0
Message 72522 - Posted: 5 Apr 2022, 19:04:26 UTC - in response to Message 71160.  

No matter how many WU's I run my card won't use more than 210w when crunching, 2 / 4 / 8 units, its a 3090 and under WCG running 8 of their GPU units at the same time it will pull the full 350w,



With the price of graphic cards skyrocketing, I would avoid running any card at its max rating. High temps cause thermal paste to harden making removal non-trivial. Exact OEM fan replacement can be hard to find and one must be creative on occasions. I have a few really weird fan arrangements I can post if anyone interested.


BUT the problem comes down to MW not being able to supply you with enough tasks in one day to keep it going, with their 10 minutes back-off between sending tasks your 3090 will be doing something else for those 10 minutes even more often.


That BOINC app I modded, 7.15.0 fixes the 10 minute wait. The latest official version is 7l.16.11 and I assume the 10 minute problem still exists for that app.

Due to temperatures recently dropping here in texas, I started up a pair of garage "racks" to start crunching on Einstein and WCG. I have a 3rd rack for milkyway but the garage is still too hot to run that one.


YES your app did fix it, but unfortunately not everyone uses it.


I was using the modded 7.15 version that Joseph created but I changed to another version for the Primegrid challenge..... now I can't get it to change back to 7.15. Each time I try to install it as per the instructions, BOINC just won't connect.
I delete just about every BOINC file I can find but no luck. I'm not sure if perhaps it's due tot he project file not being deleted or if it's my cloud backup? Either way, I've tried it on 3 or 4 pc's but can't get it to work any more.

I really can't believe that the 10min backoff thing still exists. It's the single biggest issue with this project and because of it I crunch elsewhere like Einstein. I REALLY wish someone would take the time to address this issue :( It's only going to get worse as cards get faster.

ID: 72522 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Wrend
Avatar

Send message
Joined: 4 Nov 12
Posts: 96
Credit: 251,528,484
RAC: 0
Message 72579 - Posted: 8 Apr 2022, 13:33:10 UTC
Last modified: 8 Apr 2022, 13:44:16 UTC

I'm wondering what the current name syntax for work units types/apps all are now for the purposes of this file. I think it's been years since I've updated them. I currently have this to run 4 tasks per GPU for 2 SLIed Titan Black cards:

...
<app>
<name>milkyway</name>
<max_concurrent>0</max_concurrent>
<gpu_versions>
<gpu_usage>0.25</gpu_usage>
<cpu_usage>0.10</cpu_usage>
</gpu_versions>
</app>

<app>
<name>milkyway_nbody</name>
<max_concurrent>0</max_concurrent>
<gpu_versions>
<gpu_usage>0.25</gpu_usage>
<cpu_usage>0.10</cpu_usage>
</gpu_versions>
</app>

<app>
<name>milkyway_separation__modified_fit</name>
<max_concurrent>0</max_concurrent>
<gpu_versions>
<gpu_usage>0.25</gpu_usage>
<cpu_usage>0.10</cpu_usage>
</gpu_versions>
</app>
...


If someone could let me know, I'd appreciate it. Thanks.
ID: 72579 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 556,333,220
RAC: 55,136
Message 72581 - Posted: 8 Apr 2022, 13:49:15 UTC

You can drop the milkyway_separation_modified_fit section. It's just milkway for the Separation tasks now.
ID: 72581 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Wrend
Avatar

Send message
Joined: 4 Nov 12
Posts: 96
Credit: 251,528,484
RAC: 0
Message 72583 - Posted: 8 Apr 2022, 13:56:25 UTC - in response to Message 72581.  

You can drop the milkyway_separation_modified_fit section. It's just milkway for the Separation tasks now.

Thanks.
ID: 72583 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Blake

Send message
Joined: 1 Jul 12
Posts: 8
Credit: 351,094,054
RAC: 0
Message 73817 - Posted: 12 Jun 2022, 5:59:42 UTC

Anyone figured out the best
<gpu_usage />
setting for Nvidia RTX 2080? It is not the 'Super' or the 'TI,' just a plain-Jane RTX 2080 with 8 GB of dedicated VRAM. I've had it running only one work unit at a time so far.
ID: 73817 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Keith Myers
Avatar

Send message
Joined: 24 Jan 11
Posts: 715
Credit: 556,333,220
RAC: 55,136
Message 73819 - Posted: 12 Jun 2022, 16:35:55 UTC - in response to Message 73817.  

I use 0.5 for gpu_usage on my three 2080's. Running doubles is faster per task than running singles.
ID: 73819 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3339
Credit: 524,010,781
RAC: 0
Message 73828 - Posted: 13 Jun 2022, 10:20:11 UTC - in response to Message 57755.  

I have been trying to run the app_config file shown above and whenever I do the Boinc event log shows an error message saying "App file not found". Has anyone else had this problem?


Yes, and haven't found the answer yet. Did try the ideas mentioned above.

edit: Mine wanted <use_all_gpus>1</use_all_gpus> in the cc_config, apparently.


One thing to be very careful of is to never use a word processing program to edit the file as Boinc can't read all the stuff they leave when they save a file, ONLY use a text editor, ie Notepad in Windows. Then save the file Windows in the c:\program data\Boinc\Projects\milkyway.cs.rpi.edu_milkyway or in Linux in the var\lib\Boinc\projects\milkkyway.cs.rpi.edu_milkyway folder

After you save the file in the right place be sure to then go back to the Boinc Manager and click on Options, read config files so Boinc will implement the changes right away. IF you changed the number of cpu cores per Nbody task that won't take effect until the next time you get new tasks from the project.

And YES since the <use_all_gpus> command is a global Boinc command for all the projects to use it must go in the cc_config.xml file in the main Boinc folder. Changing whether to run more than a single gpu task at a time or even not using all your cpu cores for the Nbody tasks is a MilkyWay only rule and therefore it goes in the MilkyWay Projects folder.
ID: 73828 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 12 Jun 10
Posts: 57
Credit: 6,224,911
RAC: 3,150
Message 73867 - Posted: 19 Jun 2022, 4:08:13 UTC
Last modified: 19 Jun 2022, 4:20:25 UTC

Is it safe to run more than 1 task if GPU usage is sitting between 70 and 76%? Each task is running for 102 seconds on average
ID: 73867 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
.clair.

Send message
Joined: 3 Mar 13
Posts: 84
Credit: 779,527,712
RAC: 0
Message 73871 - Posted: 19 Jun 2022, 16:22:27 UTC - in response to Message 73867.  
Last modified: 19 Jun 2022, 16:22:57 UTC

Is it safe to run more than 1 task if GPU usage is sitting between 70 and 76%? Each task is running for 102 seconds on average

Yes , you will see a good increase in throughput ,
BUT , keep an eye on GPU temperature , don't let them go above 70c long term 60c is better ,
and less is even better .
ID: 73871 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 0
Message 73872 - Posted: 19 Jun 2022, 19:32:29 UTC - in response to Message 73867.  

Is it safe to run more than 1 task if GPU usage is sitting between 70 and 76%? Each task is running for 102 seconds on average
yep. here is a little something I posted earlier:

https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4883&postid=73049#73049
ID: 73872 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 12 Jun 10
Posts: 57
Credit: 6,224,911
RAC: 3,150
Message 73874 - Posted: 19 Jun 2022, 22:21:32 UTC - in response to Message 73872.  
Last modified: 19 Jun 2022, 22:50:05 UTC

Is it safe to run more than 1 task if GPU usage is sitting between 70 and 76%? Each task is running for 102 seconds on average
yep. here is a little something I posted earlier:

https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4883&postid=73049#73049

Thanks for the guide unfortunately I couldn't get it to work. I have decided to only run 1 task at a time
ID: 73874 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 0
Message 73875 - Posted: 20 Jun 2022, 0:23:06 UTC - in response to Message 73874.  

Is it safe to run more than 1 task if GPU usage is sitting between 70 and 76%? Each task is running for 102 seconds on average
yep. here is a little something I posted earlier:

https://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=4883&postid=73049#73049

Thanks for the guide unfortunately I couldn't get it to work. I have decided to only run 1 task at a time
@ speedy51, OK, no problemo, as we say in Texas. I am a bit OCD about squeezing as much performance as I can out of this hardware. My GPU was running about 50% utilization at the time I started my quest. So clearly there was room for improvement there. If you are hovering around 75%, then probably not much room for improvement. My second GPU is running 95% utilization, so not much chance at all for any improvement. And testing actually showed a loss of performance at 2 tasks, so I went back to 1 task per, on that GPU. But I would encourage you to keep on trying.
ID: 73875 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Speedy51

Send message
Joined: 12 Jun 10
Posts: 57
Credit: 6,224,911
RAC: 3,150
Message 73876 - Posted: 20 Jun 2022, 0:51:07 UTC - in response to Message 73875.  
Last modified: 20 Jun 2022, 1:31:16 UTC

So in a sense you have done exactly what I said above. Contrary to what I said I managed to get to tasks to run however I actually think it is slower because each task is taking around 3 minutes so that 6 minutes in total. However if I was to run one at a time I could complete 2 tasks in approximately just over 4 minutes so that is a speed increase running one at a time from where I am sitting. What are your thoughts? Currently running 0.5 CPU and 0.5 GPU
ID: 73876 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · Next

Message boards : Number crunching : Run Multiple WU's on Your GPU

©2024 Astroinformatics Group