Welcome to MilkyWay@home

Posts by wolfman1360

1) Message boards : Number crunching : Rx570 vs. gtx 1080, 1080ti, 2080 (Message 69114)
Posted 23 Sep 2019 by wolfman1360
Post:
I have just stayed away from ATI/AMD because of the challenge of installing the drivers and maintaining them. The Nvidia drivers just install and run with no issues ever. The ATI/AMD drivers are a complete fiasco as all the constant posts of issues posted attest in the forums.

True enough for the latest versions. But the RX 500 series is easy enough as I understand it, though mine is on Windows. And the power efficiency of AMD is much better here; I use about 90 watts (GPU-Z) on my RX 570 for my 100 seconds; I expect it would be more like 140 watts on the Nvidia cards.

I have a bunch of Nvidias, but they are used elsewhere (on Ubuntu).

Perhaps latest AMD drivers are having issues somewhere, I know on Seti in particular, but for my rx series I have never had a driver issue on any of my AMD cards.
For this project in particular, the fp64 performance of the Nvidia cards, at least on the consumer side of things, are pretty terrible though. Rtx 2080: 314.6 GFLOPS. Rx570: 318.5 GFLOPS. So for what now is around $100 or so you can have better performance on this project as a posed to a $1100 GPU, and use much less power to boot. Those numbers seem to correlate to real world performance on this project from what I can tell, too.
Are there other projects that heavily utilize fp64 as well?
2) Message boards : Number crunching : Rx570 vs. gtx 1080, 1080ti, 2080 (Message 69080)
Posted 19 Sep 2019 by wolfman1360
Post:
I seem to do the work units in around 100 - 140 seconds for both my 1080 and 2080. And around 90 seconds for my 1080 Ti.

Thank you. This is exactly what I was looking for. How many workunits do you run at once on all 3 of those cards?
I know there are cards far better suited to this project that can wipe the floor with mine. Maybe I'll grab an r9 280x down the road. I also know that every little bit helps and I'm certainly not looking at getting into the top anything. I simply don't have the finances or physical space for that. I just wanted to make sure I could maximize output with what I do have.
3) Message boards : Number crunching : Rx570 vs. gtx 1080, 1080ti, 2080 (Message 69063)
Posted 19 Sep 2019 by wolfman1360
Post:
Hello everyone.
I've been an on and off contributor to this project for a while now. Right now, Einstein is taking priority, despite resources saying otherwise, but that's a Boinc problem more than anything. Regardless - when I was getting work from here, they seem to be a very nice fit for each other, though maybe I will soon set Einstein to 0% resources.
Right now I've got this rx570 set to complete two concurrent wus. My app_config looks like this and I seem to be completing them within 2-2.5 minutes.
<app_config>
<app>
<name>milkyway</name>
<gpu_versions>
<gpu_usage>0.5</gpu_usage>
<cpu_usage>0.25</cpu_usage>
</gpu_versions>
</app>
</app_config>
Does that look alright for this card? I'm still very new at assigning CPU cores to GPU work. The GPU load pegs at, I think, a constant 100% with periodic periods of less, though not many. though I'm not entirely sure if I should be paying attention to that, memory used, power draw, or something else entirely.
The processor is a Ryzen 1800x which is crunching Asteroids right now and I have Boinc set to use 87% CPU, since Einstein likes to use 1 core per WU and I just have it using the website preferences as a guide.
Does any of this need to be changed at all for better optimization?

Now for the interesting question. I know that this project favors AMD cards quite heavily. What kinds of runtimes can I expect from a gtx 1080, 1080 ti, or 2080? What do folks recommend setting the number of WUs to on those cards vs. CPU cores in use for the GPU? Is there anything else I should keep in mind?
thanks a ton!
4) Message boards : Number crunching : So no way to select project campaigns anymore on the new server code (Message 68705)
Posted 7 May 2019 by wolfman1360
Post:
MW used to have a mt N-body application in earlier times. GPUGrid.net currently uses an mt application called QC Chemistry that heavily uses multiple cores with default of <cmdline>--nthreads 4</cmdline>. When the application first was developed it defaulted to using all cpu cores until users complained and it was knocked down to something more sensible.


Primegrid has a ton of cpu apps and MOST can be multi-threaded like that, so can the ones at RakeSearch, SRBase, I think Universe and others I don't remember right now. The easiest thing is to try it and see if it works, you have the basics now just change the app name which is listed under apps on most projects and put the app_config.xml file it in that projects folder.


Thanks!
I'm assuming if it doesn't support it it simply won't work and will resort to a single thread per WU?
5) Questions and Answers : Unix/Linux : Attempting to run CPU tasks but get the following. Not requesting tasks: don't need (CPU: ; NVIDIA GPU: job cache full) (Message 68695)
Posted 5 May 2019 by wolfman1360
Post:
Okay, this is a strange issue.

I decided to enable CPU tasks in prime grid project settings. And what do you know, that machine is getting tasks like crazy on the CPU side of things.

Should I try resetting MWH? I have little experience in doing this so not sure if I should outright remove or reset project. I'm assuming make sure there are no tasks left before doing either one.

Interesting and a little frustrating, but hopefully I can figure this out.


Resettting will wipe out every workunit from MW you have on your pc in the process.

Are you sure you have the allow cpu tasks checkbox checked for MW? Were you running cpu wu's from other projects prior to allowing them from PG? What is your resource share set too?

Yes I do have MW set to receive cpu tasks. I have, right now, resources set to 150. No, I was not receiving cpu tasks for any other projects. I was waiting on MW to receive them since I had it exclusively set to be the one running on the cpu and gpu at the same time.
6) Message boards : Number crunching : AMD Ryzen 7 1800X CPU task takes over 20 hours? (Message 68693)
Posted 4 May 2019 by wolfman1360
Post:
I'm used to coming off of WCG, where just about every project is pretty consistent with the runtimes you receive.

And I'm used to the varying runtimes of the tasks we get at Seti depending on the origin of the tasks (which antenna) and also the way the data was gathered. Typically one type of task from Arecibo telescope takes twice as long to compute as a task from Green Bank telescope even thought the task sizes are identical. I just realize that some tasks from any particular project are easier or harder to crunch and don't worry about it.

Thanks again. Exactly what I was looking for. :)
7) Message boards : Number crunching : So no way to select project campaigns anymore on the new server code (Message 68692)
Posted 4 May 2019 by wolfman1360
Post:
MW used to have a mt N-body application in earlier times. GPUGrid.net currently uses an mt application called QC Chemistry that heavily uses multiple cores with default of <cmdline>--nthreads 4</cmdline>. When the application first was developed it defaulted to using all cpu cores until users complained and it was knocked down to something more sensible.

Well that's interesting. I didn't think gpugrid had any CPU work at all for some reason. Neat.

I remember having MW use all the cores on my fx8350 back in the day.
8) Message boards : Number crunching : Lenovo x1 extreme arriving next week! Best crunching practices? (Message 68691)
Posted 4 May 2019 by wolfman1360
Post:
I'll help with the battery first, just remove the battery and run it off the wall like a desktop, no more battery problems.

As for not using all of the gpu it depends on the OS you will put on it, if it's Windows then you can use something like MSI Afterburner to change the settings to slow it down a bit. I have no clue what program in Linux will do the same.

I guess I could do that, yes. I'm not sure how much of a chore it would be to take the back panel off each time I wanted to be mobile and put the battery in/take it out. I think it's just held in by a screw.
I will be running Windows on it. I only use Linux to crunch and since this will be a machine for both work and play Windows is what I use.
Reading about Afterburner and I think it would allow me to undervolt the GPU too. I've never done that before, so this should be interesting.
9) Message boards : Number crunching : Lenovo x1 extreme arriving next week! Best crunching practices? (Message 68684)
Posted 4 May 2019 by wolfman1360
Post:
First off, I'm sorry for all the recent threads here lately.

I'm taking possession of a Lenovo x1 extreme sometime next week and I'm just curious on recommendations to crunch with it? i7-8750H, gtx 1050ti. I think one stick of 16 gb ram which I plan on adding another to (so it has more ram than my desktop...go figure). That gpu should be pretty solid though I think?

Is there any way to not use all of the gpu, for instance? Since that would still be tons more powerful than CPU only. I'll be getting a decent laptop cooler and will probably end up repasting and undervolting the processor to help with keeping it cool. This will be my first beast of a laptop in a thin and light chassis. I'm pretty excited. This GPU crunching is all new to me.

That being said, Any hints appreciated so I don't end up swelling the battery or something in a year or two. I definitely don't have the physical space for another desktop.
10) Message boards : Number crunching : So no way to select project campaigns anymore on the new server code (Message 68683)
Posted 4 May 2019 by wolfman1360
Post:

Does this work on all CPU projects or only on a few?

<cmdline>--nthreads 4</cmdline>



<cmdline>--nthreads 4</cmdline> is an actual command line option (could be for numerous other options besides multithreading) passed onto the WU application at startup and syntax depends on the coding culture of the author.
LLR.exe accepts "-t4", not sure it accepts "--nthreads 4"

If you don't pass on the option exactly the way the application expects it, then no multithreading.
If a project's WU doesn't actually have code to multi-thread, then the command line option is meaningless, so it will only work on CPU work units that are ready for multithreading.

(Sorry Keith, that was off topic, but interesting question to be answered)


That's exactly what I was looking for. The documentation simply provides various config examples without going into too much detail about that option in particular. Thank you for the explanation.

I don't think there are many multithreaded applications around. I haven't gotten one from MWH since I started crunching here again but it's good to know I can invoke it if I so choose (I'm assuming the reference to LLR.exe was what this meant)?
11) Message boards : Number crunching : AMD Ryzen 7 1800X CPU task takes over 20 hours? (Message 68682)
Posted 4 May 2019 by wolfman1360
Post:
Does the estimated computation size not have anything to do with how long the task takes? Seems like the bigger WU is going to be much shorter (less than half the time) as the theoretical shorter and smaller one.

Yes, generally it does. But is also depends on how the project scientists setup the application's expected results for a calculation. Some tasks are more difficult than others and may require more FLOPS to compute. The calculation difficulty also factors into how much credit is awarded in the classic BOINC code. But the creation of the CreditNew credit algorithm pretty much threw a monkey wrench into that business. That topic is a political minefield. The project scientists determine how much and how they award credit.

I would not bother with reinstalling Windows. That won't change the difficulty in different task calculations. You are trying to compare apples to oranges. Just let BOINC run and do its thing. You can't change how it works.

That makes sense. The task might have more difficult calculations during the process and so that smaller number may equal longer compute times.

I'm not too hung up on credit, especially with a project like this - my GPUs don't hold a candle to just about anything out there and I'm in it to progress science, not for bragging rights at this point. I'd rather throw that $100 at the projects for funding than throw together a new machine I don't actually have room for. ;)

I just wanted to make sure this was normal for these tasks. I think the NBS runtimes vary wildly per task, which was what made me question this to begin with. I wasn't sure if they should be consistent with each other.

I'm used to coming off of WCG, where just about every project is pretty consistent with the runtimes you receive.
12) Questions and Answers : Unix/Linux : Attempting to run CPU tasks but get the following. Not requesting tasks: don't need (CPU: ; NVIDIA GPU: job cache full) (Message 68681)
Posted 4 May 2019 by wolfman1360
Post:
Okay, this is a strange issue.

I decided to enable CPU tasks in prime grid project settings. And what do you know, that machine is getting tasks like crazy on the CPU side of things.

Should I try resetting MWH? I have little experience in doing this so not sure if I should outright remove or reset project. I'm assuming make sure there are no tasks left before doing either one.

Interesting and a little frustrating, but hopefully I can figure this out.
13) Message boards : Number crunching : Errors, invalid, and validation inconclusive. Anything to worry about? (Message 68680)
Posted 4 May 2019 by wolfman1360
Post:
There are a few tasks in progress that are going to take well over a day and appear to be only 15000 gflops and a bit. Meanwhile a core i5-3317U can complete a task of 60000 plus in less than half this time. So something is definitely going on,


I discovered after making a purchase for a used GPU, that Milkyway requires double precision floating point calculations and the 1060 3gb was 1 FP64 to every 32 FP32 calculation units. (Ended up buying a used 280x for Milkyway)
The benchmarks shown in details here, I think, reflect FP32 measurements, not FP64.
It's possible the performance difference on Milkyway WU between the i5-3317U and your Ryzen could be related to double precision abilities.

Do you have a BM tool that measures FP32 and FP64 you can test on the two CPU's to compare?
Or at least to see if your Ryzen is performing at expectations to reference BMs.
Speaking of reference BM's; here's a site taking a Ryzen 1700x through it's arithmetic paces using the Sandra Lite application.
The Ryzen 1700x seems to have good FP64 , and if the 1800x (not finding their 1800x review) isn't much different in architecture, it should be outperforming the i5-3317U.



Maybe I just need to format? Not sure what else to look for.

Before you do that, to eliminate the OS and app install, you can run a testing OS from a USB thumb drive which is a barebones, OS built only to do BOINC and see how well the hardware does on that reference OS. I made one with Tiny7 but there is PenDriveLinux and others. You can use YUMI to build a pendrive Linux from many different distros.

Anyway, you had 6 errors out of over 600 WU's, it could just have been a power spike. But the performance difference between those two machines would bother me, especially since the 1800x is a workhorse.


I'm not sure what I'd use for that benchmark at this point.

I don't think this supports running 2 WUs at once on the graphics card - I'm seeing 100% utilization at around 78 Celsius. It's very tempting to go for the rx580 or even 590, or maybe wait for AMD to drop the ball on the new 7 NM, but I'd mainly be getting that for crunching rather than my own personal needs.
It's bothering me too. Lots.

I'm going to get rid of the page file to see if I'm actually running into memory issues before heading to bed tonight. If I wake up to a bunch of memory 000 errors I'll know I need to either a) upgrade to 32 GB of ram or b) wipe this thing clean and start with a clean slate.
Right now there are a few more WUs coming in that look like they're going to take around 12 hours on this Ryzen, so maybe that was just a fluke.

I don't mind wiping and reinstalling Windows. It's been about two years so I'm sure there's bound to be a memory leak somewhere or at least something effecting performance that I've done over the years. Reinstall Windows, reinstall boinc (after latest GPU drivers) and see what happens.
14) Message boards : Number crunching : So no way to select project campaigns anymore on the new server code (Message 68673)
Posted 3 May 2019 by wolfman1360
Post:
Well I was just about to ask about multithreaded CPU applications. Looks like I no longer have to since they've apparently been phased out.


An app_config.xml file will let you do it manually if you want too, most cpu apps can be done that way but not all.



This should work in your app_config.xml for 4 threads (check the client_state.xml for this information).

<app_version>
<app_name>milkyway_nbody</app_name>
<plan_class>mt</plan_class>
<avg_ncpus>4.0</avg_ncpus>
<cmdline>--nthreads 4</cmdline>
</app_version>

Thank you.
Does this work on all CPU projects or only on a few?
I'm curious what the difference between the following is, though.
<avg_ncpus>4.0</avg_ncpus>
<cmdline>--nthreads 4</cmdline>
Aren't these essentially saying the same thing?

thanks!
15) Message boards : Number crunching : AMD Ryzen 7 1800X CPU task takes over 20 hours? (Message 68672)
Posted 3 May 2019 by wolfman1360
Post:
Ignore the estimated time remaining. That is only a guess by BOINC since it has only seen 5 tasks so far on your host. BOINC can't accurately predict runtimes on tasks until the host has validated 11 tasks for each application that are not overflows, 100% radar blanked or errors.


I'm not sure if this means anything, however here are two different tasks with drastically different results. And it looks like I'm hitting virtual memory now which could explain things.


Application
Milkyway@home N-Body Simulation 1.76
Name
de_nbody_04_23_2019_v176_40k__data__3_1556550902_54648
State
Suspended - computer is in use
Received
2019-05-01 1:22:44 PM
Report deadline
2019-05-13 1:19:06 PM
Estimated computation size
15,994 GFLOPs
CPU time
1d 04:30:31
CPU time since checkpoint
00:00:14
Elapsed time
1d 05:14:44
Estimated time remaining
02:04:10
Fraction done
93.391%
Virtual memory size
13.61 MB
Working set size
1.42 MB
Directory
slots/14
Process ID
16916
Progress rate
3.240% per hour
Executable
milkyway_nbody_1.76_windows_x86_64.exe


Application
Milkyway@home N-Body Simulation 1.76
Name
de_nbody_04_23_2019_v176_40k__data__1_1556550902_83400
State
Suspended - computer is in use
Received
2019-05-02 11:17:26 PM
Report deadline
2019-05-14 11:13:49 PM
Estimated computation size
41,239 GFLOPs
CPU time
07:56:09
CPU time since checkpoint
00:00:13
Elapsed time
08:12:19
Estimated time remaining
03:34:18
Fraction done
63.672%
Virtual memory size
12.64 MB
Working set size
1.42 MB
Directory
slots/7
Process ID
13492
Progress rate
7.920% per hour
Executable
milkyway_nbody_1.76_windows_x86_64.exe

Does the estimated computation size not have anything to do with how long the task takes? Seems like the bigger WU is going to be much shorter (less than half the time) as the theoretical shorter and smaller one.

I think next week I'll be wiping this and installing a fresh copy of Windows just to be sure.

I'll give it a few days to settle. I just hope I don't get more errors in the meantime...
16) Message boards : Number crunching : Errors, invalid, and validation inconclusive. Anything to worry about? (Message 68669)
Posted 3 May 2019 by wolfman1360
Post:
Those are probably not errors. Those are the work units that most other projects call 'validation pending' or 'pending' (not sure why it's different here).

Those WU's are waiting for the wing-computer to report a second result and they may wait for up to a full deadline period if the computer got shut off, project detached without aborting the WU's, etc...


You have 6 actual errors, and like Beemer pointed out, could be your computer temps.

Two of the GPU were error: 3x "C:\Users\tcwoo\AppData\Local\Temp\\OCL8928T1.cl:186:67: warning: unknown attribute 'max_constant_size' ignored".
Not sure what that one is.

The 3 invalid on the CPU were only your computer's problem so check for cooling issues, running out of RAM (bad app, leaking memory pointers), hopefully you're not looking at an actual hardware problem.

The processor is stable at 3.7 GHZ. The GPU clocks remain steady at 1244 MHZ. Similarly the memory clock remains steady. I have no overclocks or undervolts on anything. The CPU temp, as I mentioned in another thread, is staying lower than usual while 100% cpu is utilized. I don't know why. It doesn't appear to be hitting ram limits. If I run prime 95 the cpu temp climbs up past 60 as is the norm for the machine. There are a few tasks in progress that are going to take well over a day and appear to be only 15000 gflops and a bit. Meanwhile a core i5-3317U can complete a task of 60000 plus in less than half this time. So something is definitely going on, I just don't know what. Maybe I am hitting ram limits? But each task only takes around 14 mb and this machine has 16 gb total.

Maybe I just need to format? Not sure what else to look for.
17) Message boards : Number crunching : Errors, invalid, and validation inconclusive. Anything to worry about? (Message 68663)
Posted 3 May 2019 by wolfman1360
Post:
This computer seems to be getting a lot of validation inconclusive errors...and 3 or 4 random invalid / errored tasks.
https://milkyway.cs.rpi.edu/milkyway/results.php?hostid=803731

No overclock on the GPU or CPU so is this anything to worry about?
18) Message boards : Number crunching : So no way to select project campaigns anymore on the new server code (Message 68661)
Posted 3 May 2019 by wolfman1360
Post:
Well I was just about to ask about multithreaded CPU applications. Looks like I no longer have to since they've apparently been phased out.
19) Message boards : Number crunching : AMD Ryzen 7 1800X CPU task takes over 20 hours? (Message 68656)
Posted 3 May 2019 by wolfman1360
Post:
So I just want to make sure this is normal.

task: N-Body Simulation 1.76, de_nbody_04_23_2019_v176_40k__data__3_1556550902_54648_0. (Not sure how much of that is relevant).

But now for the interesting part.

Unless I'm missing something, this amount of data should only take a few hours on this processor. It isn't.
State
Running
Received
2019-05-01 1:22:44 PM
Report deadline
2019-05-13 1:19:06 PM
Estimated computation size
15,994 GFLOPs
CPU time
13:36:38
CPU time since checkpoint
00:00:04
Elapsed time
13:50:40
Estimated time remaining
17:50:46
Fraction done
43.686%
Virtual memory size
13.59 MB
Working set size
17.79 MB
Directory
slots/14
Process ID
18784
Progress rate
3.240% per hour
Executable
milkyway_nbody_1.76_windows_x86_64.exe

Is there something wrong with this WU or my machine? I should also note that the CPU temperatures are well below their usual 60 plus and are hovering in the high 50s. The fan isn't revving up to the higher rpm range either and it's got stable clock speeds of 3.7 ghz.

I'm running this exclusively on the CPU along with the GPU. I have the processor set to use 92% to account for the one thread that the GPU needs, and those tasks appear to be flying along just fine on the rx570. I'm not sure what to do at this point.

Help appreciated!
20) Questions and Answers : Unix/Linux : Attempting to run CPU tasks but get the following. Not requesting tasks: don't need (CPU: ; NVIDIA GPU: job cache full) (Message 68653)
Posted 2 May 2019 by wolfman1360
Post:
Yes your work cache covers both cpu and gpu work. BOINC determines whether there is room to schedule cpu or gpu work based on the total amount of estimated calculation time spread among all your projects. It is called REC or Recent Estimated Credit and that figure gets used along with GFLOPS for each device in round-robin simulation when you request work. One of the ways to see what your total commitment for the cpu is to set work_fetch_debug in the Event Log logging options and then read through the Event Log after the work request. You don't want to leave it enabled for more than one work fetch cycle though because it generates a lot of output. A good option to set is sched_op_debug as a permanent logging option. It doesn't add all that much to the event log but it does show you exactly how many seconds of work you are requesting for both cpu and gpu that totals up to your days of work cache size. This is a snippet out of mine to show as an example. My work cache settings is 0.5 days of cache and 0.01 days of additional work cache. Your additional days of work should be set very low to make MW@home request work every 91 seconds.


To see what kind of commitment you have among all your attached projects, you can set rr_simulation. I think that will show you are too overcommitted to the WCG cpu task. I believe that will cause issues since even if you have a very small work cache set, just one WCG task in your work cache will swamp any other cpu work. The way to get the MW N-body mt (multi-thread) application to pull some work would be to suspend the WCG task. It may take a while for BOINC to "balance the books" and let you download a mt task. You might have to leave the WCG task suspended for a few days and hope that it doesn't go into High Priority mode once re-enabled. Try to set your work cache to a very small amount. That increases your chance of getting some MW mt tasks.


Thank you for that information. I've just set those flags now.
I have 0.5 minimum and an additional 1 day, so I've also changed this.

MW seems to be hammering away at the GPU, however the CPU is completely idle. And this is Prime grid, which is sharing the GPU along with MW and Seti.

By the cpu being idle, I mean no other project is using it because I have MW exclusively set to be the one that uses it along with the gpu.

I'll wait and see what happens at this point.

This is the machine I'm referencing. I forget if I posted that before. https://milkyway.cs.rpi.edu/milkyway/show_host_detail.php?hostid=803610


Next 20

©2024 Astroinformatics Group