Welcome to MilkyWay@home

Posts by Sunny129

21) Message boards : News : Badges for crunchers. (Message 59678)
Posted 26 Aug 2013 by Sunny129
Post:
i could care less whether or not a badge system is implemented...just don't change the look of the website. it follows the format of most other BOINC project websites that everyone knows well IMO...
22) Message boards : Number crunching : Run Multiple WU's on Your GPU (Message 58512)
Posted 5 Jun 2013 by Sunny129
Post:
Im running at 1100 Mhz Currently , takes 1min 20 to do 6 WU ( 3 on each card )



My results are similar to this, running a 7870xt (7950 rebranded by the marketing department). I can run one in 42 seconds, or two in 83 seconds, a speed increase of less than 2%. Adding a third gives no increase at all.

At first, I thought this might be a method that only works on Nvidias, since this has been how it's worked for me on both the old and new clients, with a couple of different mobos and a few different AMD CPUs. (Current CPU is an X6 1045t.) I do leave spare cores.

Does anyone have any suggestions as to why it seems to work amazingly for some people, but barely, or not at all, for others with similar hardware?


while i haven't read through the entire thread in detail, i suspect that some of the folks claiming to get substantial increases in compute efficiency aren't specifically talking about the Milkyway@Home project (despite the fact that we are in the MW@H forums). remember, the app_config.xml file is generic and can be used for a number of different GPU compute-capable projects...so while people should be talking about it here in a "MW@H-only" context, the personal experiences with the app_config.xml file being talked about here may not strictly represent experiences w/ MW@H.

the bottom line is that a single MW@H task tends to max out any GPU's utilization - hence the reason most folks only see negligible increases in compute efficiency (if any at all) when running more than one MW@H task simultaneously. if you had looked at your GPU usage in GPU-Z/MSI Afterburner/etc back when you were only running 1 MW@H task at a time, you would have noticed that your GPU usage was probably already pegged at or near 100%. if a single task runs with little or no apparent additional GPU resources, then it should be obvious that 2 simultaneous tasks will take approx. twice as long to crunch, 3 simultaneous tasks will take approx. three times as long to crunch, and so on and so forth.

that said, the real increase in PPD that people experience when running multiple MW@H tasks simultaneously comes from "eliminating" the handful of seconds of a MW@H task during which the GPU is idle, as others have already mentioned above. you see, while a majority of a MW@H task is run on the GPU, a small fraction of it actually runs on the CPU (which lasts only a handful of seconds, depending on the GPU). the solution is to run 2 MW@H tasks simultaneously, no more, no less. by doing so and staggering their start times (i.e. not letting 2 tasks start at exactly the same time), the GPU will process 2 tasks almost all of the time, and will process at least one task when the other task is being offloaded to the CPU for a few seconds worth of calculations. thus, the GPU is "never" idle. i put the word never in quotes b/c while true in theory, the GPU doesn't split its resources perfectly and allocate them equally among multiple simultaneously running MW@H tasks in reality, and thus the tasks don't always stay staggered. in other words, pairs of simultaneously running MW@H tasks tend to oscillate between a staggered state and a synchronized state...don't worry though - most of the time they'll run staggered b/c it is statistically very unlikely for that pair of tasks to be in sync more often than they're out of sync.

the reason it is pointless to run more than 2 simultaneous MW@H GPU tasks is twofold: 1) it only takes a single task to keep the GPU fully (or almost fully) utilized while the other task is running on the CPU for a short time, and 2) as i mentioned in paragraph two above, MW@H GPU task run times don't decrease by running tasks in parallel (unlike many other DC projects) - the only thing that this does is eliminate GPU idle time.
23) Message boards : News : Separation Runs ps_p_82_1s_dr8_4 and de_p_82_1s_dr8_4 Started (Message 57971)
Posted 22 Apr 2013 by Sunny129
Post:
just started getting new work...way to stay on top of it guys!
24) Message boards : News : Separation Runs ps_p_82_1s_dr8_4 and de_p_82_1s_dr8_4 Started (Message 57966)
Posted 22 Apr 2013 by Sunny129
Post:
same here...
25) Message boards : News : New Separation Runs Started (Message 57595)
Posted 21 Mar 2013 by Sunny129
Post:
I have a lot errors from WU 21_sSgr_1 with 1358941502 in it.

unfortunately that's just the nature of the current Separation run. if you look more closely at your errors, those same tasks have been erroring out on all of your wingmen's hosts as well. what that tells us is that there is either a server-side problem, or that the errors lie in the data itself and are to be expected...but perhaps the most important thing to take away from this that there isn't a problem with your host's hardware or software. the same thing is going on w/ one of my MW@H host machine and all my wingmen's hosts as well, not to mention countless others participating in the project.
26) Message boards : News : New Separation Runs Started (Message 57478)
Posted 11 Mar 2013 by Sunny129
Post:
I have started to get a lot of `de_separation` errors

Error tasks page

All compleated work units fail with Too many errors (may have bug)

most of the errors i'm getting right now are of that variety too...i also have a handful of errors that haven't yet been discarded server-side due to "too many errors (may have bug)." of those tasks, all of them have errored out on at least one wingman's computer and have already been distributed to a 3rd wingman, leading me to believe that these tasks will also soon be discarded server-side due to "too many errors (may have bug)."


Got 1 error WU too.
WU 320835109
"Non-finite result
Failed to calculate likelihood"

this is the error i see in the stderr ouputs of all my errored tasks as well.

strangely, errors seem to come in spurts on my machine. it'll go days, weeks, and sometimes even months without a single error or invalid task, but when they do show up, the show up in droves. i suppose that in the grand scheme of things i'm really not seeing that many errors (about 30 out of ~1500 per day, or ~2%), but that's a significant waste of compute power either way you look at it. no matter what changes i make to the hardware and/or software configs of this machine, it never seems to get rid of the errors, and so i always end up going back to the original hardware and software setups and wait it out...and sure enough, eventually the errors subside and my machine will run error free for some time. i hardly ever feel that the errors i see are due to host-side errors...not lately anyway.
27) Message boards : Number crunching : 5970 validate errors (20% ~30% WU's) (Message 56741)
Posted 5 Jan 2013 by Sunny129
Post:
That would be my guess. I only use the one so I don't have that many. I would pick the one you want to use and adjust that one.

just so we're clear, i don't have several profiles.xml files - its just a single profiles.xml file with multiple "MemoryClockTarget" entries. at any rate, i'll make a backup of the file just in case something goes wrong in changing the values and i forget how to get it back to its original state.
28) Message boards : Number crunching : 5970 validate errors (20% ~30% WU's) (Message 56738)
Posted 4 Jan 2013 by Sunny129
Post:
it turns out i have a number of such entries in my profile:

<Feature name="MemoryClockTarget_PCI_VEN_1002&amp;DEV_679A&amp;SUBSYS_E207174B&amp;REV_00_4&amp;416692D&amp;0&amp;0010A">
<Property name="Want_0" value="15000" />
<Property name="Want_1" value="75000" />
</Feature>

<Feature name="MemoryClockTarget_PCI_VEN_1002&amp;DEV_68E1&amp;SUBSYS_30001043&amp;REV_00_4&amp;38A11392&amp;0&amp;0018A">
<Property name="Want_0" value="45000" />
<Property name="Want_1" value="45000" />
<Property name="Want_2" value="45000" />
</Feature>

<Feature name="MemoryClockTarget_PCI_VEN_1002&amp;DEV_679A&amp;SUBSYS_E207174B&amp;REV_00_4&amp;38A11392&amp;0&amp;0018A">
<Property name="Want_0" value="15000" />
<Property name="Want_1" value="75000" />
</Feature>

<Feature name="MemoryClockTarget_PCI_VEN_1002&amp;DEV_68E1&amp;SUBSYS_30001043&amp;REV_00_4&amp;416692D&amp;0&amp;0010A">
<Property name="Want_0" value="45000" />
<Property name="Want_1" value="45000" />
<Property name="Want_2" value="45000" />
</Feature>

<Feature name="MemoryClockTarget_PCI_VEN_1002&amp;DEV_6719&amp;SUBSYS_31201682&amp;REV_00_4&amp;19D28DA3&amp;0&amp;0018A">
<Property name="Want_0" value="15000" />
<Property name="Want_1" value="125000" />
</Feature>

<Feature name="MemoryClockTarget_PCI_VEN_1002&amp;DEV_6719&amp;SUBSYS_31201682&amp;REV_00_4&amp;416692D&amp;0&amp;0010A">
<Property name="Want_0" value="15000" />
<Property name="Want_1" value="125000" />
<Property name="Want_2" value="125000" />
</Feature>

<Feature name="MemoryClockTarget_PCI_VEN_1002&amp;DEV_6719&amp;SUBSYS_E182174B&amp;REV_00_4&amp;38A11392&amp;0&amp;0018A">
<Property name="Want_0" value="15000" />
<Property name="Want_1" value="130000" />
<Property name="Want_2" value="130000" />
</Feature>


some of the values in some of the entries look familiar. for instance, in the last entry, the 130000 corresponds to default 1300MHz memory clock of my HD 7950. the 15000 i believe corresponds to my attempt to set the memory clock as low as 150MHz, even though it failed to lock in at that clock speed. the 75000 in the top entry corresponds to my HD 7950's current memory clock of 750MHz.

...i'm assuming that i just need to change the first/top "MemoryClockTarget" entry, and that the subsequent entries are just residuals from old GPU presets?
29) Message boards : Number crunching : 5970 validate errors (20% ~30% WU's) (Message 56736)
Posted 4 Jan 2013 by Sunny129
Post:
You can underclock the memory in Catalyst if you go in and adjust the values in your profile.

could you elaborate on this? what profile? where can i find it?

TIA,
Eric
30) Message boards : Number crunching : Validator stopped. (Message 56412)
Posted 6 Dec 2012 by Sunny129
Post:
i think some folks are just scared that their completed but not-yet-validated tasks will reach their deadlines before the validator comes online again. that said, i'm not worried about it, and i'll continue to crunch MW@H.
31) Message boards : News : Apology for recent bad batches of workunits (Message 55960)
Posted 26 Oct 2012 by Sunny129
Post:
looking forward to the fix...my error rate is now up to %5
32) Message boards : News : Apology for recent bad batches of workunits (Message 55926)
Posted 23 Oct 2012 by Sunny129
Post:
my error rate has risen to 4% (up from 3.5%) in the last 24 hours.

i know there's no way to tell if a WU is bad until it has failed for multiple wingmen...but does anyone have an educated guess as to how much longer it'll be until all these "bad" WU's are flushed out of the system?
33) Message boards : News : Apology for recent bad batches of workunits (Message 55917)
Posted 22 Oct 2012 by Sunny129
Post:
my error rate is right around 3.5% right now, up slightly from yesterday and the day before...
34) Message boards : Number crunching : Lots of crunching errors since today (Message 55893)
Posted 20 Oct 2012 by Sunny129
Post:
while i haven't checked all 116 of my errors lol, i did check the first 20, and all the wingmen on every single one of them have errors too.

*EDIT* - also, i'm having trouble calculating my current error rate b/c i'm at that balance point where my number of errors remains constant b/c older errors are being flushed from the server just as fast as new errors show up (it hovers around 115 tasks). i need to know how long errors stay on the server before they're flushed.

thanks,
Eric
35) Message boards : Number crunching : Lots of crunching errors since today (Message 55883)
Posted 20 Oct 2012 by Sunny129
Post:
right, i understand that much...i guess i should have more specifically asked "how do you know which v1.4.xxxx corresponds to which v11.x or v12.x"?


Use the cheat sheet.
http://www.hal6000.com/seti/boinc_ati_gpu_cheat_sheet.htm

do i hear angels singing? seriously, you have no idea how long i've been looking for something like this! even when i open the hardware info page in the Catalyst Control Center, it doesn't specify the "v1.4.xxxx" driver versions that we see on our personal DC project web pages. thank you so much!
36) Message boards : Number crunching : Lots of crunching errors since today (Message 55879)
Posted 20 Oct 2012 by Sunny129
Post:
How do I find out if my GPU driver is outdated? It is an almost new computer with a 580 GPU. Milkyway is the only project being affected.

go to nVidia's official website and go to the drivers link. you'll see that the v266.58 you're running is quite old. i was running v301.42 not too long ago on my dual GTX 560 Ti machine, and that was working just fine for Einstein@Home (can't comment on Milkyway@Home)...but i updated to 306.23 when they released it, and i noticed a slight improvement in efficiency. the newest official release is now v306.97, but i have yet to update my drivers again...
37) Message boards : Number crunching : Lots of crunching errors since today (Message 55874)
Posted 20 Oct 2012 by Sunny129
Post:
right, i understand that much...i guess i should have more specifically asked "how do you know which v1.4.xxxx corresponds to which v11.x or v12.x"?
38) Message boards : News : Apology for recent bad batches of workunits (Message 55865)
Posted 19 Oct 2012 by Sunny129
Post:
apparently i'm one of those people (i'm running Catalyst driver v12.4), but i'm also one of those people who had zero problems and zero errors before this NAN/infinity issue cropped up. so i'm reluctant to change what worked so well for so long. if you guys had updated the binaries or something, then i could understand the need to possibly update the driver version or the BOINC platform. but if the NAN/infinity issue gets fixed, then technically nothing else should be required on my end for things to go back to normal (error-free) on my end, right?
39) Message boards : Number crunching : Lots of crunching errors since today (Message 55864)
Posted 19 Oct 2012 by Sunny129
Post:
In the past, a minority of jobs have errored out due to outdated drivers, BOINC application version, or client code.

Milkos M and Sunny129, it looks as though your GPU drivers are not at the latest versions.

while i'm not running the most current Catalyst drivers (i'm running v12.4), i can't imagine that this driver version would be a problem considering that 1) i've been using v12.4 for months now without any problems, 2) i previously had over 100,000 consecutive valid tasks before this whole fiasco started, and 3) many folks have had more problems w/ recent Catalyst drivers than they have with slightly older versions (i believe its been said that some of the most recent Catalyst drivers are missing the appropriate OpenCL libraries required to make MW@H work on a GPU).

besides, how do you know exactly what driver vsrsion i'm running? all i see on my MW@H web page and in my individual tasks is a v1.4.1720, which is Greek to me.
40) Message boards : Number crunching : Lots of crunching errors since today (Message 55854)
Posted 18 Oct 2012 by Sunny129
Post:
They are crunching thru them, you can stay there and wait it out or jump in and help move thru them. Just don't expect 100% good stuff right now.

yeah, i think i may have spoken too soon, as my error rate is back up again, this time somewhere between 2% and 3%. nevertheless, i'll stick around and help flush the bad tasks out of the system...

Yes, still getting about 5 or more bad ones a day.

i'm up to 34 errors in the last ~24 hours...


Previous 20 · Next 20

©2024 Astroinformatics Group