Welcome to MilkyWay@home

Scheduled Maintenance Concluded

Message boards : News : Scheduled Maintenance Concluded
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 13 · Next

AuthorMessage
bluestang

Send message
Joined: 13 Oct 16
Posts: 112
Credit: 1,174,293,644
RAC: 0
Message 65797 - Posted: 14 Nov 2016, 19:39:38 UTC - in response to Message 65786.  

Hey Everyone,

Just released the GPU version. It is a 32-bit application that works on 64 bit machines. Let me know if there are any issues. (They should take about 5x longer than normal work units since you are crunching 5.)

Jake


Working good now. Great job! Seems to be way more efficient as well!!!
ID: 65797 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65798 - Posted: 14 Nov 2016, 19:46:48 UTC

Peter,

I am hopeful that a factor of 5x fewer WU requests will improve server stability, that was the motivation for this upgrade. I also plan to trim the WUs table either late this week or sometime next week during another scheduled maintenance period which should also improve the WU generation further.

Jake
ID: 65798 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rymorea

Send message
Joined: 6 Oct 14
Posts: 46
Credit: 20,017,425
RAC: 0
Message 65799 - Posted: 14 Nov 2016, 19:54:03 UTC

Hi Jake,

I see app cpu priorty now "normal" will it be change to "below normal" at next build ? When I open a video its make cpu spikes.
ID: 65799 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Arivald Ha'gel

Send message
Joined: 30 Apr 14
Posts: 67
Credit: 160,674,488
RAC: 0
Message 65800 - Posted: 14 Nov 2016, 19:55:54 UTC

Application works ok right now.
Except for reverting to 0% (but that's not immediate issue).

So I believe that it should be proper to increase min time to contact MilkyWay@Home server to 5 minutes - or at least definitely much more than the current 30s (or so).
ID: 65800 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wb8ili

Send message
Joined: 18 Jul 10
Posts: 76
Credit: 636,436,076
RAC: 25,309
Message 65801 - Posted: 14 Nov 2016, 20:02:03 UTC

New V1.43 working on Windows XP. Tasks being validated.

One cosmetic issue with progress counter. It goes:

0%->20%->0%->40%->0%->60%->0%->80%->0%->100%

Seems like it should go 0%->100% five times OR 0%->100% once.


Jake - Buried in all of these posts is the issue of LINUX GPU bundled tasks failing validation. Is that on your plate?
ID: 65801 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[VENETO] boboviz

Send message
Joined: 10 Feb 09
Posts: 52
Credit: 16,291,993
RAC: 10
Message 65802 - Posted: 14 Nov 2016, 20:08:37 UTC

On my Windows 10 (and RX 260X):
Cpu usage 0.46
With dedicated cpu core, 266 seconds
Without dedicated cpu core, 330 seconds
ALL Validation inconclusive... :-(
ID: 65802 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Arivald Ha'gel

Send message
Joined: 30 Apr 14
Posts: 67
Credit: 160,674,488
RAC: 0
Message 65803 - Posted: 14 Nov 2016, 20:18:26 UTC

I just love that it takes less than 5 times of the single, non-bundled WU. But credits are x5.
There are slight problems, but I think we will make past them.
My WUs are already validated 50/50 :)
ID: 65803 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Werkstatt

Send message
Joined: 19 Feb 08
Posts: 350
Credit: 141,284,369
RAC: 0
Message 65804 - Posted: 14 Nov 2016, 20:29:44 UTC

Great work, Jake,
works fine on three pc's now.
ID: 65804 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65805 - Posted: 14 Nov 2016, 20:31:37 UTC

Hey Everyone,

I did see the Linux GPU ones were getting a few invalid results, but I thought that was just do to the significant number of errors on the release day. Is that still an issue now that the error stats are reduced? (Can anyone confirm there are invalid results on Linux from WUs assigned today?)

I will work on getting a fix for the cosmetic issues for the next scheduled server maintenance in a week or so. I also need to get a Mac application working and released. I'll make a new news thread with a date to expect the next updates.

Thank you all for your help with debugging and words of encouragement. The MilkyWay@home community is the best.

Jake
ID: 65805 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Rich

Send message
Joined: 14 Nov 14
Posts: 9
Credit: 214,644,261
RAC: 0
Message 65806 - Posted: 14 Nov 2016, 20:34:16 UTC

Hello Jake,

So far my 750Ti's are doing okay, thanks for all your hard work. Just got home so I'll fire up my 7970 GPUs and see how they do.

Rich
ID: 65806 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wb8ili

Send message
Joined: 18 Jul 10
Posts: 76
Credit: 636,436,076
RAC: 25,309
Message 65807 - Posted: 14 Nov 2016, 20:35:32 UTC - in response to Message 65805.  

Jake -

I just ran another task on a LINUX machine to make sure I wasn't writing about an old issue. Received a Validate error.

Here is the link.



http://milkyway.cs.rpi.edu/milkyway/result.php?resultid=1887487375
ID: 65807 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Peciak

Send message
Joined: 27 Jun 09
Posts: 12
Credit: 148,038,330
RAC: 0
Message 65808 - Posted: 14 Nov 2016, 20:47:20 UTC - in response to Message 65798.  
Last modified: 14 Nov 2016, 20:48:13 UTC


I am hopeful that a factor of 5x fewer WU requests will improve server stability, that was the motivation for this upgrade. I also plan to trim the WUs table either late this week or sometime next week during another scheduled maintenance period which should also improve the WU generation further.

Jake

Task on are too short.
Wu validate correctly.
4xWU ->130 sek
http://milkyway.cs.rpi.edu/milkyway/results.php?hostid=618442&offset=0&show_names=0&state=4&appid=
ID: 65808 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist

Send message
Joined: 25 Feb 13
Posts: 580
Credit: 94,200,158
RAC: 0
Message 65809 - Posted: 14 Nov 2016, 20:58:48 UTC

Hi wb8ili,

Thank you so much for giving me a result to look at. Looks like these should be listed as a computation error.

I will work on rereleasing that application later tonight once I get a fix. Thanks for the report.

Jake
ID: 65809 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
wb8ili

Send message
Joined: 18 Jul 10
Posts: 76
Credit: 636,436,076
RAC: 25,309
Message 65810 - Posted: 14 Nov 2016, 21:12:38 UTC - in response to Message 65809.  

Hi Jake -

I hope that means you aren't just going to make it a "computational error" but actually fix whatever is causing "the problem".

By the way, did you now realize that making a major program update on a Friday afternoon is never a good idea? As a retired engineer involved in computer systems, when I saw that, I was thinking this is not going a good weekend for Jake.

On the other hand, if you get paid overtime and need the money, Friday afternoon is a good time!
ID: 65810 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Arivald Ha'gel

Send message
Joined: 30 Apr 14
Posts: 67
Credit: 160,674,488
RAC: 0
Message 65811 - Posted: 14 Nov 2016, 23:03:49 UTC
Last modified: 14 Nov 2016, 23:06:10 UTC

Jake,

Would it be possible to create a subproject for bundles of 25/50/100 ?

Right now we have subprojects:
MilkyWay@Home
MilkyWay@Home N-Body Simulation

And MilkyWay@Home is clearly CPU & GPU.
Wouldn't it be better to multiply it a little:
MilkyWay@Home CPU (single WU)
MilkyWay@Home GPU (Bundle of 5/(20?)WU - for lower end GPUs)
MilkyWay@Home GPU (Bundle of 50/(100?)WU - for high end GPUs)

Although we might have a problem with larger amount of some Hosts spamming computational errors, and overall "unable to validate". This was seen some time ago when some Hosts were "rejecting" several thousands WU per hour.

Bundle of 5 still takes only 2minutes (when computing 4 at the same time, so essentially 30s for 5 old WUs). Thus I think that a subproject that will bundle more in a WU would still be a valid idea.
Especially since bundle of 5 takes 120-130s on my PC, while single WU took ~26-30s (when computing 4 at a time). There is some improvement, thus bundle of 20/50/100 would potentially increase our throughput even further.

Also as I have mentioned increasing "min time to contact" from 30s to 5min would also decrease load on server.
ID: 65811 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jesse Viviano

Send message
Joined: 4 Feb 11
Posts: 86
Credit: 60,913,150
RAC: 0
Message 65812 - Posted: 14 Nov 2016, 23:12:01 UTC - in response to Message 65785.  

Actually, 32-bit applications waste plenty of time in the memory system that 64-bit applications can often avoid because AMD doubled the number of registers in the register files from 8 to 16 while creating AMD64. Some programs will be able to fit all of their speed-critical data into the registers like AQUA@home used to do when it was active with its 64-bit application while the 32-bit version kept having to shuffle data into and out of the memory causing it to be significantly slower. However, the speedup a GPU provides overcomes the problems that 32-bit x86 provides like memory system overhead and conversion of 32-bit calls to OpenCL to 64-bit calls to OpenCL, but every little bit of speed helps.
ID: 65812 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
w1hue

Send message
Joined: 13 Feb 09
Posts: 51
Credit: 72,690,386
RAC: 1,958
Message 65813 - Posted: 14 Nov 2016, 23:17:36 UTC
Last modified: 14 Nov 2016, 23:24:53 UTC

OK, for what it's worth, I downloaded 1.43 Nvidia apps for WinXP and Win10 awhile ago. Results so far:

WinXP
Validated - 2
Validation Inconclusive - 4

Win10
Validated - 0
Validation Inconclusive - 2

Several still awaiting validation for both systems.
Run times appear reasonable, but CPU times for the XP sysem are running around 25%. Looks better for the Win10 64 bit system, but not enough have run to get good numbers.
ID: 65813 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 65814 - Posted: 14 Nov 2016, 23:22:06 UTC - in response to Message 65811.  

Jake,

Would it be possible to create a subproject for bundles of 25/50/100 ?

Right now we have subprojects:
MilkyWay@Home
MilkyWay@Home N-Body Simulation

And MilkyWay@Home is clearly CPU & GPU.
Wouldn't it be better to multiply it a little:
MilkyWay@Home CPU (single WU)
MilkyWay@Home GPU (Bundle of 5/(20?)WU - for lower end GPUs)
MilkyWay@Home GPU (Bundle of 50/(100?)WU - for high end GPUs)

Although we might have a problem with larger amount of some Hosts spamming computational errors, and overall "unable to validate". This was seen some time ago when some Hosts were "rejecting" several thousands WU per hour.

Bundle of 5 still takes only 2minutes (when computing 4 at the same time, so essentially 30s for 5 old WUs). Thus I think that a subproject that will bundle more in a WU would still be a valid idea.
Especially since bundle of 5 takes 120-130s on my PC, while single WU took ~26-30s (when computing 4 at a time). There is some improvement, thus bundle of 20/50/100 would potentially increase our throughput even further.

Also as I have mentioned increasing "min time to contact" from 30s to 5min would also decrease load on server.


Other projects (eg SETI, Einstein) have much larger work units sent to GPUs, each WU lasts for 15 minutes to an hour on my Radeon R9 290. Is there a reason MW ones are much smaller and have to be bundled?
ID: 65814 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Arivald Ha'gel

Send message
Joined: 30 Apr 14
Posts: 67
Credit: 160,674,488
RAC: 0
Message 65819 - Posted: 15 Nov 2016, 8:38:31 UTC - in response to Message 65813.  
Last modified: 15 Nov 2016, 8:57:27 UTC

OK, for what it's worth, I downloaded 1.43 Nvidia apps for WinXP and Win10 awhile ago. Results so far:

WinXP
Validated - 2
Validation Inconclusive - 4

Win10
Validated - 0
Validation Inconclusive - 2

Several still awaiting validation for both systems.
Run times appear reasonable, but CPU times for the XP sysem are running around 25%. Looks better for the Win10 64 bit system, but not enough have run to get good numbers.


For 12h of work I got 3 "unable to validate", about 1200 "validated", and a little above 200 "validation inconclusive".

As for CPU, since the fast Modification Fit, app took:
1 core for first few (3-4s) seconds, then only GPU, but needed 1 CPU core at the and for 5-6s. This essentially creates a situation when similar CPU is needed for a WU as GPU (on my system at least) :) That's why I run few at the same time, for GPU not to rest at all. I run 4, and at first I try to start only 2. After few seconds, I start next 2. That way their CPU/GPU cycle will not be identical - thus GPU will be saturated constantly.

Other projects (eg SETI, Einstein) have much larger work units sent to GPUs, each WU lasts for 15 minutes to an hour on my Radeon R9 290. Is there a reason MW ones are much smaller and have to be bundled?


It just a methodology. It isn't necessarily that bigger is better. I thought that one of those bundled WUs, but right now I can't find that info.
For example ClimatePrediction@Home have WUs that take 10 or even more DAYS. That doesn't mean they're great, they do have checkpointing and they upload their data periodically, but it's still quite a lot of time.
Similar is for some subprojects in PrimeGrid. However in there a single error causes 2-3 days worth of GPU processing going to waste, since it's not possible to upload partial result. Thus there need to be some reason in WU size :)

Although we might have a problem with larger amount of some Hosts spamming computational errors, and overall "unable to validate". This was seen some time ago when some Hosts were "rejecting" several thousands WU per hour.


As for this problem that I pointed out with bigger WUs. There is a solution - to send bundles only to "proven" hosts - "proven" means more than 1000 "Consecutive valid tasks". This is already tracked in "Hosts" -> "Details" -> "Application details". Ofc. 1000 can be changed to any reasonable amount.

This would also increase my relative queue size - previously I have had 80 tasks, each taking 30/4 = 7.5s (since I process 4 tasks at the same time). This totaled to 600s, 10 minutes.
Right now my queue is: 80 tasks, each taking 2min01sec. Thus my queue gives me 2420s = 40min20s. That's a lot better. However in the event of Server problems, that only gives me 40minutes of work. Bigger bundles will allow us to be prepared for any Server connection problems - either due to local, or to remote problems.
ID: 65819 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Dirk Sadowski

Send message
Joined: 30 Apr 09
Posts: 101
Credit: 29,874,293
RAC: 0
Message 65820 - Posted: 15 Nov 2016, 8:55:44 UTC - in response to Message 65780.  

Jake Weiss wrote:
Hey Everyone,

I have been able to get a working 32 bit GPU application, but for some reason, the 64 bit has been giving me an impossible amount of trouble. For the time being I am planning to release the 32 bit application to run in both the 32 and 64 bit plan class so everyone can get crunching again and we can get some science done while I continue to work on the 64 bit application. This application will run on 64 bit platforms just fine and since we do not use a large amount of memory, there really isn't even a need for the application to be 64 bit.

Is there any objection to this plan? I'll give it an hour or so before I release to listen for opinions.

Thank you all for being great.

Jake


Thank you. :-)

I would love to let run a x64 app on my x64 OS/hardware... :-)


But I don't know if the last x64 app was a x64 app..., because:

MilkyWay@Home v1.39 (opencl_ati_101)
<stderr_txt>
<search_application> milkyway_separation 1.39 Windows x86 double OpenCL </search_application>

MilkyWay@Home v1.43 (opencl_ati_101)
<stderr_txt>
<search_application> milkyway_separation 1.43 Windows x86 double OpenCL </search_application>



With the 1.43 app (Win8.1 x64):

AMD R9 Fury X:
MSI Afterburner:
3 WUs/GPU: Memory Usage: 143 MB

NV GT 730:
GPU-Z:
2 WUs/GPU:
Memory Usage (Dedicated): 110 MB
Memory Usage (Dynamic): 73 MB



In past the project delay was 60 seconds.
Then it was changed to/now it's:
Project requested delay of 91 seconds
[sched_op] Deferring communication for 00:01:31


If it would be needed to change this settings, please think to the very fast PCs which are around (like mine with 4* R9 Fury X VGA cards ;-) - that they could be fed/saturated 24/7... :-)
ID: 65820 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 13 · Next

Message boards : News : Scheduled Maintenance Concluded

©2024 Astroinformatics Group