New Version of Separation Modified Fit (1.32)
log in

Advanced search

Message boards : News : New Version of Separation Modified Fit (1.32)

Previous · 1 · 2 · 3 · Next
Author Message
Profile bcavnaugh
Avatar
Send message
Joined: 14 Feb 14
Posts: 11
Credit: 43,020,818
RAC: 66,307

Message 62329 - Posted: 12 Sep 2014, 19:38:31 UTC - in response to Message 62328.

Thanks for the Update!
Well be standing-by.

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,632,888
RAC: 158,771

Message 62331 - Posted: 12 Sep 2014, 20:47:33 UTC

Looks like the download issue should be resolved and I can start working on some of the other issues again.

Jake W.

Profile bcavnaugh
Avatar
Send message
Joined: 14 Feb 14
Posts: 11
Credit: 43,020,818
RAC: 66,307

Message 62332 - Posted: 12 Sep 2014, 20:51:12 UTC - in response to Message 62331.
Last modified: 12 Sep 2014, 21:48:53 UTC

Yes and No, I got one project of each but cannot get any more.
Took a long time to get new projects after your reset.
827160402 617077081 12 Sep 2014, 20:39:36 UTC 12 Sep 2014, 20:47:01 UTC Completed, validation inconclusive 400.07 52.63 pending Milkyway@Home Separation (Modified Fit) v1.32 (opencl_nvidia)

Also it shows that I have 97 In progress and I have no programs running on this computer ID: 561866.
Looking at ID: 581205 now as this is my other Rig running your projects.

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,632,888
RAC: 158,771

Message 62333 - Posted: 13 Sep 2014, 0:19:21 UTC

It will take a while for the server to push out some more of my work units because there were so many errors. It tried to distribute them evenly among the different runs and since I had so many sent out earlier it just needs to let the others catch up. Should be back to normal running conditions soon.

Jake W

Profile bcavnaugh
Avatar
Send message
Joined: 14 Feb 14
Posts: 11
Credit: 43,020,818
RAC: 66,307

Message 62337 - Posted: 13 Sep 2014, 15:49:15 UTC

ID: 561866 is still not getting any tasks.
827160586 617077177 12 Sep 2014, 20:39:36 UTC 12 Sep 2014, 20:47:01 UTC Completed and validated 200.36 26.07 106.88 MilkyWay@Home v1.02 (opencl_nvidia)

Still shows 97 running tasks
827054583 617018049 12 Sep 2014, 16:44:48 UTC 24 Sep 2014, 16:44:48 UTC In progress --- --- --- MilkyWay@Home v1.02 (opencl_nvidia)

I have reset the project a few times now.
Thanks,
Bill

Michael Bennett
Send message
Joined: 10 Mar 09
Posts: 13
Credit: 4,300,514
RAC: 5,136

Message 62342 - Posted: 14 Sep 2014, 17:24:11 UTC - in response to Message 62283.

Getting error messages, checked my stats and none of them are processing. I will enclose the "error message," the next time it appears. Mike

Larr2000
Send message
Joined: 12 Jul 14
Posts: 1
Credit: 409,907
RAC: 0

Message 62344 - Posted: 15 Sep 2014, 20:00:32 UTC - in response to Message 62283.

Hi Jake,
I have been supporting MilkyWay for a long time and decided to get the T-shirt with my donation. Was disappointed that the shirt did not have a photo of our beautiful Milkyway!!

Second, I just got 750 hours to process BEFORE 23 Sept!! I am on an iMac and do not have that kind of horsepower. Is it OK to abort those that I cannot finish by the deadline?

Larr2000@Gmail.com
____________

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,632,888
RAC: 158,771

Message 62345 - Posted: 15 Sep 2014, 20:06:28 UTC

Hey Larr2000,

Which application is it that is giving you a 750 hour work unit? If it is separation you can abort it. If it is nbody it is probably just not calculating it correctly because the nbody application is notorious for giving incorrect estimated run times. Sorry the shirt didn't have a photo of the Milky Way on it. For future reference the design is posted on the fund raiser page.

Jake W.

swiftmallard
Avatar
Send message
Joined: 18 Jul 09
Posts: 289
Credit: 302,980,648
RAC: 0

Message 62346 - Posted: 15 Sep 2014, 20:13:42 UTC - in response to Message 62344.

Is it OK to abort those that I cannot finish by the deadline?

Aborted WUs simply get sent out to other crunchers sooner than if they go past deadline. If you cannot crunch them, the sooner you abort them, then the sooner they will get completed.

DutchDK
Send message
Joined: 13 Nov 10
Posts: 5
Credit: 18,929,782
RAC: 0

Message 62347 - Posted: 15 Sep 2014, 23:55:46 UTC

129 invalid, due to Workunit error, validation error etc. 23 error as well

http://milkyway.cs.rpi.edu/milkyway/results.php?userid=135022&offset=0&show_names=0&state=5&appid=

Something definitely is amiss with the new version.

alanb1951
Send message
Joined: 16 Mar 10
Posts: 23
Credit: 25,167,874
RAC: 10,837

Message 62348 - Posted: 16 Sep 2014, 2:32:39 UTC - in response to Message 62346.

Is it OK to abort those that I cannot finish by the deadline?

Aborted WUs simply get sent out to other crunchers sooner than if they go past deadline. If you cannot crunch them, the sooner you abort them, then the sooner they will get completed.


This may well be true, but each aborted work unit would appear to count as an "error" as far as validation is concerned, with the result that there can be a task that ends up as "Completed, can't validate" because of (typically) two aborts and one genuine crash. To the best of my knowledge, there's no way to stop BOINC treating user aborts the same as errors...

This is no big deal if a GPU job gets wasted, but it's very frustrating if one sees several hours-worth of cpu time black-holed because a couple of users have aborted MilkyWay jobs and someone else's [Nvidia, usually :-)] gpu job crashes.

However, it's been a fairly rare occurrence to date (for me, that is), so I'm merely pointing out a possible down side...

mikey
Avatar
Send message
Joined: 8 May 09
Posts: 2032
Credit: 180,209,082
RAC: 275,041

Message 62349 - Posted: 16 Sep 2014, 10:42:09 UTC - in response to Message 62348.

Is it OK to abort those that I cannot finish by the deadline?

Aborted WUs simply get sent out to other crunchers sooner than if they go past deadline. If you cannot crunch them, the sooner you abort them, then the sooner they will get completed.


This may well be true, but each aborted work unit would appear to count as an "error" as far as validation is concerned, with the result that there can be a task that ends up as "Completed, can't validate" because of (typically) two aborts and one genuine crash. To the best of my knowledge, there's no way to stop BOINC treating user aborts the same as errors...

This is no big deal if a GPU job gets wasted, but it's very frustrating if one sees several hours-worth of cpu time black-holed because a couple of users have aborted MilkyWay jobs and someone else's [Nvidia, usually :-)] gpu job crashes.

However, it's been a fairly rare occurrence to date (for me, that is), so I'm merely pointing out a possible down side...


Then there is a problem somewhere in the Server Side software, because aborting a unit is supposed to do nothing more then put it back in the cache of available workunits. Now if you abort too many workunits it CAN decrease the total number of workunits you can get in a day, but as soon as you start returning valid units again that will exponentially pop right back up to normal. The key is to abort what you have to and crunch what you can.

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,632,888
RAC: 158,771

Message 62351 - Posted: 16 Sep 2014, 13:01:15 UTC

I talked about the aborting work unit problem about a year ago. It is a problem but I don't work on the server side of the project much so I can't personally fix it. I will ask Travis about it again since he is in charge of most server side things.

Jake W.

SLRE
Send message
Joined: 26 Jan 09
Posts: 12
Credit: 21,904,645
RAC: 22,449

Message 62353 - Posted: 16 Sep 2014, 22:09:51 UTC
Last modified: 16 Sep 2014, 22:11:26 UTC

For info: The following (representative) jobs all errored out over the last couple of days.

ps_modfit_15_3s_130_wrap_const_1_1405680903_7765724_3
de_modfit_15_3s_132_wrap_8_1410552780_140958_1
de_modfit_15_3s_132_wrap_8_1410552780_141748_0
de_modfit_15_3s_132_wrap_8_1410552780_131333_0
de_modfit_15_3s_132_wrap_8_1410552780_133902_0
de_modfit_15_3s_132_wrap_7_1410552780_130778_1
ps_modfit_15_3s_132_wrap_3_1410552780_136533_0
de_modfit_15_3s_132_wrap_8_1410552780_131960_2

This isn't occasional;it's endemic across all modfit jobs on my Windows machine.
Machine is Win7, Nvidia GT640. Currently running MW jobs doubled up, so that may be contributing ...

On the linux box (Mint, 32-bit, Geforce GTX 660) most 1.30 modfit and some 1.32 teststars jobs are erroring out as other folk have reported.
____________

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,632,888
RAC: 158,771

Message 62355 - Posted: 17 Sep 2014, 13:27:54 UTC

SLRE,

Are you using the most recent NVidia drivers for your cards?

Jake W.

Richard Haselgrove
Send message
Joined: 4 Sep 12
Posts: 218
Credit: 448,778
RAC: 0

Message 62356 - Posted: 17 Sep 2014, 15:49:12 UTC - in response to Message 62355.

SLRE,

Are you using the most recent NVidia drivers for your cards?

Jake W.

There are hints at SETI@Home of a possible OpenCL problem with NVidia driver 340.52 - the problems observed so far relate to Compute Capability 1.x cards only, but that could be the tip of the iceberg,

NVidia have reproduced the observed problem and are investigating. https://developer.nvidia.com/nvbugs/cuda/edit/1554016 (accessible to registered developers only)

DutchDK
Send message
Joined: 13 Nov 10
Posts: 5
Credit: 18,929,782
RAC: 0

Message 62358 - Posted: 17 Sep 2014, 18:12:10 UTC

Still seeing errors and unable to validate/validate error in my jobs list.

Can someone with a clue on the new version and its coding, check up on it ?

Profile Mumak
Avatar
Send message
Joined: 8 Apr 13
Posts: 89
Credit: 515,960,170
RAC: 4,002

Message 62359 - Posted: 18 Sep 2014, 6:27:38 UTC

I'm too getting lots of ps_modfit errors recently on AMD HD 7950.
I had to opt-out of the Modfit tasks.

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,632,888
RAC: 158,771

Message 62361 - Posted: 19 Sep 2014, 16:31:13 UTC

DutchDK,

I am checking up on it. I coded all of the new changes. Doesn't seem to be any issues with the code though. Most likely it has to do with some library issues since we switched over to a new, more automated, build system.

Jake W.

Michael Bennett
Send message
Joined: 10 Mar 09
Posts: 13
Credit: 4,300,514
RAC: 5,136

Message 62362 - Posted: 20 Sep 2014, 16:50:52 UTC - in response to Message 62353.

FYI
All my Separation (Modified Fit...) work unites stop after 6-10 seconds with the message "Computation error." Only the 1.02 (opend_nvidia) get processed.
Mike

Previous · 1 · 2 · 3 · Next
Post to thread

Message boards : News : New Version of Separation Modified Fit (1.32)


Main page · Your account · Message boards


Copyright © 2017 AstroInformatics Group