Welcome to MilkyWay@home

All ps_separation tasks are getting computation errors

Message boards : Number crunching : All ps_separation tasks are getting computation errors
Message board moderation

To post messages, you must log in.

AuthorMessage
Michael Purcell

Send message
Joined: 19 Mar 11
Posts: 9
Credit: 29,873,951
RAC: 0
Message 60716 - Posted: 11 Jan 2014, 15:45:19 UTC

I have been running MW just fine for years on Windows 7 x64 (AMD HD 5870, latest drivers). With the latest BOINC update (7.2.33) I also installed VirtualBox. Things were good. I started running Test4Theory, which uses VB. Test4Theory and the ps_separation immediately started getting computation errors (modfit tasks continued to run successfully). I learned that the current VB has problems so I removed it and installed v4.2.16 which was supposed to be good. The errors continued. I set Test4Theory to NoNewTasks and uninstalled VB. I also went through my directories deleting residual .VirtualBox references. ps_separation work units are still failing with computation errors. The modfit tasks are fine.

(I also run Cosmology@Home. It was not affected.)

Any suggestions?

Thanks.

ID: 60716 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
captainjack

Send message
Joined: 22 Jun 13
Posts: 44
Credit: 64,258,609
RAC: 0
Message 60722 - Posted: 11 Jan 2014, 19:49:42 UTC

Michael,

I'm running Windows 7 x64, VirtualBox 4.3.6, and AMD 6950 GPGPU with Catalyst 13.1 drivers. First I'm curious why Test4Theory didn't work, it has been running fine for me. How did you install BOINC? I've read that sometimes installing BOINC as a service can cause problems.

How did you install Virtualbox? Did you use the BOINC/virtualbox installer? Or did you get Virtualbox from the Virtualbox download site?

I've also read that BOINC 7.2.33 is problematic. IIRC, a post on another project said that a person had problems adding a new project under 7.2.33. Some posts on other boards recommend installing a different version of BOINC, either the newer 7.2.37 or an older version that is known to be good.

Earlier I was running BOINC 7.2.33 with Test4Theory, WCG and Milkyway Separation (Modified Fit) jobs. It was all working fine. I just upgraded to BOINC 7.2.37 and added Milkyway@home (ps_separation) jobs to the mix and it seems to be working okay.

Let us know if you need more help. And if you want to get Test4theory working, you should post a message on that board. It is sometimes tricky to get set up, but will work once set up properly.
ID: 60722 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael Purcell

Send message
Joined: 19 Mar 11
Posts: 9
Credit: 29,873,951
RAC: 0
Message 60723 - Posted: 11 Jan 2014, 20:50:16 UTC - in response to Message 60722.  

I originally installed BOINC 7.2.33 and VB 4.2.16 x64 together from https://boinc.berkeley.edu/download.php (v7.2.37 is not yet posted.) I did nothing special during the installation, and did not try to make it a service.

I just now reinstalled them and let MW get new tasks. The ps-separation problem is recurring.

What I will try next is to remove Test4Theory, uninstall VB, then reinstall BOINC using the installer that does not include VB to see if I can get a clean setup that works for MW.

Do I need to do a full uninstall of BOINC first as well, or can I use the "repair" option?

Thanks.
ID: 60723 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
captainjack

Send message
Joined: 22 Jun 13
Posts: 44
Credit: 64,258,609
RAC: 0
Message 60724 - Posted: 11 Jan 2014, 21:30:21 UTC

If you want to try 7.2.37, here's a link where you can get it.

http://boinc.berkeley.edu/download_all.php

I would be tempted to uninstall everything then start with a fresh copy of BOINC.

Let us know how it works out.
ID: 60724 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 60725 - Posted: 12 Jan 2014, 0:43:16 UTC - in response to Message 60724.  

If you want to try 7.2.37, here's a link where you can get it.

I'd advise against using that one. It contains experimental code, which, on reflection, has now been withdrawn as unworkable. There is nothing in it, over and above the released v7.2.33, which could possibly have a bearing on computation errors.
ID: 60725 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael Purcell

Send message
Joined: 19 Mar 11
Posts: 9
Credit: 29,873,951
RAC: 0
Message 60731 - Posted: 13 Jan 2014, 15:18:37 UTC - in response to Message 60725.  

Thanks for the info. v7.2.37 would not do transfers or downloads. I removed it and reinstalled v7.2.33.

Prior to the above I had tried to thoroughly remove all traces of BOINC as described at http://tinyurl.com/mx3bpad . I had also removed all my BOINC projects prior to uninstalling (after letting them complete their in-progress work units).

The computation errors on the ps_separation GPU work units are still occurring. The ps_modfit GPU workunits complete just fine. The BOINC log only shows that the ps_separation work units finished. The log does not contain any error codes or diagnostics.

Any suggestions what I can look into to get more info on the error?

I truly regret ever touching VirtualBox or Test4Theory.

Thanks.
ID: 60731 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 60732 - Posted: 13 Jan 2014, 17:09:17 UTC - in response to Message 60731.  

There's nothing wrong with v7.2.37 for general use - just make sure you never drop "Use at most ... % CPU time" below 100. I'm running it on 4 machines - but the normal version, not VirtualBox. Uploads and downloads are fine.

I don't know why you're getting

Access Violation (0xc0000005) at address 0x000007FEDB2A5E00 read attempt to address 0x00000010

but I'd be 99.999% sure it has nothing to do with the version of BOINC you use.
ID: 60732 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael Purcell

Send message
Joined: 19 Mar 11
Posts: 9
Credit: 29,873,951
RAC: 0
Message 60733 - Posted: 13 Jan 2014, 17:36:43 UTC - in response to Message 60732.  

I agree. This is not the fault of BOINC. VirtualBox (or Test4Theory) corrupted something on my system. My hope had been that a clean uninstall of everything and a reinstall of BOINC and MilkyWay would fix the problem. No such luck.

I can probably avoid the error by removing the use of the GPU. The work units will take 10x longer but at least the ps_separation work units will run. The other choice is to leave things as they are with the ps_separation computation errors and ps_modfit success. Do you have a preference?
ID: 60733 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 4 Sep 12
Posts: 219
Credit: 456,474
RAC: 0
Message 60734 - Posted: 13 Jan 2014, 17:48:19 UTC - in response to Message 60733.  

Do you have a preference?

Nothing to do with me. I'm just a volunteer, like you - pretty much an ex-volunteer, since they discontinued the multi-threading tests I was interested in.

Returning errors is no use to anybody.

My preference, if I had any say in the matter, would be a proper debugging session with the developers to find the cause of all these errors people keep reporting. But neither the project staff nor the volunteers seem to understand that concept.

Alternatively, you could simply de-select the ps_separation work units in your project preferences, so that your GPUs worked productively on ps_modfit tasks, and let your CPUs crunch for another project where they're needed and appropriate.
ID: 60734 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael Purcell

Send message
Joined: 19 Mar 11
Posts: 9
Credit: 29,873,951
RAC: 0
Message 60735 - Posted: 13 Jan 2014, 18:04:20 UTC - in response to Message 60734.  

Good point. I have updated the MW project preferences.

Something odd now. The Transfers queue has a number of downloads pending (with "project backoff"). Retries fail. They seem to be for MW exe's, dll's, and other data (stars-16, stars-17, etc.). Perhaps I should remove the MW project, wait a day, and then add MW back in.
ID: 60735 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
captainjack

Send message
Joined: 22 Jun 13
Posts: 44
Credit: 64,258,609
RAC: 0
Message 60736 - Posted: 13 Jan 2014, 20:00:53 UTC

Michael, did you by chance update your gpu drivers at the same time you installed BOINC 7.2.33 and virtualbox?
ID: 60736 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael Purcell

Send message
Joined: 19 Mar 11
Posts: 9
Credit: 29,873,951
RAC: 0
Message 60737 - Posted: 13 Jan 2014, 20:45:52 UTC - in response to Message 60736.  

No. I updated the gpu drivers about a month before starting Test4Theory. VirtualBox was already installed but was unused by MW or Cosmology@Home. I will go ahead and refresh/update my drivers to see if that improves anything.
ID: 60737 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3321
Credit: 520,546,989
RAC: 28,039
Message 60738 - Posted: 13 Jan 2014, 23:51:53 UTC - in response to Message 60737.  

No. I updated the gpu drivers about a month before starting Test4Theory. VirtualBox was already installed but was unused by MW or Cosmology@Home. I will go ahead and refresh/update my drivers to see if that improves anything.


Be careful if you are not a gamer, gpu makers have gamers as their primary customers, we crunchers are secondary. That means the drivers favor gaming, not crunching, the newest Beta drivers are not always the best for crunching. In fact the last two Nvidia drivers slowed crunching down about 10+% at most projects.
ID: 60738 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael Purcell

Send message
Joined: 19 Mar 11
Posts: 9
Credit: 29,873,951
RAC: 0
Message 60739 - Posted: 14 Jan 2014, 20:17:15 UTC - in response to Message 60738.  

I updated my MW profile to select these applications (and avoid the work units getting computation errors).

MilkyWay@Home: yes
MilkyWay@Home N-Body Simulation: yes
Milkyway@Home Separation: no
Milkyway@Home Separation (Modified Fit): yes

I am receiving ps_separation, de_modfit and ps_modfit work units. How do I stop the ps_separation work units from getting sent to me?
ID: 60739 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
captainjack

Send message
Joined: 22 Jun 13
Posts: 44
Credit: 64,258,609
RAC: 0
Message 60741 - Posted: 15 Jan 2014, 0:59:08 UTC

Set the "MilkyWay@Home" tasks to "No".
Leave the "MilkyWay@Home Separation (Modified Fit)" tasks set to "Yes".
ID: 60741 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Michael Purcell

Send message
Joined: 19 Mar 11
Posts: 9
Credit: 29,873,951
RAC: 0
Message 60742 - Posted: 15 Jan 2014, 5:23:13 UTC - in response to Message 60741.  

Thanks!
ID: 60742 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[TA]Assimilator1
Avatar

Send message
Joined: 22 Jan 11
Posts: 375
Credit: 64,657,871
RAC: 0
Message 60846 - Posted: 28 Jan 2014, 18:56:54 UTC - in response to Message 60742.  
Last modified: 28 Jan 2014, 19:00:52 UTC

I guess you missed this thread ;) http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=3400

Your getting the same problem as us, seems to affect HD 58xx & 69xx cards :(.
48xx cards don't seem to be affected, certainly my 4870 wasn't & my sons 4830 isn't.

Captain Jack (harkness? ;) )
I've just noticed that your running a HD 6950 & the MW v1,02 app without problems!
How did you manage that? lol
Team AnandTech - SETI@H, DPAD, F@H, MW@H, A@H, LHC, POGS, R@H, Einstein@H, DHEP, WCG

Main rig - Ryzen 5 3600, MSI B450 G.Pro C. AC, RTX 3060Ti 8GB, 32GB DDR4 3200, Win 10 64bit
2nd rig - i7 4930k @4.1 GHz, HD 7870 XT 3GB(DS), 16GB DDR3 1866, Win7
ID: 60846 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Message boards : Number crunching : All ps_separation tasks are getting computation errors

©2024 Astroinformatics Group