Welcome to MilkyWay@home

Posts by Len LE/GE

81) Message boards : Number crunching : Download more WUs (Message 54045)
Posted 16 Apr 2012 by Len LE/GE
Post:
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2732&nowrap=true#52767
82) Message boards : Number crunching : Constant computational errors for each new download (Message 53870)
Posted 31 Mar 2012 by Len LE/GE
Post:
You are running MW as cpu only.

Your errors are:
"Failed to move file 'separation_checkpoint_tmp' to 'separation_checkpoint' (15100)"
"Failed to move file 'nbody_checkpoint_tmp_2324' to 'nbody_checkpoint' (15100)"

Writing a checkpoint fails with both mw apps.
There is a known problem if you are running your projects folder on an USB drive for example. I think Matt added a workaround that should fix it in the next release.
83) Message boards : Number crunching : Numeerous errors on xps 720 with GTX560Ti video card (Message 53848)
Posted 29 Mar 2012 by Len LE/GE
Post:
You are using the driver: 296.10

See
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2808
and
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2831
84) Message boards : Number crunching : Communication delayed: No GPU WUs for HD5870 (Message 53847)
Posted 29 Mar 2012 by Len LE/GE
Post:
Separation WU's are the same for cpu and gpu, nbody WU's run on cpu only because there is no app for a gpu yet.

If BOINC doesn't see the gpu it's obviously a driver problem.
Switching the gpu (different type/generation) you need to reinstall the driver to get the card properly recognized.
85) Message boards : Number crunching : Problem with catalyst cc since update (Message 53617)
Posted 10 Mar 2012 by Len LE/GE
Post:
After removing the old driver and rebooting you should check the registry if all entries got deleted.
When upgrading to driver 12.1 I had a problem with openCl not installing properly.
Reason was that removing the old driver deleted all old files but left some entries in the registry. After deleting those entries by hand, the new installation worked without problems.
86) Message boards : Number crunching : CPD going down... (Message 53600)
Posted 9 Mar 2012 by Len LE/GE
Post:
It still looks the same as before to me.
Your times are jumping from 145 to 195 -> suspicious
Compared to my times I would expect times of 100 - 110 on your card.
Your WU logs showing this error:
"Setting process priority to 0 (13): Permission denied"
For me that all points to a problem on your machine.
87) Message boards : Number crunching : HIGH CPU USAGE with new 1.02 OpenCL tasks (Message 53536)
Posted 4 Mar 2012 by Len LE/GE
Post:
What a lot or arsing about! :/


To work around those driver problems. Yes, sadly.

So I put those commands (as per your post on the 18th?) in the commandline in the shortcut for BOINC?
Re the wait factor, you didn't say in which direction the numbers affect MW.
Thanks :)


You put it into the app_info.xml for MW.

My own line for the params look like this:
<cmdline>--gpu-target-frequency 60 --gpu-wait-factor 0.20 --gpu-polling-mode 0 --process-priority 1</cmdline>

--gpu-target-frequency 60 is the default, if missing the value from your account profile is used. Higher frequency gives more chances for a screen refresh but leads to higher overhead (more data transfers).

--gpu-polling-mode 0 switch to use sleep/wait time calculated by app before polling for results from gpu

--process-priority 1 is lowering process prio from default 2, I am feeling more comfortable with it at 'below normal'

--gpu-wait-factor 0.20 is far down from default 0.75 but gives me high gpu use on my HD5850 with acceptable system response.

Sorry, you need to test yourself how far down you have to go with the wait factor for your HD4830. It depends too much on the individual system. The lower you go, the more cpu usage is increasing again.
With the default wait factor (0.75) my gpu load was only 70 - 80%, now I am at ~97%. (The next app version is expected to get a better gpu load with the default wait factor.)

If you want my app_info.xml to modify it for your needs, you can send me a PM
88) Message boards : Number crunching : HIGH CPU USAGE with new 1.02 OpenCL tasks (Message 53524)
Posted 3 Mar 2012 by Len LE/GE
Post:
Like Matt said, the problem for high cpu use is the busy-wait from the driver.
Force the app to use a sleep/wait by setting polling mode 0 and than play with the wait factor to get gpu usage back up. Higher frequency gives the gpu more chances to redraw the screen; the wait before polling for answer from gpu reduces the cpu usage. Lowering the priority of the app can help too.
After setting polling mode 0 start with setting the frequency high enough that the gui lag is gone, than use the wait factor to get the gpu use up close to where the gui lag starts again.
89) Message boards : Number crunching : HIGH CPU USAGE with new 1.02 OpenCL tasks (Message 53275)
Posted 18 Feb 2012 by Len LE/GE
Post:
You said you have used the app_info from arkayn's site.
Pretty sure in there is a line with the cmdline tag. That's the place to set params to pass them to the app.

i.e.
<cmdline>--gpu-polling-mode 0</cmdline>
forces the app to use an additional sleep before waiting for the answer from the gpu to prevent high CPU issues

<cmdline>--gpu-polling-mode 0 --process-priority 1</cmdline>
same as above and lowers the priority of the app to 'below normal'

--gpu-wait-factor 0.75
this is the default multiplier used to calculate the sleep time if you are using polling 0
90) Message boards : Number crunching : tasks being sent to wrong gpu card (Message 53274)
Posted 18 Feb 2012 by Len LE/GE
Post:
http://boinc.berkeley.edu/wiki/Client_configuration :

<exclude_gpu>
Don't use the given GPU for the given project. If <device_num> is not specified, exclude all GPUs of the given type. <type> is required if your computer has more than one type of GPU; otherwise it can be omitted. <app> specifies the short name of an application (i.e. the <name> element within the <app> element in client_state.xml). If specified, only tasks for that app are excluded. You may include multiple <exclude_gpu> elements. (New in 6.13 )

<exclude_gpu>
<url>project_URL</url>
[<device_num>N</device_num>]
[<type>nvidia|ati</type>]
[<app>appname</app>]
</exclude_gpu>


and

From the code of mw sep v1.0x there is a command line param
--device [Device number passed by BOINC to use]

No idea if and how that works.
Anyone with 2 gpus willing to test it?
91) Message boards : Number crunching : All work Units giving "Computational Error" (Message 53264)
Posted 18 Feb 2012 by Len LE/GE
Post:
Looking at the list of your WUs with errors.
2* simulation v1.02
1* nbody v0.84 (mt)

All 3 are showing the same error:
Failed to update checkpoint file ('separation_checkpoint_tmp' to 'separation_checkpoint') (2): No such file or directory

You could switch off checkpointing for separation because of the short runtimes on your gpu, but your problem isn't for separation alone.
It 'smells' like the external disk is the problem.
Can you try and move your BOINC dir to an internal drive?
92) Message boards : Number crunching : HIGH CPU USAGE with new 1.02 OpenCL tasks (Message 53254)
Posted 18 Feb 2012 by Len LE/GE
Post:
As default, the app is sending chunks to the gpu and wait for the answer right away. Theoretically this is how it is supposed to be without causing high CPU use.
In this situation some drivers in general, other drivers depending on combination of GPU and OS aren't playing nice and causing this high CPU use.
The app is trying to catch some of those and inserts a sleep time ('initial wait') before calling for the answer from the gpu. This is how the default polling mode of -2 works from my understanding.
If the app thinks you should be fine, it switches to polling mode -1 (asking for gpu answer without initial wait); if it thinks you are a case with 'high CPU issue' it switches to polling mode 0 (using the initial wait before asking for gpu answer).

If you see high CPU use, you are obviously one of those cases the app doesn't catch and uses mode -1 instead of mode 0.
Don't be surprised if you are setting mode 0 by app_info and see your gpu load dropping. That's what I saw on my computer. I needed to set the gpu-wait-factor far down to get the gpu load back up again. Still trying to understand why I had to push it that far down. Seems to be something in the calculation and use of the initial wait time which is not as simple as I thought first.

93) Message boards : Number crunching : All work Units giving "Computational Error" (Message 53250)
Posted 17 Feb 2012 by Len LE/GE
Post:
I think the app_info he used is for separation only; so he needs to add a part for nbody to run that too.
94) Message boards : Number crunching : N-Body and the Bunker (Message 53157)
Posted 14 Feb 2012 by Len LE/GE
Post:
Ok, did check the german thread he was linking to.
Whatever translator was used, it's trashware :)

He is running nbody on a 64bit Linux (OpenSUSE) with BM 7.0.8, system is Athlon II X4 640.
His problem is, most of the time he only gets 1 or 2 WUs. Only once in a while he gets a load of 12 (and not more).

My 2 cents are on the different cache handling in BM 7 too :)

95) Message boards : Number crunching : All work Units giving "Computational Error" (Message 53154)
Posted 14 Feb 2012 by Len LE/GE
Post:
You can get the previous and actual version on arkayn's site. They include an app_info.xml.
Stop boinc, unpack them into your mw directory and start boinc again.
If you need help modifying that app_info, post again with your question.
96) Message boards : Number crunching : CPU Scheduling question!! (Message 53152)
Posted 14 Feb 2012 by Len LE/GE
Post:
My first guess was the scheduler of BM 7 but since you said it happened in 6.10 and 6.12 too, I have no real idea what could cause the same fault in all 3 versions.

I just saw you are running a mix of separation v1.02 nvidia, separation v1.00 cpu and nbody v0.84.
I haven't see anyone reporting that BM got confused by nbody to run all mw cpu apps as multithreaded (That would be a bad BM bug).

Shooting into the dark: Maybe something in the code that could make BM think (under some conditions) mw separation is a multithread app and it needs to free all cores for it? And than it fills all free cores with separation apps since they are singlethreaded?
You should send Matt Arsenault a PM describing your problem as detailed as possible.
97) Message boards : Number crunching : CPU Scheduling question!! (Message 53138)
Posted 13 Feb 2012 by Len LE/GE
Post:
Are you talking about 1 mw over all cores (multithreaded app like mw nbody) or 1 mw per core (makes 6 * mw)?
98) Message boards : News : Separation updated to 1.00 (Message 53123)
Posted 13 Feb 2012 by Len LE/GE
Post:
Ok, it has been a long road switching from mw v0.82 with cat 11.3 to mw v1.02 with cat 12.1. Done several tests on the way.

(The driver updates gave several hickups, needing manual cleanup after uninstall and cleaner program before installing the next version.)

System is WinXP 32bit with HD5850 @775MHz
(last tweaking of app settings only days ago)

v0.82 CAL, cat 11.3: ~84.5s
v0.82 CAL, cat 11.8: ~84.5s (reasonable higher system kernel times, less responsive)
v0.82 CAL, cat 11.9: ~92.5s (system kernel times and response a little better)
v0.82 CAL, cat 12.1: ~84.5s (system kernel times high, system response bad)
v0.82 CAL, cat 12.1: ~85.3s (relaxed polling to make system responsive again)

For mw v0.82 CAL, cat 12.1 is slightly slower than 11.3 but far better than 11.9.

For collatz and moo I have the impression cat 12.1 is a little faster than 11.3 but no numbers to verify it. Cat 11.8 and 11.9 did _not_ impress there too.

v1.02 OpenCL with default params showed high system kernel use, polling 0 dropped gpu use to 90%, polling 1 to ~83%. So I am using polling 0 with a far reduced wait factor for now. Still thinking there must be a wait factor + polling > 0 but couldn't find one without loosing bad on gpu load. Need to play more with the params to get a better understanding how they work together in this new version. The actual setting has to do for now.

v1.02 OpenCL, cat 11.9: ~81.3s (cpu time includes system kernel times here!)
v1.02 OpenCL, cat 12.1: ~80.9s (system kernel times hidden again)

Cat 11.9 showed a high system kernel use, that was 'just' to get under control with setting command line params. With cat 12.1 and the same settings the system kernel times are roughly cut into half, so it gives a little more comfort related to system response. I think I got pretty close (within less than 1s/WU) to the best time possible before system response goes downhill again.
99) Message boards : Number crunching : NBody app_info (Message 53119)
Posted 12 Feb 2012 by Len LE/GE
Post:
See
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2293&nowrap=true#46895
and
http://milkyway.cs.rpi.edu/milkyway/forum_thread.php?id=2301&nowrap=true#47062

Pretty sure you have no problems to update and adapt them for your needs :)
100) Message boards : News : Separation updated to 1.00 (Message 52990)
Posted 10 Feb 2012 by Len LE/GE
Post:
Using Catalyst 11.3 it looks like. Try a newer driver. I am sort of aware of a few various crashes with different Catalyst versions on different GPUs. I think 11.3-4 and 11.10-11.12 are the most problematic versions; 12.1 seems to be working most consistently on everything. Around 11.3 the binary format stuff was introduced (which it says it supports) but it's probably crasher around then.


Upgraded from 11.3 to 11.9 (2-3% slower on mw w/ CAL, system more sluggish w/ other ATI apps too) and still getting
Exit status -1073740777 (0xffffffffc0000417) Unknown error number
when trying v1.00 OpenCL.

Thought 11.9 should be working for XP 32bit with HD5850?


Previous 20 · Next 20

©2024 Astroinformatics Group