Updated Server Daemons and Libraries
log in

Advanced search

Message boards : News : Updated Server Daemons and Libraries

Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next
Author Message
Cphipps
Send message
Joined: 5 Aug 14
Posts: 3
Credit: 7,321,238
RAC: 7,645

Message 65311 - Posted: 27 Sep 2016, 14:50:38 UTC - in response to Message 65308.

Thanks for getting back to me. It seems like I will have to update my graphics card.

Profile Wisesooth
Send message
Joined: 2 Oct 14
Posts: 33
Credit: 19,807,917
RAC: 29,268

Message 65317 - Posted: 27 Sep 2016, 16:47:11 UTC

My preferences exclude GPU work. My machines use Intel 6-gen processors with GPU on die. However, I am still getting computation errors on 8-thread tasks, but a lot less than before now. Other crunchers that run these tasks also abort with computation errors, so the issue is repeatable with up to three additional and different users. I am getting no computation errors from another BOINC project unrelated to milkyway.

This leads me to believe that the cause is data driven, not a code error, and in need of defensive programming to avoid the problem. If one or more subroutines (or threads) are sequentially bound, and the prerequisite thread does not complete before another thread needs to use its results, the consequences are predictable (all bad). The most common computational error is a zero-divide error. If the prerequisite thread is supposed to return an address location, it could produce a computation error if it references a location that is either protected or out of bounds.

I have not coded anything in years, but remember the pain. Hope this helps you to look in the right places.
____________

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,894,655
RAC: 175,326

Message 65319 - Posted: 27 Sep 2016, 18:05:08 UTC

Wisesooth,

The erroring work units you see are from the N-body application and not Separation. N-body is still very much a beta application and will likely not work all of the time. Sidd knows about the particular problem you are seeing which is a disk usage error (we are going over the commonly allotted 50MB of disk space we are allowed to use). Hopefully Sidd will figure out why and how to fix it soon.

Jake

Profile Joseph
Send message
Joined: 6 Apr 12
Posts: 17
Credit: 188,957
RAC: 24

Message 65338 - Posted: 28 Sep 2016, 15:26:39 UTC

i have an ATI 4800 ... actually 2 of them.
I downgraded from the windows 10 version driver back to a 2012 driver that it has open cl support.
Before I downgraded the driver version, I was receiving error messages about missing co processor support.
Now all I get is 0 work fetched.
Its disappointing I cannot use these cards that they have a lot of grunt.

Un4given
Send message
Joined: 14 Feb 09
Posts: 19
Credit: 38,954,868
RAC: 39,700

Message 65343 - Posted: 28 Sep 2016, 18:48:44 UTC - in response to Message 65338.

I also have AMD cards (39xx/38xx) cards on W10, but still running the drivers I've been using all along. No weird errors, but definitely seeing the no new tasks available.

WMD
Send message
Joined: 15 Jun 13
Posts: 4
Credit: 147,874,449
RAC: 274,639

Message 65346 - Posted: 29 Sep 2016, 0:25:22 UTC

I'm not getting any work units either, on the Mac version. Here is a log of one attempt:

Wed Sep 28 20:17:27 2016 | Milkyway@Home | Sending scheduler request: To fetch work.
Wed Sep 28 20:17:27 2016 | Milkyway@Home | Requesting new tasks for AMD/ATI GPU
Wed Sep 28 20:17:29 2016 | Milkyway@Home | Scheduler request completed: got 0 new tasks
Wed Sep 28 20:17:29 2016 | Milkyway@Home | No tasks sent
Wed Sep 28 20:17:29 2016 | Milkyway@Home | No tasks are available for Milkyway@Home Separation (Modified Fit)
Wed Sep 28 20:17:29 2016 | Milkyway@Home | Message from server: Your app_info.xml file doesn't have a usable version of MilkyWay@Home N-Body Simulation.
Wed Sep 28 20:17:29 2016 | Milkyway@Home | Tasks for CPU are available, but your preferences are set to not accept them
Wed Sep 28 20:17:29 2016 | Milkyway@Home | Tasks for NVIDIA GPU are available, but your preferences are set to not accept them

That error from N-Body is from my custom app_info.xml; I'm simply too lazy to fix it. ;) Otherwise I'm using the regular 1.37 binary for Mac ATI opencl. I figured maybe this was the cause, so I temporarily renamed the app_info.xml and started up without it, but this had no effect (other than removing my self-inflicted n-body error, of course).

Profile Wisesooth
Send message
Joined: 2 Oct 14
Posts: 33
Credit: 19,807,917
RAC: 29,268

Message 65350 - Posted: 29 Sep 2016, 17:31:53 UTC - in response to Message 65319.

I Increased preferences about HD use from 40 to 80 and updated BOINC manager before I returned here to post this message.
BOINC Mgr reports:
Used by BOINC 15.91 MB Used by milkyway 9.9 MB Used by other programs 27.74 GB Free but not available to BOINC is 823.22 GB
This machine has 16 GB RAM and has a 1 TB Seagate hard drive. Uses Win 10 64-bit OS with Intel i7 processor. The only BOINC task running on this machine is milkyway at home. My profile currently shows over 50 tasks with computational errors. Other users are reporting the same errors on these tasks.

My other machines are running single CPU tasks with no errors on non-milkyway projects. Hope this helps you. BTW, have you considered the possibility that BOINC, not milkyway, is causing this problem? After all, the only project I know about on BOINC using all available threads on a single task is milkyway.
____________

Arivald Ha'gel
Send message
Joined: 30 Apr 14
Posts: 67
Credit: 160,074,149
RAC: 0

Message 65351 - Posted: 29 Sep 2016, 19:22:16 UTC - in response to Message 65304.

My PC (Radeon 280X) also isn't getting workunits. Most often it's getting 0 tasks when requesting new tasks.

Started having problems 2016-09-12, 2016-09-14 - 2016-09-20, and once again since 2016-09-23. PC have capacity of ~350k credits, and had days when it was < 100k (even below 20k).
http://boincstats.com/en/stats/61/user/detail/1021475/lastDays


Problem still persists. I can even say it's worse. 2k credits for 2016-09-29. So like 340k too low...

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,894,655
RAC: 175,326

Message 65356 - Posted: 30 Sep 2016, 15:15:55 UTC
Last modified: 30 Sep 2016, 15:16:36 UTC

Arivald and others,

I am going to keep tweaking things on my end to try to increase work unit availability. Might take a few more days to get the settings right. Thank you for your patience.

Jake

Ryan Tennill
Send message
Joined: 22 Mar 09
Posts: 6
Credit: 178,892,867
RAC: 350,559

Message 65357 - Posted: 30 Sep 2016, 15:23:21 UTC

Much appreciated. No new work since the 25th and even then I had perhaps a dozen tasks over the course of a week compared to ~ 140k RAC previously.

Will Guerin
Send message
Joined: 13 Jun 09
Posts: 3
Credit: 365,525
RAC: 0

Message 65360 - Posted: 30 Sep 2016, 17:19:05 UTC
Last modified: 30 Sep 2016, 17:22:28 UTC

Hello,

I am getting new jobs, but they're all ending in 'Computation Error' after 2-4 seconds. Wasn't having this issue prior, all other projects run fine. No system or hardware changes have been made on my end.

https://s15.postimg.org/mptam7ru3/Screenshot_2016_09_30_10_10_06.png

captainjack
Send message
Joined: 22 Jun 13
Posts: 40
Credit: 35,269,088
RAC: 0

Message 65361 - Posted: 30 Sep 2016, 17:45:39 UTC

Will Guerin,

The error message in your task log says that your GPU's do not support double precision which is required for Milkyway processing.

Sebastian*
Send message
Joined: 8 Apr 09
Posts: 64
Credit: 4,836,176,585
RAC: 3,623,699

Message 65362 - Posted: 30 Sep 2016, 22:59:14 UTC

captainjack is right, your GPUs don't have double precision, which is needed for Milkyway@home Will Guerin.

My 390X seems to work now, but only when running without app_config file. It is running one WU at a time for now.

After a quick sear on the web, it looks like the latest Microsoft Windows 10 Update really messed up the GPU drivers. Milkyway causes errors since the Update, and even Einstein@home (2 WUs at once). But for some reason my 280X cards seem unaffected.

Can anyone confirm this?

I guess GPUs like 280X and below still work well, while 290X and above cause problems.

Windows 10, latest update, and using an app_config to run multiple WUs at once.
These circumstances have to apply to cause WUs hanging when one WU has finished at 100% and stays there.

Sebastian*
Send message
Joined: 8 Apr 09
Posts: 64
Credit: 4,836,176,585
RAC: 3,623,699

Message 65363 - Posted: 1 Oct 2016, 9:16:54 UTC
Last modified: 1 Oct 2016, 9:42:04 UTC

After updating my HD5970 computer to Windows 10 1607 (Cumulative Update for Windows 10 Version 1607 for x64-based Systems (KB3194496)), i get some strange behaviors there too. I run 4 WUs on one GPU core. So 8 WUs at once on the dual GPU card
After nearly a minute of heavy CPU-Work (3 Cores almost running at 100% of 6 Cores) the GPU gets into action.

It was running well before the update to 1607. Does anyone have information about what Microsoft changed to the drivers?

The update also affects the R9 390X, and i guess other GPUs as well.

Please post anyone, if he has trouble with it too.

WMD
Send message
Joined: 15 Jun 13
Posts: 4
Credit: 147,874,449
RAC: 274,639

Message 65370 - Posted: 2 Oct 2016, 22:37:16 UTC

To follow up on my previous post...I have fixed my problem, sort of. It turns out that my project prefs had everything disabled except for Separation (Modified Fit). I honestly don't remember why I did that...so, I turned the basic app back on (leaving n-body off), and I began to receive some work units. At first, it seemed like I could only make that happen without my custom app_info.xml (which means Jake got it to read newer GPUs after all! :) ) but after that I put the app_info.xml back, and it's still working. Which is good, because that way I can crunch 3 WUs at a time :)

In spite of this, new WUs are still sporatic. Some updates don't give me anything, and then a minute later, I'll get a few. I'm not getting anything from Separation still.

Profile UnionJack
Send message
Joined: 8 Jan 10
Posts: 11
Credit: 3,113,148
RAC: 1,423

Message 65375 - Posted: 4 Oct 2016, 9:33:47 UTC - in response to Message 65273.
Last modified: 4 Oct 2016, 9:48:45 UTC

I'm getting Computation Error on every milkyway@home task I run. The Tasks page of BOINC manager says "Computation error (0.0176 CPUs +1 AMD/ATI GPU)". There's no corresponding text in the Event Log page, nor in any file under ~/boinc. This is a Gentoo Linux 12-core i7 box with plenty of RAM and disk space, so this isn't the Windows problem reported above. 0.0176 is about 1/58. I'm running the latest versions available of all drivers etc.

The error comes up immediately each task is started. I've reset the project but that made no difference. The job log has a date stamp of 24 Sept, and its last entry is "1474692508 ue 5545.518061 ct 1614.182000 fe 1307020000000 nm de_nbody_8_22_16_v162_2k_2_1474377580_43838_0 et 172.516924 es 0". Other projects' job logs are date-stamped today.

# lspci -n -s 01:00.0
01:00.0 0300: 1002:6938 (rev f1)
____________
Rgds
Peter.

Profile UnionJack
Send message
Joined: 8 Jan 10
Posts: 11
Credit: 3,113,148
RAC: 1,423

Message 65376 - Posted: 4 Oct 2016, 11:41:58 UTC - in response to Message 65375.

# lspci -n -s 01:00.0 01:00.0 0300: 1002:6938 (rev f1)

You can find the details of this device at https://pci-ids.ucw.cz/read/PC/1002/6938 . I'm peterh in that discussion.
____________
Rgds
Peter.

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,894,655
RAC: 175,326

Message 65379 - Posted: 4 Oct 2016, 15:42:44 UTC

Hey Everyone,

I did some work tuning the database yesterday to improve insert query times for the workunit generator after determining that query was the bottle neck in work unit generation. Seems to have vastly improve the work unit availability. If you guys are still running out of work units please let me know.

Jake

Tom*
Send message
Joined: 4 Oct 11
Posts: 33
Credit: 268,361,555
RAC: 399,933

Message 65380 - Posted: 4 Oct 2016, 16:28:46 UTC

Thanks Jake, Looks like its back to normal at my end.

Really appreciate your hard work.

Jake Weiss
Volunteer moderator
Project developer
Project tester
Project scientist
Send message
Joined: 25 Feb 13
Posts: 438
Credit: 9,894,655
RAC: 175,326

Message 65381 - Posted: 4 Oct 2016, 16:44:48 UTC

Thank you Tom and everyone else who crunches for us. We really appreciate your work crunching our work units.

Jake

Previous · 1 · 2 · 3 · 4 · 5 . . . 6 · Next
Post to thread

Message boards : News : Updated Server Daemons and Libraries


Main page · Your account · Message boards


Copyright © 2017 AstroInformatics Group