Welcome to MilkyWay@home

Server Trouble

Message boards : News : Server Trouble
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 22 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 72940 - Posted: 17 Apr 2022, 23:25:16 UTC - in response to Message 72938.  

deleted duplicate post
Edit your post and change it to two spaces. It will magically vanish. Not sure why the forum software doesn't just have a delete button.
ID: 72940 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 72941 - Posted: 17 Apr 2022, 23:25:18 UTC - in response to Message 72925.  
Last modified: 17 Apr 2022, 23:26:59 UTC

Every time I check server status it always says around 10,000, so I'm not sure how it's ever running out.


There are usually 10k WUs inthe database waiting to go out, but users don't talk directly to the DB. Instead, when you request work, you ask the scheduler/feeder system, which contains a buffer pool of WUs that it keeps in shared memory waiting to go out. So this pool can be empty, but there are still plenty of jobs in the DB, and you won't be able to get work until that pool fills back up.
ID: 72941 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Tom Donlon
Volunteer moderator
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 10 Apr 19
Posts: 408
Credit: 120,203,200
RAC: 0
Message 72942 - Posted: 17 Apr 2022, 23:26:40 UTC - in response to Message 72930.  

Seems like the feeder is sleeping too long to feed the buffer. I would suggest reducing the amount of time the feeder sleeps for.


I remember seeing this setting somewhere in the server code, but can't remember where. I might try to dig around for it tomorrow.
ID: 72942 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 72943 - Posted: 17 Apr 2022, 23:26:46 UTC - in response to Message 72939.  
Last modified: 17 Apr 2022, 23:28:04 UTC

Just installed an R9 280X last night, and it has run dry several times already!
Those cards are wonderful, I have 6. Not sure what MW is going to do when they've all expired and everyone uses the modern pitiful ones without much DP.
yep
I just bought two R9 Nanos at a stupidly low price (auctions on Ebay are very cheap compared to Buy It Now). Twice the speed for everything else, half the speed for MW. So my 6 280X cards are going to be on MW permanently (well until they go bang - I had one a long time ago that managed to explode a surface mount capacitor. That made quite a stench).
ID: 72943 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 72945 - Posted: 17 Apr 2022, 23:30:09 UTC - in response to Message 72941.  
Last modified: 17 Apr 2022, 23:30:15 UTC

Every time I check server status it always says around 10,000, so I'm not sure how it's ever running out.
There are usually 10k WUs inthe database waiting to go out, but users don't talk directly to the DB. Instead, when you request work, you ask the scheduler/feeder system, which contains a buffer pool of WUs that it keeps in shared memory waiting to go out. So this pool can be empty, but there are still plenty of jobs in the DB, and you won't be able to get work until that pool fills back up.
Is it possible to make this buffer bigger? Maybe a RAM upgrade or a change of the server settings? If you need cash, we're willing to donate, please put the donation request on the home page. A lot of people don't know about it.
ID: 72945 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Kiska

Send message
Joined: 31 Mar 12
Posts: 94
Credit: 152,370,647
RAC: 1,792
Message 72950 - Posted: 18 Apr 2022, 0:55:38 UTC - in response to Message 72942.  

Seems like the feeder is sleeping too long to feed the buffer. I would suggest reducing the amount of time the feeder sleeps for.


I remember seeing this setting somewhere in the server code, but can't remember where. I might try to dig around for it tomorrow.


You'd likely find this in the config.xml file: https://boinc.berkeley.edu/trac/wiki/ProjectConfigFile

Documentation is here if you need it: https://boinc.berkeley.edu/trac/wiki/ProjectOptions and here https://boinc.berkeley.edu/trac/wiki/ProjectDaemons

Or the other place you'd find it is in the service files for the project, command line switch documentation is here: https://boinc.berkeley.edu/trac/wiki/BackendPrograms
ID: 72950 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile alk44
Avatar

Send message
Joined: 2 Mar 20
Posts: 131
Credit: 317,859,512
RAC: 25,272
Message 72951 - Posted: 18 Apr 2022, 4:12:01 UTC - in response to Message 72739.  

?????

The server is broken, nobody knows why, sometimes it can't be bothered giving you tasks even though they're there in the queue. For some reason you hid your computers so I can't tell what OS to advise you on.


Are you sure you can't view my computers??? I thought I sat it up so everyone could see them a long time ago. Now I can't figure out where to look to make sure it's set that way.
ID: 72951 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 9
Message 72952 - Posted: 18 Apr 2022, 4:15:54 UTC - in response to Message 72943.  

Just installed an R9 280X last night, and it has run dry several times already!
Those cards are wonderful, I have 6. Not sure what MW is going to do when they've all expired and everyone uses the modern pitiful ones without much DP.
yep
I just bought two R9 Nanos at a stupidly low price (auctions on Ebay are very cheap compared to Buy It Now). Twice the speed for everything else, half the speed for MW. So my 6 280X cards are going to be on MW permanently (well until they go bang - I had one a long time ago that managed to explode a surface mount capacitor. That made quite a stench).

Yep. R9 nano is 512 Gflops, while R9 280x is 1024. Strangely enough, tech power up GPU data base shows the 280x at 60% of the performance of the R9 nano. I've blown up radial lead and axial lead caps before, but never surface mount. That would be a personal best for me!
ID: 72952 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
alanb1951

Send message
Joined: 16 Mar 10
Posts: 211
Credit: 107,290,474
RAC: 17,925
Message 72953 - Posted: 18 Apr 2022, 4:37:22 UTC - in response to Message 72951.  

?????

The server is broken, nobody knows why, sometimes it can't be bothered giving you tasks even though they're there in the queue. For some reason you hid your computers so I can't tell what OS to advise you on.


Are you sure you can't view my computers??? I thought I sat it up so everyone could see them a long time ago. Now I can't figure out where to look to make sure it's set that way.
I think the person who posted that had tried to look at the computers of the person (spatzthecat, whose computers are hidden) who added the row of question-marks, and to whom he replied. I can see your computers if I go via the account link in your profile, so you're o.k. as is...

Cheers - Al.
ID: 72953 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 9
Message 72954 - Posted: 18 Apr 2022, 4:37:58 UTC - in response to Message 72951.  

?????

The server is broken, nobody knows why, sometimes it can't be bothered giving you tasks even though they're there in the queue. For some reason you hid your computers so I can't tell what OS to advise you on.


Are you sure you can't view my computers??? I thought I sat it up so everyone could see them a long time ago. Now I can't figure out where to look to make sure it's set that way.

go to project>preferences. it's the 5th check box down. I can see your computers. This question is now a candidate for the(hopefully upcoming) MW user guide.
ID: 72954 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Chooka
Avatar

Send message
Joined: 13 Dec 12
Posts: 101
Credit: 1,782,658,327
RAC: 0
Message 72956 - Posted: 18 Apr 2022, 7:54:31 UTC
Last modified: 18 Apr 2022, 8:02:11 UTC

I haven't been getting any work for hours. Run dry.

ps.... could someone please confirm this app config is still correct? I'm unsure about the modified fit part.

<app_config>
<app>
<name>milkyway</name>
<max_concurrent>16</max_concurrent>
<gpu_versions>
<gpu_usage>0.33</gpu_usage>
<cpu_usage>0.2</cpu_usage>
</gpu_versions>
</app>
<app>
<name>milkyway_separation__modified_fit</name>
<max_concurrent>16</max_concurrent>
<gpu_versions>
<gpu_usage>0.33</gpu_usage>
<cpu_usage>0.2</cpu_usage>
</gpu_versions>
</app>
</app_config>

ID: 72956 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 72959 - Posted: 18 Apr 2022, 8:20:04 UTC - in response to Message 72956.  

Check your "Event log ..." in "Tools" !
And look in "Notices" !

It seems, to me, that there is no ".....modified_fit" application.

You are probably not getting any GPU-Separation tasks because the are none to be sent.
Keep on trying.
Do update or re-start Boinc - but wait a while after each action.
ID: 72959 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile Chooka
Avatar

Send message
Joined: 13 Dec 12
Posts: 101
Credit: 1,782,658,327
RAC: 0
Message 72961 - Posted: 18 Apr 2022, 9:07:45 UTC - in response to Message 72959.  

Hi!
Yes, its an old config and might not be relevant anymore. I'm not 100% sure.
Log shows no work retrieved. It is asking for work though.
Restart of pc made no difference. No work on all pc's now. I'm sending 1 gpu over where it excels best.... Primegrid.
I'll just maintain some patience :)

ID: 72961 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Jimbocous
Avatar

Send message
Joined: 7 Mar 20
Posts: 22
Credit: 105,446,635
RAC: 1,698
Message 72962 - Posted: 18 Apr 2022, 9:36:44 UTC - in response to Message 72956.  

My current app_configs:


    <!-- Z600 milkyway -->
    <app_config>
    <app>
    <name>milkyway</name>
    <gpu_versions>
    <gpu_usage>1.0</gpu_usage>
    <cpu_usage>1.0</cpu_usage>
    </gpu_versions>
    <max_concurrent>11</max_concurrent>
    </app>

    <app_version>
    <app_name>milkyway</app_name>
    </app_version>

    <app_version>
    <app_name>milkyway</app_name>
    <plan_class>opencl_nvidia_101</plan_class>
    </app_version>

    </app_config>

    <!-- Z600B milkyway -->
    <app_config>
    <app>
    <name>milkyway</name>
    <gpu_versions>
    <gpu_usage>1.0</gpu_usage>
    <cpu_usage>0.5</cpu_usage>
    </gpu_versions>
    </app>
    <project_max_concurrent>5</project_max_concurrent>
    </app_config>



ID: 72962 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 72963 - Posted: 18 Apr 2022, 10:59:24 UTC - in response to Message 72951.  

The server is broken, nobody knows why, sometimes it can't be bothered giving you tasks even though they're there in the queue. For some reason you hid your computers so I can't tell what OS to advise you on.
Are you sure you can't view my computers??? I thought I sat it up so everyone could see them a long time ago. Now I can't figure out where to look to make sure it's set that way.
I can see yours, it was spatzthecat I couldn't see.
ID: 72963 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 72964 - Posted: 18 Apr 2022, 11:04:52 UTC - in response to Message 72952.  
Last modified: 18 Apr 2022, 11:12:13 UTC

Yep. R9 nano is 512 Gflops, while R9 280x is 1024. Strangely enough, tech power up GPU data base shows the 280x at 60% of the performance of the R9 nano. I've blown up radial lead and axial lead caps before, but never surface mount. That would be a personal best for me!
Not sure where you got 60% from (an average they took of DP/SP with a benchmark?), but I checked the 32 and 64 bit speeds on techpowerup (the site I always use, and have made a spreadsheet from the info for many cards) and got:
280X: 4096/1024 Gflops
Nano: 8192/512 Gflops
So exactly double/half speed. Which is very odd considering it uses half the electricity and has the same nm die. Maybe the higher power usage for 280X is only on DP mode (they do get louder on MW)

It was a lot of dust on some surface mount caps, on the back side of the card, no cooling whatsoever, not even a draught from a case fan. There was a loud pop and dust and cap flew everywhere. It refused to be recognised by the computer after that. I remember posting pictures of it, maybe in an electronics forum, trying to work out how to repair it. The caps were near the power input. I can't remember if I fixed it or not, I think I managed to bypass something or just shove a large cap in there (since I couldn't find out what size it should be).

I used to blow up big caps for a laugh as a teenager, outside, using a variac and diode to charge them to too high a voltage. Got one to fly up above roof height. If you seal any safety valve, it's more spectacular.

Now this is how to blow up a cap (using a very fast camera): https://youtu.be/6WUxgmMDts4
ID: 72964 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 5 Jul 11
Posts: 990
Credit: 376,143,149
RAC: 0
Message 72965 - Posted: 18 Apr 2022, 11:16:34 UTC - in response to Message 72956.  

I haven't been getting any work for hours. Run dry.

ps.... could someone please confirm this app config is still correct? I'm unsure about the modified fit part.

<app_config>
<app>
<name>milkyway</name>
<max_concurrent>16</max_concurrent>
<gpu_versions>
<gpu_usage>0.33</gpu_usage>
<cpu_usage>0.2</cpu_usage>
</gpu_versions>
</app>
<app>
<name>milkyway_separation__modified_fit</name>
<max_concurrent>16</max_concurrent>
<gpu_versions>
<gpu_usage>0.33</gpu_usage>
<cpu_usage>0.2</cpu_usage>
</gpu_versions>
</app>
</app_config>
There are only two app names, "milkyway" (the seperation (also called modfit) one), and "milkyway_nbody" (the multicore nbody one for CPU only).

Remove all this, it doesn't do anything but won't be causing a problem:

<app>
<name>milkyway_separation__modified_fit</name>
<max_concurrent>16</max_concurrent>
<gpu_versions>
<gpu_usage>0.33</gpu_usage>
<cpu_usage>0.2</cpu_usage>
</gpu_versions>
</app>


The rest is correct, you're allowing up to 16 at once on the whole machine, three per GPU, and allocating a fifth of a CPU core per task.
ID: 72965 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile mikey
Avatar

Send message
Joined: 8 May 09
Posts: 3329
Credit: 524,005,436
RAC: 29,355
Message 72966 - Posted: 18 Apr 2022, 11:17:21 UTC - in response to Message 72931.  

Seems like the feeder is sleeping too long to feed the buffer. I would suggest reducing the amount of time the feeder sleeps for.

Either that or someone has just dumped a ton of tasks onto the server, and its working through validating all of them
The Boinc server software needs a complete rewrite. Looks like at the moment you have to fiddle with lots of numbers to make it work right. Kinda like a machine from the 50s. Please use your expertise and go to github and kick those "programmers" where the sun doesn't shine.


Part of it is still 32bit, just like the client, which doesn't help either!!
ID: 72966 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
San-Fernando-Valley

Send message
Joined: 13 Apr 17
Posts: 256
Credit: 604,411,638
RAC: 0
Message 72969 - Posted: 18 Apr 2022, 11:27:56 UTC - in response to Message 72966.  
Last modified: 18 Apr 2022, 11:36:54 UTC

Seems like the feeder is sleeping too long to feed the buffer. I would suggest reducing the amount of time the feeder sleeps for.

Either that or someone has just dumped a ton of tasks onto the server, and its working through validating all of them
The Boinc server software needs a complete rewrite. Looks like at the moment you have to fiddle with lots of numbers to make it work right. Kinda like a machine from the 50s. Please use your expertise and go to github and kick those "programmers" where the sun doesn't shine.


Part of it is still 32bit, just like the client, which doesn't help either!!

... even having 64-bit, one can code sh*t ...

... or even better, having "only" 32-bit doesn't mean you are obliged to code sh*t !
ID: 72969 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Profile HRFMguy

Send message
Joined: 12 Nov 21
Posts: 236
Credit: 575,038,236
RAC: 9
Message 72973 - Posted: 18 Apr 2022, 12:11:59 UTC - in response to Message 72964.  

Yep. R9 nano is 512 Gflops, while R9 280x is 1024. Strangely enough, tech power up GPU data base shows the 280x at 60% of the performance of the R9 nano. I've blown up radial lead and axial lead caps before, but never surface mount. That would be a personal best for me!
Not sure where you got 60% from (an average they took of DP/SP with a benchmark?), but I checked the 32 and 64 bit speeds on techpowerup (the site I always use, and have made a spreadsheet from the info for many cards) and got:
280X: 4096/1024 Gflops
Nano: 8192/512 Gflops
So exactly double/half speed. Which is very odd considering it uses half the electricity and has the same nm die. Maybe the higher power usage for 280X is only on DP mode (they do get louder on MW)

It was a lot of dust on some surface mount caps, on the back side of the card, no cooling whatsoever, not even a draught from a case fan. There was a loud pop and dust and cap flew everywhere. It refused to be recognised by the computer after that. I remember posting pictures of it, maybe in an electronics forum, trying to work out how to repair it. The caps were near the power input. I can't remember if I fixed it or not, I think I managed to bypass something or just shove a large cap in there (since I couldn't find out what size it should be).

I used to blow up big caps for a laugh as a teenager, outside, using a variac and diode to charge them to too high a voltage. Got one to fly up above roof height. If you seal any safety valve, it's more spectacular.

Now this is how to blow up a cap (using a very fast camera): https://youtu.be/6WUxgmMDts4

If you go to https://www.techpowerup.com/gpu-specs/radeon-r9-nano.c2735 and scroll up about 30 devices in the relative performance chart, you will see it.
Those boys with the very fast camera were having waaaay to much fun.
ID: 72973 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Previous · 1 . . . 14 · 15 · 16 · 17 · 18 · 19 · 20 . . . 22 · Next

Message boards : News : Server Trouble

©2024 Astroinformatics Group